Abstract: Segmentation is an important step in medical image
analysis and classification for radiological evaluation or computer
aided diagnosis. This paper presents the problem of inaccurate lung
segmentation as observed in algorithms presented by researchers
working in the area of medical image analysis. The different lung
segmentation techniques have been tested using the dataset of 19
patients consisting of a total of 917 images. We obtained datasets of
11 patients from Ackron University, USA and of 8 patients from
AGA Khan Medical University, Pakistan. After testing the algorithms
against datasets, the deficiencies of each algorithm have been
highlighted.
Abstract: Bagging and boosting are among the most popular re-sampling ensemble methods that generate and combine a diversity of regression models using the same learning algorithm as base-learner. Boosting algorithms are considered stronger than bagging on noise-free data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in this work we built an ensemble using an averaging methodology of bagging and boosting ensembles with 10 sub-learners in each one. We performed a comparison with simple bagging and boosting ensembles with 25 sub-learners on standard benchmark datasets and the proposed ensemble gave better accuracy.
Abstract: Most of the existing text mining approaches are
proposed, keeping in mind, transaction databases model. Thus, the
mined dataset is structured using just one concept: the “transaction",
whereas the whole dataset is modeled using the “set" abstract type. In
such cases, the structure of the whole dataset and the relationships
among the transactions themselves are not modeled and
consequently, not considered in the mining process.
We believe that taking into account structure properties of
hierarchically structured information (e.g. textual document, etc ...)
in the mining process, can leads to best results. For this purpose, an
hierarchical associations rule mining approach for textual documents
is proposed in this paper and the classical set-oriented mining
approach is reconsidered profits to a Direct Acyclic Graph (DAG)
oriented approach. Natural languages processing techniques are used
in order to obtain the DAG structure. Based on this graph model, an
hierarchical bottom up algorithm is proposed. The main idea is that
each node is mined with its parent node.
Abstract: Frequent patterns are patterns such as sets of features or items that appear in data frequently. Finding such frequent patterns has become an important data mining task because it reveals associations, correlations, and many other interesting relationships hidden in a dataset. Most of the proposed frequent pattern mining algorithms have been implemented with imperative programming languages such as C, Cµ, Java. The imperative paradigm is significantly inefficient when itemset is large and the frequent pattern is long. We suggest a high-level declarative style of programming using a functional language. Our supposition is that the problem of frequent pattern discovery can be efficiently and concisely implemented via a functional paradigm since pattern matching is a fundamental feature supported by most functional languages. Our frequent pattern mining implementation using the Haskell language confirms our hypothesis about conciseness of the program. The performance studies on speed and memory usage support our intuition on efficiency of functional language.
Abstract: This paper presents an approach which is based on the
use of supervised feed forward neural network, namely multilayer
perceptron (MLP) neural network and finite element method (FEM)
to solve the inverse problem of parameters identification. The
approach is used to identify unknown parameters of ferromagnetic
materials. The methodology used in this study consists in the
simulation of a large number of parameters in a material under test,
using the finite element method (FEM). Both variations in relative
magnetic permeability and electrical conductivity of the material
under test are considered. Then, the obtained results are used to
generate a set of vectors for the training of MLP neural network.
Finally, the obtained neural network is used to evaluate a group of
new materials, simulated by the FEM, but not belonging to the
original dataset. Noisy data, added to the probe measurements is used
to enhance the robustness of the method. The reached results
demonstrate the efficiency of the proposed approach, and encourage
future works on this subject.
Abstract: The quick training algorithms and accurate solution
procedure for incremental learning aim at improving the efficiency of
training of SVR, whereas there are some disadvantages for them, i.e.
the nonconvergence of the formers for changeable training set and
the inefficiency of the latter for a massive dataset. In order to handle
the problems, a new training algorithm for a changeable training
set, named Approximation Incremental Training Algorithm (AITA),
was proposed. This paper explored the reason of nonconvergence
theoretically and discussed the realization of AITA, and finally
demonstrated the benefits of AITA both on precision and efficiency.
Abstract: Genetic Folding (GF) a new class of EA named as is
introduced for the first time. It is based on chromosomes composed
of floating genes structurally organized in a parent form and
separated by dots. Although, the genotype/phenotype system of GF
generates a kernel expression, which is the objective function of
superior classifier. In this work the question of the satisfying
mapping-s rules in evolving populations is addressed by analyzing
populations undergoing either Mercer-s or none Mercer-s rule. The
results presented here show that populations undergoing Mercer-s
rules improve practically models selection of Support Vector
Machine (SVM). The experiment is trained multi-classification
problem and tested on nonlinear Ionosphere dataset. The target of this
paper is to answer the question of evolving Mercer-s rule in SVM
addressed using either genetic folding satisfied kernel-s rules or not
applied to complicated domains and problems.
Abstract: Software estimation accuracy is among the greatest
challenges for software developers. This study aimed at building and
evaluating a neuro-fuzzy model to estimate software projects
development time. The forty-one modules developed from ten
programs were used as dataset. Our proposed approach is compared
with fuzzy logic and neural network model and Results show that the
value of MMRE (Mean of Magnitude of Relative Error) applying
neuro-fuzzy was substantially lower than MMRE applying fuzzy
logic and neural network.
Abstract: Heterogeneity of solid waste characteristics as well as the complex processes taking place within the landfill ecosystem motivated the implementation of soft computing methodologies such as artificial neural networks (ANN), fuzzy logic (FL), and their combination. The present work uses a hybrid ANN-FL model that employs knowledge-based FL to describe the process qualitatively and implements the learning algorithm of ANN to optimize model parameters. The model was developed to simulate and predict the landfill gas production at a given time based on operational parameters. The experimental data used were compiled from lab-scale experiment that involved various operating scenarios. The developed model was validated and statistically analyzed using F-test, linear regression between actual and predicted data, and mean squared error measures. Overall, the simulated landfill gas production rates demonstrated reasonable agreement with actual data. The discussion focused on the effect of the size of training datasets and number of training epochs.
Abstract: In this paper we designed and implemented a new
ensemble of classifiers based on a sequence of classifiers which were
specialized in regions of the training dataset where errors of its
trained homologous are concentrated. In order to separate this
regions, and to determine the aptitude of each classifier to properly
respond to a new case, it was used another set of classifiers built
hierarchically. We explored a selection based variant to combine the
base classifiers. We validated this model with different base
classifiers using 37 training datasets. It was carried out a statistical
comparison of these models with the well known Bagging and
Boosting, obtaining significantly superior results with the
hierarchical ensemble using Multilayer Perceptron as base classifier.
Therefore, we demonstrated the efficacy of the proposed ensemble,
as well as its applicability to general problems.
Abstract: Recently, information security has become a key issue
in information technology as the number of computer security
breaches are exposed to an increasing number of security threats. A
variety of intrusion detection systems (IDS) have been employed for
protecting computers and networks from malicious network-based or
host-based attacks by using traditional statistical methods to new data
mining approaches in last decades. However, today's commercially
available intrusion detection systems are signature-based that are not
capable of detecting unknown attacks. In this paper, we present a
new learning algorithm for anomaly based network intrusion
detection system using decision tree algorithm that distinguishes
attacks from normal behaviors and identifies different types of
intrusions. Experimental results on the KDD99 benchmark network
intrusion detection dataset demonstrate that the proposed learning
algorithm achieved 98% detection rate (DR) in comparison with
other existing methods.
Abstract: In the recent works related with mixture discriminant
analysis (MDA), expectation and maximization (EM) algorithm is
used to estimate parameters of Gaussian mixtures. But, initial values
of EM algorithm affect the final parameters- estimates. Also, when
EM algorithm is applied two times, for the same data set, it can be
give different results for the estimate of parameters and this affect the
classification accuracy of MDA. Forthcoming this problem, we use
Self Organizing Mixture Network (SOMN) algorithm to estimate
parameters of Gaussians mixtures in MDA that SOMN is more robust
when random the initial values of the parameters are used [5]. We
show effectiveness of this method on popular simulated waveform
datasets and real glass data set.
Abstract: Clustering algorithms help to understand the hidden
information present in datasets. A dataset may contain intrinsic and
nested clusters, the detection of which is of utmost importance. This
paper presents a Distributed Grid-based Density Clustering algorithm
capable of identifying arbitrary shaped embedded clusters as well as
multi-density clusters over large spatial datasets. For handling
massive datasets, we implemented our method using a 'sharednothing'
architecture where multiple computers are interconnected
over a network. Experimental results are reported to establish the
superiority of the technique in terms of scale-up, speedup as well as
cluster quality.
Abstract: Real-time object tracking is a problem which involves extraction of critical information from complex and uncertain imagedata. In this paper, we present a comprehensive methodology to design an artificial neural network (ANN) for a real-time object tracking application. The object, which is tracked for the purpose of demonstration, is a specific airplane. However, the proposed ANN can be trained to track any other object of interest. The ANN has been simulated and tested on the training and testing datasets, as well as on a real-time streaming video. The tracking error is analyzed with post-regression analysis tool, which finds the correlation among the calculated coordinates and the correct coordinates of the object in the image. The encouraging results from the computer simulation and analysis show that the proposed ANN architecture is a good candidate solution to a real-time object tracking problem.
Abstract: This paper presents the methodology from machine
learning approaches for short-term rain forecasting system. Decision
Tree, Artificial Neural Network (ANN), and Support Vector Machine
(SVM) were applied to develop classification and prediction models
for rainfall forecasts. The goals of this presentation are to
demonstrate (1) how feature selection can be used to identify the
relationships between rainfall occurrences and other weather
conditions and (2) what models can be developed and deployed for
predicting the accurate rainfall estimates to support the decisions to
launch the cloud seeding operations in the northeastern part of
Thailand. Datasets collected during 2004-2006 from the
Chalermprakiat Royal Rain Making Research Center at Hua Hin,
Prachuap Khiri khan, the Chalermprakiat Royal Rain Making
Research Center at Pimai, Nakhon Ratchasima and Thai
Meteorological Department (TMD). A total of 179 records with 57
features was merged and matched by unique date. There are three
main parts in this work. Firstly, a decision tree induction algorithm
(C4.5) was used to classify the rain status into either rain or no-rain.
The overall accuracy of classification tree achieves 94.41% with the
five-fold cross validation. The C4.5 algorithm was also used to
classify the rain amount into three classes as no-rain (0-0.1 mm.),
few-rain (0.1- 10 mm.), and moderate-rain (>10 mm.) and the overall
accuracy of classification tree achieves 62.57%. Secondly, an ANN
was applied to predict the rainfall amount and the root mean square
error (RMSE) were used to measure the training and testing errors of
the ANN. It is found that the ANN yields a lower RMSE at 0.171 for
daily rainfall estimates, when compared to next-day and next-2-day
estimation. Thirdly, the ANN and SVM techniques were also used to
classify the rain amount into three classes as no-rain, few-rain, and
moderate-rain as above. The results achieved in 68.15% and 69.10%
of overall accuracy of same-day prediction for the ANN and SVM
models, respectively. The obtained results illustrated the comparison
of the predictive power of different methods for rainfall estimation.
Abstract: This research paper presents a framework on how to
build up malware dataset.Many researchers took longer time to
clean the dataset from any noise or to transform the dataset into a
format that can be used straight away for testing. Therefore, this
research is proposing a framework to help researchers to speed up
the malware dataset cleaningprocesses which later can be used for
testing. It is believed, an efficient malware dataset cleaning
processes, can improved the quality of the data, thus help to improve
the accuracy and the efficiency of the subsequent analysis. Apart
from that, an in-depth understanding of the malware taxonomy is
also important prior and during the dataset cleaning processes. A
new Trojan classification has been proposed to complement this
framework.This experiment has been conducted in a controlled lab
environment and using the dataset from VxHeavens dataset. This
framework is built based on the integration of static and dynamic
analyses, incident response method and knowledge database
discovery (KDD) processes.This framework can be used as the basis
guideline for malware researchers in building malware dataset.
Abstract: In this paper a combined feature selection method is
proposed which takes advantages of sample domain filtering,
resampling and feature subset evaluation methods to reduce
dimensions of huge datasets and select reliable features. This method
utilizes both feature space and sample domain to improve the process
of feature selection and uses a combination of Chi squared with
Consistency attribute evaluation methods to seek reliable features.
This method consists of two phases. The first phase filters and
resamples the sample domain and the second phase adopts a hybrid
procedure to find the optimal feature space by applying Chi squared,
Consistency subset evaluation methods and genetic search.
Experiments on various sized datasets from UCI Repository of
Machine Learning databases show that the performance of five
classifiers (Naïve Bayes, Logistic, Multilayer Perceptron, Best First
Decision Tree and JRIP) improves simultaneously and the
classification error for these classifiers decreases considerably. The
experiments also show that this method outperforms other feature
selection methods.
Abstract: Society has grown to rely on Internet services, and the
number of Internet users increases every day. As more and more
users become connected to the network, the window of opportunity
for malicious users to do their damage becomes very great and
lucrative. The objective of this paper is to incorporate different
techniques into classier system to detect and classify intrusion from
normal network packet. Among several techniques, Steady State
Genetic-based Machine Leaning Algorithm (SSGBML) will be used
to detect intrusions. Where Steady State Genetic Algorithm (SSGA),
Simple Genetic Algorithm (SGA), Modified Genetic Algorithm and
Zeroth Level Classifier system are investigated in this research.
SSGA is used as a discovery mechanism instead of SGA. SGA
replaces all old rules with new produced rule preventing old good
rules from participating in the next rule generation. Zeroth Level
Classifier System is used to play the role of detector by matching
incoming environment message with classifiers to determine whether
the current message is normal or intrusion and receiving feedback
from environment. Finally, in order to attain the best results,
Modified SSGA will enhance our discovery engine by using Fuzzy
Logic to optimize crossover and mutation probability. The
experiments and evaluations of the proposed method were performed
with the KDD 99 intrusion detection dataset.
Abstract: To improve the classification rate of the face
recognition, features combination and a novel non-linear kernel are
proposed. The feature vector concatenates three different radius of
local binary patterns and Gabor wavelet features. Gabor features are
the mean, standard deviation and the skew of each scaling and
orientation parameter. The aim of the new kernel is to incorporate
the power of the kernel methods with the optimal balance between
the features. To verify the effectiveness of the proposed method,
numerous methods are tested by using four datasets, which are
consisting of various emotions, orientations, configuration,
expressions and lighting conditions. Empirical results show the
superiority of the proposed technique when compared to other
methods.
Abstract: Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in this work we built an ensemble using a voting methodology of bagging and boosting ensembles with 10 subclassifiers in each one. We performed a comparison with simple bagging and boosting ensembles with 25 sub-classifiers, as well as other well known combining methods, on standard benchmark datasets and the proposed technique was the most accurate.