Abstract: In recent years, a number of works proposing the
combination of multiple classifiers to produce a single
classification have been reported in remote sensing literature. The
resulting classifier, referred to as an ensemble classifier, is
generally found to be more accurate than any of the individual
classifiers making up the ensemble. As accuracy is the primary
concern, much of the research in the field of land cover
classification is focused on improving classification accuracy. This
study compares the performance of four ensemble approaches
(boosting, bagging, DECORATE and random subspace) with a
univariate decision tree as base classifier. Two training datasets,
one without ant noise and other with 20 percent noise was used to
judge the performance of different ensemble approaches. Results
with noise free data set suggest an improvement of about 4% in
classification accuracy with all ensemble approaches in
comparison to the results provided by univariate decision tree
classifier. Highest classification accuracy of 87.43% was achieved
by boosted decision tree. A comparison of results with noisy data
set suggests that bagging, DECORATE and random subspace
approaches works well with this data whereas the performance of
boosted decision tree degrades and a classification accuracy of
79.7% is achieved which is even lower than that is achieved (i.e.
80.02%) by using unboosted decision tree classifier.
Abstract: Heart failure is the most common reason of death
nowadays, but if the medical help is given directly, the patient-s life
may be saved in many cases. Numerous heart diseases can be
detected by means of analyzing electrocardiograms (ECG). Artificial
Neural Networks (ANN) are computer-based expert systems that
have proved to be useful in pattern recognition tasks. ANN can be
used in different phases of the decision-making process, from
classification to diagnostic procedures. This work concentrates on a
review followed by a novel method.
The purpose of the review is to assess the evidence of healthcare
benefits involving the application of artificial neural networks to the
clinical functions of diagnosis, prognosis and survival analysis, in
ECG signals. The developed method is based on a compound neural
network (CNN), to classify ECGs as normal or carrying an
AtrioVentricular heart Block (AVB). This method uses three
different feed forward multilayer neural networks. A single output
unit encodes the probability of AVB occurrences. A value between 0
and 0.1 is the desired output for a normal ECG; a value between 0.1
and 1 would infer an occurrence of an AVB. The results show that
this compound network has a good performance in detecting AVBs,
with a sensitivity of 90.7% and a specificity of 86.05%. The accuracy
value is 87.9%.
Abstract: The number of features required to represent an image
can be very huge. Using all available features to recognize objects
can suffer from curse dimensionality. Feature selection and
extraction is the pre-processing step of image mining. Main issues in
analyzing images is the effective identification of features and
another one is extracting them. The mining problem that has been
focused is the grouping of features for different shapes. Experiments
have been conducted by using shape outline as the features. Shape
outline readings are put through normalization and dimensionality
reduction process using an eigenvector based method to produce a
new set of readings. After this pre-processing step data will be
grouped through their shapes. Through statistical analysis, these
readings together with peak measures a robust classification and
recognition process is achieved. Tests showed that the suggested
methods are able to automatically recognize objects through their
shapes. Finally, experiments also demonstrate the system invariance
to rotation, translation, scale, reflection and to a small degree of
distortion.
Abstract: Fourier transform infrared (FT-IR) spectroscopic imaging
is an emerging technique that provides both chemically and
spatially resolved information. The rich chemical content of data
may be utilized for computer-aided determinations of structure and
pathologic state (cancer diagnosis) in histological tissue sections for
prostate cancer. FT-IR spectroscopic imaging of prostate tissue has
shown that tissue type (histological) classification can be performed to
a high degree of accuracy [1] and cancer diagnosis can be performed
with an accuracy of about 80% [2] on a microscopic (≈ 6μm)
length scale. In performing these analyses, it has been observed
that there is large variability (more than 60%) between spectra from
different points on tissue that is expected to consist of the same
essential chemical constituents. Spectra at the edges of tissues are
characteristically and consistently different from chemically similar
tissue in the middle of the same sample. Here, we explain these
differences using a rigorous electromagnetic model for light-sample
interaction. Spectra from FT-IR spectroscopic imaging of chemically
heterogeneous samples are different from bulk spectra of individual
chemical constituents of the sample. This is because spectra not
only depend on chemistry, but also on the shape of the sample.
Using coupled wave analysis, we characterize and quantify the nature
of spectral distortions at the edges of tissues. Furthermore, we
present a method of performing histological classification of tissue
samples. Since the mid-infrared spectrum is typically assumed to
be a quantitative measure of chemical composition, classification
results can vary widely due to spectral distortions. However, we
demonstrate that the selection of localized metrics based on chemical
information can make our data robust to the spectral distortions
caused by scattering at the tissue boundary.
Abstract: Lately, significant work in the area of Intelligent
Manufacturing has become public and mainly applied within the
frame of industrial purposes. Special efforts have been made in the
implementation of new technologies, management and control
systems, among many others which have all evolved the field. Aware
of all this and due to the scope of new projects and the need of
turning the existing flexible ideas into more autonomous and
intelligent ones, i.e.: Intelligent Manufacturing, the present paper
emerges with the main aim of contributing to the design and analysis
of the material flow in either systems, cells or work stations under
this new “intelligent" denomination. For this, besides offering a
conceptual basis in some of the key points to be taken into account
and some general principles to consider in the design and analysis of
the material flow, also some tips on how to define other possible
alternative material flow scenarios and a classification of the states a
system, cell or workstation are offered as well. All this is done with
the intentions of relating it with the use of simulation tools, for which
these have been briefly addressed with a special focus on the Witness
simulation package. For a better comprehension, the previous
elements are supported by a detailed layout, other figures and a few
expressions which could help obtaining necessary data. Such data and
others will be used in the future, when simulating the scenarios in the
search of the best material flow configurations.
Abstract: The belief K-modes method (BKM) approach is a new
clustering technique handling uncertainty in the attribute values of
objects in both the cluster construction task and the classification one.
Like the standard version of this method, the BKM results depend on
the chosen initial modes. So, one selection method of initial modes
is developed, in this paper, aiming at improving the performances of
the BKM approach. Experiments with several sets of real data show
that by considered the developed selection initial modes method, the
clustering algorithm produces more accurate results.
Abstract: Current image-based individual human recognition
methods, such as fingerprints, face, or iris biometric modalities
generally require a cooperative subject, views from certain aspects,
and physical contact or close proximity. These methods cannot
reliably recognize non-cooperating individuals at a distance in the
real world under changing environmental conditions. Gait, which
concerns recognizing individuals by the way they walk, is a relatively
new biometric without these disadvantages. The inherent gait
characteristic of an individual makes it irreplaceable and useful in
visual surveillance.
In this paper, an efficient gait recognition system for human
identification by extracting two features namely width vector of
the binary silhouette and the MPEG-7-based region-based shape
descriptors is proposed. In the proposed method, foreground objects
i.e., human and other moving objects are extracted by estimating
background information by a Gaussian Mixture Model (GMM) and
subsequently, median filtering operation is performed for removing
noises in the background subtracted image. A moving target classification
algorithm is used to separate human being (i.e., pedestrian)
from other foreground objects (viz., vehicles). Shape and boundary
information is used in the moving target classification algorithm.
Subsequently, width vector of the outer contour of binary silhouette
and the MPEG-7 Angular Radial Transform coefficients are taken as
the feature vector. Next, the Principal Component Analysis (PCA)
is applied to the selected feature vector to reduce its dimensionality.
These extracted feature vectors are used to train an Hidden Markov
Model (HMM) for identification of some individuals. The proposed
system is evaluated using some gait sequences and the experimental
results show the efficacy of the proposed algorithm.
Abstract: Cancers could normally be marked by a number of
differentially expressed genes which show enormous potential as
biomarkers for a certain disease. Recent years, cancer classification
based on the investigation of gene expression profiles derived by
high-throughput microarrays has widely been used. The selection of
discriminative genes is, therefore, an essential preprocess step in
carcinogenesis studies. In this paper, we have proposed a novel gene
selector using information-theoretic measures for biological
discovery. This multivariate filter is a four-stage framework through
the analyses of feature relevance, feature interdependence, feature
redundancy-dependence and subset rankings, and having been
examined on the colon cancer data set. Our experimental result show
that the proposed method outperformed other information theorem
based filters in all aspect of classification errors and classification
performance.
Abstract: This paper proposes a specialized Web robot to automatically collect objectionable Web contents for use in an objectionable Web content classification system, which creates the URL database of objectionable Web contents. It aims at shortening the update period of the DB, increasing the number of URLs in the DB, and enhancing the accuracy of the information in the DB.
Abstract: This paper introduces new algorithms (Fuzzy relative
of the CLARANS algorithm FCLARANS and Fuzzy c Medoids
based on randomized search FCMRANS) for fuzzy clustering of
relational data. Unlike existing fuzzy c-medoids algorithm (FCMdd)
in which the within cluster dissimilarity of each cluster is minimized
in each iteration by recomputing new medoids given current
memberships, FCLARANS minimizes the same objective function
minimized by FCMdd by changing current medoids in such away
that that the sum of the within cluster dissimilarities is minimized.
Computing new medoids may be effected by noise because outliers
may join the computation of medoids while the choice of medoids in
FCLARANS is dictated by the location of a predominant fraction of
points inside a cluster and, therefore, it is less sensitive to the
presence of outliers. In FCMRANS the step of computing new
medoids in FCMdd is modified to be based on randomized search.
Furthermore, a new initialization procedure is developed that add
randomness to the initialization procedure used with FCMdd. Both
FCLARANS and FCMRANS are compared with the robust and
linearized version of fuzzy c-medoids (RFCMdd). Experimental
results with different samples of the Reuter-21578, Newsgroups
(20NG) and generated datasets with noise show that FCLARANS is
more robust than both RFCMdd and FCMRANS. Finally, both
FCMRANS and FCLARANS are more efficient and their outputs
are almost the same as that of RFCMdd in terms of classification
rate.
Abstract: In this work, we improve a previously developed
segmentation scheme aimed at extracting edge information from
speckled images using a maximum likelihood edge detector. The
scheme was based on finding a threshold for the probability density
function of a new kernel defined as the arithmetic mean-to-geometric
mean ratio field over a circular neighborhood set and, in a general
context, is founded on a likelihood random field model (LRFM). The
segmentation algorithm was applied to discriminated speckle areas
obtained using simple elliptic discriminant functions based on
measures of the signal-to-noise ratio with fractional order moments.
A rigorous stochastic analysis was used to derive an exact expression
for the cumulative density function of the probability density
function of the random field. Based on this, an accurate probability
of error was derived and the performance of the scheme was
analysed. The improved segmentation scheme performed well for
both simulated and real images and showed superior results to those
previously obtained using the original LRFM scheme and standard
edge detection methods. In particular, the false alarm probability was
markedly lower than that of the original LRFM method with
oversegmentation artifacts virtually eliminated. The importance of
this work lies in the development of a stochastic-based segmentation,
allowing an accurate quantification of the probability of false
detection. Non visual quantification and misclassification in medical
ultrasound speckled images is relatively new and is of interest to
clinicians.
Abstract: In this article, a method has been offered to classify
normal and defective tiles using wavelet transform and artificial
neural networks. The proposed algorithm calculates max and min
medians as well as the standard deviation and average of detail
images obtained from wavelet filters, then comes by feature vectors
and attempts to classify the given tile using a Perceptron neural
network with a single hidden layer. In this study along with the
proposal of using median of optimum points as the basic feature and
its comparison with the rest of the statistical features in the wavelet
field, the relational advantages of Haar wavelet is investigated. This
method has been experimented on a number of various tile designs
and in average, it has been valid for over 90% of the cases. Amongst
the other advantages, high speed and low calculating load are
prominent.
Abstract: In this research, the diabetes conditions of people (healthy, prediabete and diabete) were tried to be identified with noninvasive palm perspiration measurements. Data clusters gathered from 200 subjects were used (1.Individual Attributes Cluster and 2. Palm Perspiration Attributes Cluster). To decrase the dimensions of these data clusters, Principal Component Analysis Method was used. Data clusters, prepared in that way, were classified with Support Vector Machines. Classifications with highest success were 82% for Glucose parameters and 84% for HbA1c parametres.
Abstract: This paper proposes a technique to block adult images displayed in websites. The filter is designed so as to perform even in exceptional cases such as, where face detection is not possible or improper face visibility. This is achieved by using an alternative phase to extract the MFC (Most Frequent Color) from the Human Body regions estimated using a biometric of anthropometric distances between fixed rigidly connected body locations. The logical results generated can be protected from overriding by a firewall or intrusion, by encrypting the result in a SSH data packet.
Abstract: Whilst there is growing evidence that activity
across the lifespan is beneficial for improved health, there are
also many changes involved with the aging process and
subsequently the potential for reduced indices of health. The
nexus between health, physical activity and aging is complex
and has raised much interest in recent times due to the
realization that a multifaceted approached is necessary in
order to counteract a growing obesity epidemic. By
investigating age based trends within a population adhering to
competitive sport at older ages, further insight might be
gleaned to assist in understanding one of many factors
influencing this relationship.
BMI was derived using data gathered on a total of 6,071
masters athletes (51.9% male, 48.1% female) aged 25 to 91
years ( =51.5, s =±9.7), competing at the Sydney World
Masters Games (2009). Using linear and loess regression it
was demonstrated that the usual tendency for prevalence of
higher BMI increasing with age was reversed in the sample.
This trend in reversal was repeated for both male and female
only sub-sets of the sample participants, indicating the
possibility of improved prevalence of BMI with increasing
age for both the sample as a whole and these individual subgroups.
This evidence of improved classification in one index of
health (reduced BMI) for masters athletes (when compared to
the general population) implies there are either improved
levels of this index of health with aging due to adherence to
sport or possibly the reduced BMI is advantageous and
contributes to this cohort adhering (or being attracted) to
masters sport at older ages. Demonstration of this
proportionately under-investigated World Masters Games
population having an improved relationship between BMI and
increasing age over the general population is of particular
interest in the context of the measures being taken globally to
curb an obesity epidemic.
Abstract: The present study presents a new approach to automatic
data clustering and classification problems in large and complex
databases and, at the same time, derives specific types of explicit rules
describing each cluster. The method works well in both sparse and
dense multidimensional data spaces. The members of the data space
can be of the same nature or represent different classes. A number
of N-dimensional ellipsoids are used for enclosing the data clouds.
Due to the geometry of an ellipsoid and its free rotation in space
the detection of clusters becomes very efficient. The method is based
on genetic algorithms that are used for the optimization of location,
orientation and geometric characteristics of the hyper-ellipsoids. The
proposed approach can serve as a basis for the development of
general knowledge systems for discovering hidden knowledge and
unexpected patterns and rules in various large databases.
Abstract: Maintenance is one of the most important activities in
the shipyard industry. However, sometimes it is not supported by
adequate services from the shipyard, where inaccuracy in estimating
the duration of the ship maintenance is still common. This makes
estimation of ship maintenance duration is crucial. This study uses
Data Mining approach, i.e., CART (Classification and Regression
Tree) to estimate the duration of ship maintenance that is limited to
dock works or which is known as dry docking. By using the volume
of dock works as an input to estimate the maintenance duration, 4
classes of dry docking duration were obtained with different linear
model and job criteria for each class. These linear models can then be
used to estimate the duration of dry docking based on job criteria.
Abstract: Cancer classification to their corresponding cohorts has been key area of research in bioinformatics aiming better prognosis of the disease. High dimensionality of gene data has been makes it a complex task and requires significance data identification technique in order to reducing the dimensionality and identification of significant information. In this paper, we have proposed a novel approach for classification of oral cancer into metastasis positive and negative patients. We have used significance analysis of microarrays (SAM) for identifying significant genes which constitutes gene signature. 3 different gene signatures were identified using SAM from 3 different combination of training datasets and their classification accuracy was calculated on corresponding testing datasets using k-Nearest Neighbour (kNN), Fuzzy C-Means Clustering (FCM), Support Vector Machine (SVM) and Backpropagation Neural Network (BPNN). A final gene signature of only 9 genes was obtained from above 3 individual gene signatures. 9 gene signature-s classification capability was compared using same classifiers on same testing datasets. Results obtained from experimentation shows that 9 gene signature classified all samples in testing dataset accurately while individual genes could not classify all accurately.
Abstract: Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.
Abstract: This paper presents a new method of analog fault diagnosis based on back-propagation neural networks (BPNNs) using wavelet decomposition and fractal dimension as preprocessors. The proposed method has the capability to detect and identify faulty components in an analog electronic circuit with tolerance by analyzing its impulse response. Using wavelet decomposition to preprocess the impulse response drastically de-noises the inputs to the neural network. The second preprocessing by fractal dimension can extract unique features, which are the fed to a neural network as inputs for further classification. A comparison of our work with [1] and [6], which also employs back-propagation (BP) neural networks, reveals that our system requires a much smaller network and performs significantly better in fault diagnosis of analog circuits due to our proposed preprocessing techniques.