Abstract: The exponential increase in the volume of medical image database has imposed new challenges to clinical routine in maintaining patient history, diagnosis, treatment and monitoring. With the advent of data mining and machine learning techniques it is possible to automate and/or assist physicians in clinical diagnosis. In this research a medical image classification framework using data mining techniques is proposed. It involves feature extraction, feature selection, feature discretization and classification. In the classification phase, the performance of the traditional kNN k nearest neighbor classifier is improved using a feature weighting scheme and a distance weighted voting instead of simple majority voting. Feature weights are calculated using the interestingness measures used in association rule mining. Experiments on the retinal fundus images show that the proposed framework improves the classification accuracy of traditional kNN from 78.57 % to 92.85 %.
Abstract: In this paper, we investigated the characteristic of a
clinical dataseton the feature selection and classification
measurements which deal with missing values problem.And also
posed the appropriated techniques to achieve the aim of the activity;
in this research aims to find features that have high effect to mortality
and mortality time frame. We quantify the complexity of a clinical
dataset. According to the complexity of the dataset, we proposed the
data mining processto cope their complexity; missing values, high
dimensionality, and the prediction problem by using the methods of
missing value replacement, feature selection, and classification.The
experimental results will extend to develop the prediction model for
cardiology.
Abstract: Text categorization is the problem of classifying text
documents into a set of predefined classes. After a preprocessing
step, the documents are typically represented as large sparse vectors.
When training classifiers on large collections of documents, both the
time and memory restrictions can be quite prohibitive. This justifies
the application of feature selection methods to reduce the
dimensionality of the document-representation vector. In this paper,
we present three feature selection methods: Information Gain,
Support Vector Machine feature selection called (SVM_FS) and
Genetic Algorithm with SVM (called GA_SVM). We show that the
best results were obtained with GA_SVM method for a relatively
small dimension of the feature vector.
Abstract: Identification of cancer genes that might anticipate
the clinical behaviors from different types of cancer disease is
challenging due to the huge number of genes and small number of
patients samples. The new method is being proposed based on
supervised learning of classification like support vector machines
(SVMs).A new solution is described by the introduction of the
Maximized Margin (MM) in the subset criterion, which permits to
get near the least generalization error rate. In class prediction
problem, gene selection is essential to improve the accuracy and to
identify genes for cancer disease. The performance of the new
method was evaluated with real-world data experiment. It can give
the better accuracy for classification.
Abstract: This paper examines the available experiment data for a copper bromide vapor laser (CuBr laser), emitting at two wavelengths - 510.6 and 578.2nm. Laser output power is estimated based on 10 independent input physical parameters. A classification and regression tree (CART) model is obtained which describes 97% of data. The resulting binary CART tree specifies which input parameters influence considerably each of the classification groups. This allows for a technical assessment that indicates which of these are the most significant for the manufacture and operation of the type of laser under consideration. The predicted values of the laser output power are also obtained depending on classification. This aids the design and development processes considerably.
Abstract: A novel application of neural network approach to
fault classification and fault location of Medium voltage cables is
demonstrated in this paper. Different faults on a protected cable
should be classified and located correctly. This paper presents the use
of neural networks as a pattern classifier algorithm to perform these
tasks. The proposed scheme is insensitive to variation of different
parameters such as fault type, fault resistance, and fault inception
angle. Studies show that the proposed technique is able to offer high
accuracy in both of the fault classification and fault location tasks.
Abstract: Sonogram images of normal and lymphocyte thyroid tissues have considerable overlap which makes it difficult to interpret and distinguish. Classification from sonogram images of thyroid gland is tackled in semiautomatic way. While making manual diagnosis from images, some relevant information need not to be recognized by human visual system. Quantitative image analysis could be helpful to manual diagnostic process so far done by physician. Two classes are considered: normal tissue and chronic lymphocyte thyroid (Hashimoto's Thyroid). Data structure is analyzed using K-nearest-neighbors classification. This paper is mentioned that unlike the wavelet sub bands' energy, histograms and Haralick features are not appropriate to distinguish between normal tissue and Hashimoto's thyroid.
Abstract: Most of the Question Answering systems
composed of three main modules: question processing,
document processing and answer processing. Question
processing module plays an important role in QA systems. If
this module doesn't work properly, it will make problems for
other sections. Moreover answer processing module is an
emerging topic in Question Answering, where these systems
are often required to rank and validate candidate answers.
These techniques aiming at finding short and precise answers
are often based on the semantic classification.
This paper discussed about a new model for question
answering which improved two main modules, question
processing and answer processing.
There are two important components which are the bases
of the question processing. First component is question
classification that specifies types of question and answer.
Second one is reformulation which converts the user's
question into an understandable question by QA system in a
specific domain. Answer processing module, consists of
candidate answer filtering, candidate answer ordering
components and also it has a validation section for interacting
with user. This module makes it more suitable to find exact
answer. In this paper we have described question and answer
processing modules with modeling, implementing and
evaluating the system. System implemented in two versions.
Results show that 'Version No.1' gave correct answer to 70%
of questions (30 correct answers to 50 asked questions) and
'version No.2' gave correct answers to 94% of questions (47
correct answers to 50 asked questions).
Abstract: A lot of research has been done in the past decade in the field of audio content analysis for extracting various information from audio signal. One such significant information is the "perceived mood" or the "emotions" related to a music or audio clip. This information is extremely useful in applications like creating or adapting the play-list based on the mood of the listener. This information could also be helpful in better classification of the music database. In this paper we have presented a method to classify music not just based on the meta-data of the audio clip but also include the "mood" factor to help improve the music classification. We propose an automated and efficient way of classifying music samples based on the mood detection from the audio data. We in particular try to classify the music based on mood for Indian bollywood music. The proposed method tries to address the following problem statement: Genre information (usually part of the audio meta-data) alone does not help in better music classification. For example the acoustic version of the song "nothing else matters by Metallica" can be classified as melody music and thereby a person in relaxing or chill out mood might want to listen to this track. But more often than not this track is associated with metal / heavy rock genre and if a listener classified his play-list based on the genre information alone for his current mood, the user shall miss out on listening to this track. Currently methods exist to detect mood in western or similar kind of music. Our paper tries to solve the issue for Indian bollywood music from an Indian cultural context
Abstract: Microaneurysm is a key indicator of diabetic retinopathy that can potentially cause damage to retina. Early detection and automatic quantification are the keys to prevent further damage. In this paper, which focuses on automatic microaneurysm detection in images acquired through non-dilated pupils, we present a series of experiments on feature selection and automatic microaneurysm pixel classification. We found that the best feature set is a combination of 10 features: the pixel-s intensity of shade corrected image, the pixel hue, the standard deviation of shade corrected image, DoG4, the area of the candidate MA, the perimeter of the candidate MA, the eccentricity of the candidate MA, the circularity of the candidate MA, the mean intensity of the candidate MA on shade corrected image and the ratio of the major axis length and minor length of the candidate MA. The overall sensitivity, specificity, precision, and accuracy are 84.82%, 99.99%, 89.01%, and 99.99%, respectively.
Abstract: In this paper a Pattern Recognition algorithm based on
a constrained version of the k-means clustering algorithm will be
presented. The proposed algorithm is a non parametric supervised
statistical pattern recognition algorithm, i.e. it works under very mild
assumptions on the dataset. The performance of the algorithm will
be tested, togheter with a feature extraction technique that captures
the information on the closed two-dimensional contour of an image,
on images of industrial mineral ores.
Abstract: Patients with diabetes are susceptible to chronic foot
wounds which may be difficult to manage and slow to heal.
Diagnosis and treatment currently rely on the subjective judgement of
experienced professionals. An objective method of tissue assessment
is required. In this paper, a data fusion approach was taken to wound
tissue classification. The supervised Maximum Likelihood and
unsupervised Multi-Modal Expectation Maximisation algorithms
were used to classify tissues within simulated wound models by
weighting the contributions of both colour and 3D depth information.
It was found that, at low weightings, depth information could show
significant improvements in classification accuracy when compared
to classification by colour alone, particularly when using the
maximum likelihood method. However, larger weightings were
found to have an entirely negative effect on accuracy.
Abstract: Text document categorization involves large amount
of data or features. The high dimensionality of features is a
troublesome and can affect the performance of the classification.
Therefore, feature selection is strongly considered as one of the
crucial part in text document categorization. Selecting the best
features to represent documents can reduce the dimensionality of
feature space hence increase the performance. There were many
approaches has been implemented by various researchers to
overcome this problem. This paper proposed a novel hybrid approach
for feature selection in text document categorization based on Ant
Colony Optimization (ACO) and Information Gain (IG). We also
presented state-of-the-art algorithms by several other researchers.
Abstract: In this paper, a new learning algorithm based on a
hybrid metaheuristic integrating Differential Evolution (DE) and
Reduced Variable Neighborhood Search (RVNS) is introduced to train
the classification method PROAFTN. To apply PROAFTN, values of
several parameters need to be determined prior to classification. These
parameters include boundaries of intervals and relative weights for
each attribute. Based on these requirements, the hybrid approach,
named DEPRO-RVNS, is presented in this study. In some cases, the
major problem when applying DE to some classification problems
was the premature convergence of some individuals to local optima.
To eliminate this shortcoming and to improve the exploration and
exploitation capabilities of DE, such individuals were set to iteratively
re-explored using RVNS. Based on the generated results on
both training and testing data, it is shown that the performance of
PROAFTN is significantly improved. Furthermore, the experimental
study shows that DEPRO-RVNS outperforms well-known machine
learning classifiers in a variety of problems.
Abstract: This paper presents the effectiveness of artificial
intelligent technique to apply for pattern recognition and
classification of Partial Discharge (PD). Characteristics of PD signal
for pattern recognition and classification are computed from the
relation of the voltage phase angle, the discharge magnitude and the
repeated existing of partial discharges by using statistical and fractal
methods. The simplified fuzzy ARTMAP (SFAM) is used for pattern
recognition and classification as artificial intelligent technique. PDs
quantities, 13 parameters from statistical method and fractal method
results, are inputted to Simplified Fuzzy ARTMAP to train system
for pattern recognition and classification. The results confirm the
effectiveness of purpose technique.
Abstract: Text similarity measurement is a fundamental issue in
many textual applications such as document clustering, classification,
summarization and question answering. However, prevailing approaches
based on Vector Space Model (VSM) more or less suffer
from the limitation of Bag of Words (BOW), which ignores the semantic
relationship among words. Enriching document representation
with background knowledge from Wikipedia is proven to be an effective
way to solve this problem, but most existing methods still
cannot avoid similar flaws of BOW in a new vector space. In this
paper, we propose a novel text similarity measurement which goes
beyond VSM and can find semantic affinity between documents.
Specifically, it is a unified graph model that exploits Wikipedia as
background knowledge and synthesizes both document representation
and similarity computation. The experimental results on two different
datasets show that our approach significantly improves VSM-based
methods in both text clustering and classification.
Abstract: Cryo-electron microscopy (CEM) in combination with
single particle analysis (SPA) is a widely used technique for
elucidating structural details of macromolecular assemblies at closeto-
atomic resolutions. However, development of automated software
for SPA processing is still vital since thousands to millions of
individual particle images need to be processed. Here, we present our
workflow for automated particle picking. Our approach integrates
peak shape analysis to the classical correlation and an iterative
approach to separate macromolecules and background by
classification. This particle selection workflow furthermore provides
a robust means for SPA with little user interaction. Processing
simulated and experimental data assesses performance of the
presented tools.
Abstract: This paper proposes a novel hybrid algorithm for feature selection based on a binary ant colony and SVM. The final subset selection is attained through the elimination of the features that produce noise or, are strictly correlated with other already selected features. Our algorithm can improve classification accuracy with a small and appropriate feature subset. Proposed algorithm is easily implemented and because of use of a simple filter in that, its computational complexity is very low. The performance of the proposed algorithm is evaluated through a real Rotary Cement kiln dataset. The results show that our algorithm outperforms existing algorithms.
Abstract: A novel method of learning complex fuzzy decision regions in the n-dimensional feature space is proposed. Through the fuzzy decision regions, a given pattern's class membership value of every class is determined instead of the conventional crisp class the pattern belongs to. The n-dimensional fuzzy decision region is approximated by union of hyperellipsoids. By explicitly parameterizing these hyperellipsoids, the decision regions are determined by estimating the parameters of each hyperellipsoid.Genetic Algorithm is applied to estimate the parameters of each region component. With the global optimization ability of GA, the learned decision region can be arbitrarily complex.
Abstract: Brain Computer Interface (BCI) has been recently
increased in research. Functional Near Infrared Spectroscope (fNIRs)
is one the latest technologies which utilize light in the near-infrared
range to determine brain activities. Because near infrared technology
allows design of safe, portable, wearable, non-invasive and wireless
qualities monitoring systems, fNIRs monitoring of brain
hemodynamics can be value in helping to understand brain tasks. In
this paper, we present results of fNIRs signal analysis indicating that
there exist distinct patterns of hemodynamic responses which
recognize brain tasks toward developing a BCI. We applied two
different mathematics tools separately, Wavelets analysis for
preprocessing as signal filters and feature extractions and Neural
networks for cognition brain tasks as a classification module. We
also discuss and compare with other methods while our proposals
perform better with an average accuracy of 99.9% for classification.