Abstract: In this paper, a method is provided for content-based image retrieval. Content-based image retrieval system searches query an image based on its visual content in an image database to retrieve similar images. In this paper, with the aim of simulating the human visual system sensitivity to image's edges and color features, the concept of color difference histogram (CDH) is used. CDH includes the perceptually color difference between two neighboring pixels with regard to colors and edge orientations. Since the HSV color space is close to the human visual system, the CDH is calculated in this color space. In addition, to improve the color features, the color histogram in HSV color space is also used as a feature. Among the extracted features, efficient features are selected using entropy and correlation criteria. The final features extract the content of images most efficiently. The proposed method has been evaluated on three standard databases Corel 5k, Corel 10k and UKBench. Experimental results show that the accuracy of the proposed image retrieval method is significantly improved compared to the recently developed methods.
Abstract: This paper presents a classifier ensemble approach for
predicting the survivability of the breast cancer patients using the
latest database version of the Surveillance, Epidemiology, and End
Results (SEER) Program of the National Cancer Institute. The system
consists of two main components; features selection and classifier
ensemble components. The features selection component divides the
features in SEER database into four groups. After that it tries to find
the most important features among the four groups that maximizes the
weighted average F-score of a certain classification algorithm. The
ensemble component uses three different classifiers, each of which
models different set of features from SEER through the features
selection module. On top of them, another classifier is used to give
the final decision based on the output decisions and confidence
scores from each of the underlying classifiers. Different classification
algorithms have been examined; the best setup found is by using the
decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the
underlying classifiers and Na¨ıve Bayes for the classifier ensemble
step. The system outperforms all published systems to date when
evaluated against the exact same data of SEER (period of 1973-2002).
It gives 87.39% weighted average F-score compared to 85.82% and
81.34% of the other published systems. By increasing the data size to
cover the whole database (period of 1973-2014), the overall weighted
average F-score jumps to 92.4% on the held out unseen test set.
Abstract: State-of-the-art methods for secondary structure (Porter, Psi-PRED, SAM-T99sec, Sable) and solvent accessibility (Sable, ACCpro) predictions use evolutionary profiles represented by the position specific scoring matrix (PSSM). It has been demonstrated that evolutionary profiles are the most important features in the feature space for these predictions. Unfortunately applying PSSM matrix leads to high dimensional feature spaces that may create problems with parameter optimization and generalization. Several recently published suggested that applying feature extraction for the PSSM matrix may result in improvements in secondary structure predictions. However, none of the top performing methods considered here utilizes dimensionality reduction to improve generalization. In the present study, we used simple and fast methods for features selection (t-statistics, information gain) that allow us to decrease the dimensionality of PSSM matrix by 75% and improve generalization in the case of secondary structure prediction compared to the Sable server.
Abstract: Sleep stage scoring is the process of classifying the
stage of the sleep in which the subject is in. Sleep is classified into
two states based on the constellation of physiological parameters.
The two states are the non-rapid eye movement (NREM) and the
rapid eye movement (REM). The NREM sleep is also classified into
four stages (1-4). These states and the state wakefulness are
distinguished from each other based on the brain activity. In this
work, a classification method for automated sleep stage scoring
based on a single EEG recording using wavelet packet decomposition
was implemented. Thirty two ploysomnographic recording from the
MIT-BIH database were used for training and validation of the
proposed method. A single EEG recording was extracted and
smoothed using Savitzky-Golay filter. Wavelet packets
decomposition up to the fourth level based on 20th order Daubechies
filter was used to extract features from the EEG signal. A features
vector of 54 features was formed. It was reduced to a size of 25 using
the gain ratio method and fed into a classifier of regression trees. The
regression trees were trained using 67% of the records available. The
records for training were selected based on cross validation of the
records. The remaining of the records was used for testing the
classifier. The overall correct rate of the proposed method was found
to be around 75%, which is acceptable compared to the techniques in
the literature.
Abstract: The healthcare environment is generally perceived as
being information rich yet knowledge poor. However, there is a lack
of effective analysis tools to discover hidden relationships and trends
in data. In fact, valuable knowledge can be discovered from
application of data mining techniques in healthcare system. In this
study, a proficient methodology for the extraction of significant
patterns from the Coronary Heart Disease warehouses for heart
attack prediction, which unfortunately continues to be a leading cause
of mortality in the whole world, has been presented. For this purpose,
we propose to enumerate dynamically the optimal subsets of the
reduced features of high interest by using rough sets technique
associated to dynamic programming. Therefore, we propose to
validate the classification using Random Forest (RF) decision tree to
identify the risky heart disease cases. This work is based on a large
amount of data collected from several clinical institutions based on
the medical profile of patient. Moreover, the experts- knowledge in
this field has been taken into consideration in order to define the
disease, its risk factors, and to establish significant knowledge
relationships among the medical factors. A computer-aided system is
developed for this purpose based on a population of 525 adults. The
performance of the proposed model is analyzed and evaluated based
on set of benchmark techniques applied in this classification problem.
Abstract: Content-based music retrieval generally involves analyzing, searching and retrieving music based on low or high level features of a song which normally used to represent artists, songs or music genre. Identifying them would normally involve feature extraction and classification tasks. Theoretically the greater features analyzed, the better the classification accuracy can be achieved but with longer execution time. Technique to select significant features is important as it will reduce dimensions of feature used in classification and contributes to the accuracy. Artificial Immune System (AIS) approach will be investigated and applied in the classification task. Bio-inspired audio content-based retrieval framework (B-ACRF) is proposed at the end of this paper where it embraces issues that need further consideration in music retrieval performances.
Abstract: Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of features selection methods to reduce the dimensionality of the document-representation vector. Four feature selection methods are evaluated: Random Selection, Information Gain (IG), Support Vector Machine (called SVM_FS) and Genetic Algorithm with SVM (GA_FS). We showed that the best results were obtained with SVM_FS and GA_FS methods for a relatively small dimension of the features vector comparative with the IG method that involves longer vectors, for quite similar classification accuracies. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).