Identification of Printed Punjabi Words and English Numerals Using Gabor Features

Script identification is one of the challenging steps in the development of optical character recognition system for bilingual or multilingual documents. In this paper an attempt is made for identification of English numerals at word level from Punjabi documents by using Gabor features. The support vector machine (SVM) classifier with five fold cross validation is used to classify the word images. The results obtained are quite encouraging. Average accuracy with RBF kernel, Polynomial and Linear Kernel functions comes out to be greater than 99%.

Assessing and Visualizing the Stability of Feature Selectors: A Case Study with Spectral Data

Feature selection plays an important role in applications with high dimensional data. The assessment of the stability of feature selection/ranking algorithms becomes an important issue when the dataset is small and the aim is to gain insight into the underlying process by analyzing the most relevant features. In this work, we propose a graphical approach that enables to analyze the similarity between feature ranking techniques as well as their individual stability. Moreover, it works with whatever stability metric (Canberra distance, Spearman's rank correlation coefficient, Kuncheva's stability index,...). We illustrate this visualization technique evaluating the stability of several feature selection techniques on a spectral binary dataset. Experimental results with a neural-based classifier show that stability and ranking quality may not be linked together and both issues have to be studied jointly in order to offer answers to the domain experts.

Texture Feature Extraction of Infrared River Ice Images using Second-Order Spatial Statistics

Ice cover County has a significant impact on rivers as it affects with the ice melting capacity which results in flooding, restrict navigation, modify the ecosystem and microclimate. River ices are made up of different ice types with varying ice thickness, so surveillance of river ice plays an important role. River ice types are captured using infrared imaging camera which captures the images even during the night times. In this paper the river ice infrared texture images are analysed using first-order statistical methods and secondorder statistical methods. The second order statistical methods considered are spatial gray level dependence method, gray level run length method and gray level difference method. The performance of the feature extraction methods are evaluated by using Probabilistic Neural Network classifier and it is found that the first-order statistical method and second-order statistical method yields low accuracy. So the features extracted from the first-order statistical method and second-order statistical method are combined and it is observed that the result of these combined features (First order statistical method + gray level run length method) provides higher accuracy when compared with the features from the first-order statistical method and second-order statistical method alone.

Massive Lesions Classification using Features based on Morphological Lesion Differences

Purpose of this work is the development of an automatic classification system which could be useful for radiologists in the investigation of breast cancer. The software has been designed in the framework of the MAGIC-5 collaboration. In the automatic classification system the suspicious regions with high probability to include a lesion are extracted from the image as regions of interest (ROIs). Each ROI is characterized by some features based on morphological lesion differences. Some classifiers as a Feed Forward Neural Network, a K-Nearest Neighbours and a Support Vector Machine are used to distinguish the pathological records from the healthy ones. The results obtained in terms of sensitivity (percentage of pathological ROIs correctly classified) and specificity (percentage of non-pathological ROIs correctly classified) will be presented through the Receive Operating Characteristic curve (ROC). In particular the best performances are 88% ± 1 of area under ROC curve obtained with the Feed Forward Neural Network.

Optimizing Mobile Agents Migration Based on Decision Tree Learning

Mobile agents are a powerful approach to develop distributed systems since they migrate to hosts on which they have the resources to execute individual tasks. In a dynamic environment like a peer-to-peer network, Agents have to be generated frequently and dispatched to the network. Thus they will certainly consume a certain amount of bandwidth of each link in the network if there are too many agents migration through one or several links at the same time, they will introduce too much transferring overhead to the links eventually, these links will be busy and indirectly block the network traffic, therefore, there is a need of developing routing algorithms that consider about traffic load. In this paper we seek to create cooperation between a probabilistic manner according to the quality measure of the network traffic situation and the agent's migration decision making to the next hop based on decision tree learning algorithms.

Multi-Label Hierarchical Classification for Protein Function Prediction

Hierarchical classification is a problem with applications in many areas as protein function prediction where the dates are hierarchically structured. Therefore, it is necessary the development of algorithms able to induce hierarchical classification models. This paper presents experimenters using the algorithm for hierarchical classification called Multi-label Hierarchical Classification using a Competitive Neural Network (MHC-CNN). It was tested in ten datasets the Gene Ontology (GO) Cellular Component Domain. The results are compared with the Clus-HMC and Clus-HSC using the hF-Measure.

Comparison between Associative Classification and Decision Tree for HCV Treatment Response Prediction

Combined therapy using Interferon and Ribavirin is the standard treatment in patients with chronic hepatitis C. However, the number of responders to this treatment is low, whereas its cost and side effects are high. Therefore, there is a clear need to predict patient’s response to the treatment based on clinical information to protect the patients from the bad drawbacks, Intolerable side effects and waste of money. Different machine learning techniques have been developed to fulfill this purpose. From these techniques are Associative Classification (AC) and Decision Tree (DT). The aim of this research is to compare the performance of these two techniques in the prediction of virological response to the standard treatment of HCV from clinical information. 200 patients treated with Interferon and Ribavirin; were analyzed using AC and DT. 150 cases had been used to train the classifiers and 50 cases had been used to test the classifiers. The experiment results showed that the two techniques had given acceptable results however the best accuracy for the AC reached 92% whereas for DT reached 80%.

A Kernel Based Rejection Method for Supervised Classification

In this paper we are interested in classification problems with a performance constraint on error probability. In such problems if the constraint cannot be satisfied, then a rejection option is introduced. For binary labelled classification, a number of SVM based methods with rejection option have been proposed over the past few years. All of these methods use two thresholds on the SVM output. However, in previous works, we have shown on synthetic data that using thresholds on the output of the optimal SVM may lead to poor results for classification tasks with performance constraint. In this paper a new method for supervised classification with rejection option is proposed. It consists in two different classifiers jointly optimized to minimize the rejection probability subject to a given constraint on error rate. This method uses a new kernel based linear learning machine that we have recently presented. This learning machine is characterized by its simplicity and high training speed which makes the simultaneous optimization of the two classifiers computationally reasonable. The proposed classification method with rejection option is compared to a SVM based rejection method proposed in recent literature. Experiments show the superiority of the proposed method.

Clinical Decision Support for Disease Classification based on the Tests Association

Until recently, researchers have developed various tools and methodologies for effective clinical decision-making. Among those decisions, chest pain diseases have been one of important diagnostic issues especially in an emergency department. To improve the ability of physicians in diagnosis, many researchers have developed diagnosis intelligence by using machine learning and data mining. However, most of the conventional methodologies have been generally based on a single classifier for disease classification and prediction, which shows moderate performance. This study utilizes an ensemble strategy to combine multiple different classifiers to help physicians diagnose chest pain diseases more accurately than ever. Specifically the ensemble strategy is applied by using the integration of decision trees, neural networks, and support vector machines. The ensemble models are applied to real-world emergency data. This study shows that the performance of the ensemble models is superior to each of single classifiers.

Comparative Study of Filter Characteristics as Statistical Vocal Correlates of Clinical Psychiatric State in Human

Acoustical properties of speech have been shown to be related to mental states of speaker with symptoms: depression and remission. This paper describes way to address the issue of distinguishing depressed patients from remitted subjects based on measureable acoustics change of their spoken sound. The vocal-tract related frequency characteristics of speech samples from female remitted and depressed patients were analyzed via speech processing techniques and consequently, evaluated statistically by cross-validation with Support Vector Machine. Our results comparatively show the classifier's performance with effectively correct separation of 93% determined from testing with the subjectbased feature model and 88% from the frame-based model based on the same speech samples collected from hospital visiting interview sessions between patients and psychiatrists.

Improved Tropical Wood Species Recognition System based on Multi-feature Extractor and Classifier

An automated wood recognition system is designed to classify tropical wood species.The wood features are extracted based on two feature extractors: Basic Grey Level Aura Matrix (BGLAM) technique and statistical properties of pores distribution (SPPD) technique. Due to the nonlinearity of the tropical wood species separation boundaries, a pre classification stage is proposed which consists ofKmeans clusteringand kernel discriminant analysis (KDA). Finally, Linear Discriminant Analysis (LDA) classifier and KNearest Neighbour (KNN) are implemented for comparison purposes. The study involves comparison of the system with and without pre classification using KNN classifier and LDA classifier.The results show that the inclusion of the pre classification stage has improved the accuracy of both the LDA and KNN classifiers by more than 12%.

Text Mining Technique for Data Mining Application

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

An Adaptive Model for Blind Image Restoration using Bayesian Approach

Image restoration involves elimination of noise. Filtering techniques were adopted so far to restore images since last five decades. In this paper, we consider the problem of image restoration degraded by a blur function and corrupted by random noise. A method for reducing additive noise in images by explicit analysis of local image statistics is introduced and compared to other noise reduction methods. The proposed method, which makes use of an a priori noise model, has been evaluated on various types of images. Bayesian based algorithms and technique of image processing have been described and substantiated with experimentation using MATLAB.

Gradual Shot Boundary Detection and Classification Based on Fractal Analysis

Shot boundary detection is a fundamental step for the organization of large video data. In this paper, we propose a new method for video gradual shots detection and classification, using advantages of fractal analysis and AIS-based classifier. Proposed features are “vertical intercept" and “fractal dimension" of each frame of videos which are computed using Fourier transform coefficients. We also used a classifier based on Clonal Selection Algorithm. We have carried out our solution and assessed it according to the TRECVID2006 benchmark dataset.

Association Rule and Decision Tree based Methodsfor Fuzzy Rule Base Generation

This paper focuses on the data-driven generation of fuzzy IF...THEN rules. The resulted fuzzy rule base can be applied to build a classifier, a model used for prediction, or it can be applied to form a decision support system. Among the wide range of possible approaches, the decision tree and the association rule based algorithms are overviewed, and two new approaches are presented based on the a priori fuzzy clustering based partitioning of the continuous input variables. An application study is also presented, where the developed methods are tested on the well known Wisconsin Breast Cancer classification problem.

Iris Recognition Based On the Low Order Norms of Gradient Components

Iris pattern is an important biological feature of human body; it becomes very hot topic in both research and practical applications. In this paper, an algorithm is proposed for iris recognition and a simple, efficient and fast method is introduced to extract a set of discriminatory features using first order gradient operator applied on grayscale images. The gradient based features are robust, up to certain extents, against the variations may occur in contrast or brightness of iris image samples; the variations are mostly occur due lightening differences and camera changes. At first, the iris region is located, after that it is remapped to a rectangular area of size 360x60 pixels. Also, a new method is proposed for detecting eyelash and eyelid points; it depends on making image statistical analysis, to mark the eyelash and eyelid as a noise points. In order to cover the features localization (variation), the rectangular iris image is partitioned into N overlapped sub-images (blocks); then from each block a set of different average directional gradient densities values is calculated to be used as texture features vector. The applied gradient operators are taken along the horizontal, vertical and diagonal directions. The low order norms of gradient components were used to establish the feature vector. Euclidean distance based classifier was used as a matching metric for determining the degree of similarity between the features vector extracted from the tested iris image and template features vectors stored in the database. Experimental tests were performed using 2639 iris images from CASIA V4-Interival database, the attained recognition accuracy has reached up to 99.92%.

Persian Printed Numerals Classification Using Extended Moment Invariants

Classification of Persian printed numeral characters has been considered and a proposed system has been introduced. In representation stage, for the first time in Persian optical character recognition, extended moment invariants has been utilized as characters image descriptor. In classification stage, four different classifiers namely minimum mean distance, nearest neighbor rule, multi layer perceptron, and fuzzy min-max neural network has been used, which first and second are traditional nonparametric statistical classifier. Third is a well-known neural network and forth is a kind of fuzzy neural network that is based on utilizing hyperbox fuzzy sets. Set of different experiments has been done and variety of results has been presented. The results showed that extended moment invariants are qualified as features to classify Persian printed numeral characters.

A New Face Recognition Method using PCA, LDA and Neural Network

In this paper, a new face recognition method based on PCA (principal Component Analysis), LDA (Linear Discriminant Analysis) and neural networks is proposed. This method consists of four steps: i) Preprocessing, ii) Dimension reduction using PCA, iii) feature extraction using LDA and iv) classification using neural network. Combination of PCA and LDA is used for improving the capability of LDA when a few samples of images are available and neural classifier is used to reduce number misclassification caused by not-linearly separable classes. The proposed method was tested on Yale face database. Experimental results on this database demonstrated the effectiveness of the proposed method for face recognition with less misclassification in comparison with previous methods.

One-Class Support Vector Machines for Protein-Protein Interactions Prediction

Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.

Trajectory Guided Recognition of Hand Gestures having only Global Motions

One very interesting field of research in Pattern Recognition that has gained much attention in recent times is Gesture Recognition. In this paper, we consider a form of dynamic hand gestures that are characterized by total movement of the hand (arm) in space. For these types of gestures, the shape of the hand (palm) during gesturing does not bear any significance. In our work, we propose a model-based method for tracking hand motion in space, thereby estimating the hand motion trajectory. We employ the dynamic time warping (DTW) algorithm for time alignment and normalization of spatio-temporal variations that exist among samples belonging to the same gesture class. During training, one template trajectory and one prototype feature vector are generated for every gesture class. Features used in our work include some static and dynamic motion trajectory features. Recognition is accomplished in two stages. In the first stage, all unlikely gesture classes are eliminated by comparing the input gesture trajectory to all the template trajectories. In the next stage, feature vector extracted from the input gesture is compared to all the class prototype feature vectors using a distance classifier. Experimental results demonstrate that our proposed trajectory estimator and classifier is suitable for Human Computer Interaction (HCI) platform.