Text Mining Technique for Data Mining Application

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Multiwavelet and Biological Signal Processing

In this paper we are to find the optimum multiwavelet for compression of electrocardiogram (ECG) signals and then, selecting it for using with SPIHT codec. At present, it is not well known which multiwavelet is the best choice for optimum compression of ECG. In this work, we examine different multiwavelets on 24 sets of ECG data with entirely different characteristics, selected from MIT-BIH database. For assessing the functionality of the different multiwavelets in compressing ECG signals, in addition to known factors such as Compression Ratio (CR), Percent Root Difference (PRD), Distortion (D), Root Mean Square Error (RMSE) in compression literature, we also employed the Cross Correlation (CC) criterion for studying the morphological relations between the reconstructed and the original ECG signal and Signal to reconstruction Noise Ratio (SNR). The simulation results show that the Cardinal Balanced Multiwavelet (cardbal2) by the means of identity (Id) prefiltering method to be the best effective transformation. After finding the most efficient multiwavelet, we apply SPIHT coding algorithm on the transformed signal by this multiwavelet.

Evolutionary Approach for Automated Discovery of Censored Production Rules

In the recent past, there has been an increasing interest in applying evolutionary methods to Knowledge Discovery in Databases (KDD) and a number of successful applications of Genetic Algorithms (GA) and Genetic Programming (GP) to KDD have been demonstrated. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. The PRs, however, are unable to handle exceptions and do not exhibit variable precision. The Censored Production Rules (CPRs), an extension of PRs, were proposed by Michalski & Winston that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations, in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence are tight or there is simply no information available as to whether it holds or not. Thus, the 'If P Then D' part of the CPR expresses important information, while the Unless C part acts only as a switch and changes the polarity of D to ~D. This paper presents a classification algorithm based on evolutionary approach that discovers comprehensible rules with exceptions in the form of CPRs. The proposed approach has flexible chromosome encoding, where each chromosome corresponds to a CPR. Appropriate genetic operators are suggested and a fitness function is proposed that incorporates the basic constraints on CPRs. Experimental results are presented to demonstrate the performance of the proposed algorithm.

Non-negative Principal Component Analysis for Face Recognition

Principle component analysis is often combined with the state-of-art classification algorithms to recognize human faces. However, principle component analysis can only capture these features contributing to the global characteristics of data because it is a global feature selection algorithm. It misses those features contributing to the local characteristics of data because each principal component only contains some levels of global characteristics of data. In this study, we present a novel face recognition approach using non-negative principal component analysis which is added with the constraint of non-negative to improve data locality and contribute to elucidating latent data structures. Experiments are performed on the Cambridge ORL face database. We demonstrate the strong performances of the algorithm in recognizing human faces in comparison with PCA and NREMF approaches.

Novelty as a Measure of Interestingness in Knowledge Discovery

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules leads to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach based on both objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules (knowledge). We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are promising.

Human Facial Expression Recognition using MANFIS Model

Facial expression analysis plays a significant role for human computer interaction. Automatic analysis of human facial expression is still a challenging problem with many applications. In this paper, we propose neuro-fuzzy based automatic facial expression recognition system to recognize the human facial expressions like happy, fear, sad, angry, disgust and surprise. Initially facial image is segmented into three regions from which the uniform Local Binary Pattern (LBP) texture features distributions are extracted and represented as a histogram descriptor. The facial expressions are recognized using Multiple Adaptive Neuro Fuzzy Inference System (MANFIS). The proposed system designed and tested with JAFFE face database. The proposed model reports 94.29% of classification accuracy.

Cluster Algorithm for Genetic Diversity

With the hardware technology advancing, the cost of storing is decreasing. Thus there is an urgent need for new techniques and tools that can intelligently and automatically assist us in transferring this data into useful knowledge. Different techniques of data mining are developed which are helpful for handling these large size databases [7]. Data mining is also finding its role in the field of biotechnology. Pedigree means the associated ancestry of a crop variety. Genetic diversity is the variation in the genetic composition of individuals within or among species. Genetic diversity depends upon the pedigree information of the varieties. Parents at lower hierarchic levels have more weightage for predicting genetic diversity as compared to the upper hierarchic levels. The weightage decreases as the level increases. For crossbreeding, the two varieties should be more and more genetically diverse so as to incorporate the useful characters of the two varieties in the newly developed variety. This paper discusses the searching and analyzing of different possible pairs of varieties selected on the basis of morphological characters, Climatic conditions and Nutrients so as to obtain the most optimal pair that can produce the required crossbreed variety. An algorithm was developed to determine the genetic diversity between the selected wheat varieties. Cluster analysis technique is used for retrieving the results.

Classification of Soil Aptness to Establish of Panicum virgatum in Mississippi using Sensitivity Analysis and GIS

During the last decade Panicum virgatum, known as Switchgrass, has been broadly studied because of its remarkable attributes as a substitute pasture and as a functional biofuel source. The objective of this investigation was to establish soil suitability for Switchgrass in the State of Mississippi. A linear weighted additive model was developed to forecast soil suitability. Multicriteria analysis and Sensitivity analysis were utilized to adjust and optimize the model. The model was fit using seven years of field data associated with soils characteristics collected from Natural Resources Conservation System - United States Department of Agriculture (NRCS-USDA). The best model was selected by correlating calculated biomass yield with each model's soils-based output for Switchgrass suitability. Coefficient of determination (r2) was the decisive factor used to establish the 'best' soil suitability model. Coefficients associated with the 'best' model were implemented within a Geographic Information System (GIS) to create a map of relative soil suitability for Switchgrass in Mississippi. A Geodatabase associated with soil parameters was built and is available for future Geographic Information System use.

Extraction of Craniofacial Landmarks for Preoperative to Intraoperative Registration

This paper presents the automated methods employed for extracting craniofacial landmarks in white light images as part of a registration framework designed to support three neurosurgical procedures. The intraoperative space is characterised by white light stereo imaging while the preoperative plan is performed on CT scans. The registration aims at aligning these two modalities to provide a calibrated environment to enable image-guided solutions. The neurosurgical procedures can then be carried out by mapping the entry and target points from CT space onto the patient-s space. The registration basis adopted consists of natural landmarks (eye corner and ear tragus). A 5mm accuracy is deemed sufficient for these three procedures and the validity of the selected registration basis in achieving this accuracy has been assessed by simulation studies. The registration protocol is briefly described, followed by a presentation of the automated techniques developed for the extraction of the craniofacial features and results obtained from tests on the AR and FERET databases. Since the three targeted neurosurgical procedures are routinely used for head injury management, the effect of bruised/swollen faces on the automated algorithms is assessed. A user-interactive method is proposed to deal with such unpredictable circumstances.

Evolving Neural Networks using Moment Method for Handwritten Digit Recognition

This paper proposes a neural network weights and topology optimization using genetic evolution and the backpropagation training algorithm. The proposed crossover and mutation operators aims to adapt the networks architectures and weights during the evolution process. Through a specific inheritance procedure, the weights are transmitted from the parents to their offsprings, which allows re-exploitation of the already trained networks and hence the acceleration of the global convergence of the algorithm. In the preprocessing phase, a new feature extraction method is proposed based on Legendre moments with the Maximum entropy principle MEP as a selection criterion. This allows a global search space reduction in the design of the networks. The proposed method has been applied and tested on the well known MNIST database of handwritten digits.

Iris Recognition Based On the Low Order Norms of Gradient Components

Iris pattern is an important biological feature of human body; it becomes very hot topic in both research and practical applications. In this paper, an algorithm is proposed for iris recognition and a simple, efficient and fast method is introduced to extract a set of discriminatory features using first order gradient operator applied on grayscale images. The gradient based features are robust, up to certain extents, against the variations may occur in contrast or brightness of iris image samples; the variations are mostly occur due lightening differences and camera changes. At first, the iris region is located, after that it is remapped to a rectangular area of size 360x60 pixels. Also, a new method is proposed for detecting eyelash and eyelid points; it depends on making image statistical analysis, to mark the eyelash and eyelid as a noise points. In order to cover the features localization (variation), the rectangular iris image is partitioned into N overlapped sub-images (blocks); then from each block a set of different average directional gradient densities values is calculated to be used as texture features vector. The applied gradient operators are taken along the horizontal, vertical and diagonal directions. The low order norms of gradient components were used to establish the feature vector. Euclidean distance based classifier was used as a matching metric for determining the degree of similarity between the features vector extracted from the tested iris image and template features vectors stored in the database. Experimental tests were performed using 2639 iris images from CASIA V4-Interival database, the attained recognition accuracy has reached up to 99.92%.

A New Face Recognition Method using PCA, LDA and Neural Network

In this paper, a new face recognition method based on PCA (principal Component Analysis), LDA (Linear Discriminant Analysis) and neural networks is proposed. This method consists of four steps: i) Preprocessing, ii) Dimension reduction using PCA, iii) feature extraction using LDA and iv) classification using neural network. Combination of PCA and LDA is used for improving the capability of LDA when a few samples of images are available and neural classifier is used to reduce number misclassification caused by not-linearly separable classes. The proposed method was tested on Yale face database. Experimental results on this database demonstrated the effectiveness of the proposed method for face recognition with less misclassification in comparison with previous methods.

Object Recognition in Color Images by the Self Configuring System MEMORI

System MEMORI automatically detects and recognizes rotated and/or rescaled versions of the objects of a database within digital color images with cluttered background. This task is accomplished by means of a region grouping algorithm guided by heuristic rules, whose parameters concern some geometrical properties and the recognition score of the database objects. This paper focuses on the strategies implemented in MEMORI for the estimation of the heuristic rule parameters. This estimation, being automatic, makes the system a self configuring and highly user-friendly tool.

Researches on Simulation and Validation of Airborne Enhanced Ground Proximity Warning System

In this paper, enhanced ground proximity warning simulation and validation system is designed and implemented. First, based on square grid and sub-grid structure, the global digital terrain database is designed and constructed. Terrain data searching is implemented through querying the latitude and longitude bands and separated zones of global terrain database with the current aircraft position. A combination of dynamic scheduling and hierarchical scheduling is adopted to schedule the terrain data, and the terrain data can be read and delete dynamically in the memory. Secondly, according to the scope, distance, approach speed information etc. to the dangerous terrain in front, and using security profiles calculating method, collision threat detection is executed in real-time, and provides caution and warning alarm. According to this scheme, the implementation of the enhanced ground proximity warning simulation system is realized. Simulations are carried out to verify a good real-time in terrain display and alarm trigger, and the results show simulation system is realized correctly, reasonably and stable.

Extending E-learning systems based on Clause-Rule model

E-Learning systems are used by many learners and teachers. The developer is developing the e-Learning system. However, the developer cannot do system construction to satisfy all of users- demands. We discuss a method of constructing e-Learning systems where learners and teachers can design, try to use, and share extending system functions that they want to use; which may be nally added to the system by system managers.

A Meta-Analytic Path Analysis of e-Learning Acceptance Model

This study reports results of a meta-analytic path analysis e-learning Acceptance Model with k = 27 studies, Databases searched included Information Sciences Institute (ISI) website. Variables recorded included perceived usefulness, perceived ease of use, attitude toward behavior, and behavioral intention to use e-learning. A correlation matrix of these variables was derived from meta-analytic data and then analyzed by using structural path analysis to test the fitness of the e-learning acceptance model to the observed aggregated data. Results showed the revised hypothesized model to be a reasonable, good fit to aggregated data. Furthermore, discussions and implications are given in this article.

Contribution to the Query Optimization in the Object-Oriented Databases

Appeared toward 1986, the object-oriented databases management systems had not known successes knew five years after their birth. One of the major difficulties is the query optimization. We propose in this paper a new approach that permits to enrich techniques of query optimization existing in the object-oriented databases. Seen success that knew the query optimization in the relational model, our approach inspires itself of these optimization techniques and enriched it so that they can support the new concepts introduced by the object databases.

A Novel Steganographic Method for Gray-Level Images

In this work we propose a novel Steganographic method for hiding information within the spatial domain of the gray scale image. The proposed approach works by dividing the cover into blocks of equal sizes and then embeds the message in the edge of the block depending on the number of ones in left four bits of the pixel. The proposed approach is tested on a database consists of 100 different images. Experimental results, compared with other methods, showed that the proposed approach hide more large information and gave a good visual quality stego-image that can be seen by human eyes.

A Generic, Functionally Comprehensive Approach to Maintaining an Ontology as a Relational Database

An ontology is a data model that represents a set of concepts in a given field and the relationships among those concepts. As the emphasis on achieving a semantic web continues to escalate, ontologies for all types of domains increasingly will be developed. These ontologies may become large and complex, and as their size and complexity grows, so will the need for multi-user interfaces for ontology curation. Herein a functionally comprehensive, generic approach to maintaining an ontology as a relational database is presented. Unlike many other ontology editors that utilize a database, this approach is entirely domain-generic and fully supports Webbased, collaborative editing including the designation of different levels of authorization for users.

Implicit Authorization Mechanism of Object-Oriented Database

Due to its special data structure and manipulative principle, Object-Oriented Database (OODB) has a particular security protection and authorization methods. This paper first introduces the features of security mechanism about OODB, and then talked about authorization checking process of OODB. Implicit authorization mechanism is based on the subject hierarchies, object hierarchies and access hierarchies of the security authorization modes, and simplifies the authorization mode. In addition, to combine with other authorization mechanisms, implicit authorization can make protection on the authorization of OODB expediently and effectively.