Abstract: Text Mining is around applying knowledge discovery
techniques to unstructured text is termed knowledge discovery in text
(KDT), or Text data mining or Text Mining. In decision tree
approach is most useful in classification problem. With this
technique, tree is constructed to model the classification process.
There are two basic steps in the technique: building the tree and
applying the tree to the database. This paper describes a proposed
C5.0 classifier that performs rulesets, cross validation and boosting
for original C5.0 in order to reduce the optimization of error ratio.
The feasibility and the benefits of the proposed approach are
demonstrated by means of medial data set like hypothyroid. It is
shown that, the performance of a classifier on the training cases from
which it was constructed gives a poor estimate by sampling or using a
separate test file, either way, the classifier is evaluated on cases that
were not used to build and evaluate the classifier are both are large. If
the cases in hypothyroid.data and hypothyroid.test were to be
shuffled and divided into a new 2772 case training set and a 1000
case test set, C5.0 might construct a different classifier with a lower
or higher error rate on the test cases. An important feature of see5 is
its ability to classifiers called rulesets. The ruleset has an error rate
0.5 % on the test cases. The standard errors of the means provide an
estimate of the variability of results. One way to get a more reliable
estimate of predictive is by f-fold –cross- validation. The error rate of
a classifier produced from all the cases is estimated as the ratio of the
total number of errors on the hold-out cases to the total number of
cases. The Boost option with x trials instructs See5 to construct up to
x classifiers in this manner. Trials over numerous datasets, large and
small, show that on average 10-classifier boosting reduces the error
rate for test cases by about 25%.
Abstract: In this paper we are to find the optimum multiwavelet for compression of electrocardiogram (ECG) signals and then, selecting it for using with SPIHT codec. At present, it is not well known which multiwavelet is the best choice for optimum compression of ECG. In this work, we examine different multiwavelets on 24 sets of ECG data with entirely different characteristics, selected from MIT-BIH database. For assessing the functionality of the different multiwavelets in compressing ECG signals, in addition to known factors such as Compression Ratio (CR), Percent Root Difference (PRD), Distortion (D), Root Mean Square Error (RMSE) in compression literature, we also employed the Cross Correlation (CC) criterion for studying the morphological relations between the reconstructed and the original ECG signal and Signal to reconstruction Noise Ratio (SNR). The simulation results show that the Cardinal Balanced Multiwavelet (cardbal2) by the means of identity (Id) prefiltering method to be the best effective transformation. After finding the most efficient multiwavelet, we apply SPIHT coding algorithm on the transformed signal by this multiwavelet.
Abstract: In the recent past, there has been an increasing interest
in applying evolutionary methods to Knowledge Discovery in
Databases (KDD) and a number of successful applications of Genetic
Algorithms (GA) and Genetic Programming (GP) to KDD have been
demonstrated. The most predominant representation of the
discovered knowledge is the standard Production Rules (PRs) in the
form If P Then D. The PRs, however, are unable to handle
exceptions and do not exhibit variable precision. The Censored
Production Rules (CPRs), an extension of PRs, were proposed by
Michalski & Winston that exhibit variable precision and supports an
efficient mechanism for handling exceptions. A CPR is an
augmented production rule of the form:
If P Then D Unless C, where C (Censor) is an exception to the rule.
Such rules are employed in situations, in which the conditional
statement 'If P Then D' holds frequently and the assertion C holds
rarely. By using a rule of this type we are free to ignore the exception
conditions, when the resources needed to establish its presence are
tight or there is simply no information available as to whether it
holds or not. Thus, the 'If P Then D' part of the CPR expresses
important information, while the Unless C part acts only as a switch
and changes the polarity of D to ~D.
This paper presents a classification algorithm based on evolutionary
approach that discovers comprehensible rules with exceptions in the
form of CPRs.
The proposed approach has flexible chromosome encoding, where
each chromosome corresponds to a CPR. Appropriate genetic
operators are suggested and a fitness function is proposed that
incorporates the basic constraints on CPRs. Experimental results are
presented to demonstrate the performance of the proposed algorithm.
Abstract: Principle component analysis is often combined with
the state-of-art classification algorithms to recognize human faces.
However, principle component analysis can only capture these
features contributing to the global characteristics of data because it is a
global feature selection algorithm. It misses those features
contributing to the local characteristics of data because each principal
component only contains some levels of global characteristics of data.
In this study, we present a novel face recognition approach using
non-negative principal component analysis which is added with the
constraint of non-negative to improve data locality and contribute to
elucidating latent data structures. Experiments are performed on the
Cambridge ORL face database. We demonstrate the strong
performances of the algorithm in recognizing human faces in
comparison with PCA and NREMF approaches.
Abstract: Rule Discovery is an important technique for mining
knowledge from large databases. Use of objective measures for
discovering interesting rules leads to another data mining problem,
although of reduced complexity. Data mining researchers have
studied subjective measures of interestingness to reduce the volume
of discovered rules to ultimately improve the overall efficiency of
KDD process.
In this paper we study novelty of the discovered rules as a
subjective measure of interestingness. We propose a hybrid approach
based on both objective and subjective measures to quantify novelty
of the discovered rules in terms of their deviations from the known
rules (knowledge). We analyze the types of deviation that can arise
between two rules and categorize the discovered rules according to
the user specified threshold. We implement the proposed framework
and experiment with some public datasets. The experimental results
are promising.
Abstract: Facial expression analysis plays a significant role for
human computer interaction. Automatic analysis of human facial
expression is still a challenging problem with many applications. In
this paper, we propose neuro-fuzzy based automatic facial expression
recognition system to recognize the human facial expressions like
happy, fear, sad, angry, disgust and surprise. Initially facial image is
segmented into three regions from which the uniform Local Binary
Pattern (LBP) texture features distributions are extracted and
represented as a histogram descriptor. The facial expressions are
recognized using Multiple Adaptive Neuro Fuzzy Inference System
(MANFIS). The proposed system designed and tested with JAFFE
face database. The proposed model reports 94.29% of classification
accuracy.
Abstract: With the hardware technology advancing, the cost of
storing is decreasing. Thus there is an urgent need for new techniques
and tools that can intelligently and automatically assist us in
transferring this data into useful knowledge. Different techniques of
data mining are developed which are helpful for handling these large
size databases [7]. Data mining is also finding its role in the field of
biotechnology. Pedigree means the associated ancestry of a crop
variety. Genetic diversity is the variation in the genetic composition
of individuals within or among species. Genetic diversity depends
upon the pedigree information of the varieties. Parents at lower
hierarchic levels have more weightage for predicting genetic
diversity as compared to the upper hierarchic levels. The weightage
decreases as the level increases. For crossbreeding, the two varieties
should be more and more genetically diverse so as to incorporate the
useful characters of the two varieties in the newly developed variety.
This paper discusses the searching and analyzing of different possible
pairs of varieties selected on the basis of morphological characters,
Climatic conditions and Nutrients so as to obtain the most optimal
pair that can produce the required crossbreed variety. An algorithm
was developed to determine the genetic diversity between the
selected wheat varieties. Cluster analysis technique is used for
retrieving the results.
Abstract: During the last decade Panicum virgatum, known as
Switchgrass, has been broadly studied because of its remarkable
attributes as a substitute pasture and as a functional biofuel source.
The objective of this investigation was to establish soil suitability for
Switchgrass in the State of Mississippi. A linear weighted additive
model was developed to forecast soil suitability. Multicriteria
analysis and Sensitivity analysis were utilized to adjust and optimize
the model. The model was fit using seven years of field data
associated with soils characteristics collected from Natural Resources
Conservation System - United States Department of Agriculture
(NRCS-USDA). The best model was selected by correlating
calculated biomass yield with each model's soils-based output for
Switchgrass suitability. Coefficient of determination (r2) was the
decisive factor used to establish the 'best' soil suitability model.
Coefficients associated with the 'best' model were implemented
within a Geographic Information System (GIS) to create a map of
relative soil suitability for Switchgrass in Mississippi. A Geodatabase
associated with soil parameters was built and is available for future
Geographic Information System use.
Abstract: This paper presents the automated methods employed
for extracting craniofacial landmarks in white light images as part of
a registration framework designed to support three neurosurgical
procedures. The intraoperative space is characterised by white light
stereo imaging while the preoperative plan is performed on CT scans.
The registration aims at aligning these two modalities to provide a
calibrated environment to enable image-guided solutions. The
neurosurgical procedures can then be carried out by mapping the
entry and target points from CT space onto the patient-s space. The
registration basis adopted consists of natural landmarks (eye corner
and ear tragus). A 5mm accuracy is deemed sufficient for these three
procedures and the validity of the selected registration basis in
achieving this accuracy has been assessed by simulation studies. The
registration protocol is briefly described, followed by a presentation
of the automated techniques developed for the extraction of the
craniofacial features and results obtained from tests on the AR and
FERET databases. Since the three targeted neurosurgical procedures
are routinely used for head injury management, the effect of
bruised/swollen faces on the automated algorithms is assessed. A
user-interactive method is proposed to deal with such unpredictable
circumstances.
Abstract: This paper proposes a neural network weights and
topology optimization using genetic evolution and the
backpropagation training algorithm. The proposed crossover and
mutation operators aims to adapt the networks architectures and
weights during the evolution process. Through a specific inheritance
procedure, the weights are transmitted from the parents to their
offsprings, which allows re-exploitation of the already trained
networks and hence the acceleration of the global convergence of the
algorithm. In the preprocessing phase, a new feature extraction
method is proposed based on Legendre moments with the Maximum
entropy principle MEP as a selection criterion. This allows a global
search space reduction in the design of the networks. The proposed
method has been applied and tested on the well known MNIST
database of handwritten digits.
Abstract: Iris pattern is an important biological feature of human body; it becomes very hot topic in both research and practical applications. In this paper, an algorithm is proposed for iris recognition and a simple, efficient and fast method is introduced to extract a set of discriminatory features using first order gradient operator applied on grayscale images. The gradient based features are robust, up to certain extents, against the variations may occur in contrast or brightness of iris image samples; the variations are mostly occur due lightening differences and camera changes. At first, the iris region is located, after that it is remapped to a rectangular area of size 360x60 pixels. Also, a new method is proposed for detecting eyelash and eyelid points; it depends on making image statistical analysis, to mark the eyelash and eyelid as a noise points. In order to cover the features localization (variation), the rectangular iris image is partitioned into N overlapped sub-images (blocks); then from each block a set of different average directional gradient densities values is calculated to be used as texture features vector. The applied gradient operators are taken along the horizontal, vertical and diagonal directions. The low order norms of gradient components were used to establish the feature vector. Euclidean distance based classifier was used as a matching metric for determining the degree of similarity between the features vector extracted from the tested iris image and template features vectors stored in the database. Experimental tests were performed using 2639 iris images from CASIA V4-Interival database, the attained recognition accuracy has reached up to 99.92%.
Abstract: In this paper, a new face recognition method based on
PCA (principal Component Analysis), LDA (Linear Discriminant
Analysis) and neural networks is proposed. This method consists of
four steps: i) Preprocessing, ii) Dimension reduction using PCA, iii)
feature extraction using LDA and iv) classification using neural
network. Combination of PCA and LDA is used for improving the
capability of LDA when a few samples of images are available and
neural classifier is used to reduce number misclassification caused by
not-linearly separable classes. The proposed method was tested on
Yale face database. Experimental results on this database
demonstrated the effectiveness of the proposed method for face
recognition with less misclassification in comparison with previous
methods.
Abstract: System MEMORI automatically detects and recognizes
rotated and/or rescaled versions of the objects of a database within
digital color images with cluttered background. This task is accomplished
by means of a region grouping algorithm guided by heuristic
rules, whose parameters concern some geometrical properties and the
recognition score of the database objects. This paper focuses on the
strategies implemented in MEMORI for the estimation of the heuristic
rule parameters. This estimation, being automatic, makes the system
a self configuring and highly user-friendly tool.
Abstract: In this paper, enhanced ground proximity warning simulation and validation system is designed and implemented. First, based on square grid and sub-grid structure, the global digital terrain database is designed and constructed. Terrain data searching is implemented through querying the latitude and longitude bands and separated zones of global terrain database with the current aircraft position. A combination of dynamic scheduling and hierarchical scheduling is adopted to schedule the terrain data, and the terrain data can be read and delete dynamically in the memory. Secondly, according to the scope, distance, approach speed information etc. to the dangerous terrain in front, and using security profiles calculating method, collision threat detection is executed in real-time, and provides caution and warning alarm. According to this scheme, the implementation of the enhanced ground proximity warning simulation system is realized. Simulations are carried out to verify a good real-time in terrain display and alarm trigger, and the results show simulation system is realized correctly, reasonably and stable.
Abstract: E-Learning systems are used by many learners and
teachers. The developer is developing the e-Learning system. However,
the developer cannot do system construction to satisfy all of
users- demands. We discuss a method of constructing e-Learning
systems where learners and teachers can design, try to use, and share
extending system functions that they want to use; which may be nally
added to the system by system managers.
Abstract: This study reports results of a meta-analytic path analysis e-learning Acceptance Model with k = 27 studies, Databases searched included Information Sciences Institute (ISI) website. Variables recorded included perceived usefulness, perceived ease of use, attitude toward behavior, and behavioral intention to use e-learning. A correlation matrix of these variables was derived from meta-analytic data and then analyzed by using structural path analysis to test the fitness of the e-learning acceptance model to the observed aggregated data. Results showed the revised hypothesized model to be a reasonable, good fit to aggregated data. Furthermore, discussions and implications are given in this article.
Abstract: Appeared toward 1986, the object-oriented databases
management systems had not known successes knew five years after
their birth. One of the major difficulties is the query optimization.
We propose in this paper a new approach that permits to enrich
techniques of query optimization existing in the object-oriented
databases. Seen success that knew the query optimization in the
relational model, our approach inspires itself of these optimization
techniques and enriched it so that they can support the new concepts
introduced by the object databases.
Abstract: In this work we propose a novel Steganographic
method for hiding information within the spatial domain of the gray
scale image. The proposed approach works by dividing the cover into
blocks of equal sizes and then embeds the message in the edge of the
block depending on the number of ones in left four bits of the pixel.
The proposed approach is tested on a database consists of 100
different images. Experimental results, compared with other
methods, showed that the proposed approach hide more large
information and gave a good visual quality stego-image that can be
seen by human eyes.
Abstract: An ontology is a data model that represents a set of
concepts in a given field and the relationships among those concepts.
As the emphasis on achieving a semantic web continues to escalate,
ontologies for all types of domains increasingly will be developed.
These ontologies may become large and complex, and as their size
and complexity grows, so will the need for multi-user interfaces for
ontology curation. Herein a functionally comprehensive, generic
approach to maintaining an ontology as a relational database is
presented. Unlike many other ontology editors that utilize a database,
this approach is entirely domain-generic and fully supports Webbased,
collaborative editing including the designation of different
levels of authorization for users.
Abstract: Due to its special data structure and manipulative principle, Object-Oriented Database (OODB) has a particular security protection and authorization methods. This paper first introduces the features of security mechanism about OODB, and then talked about authorization checking process of OODB. Implicit authorization mechanism is based on the subject hierarchies, object hierarchies and access hierarchies of the security authorization modes, and simplifies the authorization mode. In addition, to combine with other authorization mechanisms, implicit authorization can make protection on the authorization of OODB expediently and effectively.