Abstract: The paper presents the case study of hazard
identification and sensitivity of potential resource of emergency
water supply as part of the application of methodology classifying
the resources of drinking water for emergency supply of population.
The case study has been carried out on a selected resource of
emergency water supply in one region of the Czech Republic. The
hazard identification and sensitivity of potential resource of
emergency water supply is based on a unique procedure and
developed general registers of selected types of hazards and
sensitivities. The registers have been developed with the help of the
“Fault Tree Analysis” method in combination with the “What if
method”. The identified hazards for the assessed resource include
hailstorms and torrential rains, drought, soil erosion, accidents of
farm machinery, and agricultural production. The developed registers
of hazards and vulnerabilities and a semi-quantitative assessment of
hazards for individual parts of hydrological structure and
technological elements of presented drilled wells are the basis for a
semi-quantitative risk assessment of potential resource of emergency
supply of population and the subsequent classification of such
resource within the system of crisis planning.
Abstract: In this study, fuzzy rule-based classifier is used for the
diagnosis of congenital heart disease. Congenital heart diseases are
defined as structural or functional heart disease. Medical data sets
were obtained from Pediatric Cardiology Department at Selcuk
University, from years 2000 to 2003. Firstly, fuzzy rules were
generated by using medical data. Then the weights of fuzzy rules
were calculated. Two different reasoning methods as “weighted vote
method" and “singles winner method" were used in this study. The
results of fuzzy classifiers were compared.
Abstract: Text Mining is an important step of Knowledge
Discovery process. It is used to extract hidden information from notstructured
o semi-structured data. This aspect is fundamental because
much of the Web information is semi-structured due to the nested
structure of HTML code, much of the Web information is linked,
much of the Web information is redundant. Web Text Mining helps
whole knowledge mining process to mining, extraction and
integration of useful data, information and knowledge from Web
page contents.
In this paper, we present a Web Text Mining process able to
discover knowledge in a distributed and heterogeneous multiorganization
environment. The Web Text Mining process is based on
flexible architecture and is implemented by four steps able to
examine web content and to extract useful hidden information
through mining techniques. Our Web Text Mining prototype starts
from the recovery of Web job offers in which, through a Text Mining
process, useful information for fast classification of the same are
drawn out, these information are, essentially, job offer place and
skills.
Abstract: In this paper a data miner based on the learning
automata is proposed and is called LA-miner. The LA-miner extracts
classification rules from data sets automatically. The proposed
algorithm is established based on the function optimization using
learning automata. The experimental results on three benchmarks
indicate that the performance of the proposed LA-miner is
comparable with (sometimes better than) the Ant-miner (a data miner
algorithm based on the Ant Colony optimization algorithm) and CNZ
(a well-known data mining algorithm for classification).
Abstract: Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.
Abstract: This paper proposes a novel feature extraction method,
based on Discrete Wavelet Transform (DWT) and K-L Seperability
(KLS), for the classification of Functional Data (FD). This method
combines the decorrelation and reduction property of DWT and the
additive independence property of KLS, which is helpful to extraction
classification features of FD. It is an advanced approach of the
popular wavelet based shrinkage method for functional data reduction
and classification. A theory analysis is given in the paper to prove the
consistent convergence property, and a simulation study is also done
to compare the proposed method with the former shrinkage ones. The
experiment results show that this method has advantages in improving
classification efficiency, precision and robustness.
Abstract: Complex systems are composed of several plain interacting independent entities. Interaction between these entities creates a unified behavior at the global level that cannot be predicted by examining the behavior of any single individual component of the system. In this paper we consider a welded frame of an automobile trailer as a real example of Complex Technical Systems, The purpose of this paper is to introduce a Statistical method for predicting the life cycle of complex technical systems. To organize gathering of primary data for modeling the life cycle of complex technical systems an “Automobile Trailer Frame" were used as a prototype in this research. The prototype represents a welded structure of several pieces. Both information flows underwent a computerized analysis and classification for the acquisition of final results to reach final recommendations for improving the trailers structure and their operational conditions.
Abstract: Cluster analysis is the name given to a diverse collection of techniques that can be used to classify objects (e.g. individuals, quadrats, species etc). While Kohonen's Self-Organizing Feature Map (SOFM) or Self-Organizing Map (SOM) networks have been successfully applied as a classification tool to various problem domains, including speech recognition, image data compression, image or character recognition, robot control and medical diagnosis, its potential as a robust substitute for clustering analysis remains relatively unresearched. SOM networks combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an a priori grid and provide a powerful tool for data visualization. In this paper, SOM is used for creating a toroidal mapping of two-dimensional lattice to perform cluster analysis on results of a chemical analysis of wines produced in the same region in Italy but derived from three different cultivators, referred to as the “wine recognition data" located in the University of California-Irvine database. The results are encouraging and it is believed that SOM would make an appealing and powerful decision-support system tool for clustering tasks and for data visualization.
Abstract: The customary practice of identifying industrial sickness is a set traditional techniques which rely upon a range of manual monitoring and compilation of financial records. It makes the process tedious, time consuming and often are susceptible to manipulation. Therefore, certain readily available tools are required which can deal with such uncertain situations arising out of industrial sickness. It is more significant for a country like India where the fruits of development are rarely equally distributed. In this paper, we propose an approach based on Artificial Neural Network (ANN) to deal with industrial sickness with specific focus on a few such units taken from a less developed north-east (NE) Indian state like Assam. The proposed system provides decision regarding industrial sickness using eight different parameters which are directly related to the stages of sickness of such units. The mechanism primarily uses certain signals and symptoms of industrial health to decide upon the state of a unit. Specifically, we formulate an ANN based block with data obtained from a few selected units of Assam so that required decisions related to industrial health could be taken. The system thus formulated could become an important part of planning and development. It can also contribute towards computerization of decision support systems related to industrial health and help in better management.
Abstract: The k-nearest neighbors (knn) is a simple but effective method of classification. In this paper we present an extended version of this technique for chemical compounds used in High Throughput Screening, where the distances of the nearest neighbors can be taken into account. Our algorithm uses kernel weight functions as guidance for the process of defining activity in screening data. Proposed kernel weight function aims to combine properties of graphical structure and molecule descriptors of screening compounds. We apply the modified knn method on several experimental data from biological screens. The experimental results confirm the effectiveness of the proposed method.
Abstract: With deep development of software reuse, componentrelated
technologies have been widely applied in the development of
large-scale complex applications. Component identification (CI) is
one of the primary research problems in software reuse, by analyzing
domain business models to get a set of business components with high
reuse value and good reuse performance to support effective reuse.
Based on the concept and classification of CI, its technical stack is
briefly discussed from four views, i.e., form of input business models,
identification goals, identification strategies, and identification
process. Then various CI methods presented in literatures are
classified into four types, i.e., domain analysis based methods,
cohesion-coupling based clustering methods, CRUD matrix based
methods, and other methods, with the comparisons between these
methods for their advantages and disadvantages. Additionally, some
insufficiencies of study on CI are discussed, and the causes are
explained subsequently. Finally, it is concluded with some
significantly promising tendency about research on this problem.
Abstract: To explore pipelines is one of various bio-mimetic
robot applications. The robot may work in common buildings such as
between ceilings and ducts, in addition to complicated and massive
pipeline systems of large industrial plants. The bio-mimetic robot finds
any troubled area or malfunction and then reports its data. Importantly,
it can not only prepare for but also react to any abnormal routes in the
pipeline. The pipeline monitoring tasks require special types of mobile
robots. For an effective movement along a pipeline, the movement of
the robot will be similar to that of insects or crawling animals. During
its movement along the pipelines, a pipeline monitoring robot has an
important task of finding the shapes of the approaching path on the
pipes. In this paper we propose an effective solution to the pipeline
pattern recognition, based on the fuzzy classification rules for the
measured IR distance data.
Abstract: Support vector machines (SVMs) are considered to be
the best machine learning algorithms for minimizing the predictive
probability of misclassification. However, their drawback is that for
large data sets the computation of the optimal decision boundary is a
time consuming function of the size of the training set. Hence several
methods have been proposed to speed up the SVM algorithm. Here
three methods used to speed up the computation of the SVM
classifiers are compared experimentally using a musical genre
classification problem. The simplest method pre-selects a random
sample of the data before the application of the SVM algorithm. Two
additional methods use proximity graphs to pre-select data that are
near the decision boundary. One uses k-Nearest Neighbor graphs and
the other Relative Neighborhood Graphs to accomplish the task.
Abstract: This paper deals with the extraction of information from the experts to automatically identify and recognize Ganoderma infection in oil palm stem using tomography images. Expert-s knowledge are used as rules in a Fuzzy Inference Systems to classify each individual patterns observed in he tomography image. The classification is done by defining membership functions which assigned a set of three possible hypotheses : Ganoderma infection (G), non Ganoderma infection (N) or intact stem tissue (I) to every abnormalities pattern found in the tomography image. A complete comparison between Mamdani and Sugeno style,triangular, trapezoids and mixed triangular-trapezoids membership functions and different methods of aggregation and defuzzification is also presented and analyzed to select suitable Fuzzy Inference System methods to perform the above mentioned task. The results showed that seven out of 30 initial possible combination of available Fuzzy Inference methods in MATLAB Fuzzy Toolbox were observed giving result close to the experts estimation.
Abstract: The development of aid's systems for the medical
diagnosis is not easy thing because of presence of inhomogeneities in
the MRI, the variability of the data from a sequence to the other as
well as of other different source distortions that accentuate this
difficulty. A new automatic, contextual, adaptive and robust
segmentation procedure by MRI brain tissue classification is
described in this article. A first phase consists in estimating the
density of probability of the data by the Parzen-Rozenblatt method.
The classification procedure is completely automatic and doesn't
make any assumptions nor on the clusters number nor on the
prototypes of these clusters since these last are detected in an
automatic manner by an operator of mathematical morphology called
skeleton by influence zones detection (SKIZ). The problem of
initialization of the prototypes as well as their number is transformed
in an optimization problem; in more the procedure is adaptive since it
takes in consideration the contextual information presents in every
voxel by an adaptive and robust non parametric model by the
Markov fields (MF). The number of bad classifications is reduced by
the use of the criteria of MPM minimization (Maximum Posterior
Marginal).
Abstract: Steel surface defect detection is essentially one of
pattern recognition problems. Support Vector Machines (SVMs) are
known as one of the most proper classifiers in this application. In this
paper, we introduce a more accurate classification method by using
SVMs as our final classifier of the inspection system. In this scheme,
multiclass classification task is performed based on the "one-againstone"
method and different kernels are utilized for each pair of the
classes in multiclass classification of the different defects.
In the proposed system, a decision tree is employed in the first
stage for two-class classification of the steel surfaces to "defect" and
"non-defect", in order to decrease the time complexity. Based on
the experimental results, generated from over one thousand images,
the proposed multiclass classification scheme is more accurate than
the conventional methods and the overall system yields a sufficient
performance which can meet the requirements in steel manufacturing.
Abstract: Studies on the distribution of traffic demands have
been proceeding by providing traffic information for reducing
greenhouse gases and reinforcing the road's competitiveness in the
transport section, however, since it is preferentially required the
extensive studies on the driver's behavior changing routes and its
influence factors, this study has been developed a discriminant model
for changing routes considering driving conditions including traffic
conditions of roads and driver's preferences for information media. It
is divided into three groups depending on driving conditions in group
classification with the CART analysis, which is statistically
meaningful. And the extent that driving conditions and preferred
media affect a route change is examined through a discriminant
analysis, and it is developed a discriminant model equation to predict a
route change. As a result of building the discriminant model equation,
it is shown that driving conditions affect a route change much more,
the entire discriminant hit ratio is derived as 64.2%, and this
discriminant equation shows high discriminant ability more than a
certain degree.
Abstract: Serial hierarchical support vector machine (SHSVM)
is proposed to discriminate three brain tissues which are white matter
(WM), gray matter (GM), and cerebrospinal fluid (CSF). SHSVM
has novel classification approach by repeating the hierarchical
classification on data set iteratively. It used Radial Basis Function
(rbf) Kernel with different tuning to obtain accurate results. Also as
the second approach, segmentation performed with DAGSVM
method. In this article eight univariate features from the raw DTI data
are extracted and all the possible 2D feature sets are examined within
the segmentation process. SHSVM succeed to obtain DSI values
higher than 0.95 accuracy for all the three tissues, which are higher
than DAGSVM results.
Abstract: It is an important task in Korean-English machine
translation to classify the gender of names correctly. When a sentence
is composed of two or more clauses and only one subject is given as a proper noun, it is important to find the gender of the proper noun
for correct translation of the sentence. This is because a singular pronoun has a gender in English while it does not in Korean. Thus,
in Korean-English machine translation, the gender of a proper noun should be determined. More generally, this task can be expanded into the classification of the general Korean names. This paper proposes a statistical method for this problem. By considering a name as just
a sequence of syllables, it is possible to get a statistics for each name from a collection of names. An evaluation of the proposed method
yields the improvement in accuracy over the simple looking-up of the
collection. While the accuracy of the looking-up method is 64.11%, that of the proposed method is 81.49%. This implies that the proposed
method is more plausible for the gender classification of the Korean names.
Abstract: The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag Of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained.