Hazard Identification and Sensitivity of Potential Resource of Emergency Water Supply

The paper presents the case study of hazard identification and sensitivity of potential resource of emergency water supply as part of the application of methodology classifying the resources of drinking water for emergency supply of population. The case study has been carried out on a selected resource of emergency water supply in one region of the Czech Republic. The hazard identification and sensitivity of potential resource of emergency water supply is based on a unique procedure and developed general registers of selected types of hazards and sensitivities. The registers have been developed with the help of the “Fault Tree Analysis” method in combination with the “What if method”. The identified hazards for the assessed resource include hailstorms and torrential rains, drought, soil erosion, accidents of farm machinery, and agricultural production. The developed registers of hazards and vulnerabilities and a semi-quantitative assessment of hazards for individual parts of hydrological structure and technological elements of presented drilled wells are the basis for a semi-quantitative risk assessment of potential resource of emergency supply of population and the subsequent classification of such resource within the system of crisis planning.

A Diagnostic Fuzzy Rule-Based System for Congenital Heart Disease

In this study, fuzzy rule-based classifier is used for the diagnosis of congenital heart disease. Congenital heart diseases are defined as structural or functional heart disease. Medical data sets were obtained from Pediatric Cardiology Department at Selcuk University, from years 2000 to 2003. Firstly, fuzzy rules were generated by using medical data. Then the weights of fuzzy rules were calculated. Two different reasoning methods as “weighted vote method" and “singles winner method" were used in this study. The results of fuzzy classifiers were compared.

A Web Text Mining Flexible Architecture

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge mining process to mining, extraction and integration of useful data, information and knowledge from Web page contents. In this paper, we present a Web Text Mining process able to discover knowledge in a distributed and heterogeneous multiorganization environment. The Web Text Mining process is based on flexible architecture and is implemented by four steps able to examine web content and to extract useful hidden information through mining techniques. Our Web Text Mining prototype starts from the recovery of Web job offers in which, through a Text Mining process, useful information for fast classification of the same are drawn out, these information are, essentially, job offer place and skills.

Data Mining Using Learning Automata

In this paper a data miner based on the learning automata is proposed and is called LA-miner. The LA-miner extracts classification rules from data sets automatically. The proposed algorithm is established based on the function optimization using learning automata. The experimental results on three benchmarks indicate that the performance of the proposed LA-miner is comparable with (sometimes better than) the Ant-miner (a data miner algorithm based on the Ant Colony optimization algorithm) and CNZ (a well-known data mining algorithm for classification).

A Framework for Data Mining Based Multi-Agent: An Application to Spatial Data

Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.

Wavelet and K-L Seperability Based Feature Extraction Method for Functional Data Classification

This paper proposes a novel feature extraction method, based on Discrete Wavelet Transform (DWT) and K-L Seperability (KLS), for the classification of Functional Data (FD). This method combines the decorrelation and reduction property of DWT and the additive independence property of KLS, which is helpful to extraction classification features of FD. It is an advanced approach of the popular wavelet based shrinkage method for functional data reduction and classification. A theory analysis is given in the paper to prove the consistent convergence property, and a simulation study is also done to compare the proposed method with the former shrinkage ones. The experiment results show that this method has advantages in improving classification efficiency, precision and robustness.

Predicting the Life Cycle of Complex Technical Systems (CTS)

Complex systems are composed of several plain interacting independent entities. Interaction between these entities creates a unified behavior at the global level that cannot be predicted by examining the behavior of any single individual component of the system. In this paper we consider a welded frame of an automobile trailer as a real example of Complex Technical Systems, The purpose of this paper is to introduce a Statistical method for predicting the life cycle of complex technical systems. To organize gathering of primary data for modeling the life cycle of complex technical systems an “Automobile Trailer Frame" were used as a prototype in this research. The prototype represents a welded structure of several pieces. Both information flows underwent a computerized analysis and classification for the acquisition of final results to reach final recommendations for improving the trailers structure and their operational conditions.

Enhanced Clustering Analysis and Visualization Using Kohonen's Self-Organizing Feature Map Networks

Cluster analysis is the name given to a diverse collection of techniques that can be used to classify objects (e.g. individuals, quadrats, species etc). While Kohonen's Self-Organizing Feature Map (SOFM) or Self-Organizing Map (SOM) networks have been successfully applied as a classification tool to various problem domains, including speech recognition, image data compression, image or character recognition, robot control and medical diagnosis, its potential as a robust substitute for clustering analysis remains relatively unresearched. SOM networks combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an a priori grid and provide a powerful tool for data visualization. In this paper, SOM is used for creating a toroidal mapping of two-dimensional lattice to perform cluster analysis on results of a chemical analysis of wines produced in the same region in Italy but derived from three different cultivators, referred to as the “wine recognition data" located in the University of California-Irvine database. The results are encouraging and it is believed that SOM would make an appealing and powerful decision-support system tool for clustering tasks and for data visualization.

Identification of Industrial Health Using ANN

The customary practice of identifying industrial sickness is a set traditional techniques which rely upon a range of manual monitoring and compilation of financial records. It makes the process tedious, time consuming and often are susceptible to manipulation. Therefore, certain readily available tools are required which can deal with such uncertain situations arising out of industrial sickness. It is more significant for a country like India where the fruits of development are rarely equally distributed. In this paper, we propose an approach based on Artificial Neural Network (ANN) to deal with industrial sickness with specific focus on a few such units taken from a less developed north-east (NE) Indian state like Assam. The proposed system provides decision regarding industrial sickness using eight different parameters which are directly related to the stages of sickness of such units. The mechanism primarily uses certain signals and symptoms of industrial health to decide upon the state of a unit. Specifically, we formulate an ANN based block with data obtained from a few selected units of Assam so that required decisions related to industrial health could be taken. The system thus formulated could become an important part of planning and development. It can also contribute towards computerization of decision support systems related to industrial health and help in better management.

Weighted k-Nearest-Neighbor Techniques for High Throughput Screening Data

The k-nearest neighbors (knn) is a simple but effective method of classification. In this paper we present an extended version of this technique for chemical compounds used in High Throughput Screening, where the distances of the nearest neighbors can be taken into account. Our algorithm uses kernel weight functions as guidance for the process of defining activity in screening data. Proposed kernel weight function aims to combine properties of graphical structure and molecule descriptors of screening compounds. We apply the modified knn method on several experimental data from biological screens. The experimental results confirm the effectiveness of the proposed method.

A Survey of Business Component Identification Methods and Related Techniques

With deep development of software reuse, componentrelated technologies have been widely applied in the development of large-scale complex applications. Component identification (CI) is one of the primary research problems in software reuse, by analyzing domain business models to get a set of business components with high reuse value and good reuse performance to support effective reuse. Based on the concept and classification of CI, its technical stack is briefly discussed from four views, i.e., form of input business models, identification goals, identification strategies, and identification process. Then various CI methods presented in literatures are classified into four types, i.e., domain analysis based methods, cohesion-coupling based clustering methods, CRUD matrix based methods, and other methods, with the comparisons between these methods for their advantages and disadvantages. Additionally, some insufficiencies of study on CI are discussed, and the causes are explained subsequently. Finally, it is concluded with some significantly promising tendency about research on this problem.

Development of a Pipeline Monitoring System by Bio-mimetic Robots

To explore pipelines is one of various bio-mimetic robot applications. The robot may work in common buildings such as between ceilings and ducts, in addition to complicated and massive pipeline systems of large industrial plants. The bio-mimetic robot finds any troubled area or malfunction and then reports its data. Importantly, it can not only prepare for but also react to any abnormal routes in the pipeline. The pipeline monitoring tasks require special types of mobile robots. For an effective movement along a pipeline, the movement of the robot will be similar to that of insects or crawling animals. During its movement along the pipelines, a pipeline monitoring robot has an important task of finding the shapes of the approaching path on the pipes. In this paper we propose an effective solution to the pipeline pattern recognition, based on the fuzzy classification rules for the measured IR distance data.

On Speeding Up Support Vector Machines: Proximity Graphs Versus Random Sampling for Pre-Selection Condensation

Support vector machines (SVMs) are considered to be the best machine learning algorithms for minimizing the predictive probability of misclassification. However, their drawback is that for large data sets the computation of the optimal decision boundary is a time consuming function of the size of the training set. Hence several methods have been proposed to speed up the SVM algorithm. Here three methods used to speed up the computation of the SVM classifiers are compared experimentally using a musical genre classification problem. The simplest method pre-selects a random sample of the data before the application of the SVM algorithm. Two additional methods use proximity graphs to pre-select data that are near the decision boundary. One uses k-Nearest Neighbor graphs and the other Relative Neighborhood Graphs to accomplish the task.

Towards Automatic Recognition and Grading of Ganoderma Infection Pattern Using Fuzzy Systems

This paper deals with the extraction of information from the experts to automatically identify and recognize Ganoderma infection in oil palm stem using tomography images. Expert-s knowledge are used as rules in a Fuzzy Inference Systems to classify each individual patterns observed in he tomography image. The classification is done by defining membership functions which assigned a set of three possible hypotheses : Ganoderma infection (G), non Ganoderma infection (N) or intact stem tissue (I) to every abnormalities pattern found in the tomography image. A complete comparison between Mamdani and Sugeno style,triangular, trapezoids and mixed triangular-trapezoids membership functions and different methods of aggregation and defuzzification is also presented and analyzed to select suitable Fuzzy Inference System methods to perform the above mentioned task. The results showed that seven out of 30 initial possible combination of available Fuzzy Inference methods in MATLAB Fuzzy Toolbox were observed giving result close to the experts estimation.

A New Hybrid RMN Image Segmentation Algorithm

The development of aid's systems for the medical diagnosis is not easy thing because of presence of inhomogeneities in the MRI, the variability of the data from a sequence to the other as well as of other different source distortions that accentuate this difficulty. A new automatic, contextual, adaptive and robust segmentation procedure by MRI brain tissue classification is described in this article. A first phase consists in estimating the density of probability of the data by the Parzen-Rozenblatt method. The classification procedure is completely automatic and doesn't make any assumptions nor on the clusters number nor on the prototypes of these clusters since these last are detected in an automatic manner by an operator of mathematical morphology called skeleton by influence zones detection (SKIZ). The problem of initialization of the prototypes as well as their number is transformed in an optimization problem; in more the procedure is adaptive since it takes in consideration the contextual information presents in every voxel by an adaptive and robust non parametric model by the Markov fields (MF). The number of bad classifications is reduced by the use of the criteria of MPM minimization (Maximum Posterior Marginal).

Enhanced Performance for Support Vector Machines as Multiclass Classifiers in Steel Surface Defect Detection

Steel surface defect detection is essentially one of pattern recognition problems. Support Vector Machines (SVMs) are known as one of the most proper classifiers in this application. In this paper, we introduce a more accurate classification method by using SVMs as our final classifier of the inspection system. In this scheme, multiclass classification task is performed based on the "one-againstone" method and different kernels are utilized for each pair of the classes in multiclass classification of the different defects. In the proposed system, a decision tree is employed in the first stage for two-class classification of the steel surfaces to "defect" and "non-defect", in order to decrease the time complexity. Based on the experimental results, generated from over one thousand images, the proposed multiclass classification scheme is more accurate than the conventional methods and the overall system yields a sufficient performance which can meet the requirements in steel manufacturing.

Analysis of Driving Conditions and Preferred Media on Diversion

Studies on the distribution of traffic demands have been proceeding by providing traffic information for reducing greenhouse gases and reinforcing the road's competitiveness in the transport section, however, since it is preferentially required the extensive studies on the driver's behavior changing routes and its influence factors, this study has been developed a discriminant model for changing routes considering driving conditions including traffic conditions of roads and driver's preferences for information media. It is divided into three groups depending on driving conditions in group classification with the CART analysis, which is statistically meaningful. And the extent that driving conditions and preferred media affect a route change is examined through a discriminant analysis, and it is developed a discriminant model equation to predict a route change. As a result of building the discriminant model equation, it is shown that driving conditions affect a route change much more, the entire discriminant hit ratio is derived as 64.2%, and this discriminant equation shows high discriminant ability more than a certain degree.

A Serial Hierarchical Support Vector Machine and 2D Feature Sets Act for Brain DTI Segmentation

Serial hierarchical support vector machine (SHSVM) is proposed to discriminate three brain tissues which are white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). SHSVM has novel classification approach by repeating the hierarchical classification on data set iteratively. It used Radial Basis Function (rbf) Kernel with different tuning to obtain accurate results. Also as the second approach, segmentation performed with DAGSVM method. In this article eight univariate features from the raw DTI data are extracted and all the possible 2D feature sets are examined within the segmentation process. SHSVM succeed to obtain DSI values higher than 0.95 accuracy for all the three tissues, which are higher than DAGSVM results.

Determining the Gender of Korean Names for Pronoun Generation

It is an important task in Korean-English machine translation to classify the gender of names correctly. When a sentence is composed of two or more clauses and only one subject is given as a proper noun, it is important to find the gender of the proper noun for correct translation of the sentence. This is because a singular pronoun has a gender in English while it does not in Korean. Thus, in Korean-English machine translation, the gender of a proper noun should be determined. More generally, this task can be expanded into the classification of the general Korean names. This paper proposes a statistical method for this problem. By considering a name as just a sequence of syllables, it is possible to get a statistics for each name from a collection of names. An evaluation of the proposed method yields the improvement in accuracy over the simple looking-up of the collection. While the accuracy of the looking-up method is 64.11%, that of the proposed method is 81.49%. This implies that the proposed method is more plausible for the gender classification of the Korean names.

Adaptive Naïve Bayesian Anti-Spam Engine

The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag Of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained.