Risk-Management by Numerical Pattern Analysis in Data-Mining

In this paper a new method is suggested for risk management by the numerical patterns in data-mining. These patterns are designed using probability rules in decision trees and are cared to be valid, novel, useful and understandable. Considering a set of functions, the system reaches to a good pattern or better objectives. The patterns are analyzed through the produced matrices and some results are pointed out. By using the suggested method the direction of the functionality route in the systems can be controlled and best planning for special objectives be done.

Influence of Social Factors and Motives on Commitment of Sport Events Volunteers

In sport, human resources management gives special attention to method of applying volunteers, their maintenance, and participation of volunteers with each other and management approaches for better operation of events celebrants. The recognition of volunteers- characteristics and motives is important to notice, because it makes the basis of their participation and commitment at sport environment. The motivation and commitment of 281 volunteers were assessed using the organizational commitment scale, motivation scale and personal characteristics questionnaire.The descriptive results showed that; 64% of volunteers were women with age average 21/24 years old. They were physical education student, single (71/9%), without occupation (53%) and with average of 5 years sport experience. Their most important motivation was career factor and the most important commitment factor was normative factor. The results of examining the hypothesized showed that; age, sport experience and education are effective in the amount of volunteers- commitment. And the motive factors such as career, material, purposive and protective factors also have the power to predict the amount of sports volunteers- commitment value. Therefore it is recommended to provide possible opportunities for volunteers and carrying out appropriate instructional courses by events executive managers.

Logistics Outsourcing: Performance Models and Financial and Operational Indicators

The growing outsourcing of logistics services resulting from the ongoing current in firms of costs reduction/increased efficiency means that it is becoming more and more important for the companies doing the outsourcing to carry out a proper evaluation. The multiple definitions and measures of logistics service performance found in research on the topic create a certain degree of confusion and do not clear the way towards the proper measurement of their performance. Do a model and a specific set of indicators exist that can be considered appropriate for measuring the performance of logistics services outsourcing in industrial environments? Are said indicators in keeping with the objectives pursued by outsourcing? We aim to answer these and other research questions in the study we have initiated in the field within the framework of the international High Performance Manufacturing (HPM) project of which this paper forms part. As the first stage of this research, this paper reviews articles dealing with the topic published in the last 15 years with the aim of detecting the models most used to make this measurement and determining which performance indicators are proposed as part of said models and which are most used. The first steps are also taken in determining whether these indicators, financial and operational, cover the aims that are being pursued when outsourcing logistics services. The findings show there is a wide variety of both models and indicators used. This would seem to testify to the need to continue with our research in order to try to propose a model and a set of indicators for measuring the performance of logistics services outsourcing in industrial environments.

Yield, Yield Components, Soil Minerals and Aroma of KDML 105 Rice in Tungkularonghai, Roi-Et,Thailand

Pearson-s correlation coefficient and sequential path analysis has been used for determining the interrelationship among yield, yield components, soil minerals and aroma of Khao Dawk Mali (KDML) 105 rice grown in the area of Tungkularonghai in Roi-Et province, located in the northeast of Thailand. Pearson-s correlation coefficient in this study showed that the number of panicles was the only factor that had positive significant (0.790**) effect on grain yield. Sequential path analysis revealed that the number of panicles followed by the number of fertile spikelets and 100-grain weight were the first-order factors which had positive direct effects on grain yield. Whereas, other factors analyzed had indirect effects influencing grain yield. This study also indicated that no significant relationship was found between the aroma level and any of the factors analyzed.

Incremental Mining of Shocking Association Patterns

Association rules are an important problem in data mining. Massively increasing volume of data in real life databases has motivated researchers to design novel and incremental algorithms for association rules mining. In this paper, we propose an incremental association rules mining algorithm that integrates shocking interestingness criterion during the process of building the model. A new interesting measure called shocking measure is introduced. One of the main features of the proposed approach is to capture the user background knowledge, which is monotonically augmented. The incremental model that reflects the changing data and the user beliefs is attractive in order to make the over all KDD process more effective and efficient. We implemented the proposed approach and experiment it with some public datasets and found the results quite promising.

Matching Facial Images using Age Related Morphing Changes

Each year many people are reported missing in most of the countries in the world owing to various reasons. Arrangements have to be made to find these people after some time. So the investigating agencies are compelled to make out these people by using manpower. But in many cases, the investigations carried out to find out an absconding for a long time may not be successful. At a time like that it may be difficult to identify these people by examining their old photographs, because their facial appearance might have changed mainly due to the natural aging process. On some occasions in forensic medicine if a dead body is found, investigations should be held to make sure that this corpse belongs to the same person disappeared some time ago. With the passage of time the face of the person might have changed and there should be a mechanism to reveal the person-s identity. In order to make this process easy, we must guess and decide as to how he will look like by now. To address this problem this paper presents a way of synthesizing a facial image with the aging effects.

Fuzzy Expert System Design for Determining Wearing Properties of Nitrided and Non Nitrided Steel

This paper proposes a Fuzzy Expert System design to determine the wearing properties of nitrided and non nitrided steel. The proposed Fuzzy Expert System approach helps the user and the manufacturer to forecast the wearing properties of nitrided and non nitrided steel under specified laboratory conditions. Surfaces of the engineering components are often nitrided for improving wear, corosion, fatigue specifications. A major property of nitriding process is reducing distortion and wearing of the metalic alloys. A Fuzzy Expert System was developed for determining the wearing and durability properties of nitrided and non nitrided steels that were tested under different loads and different sliding speeds in the laboratory conditions.

Correlation-based Feature Selection using Ant Colony Optimization

Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. In this paper, a novel feature search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.

Error Propagation of the Hidden-Point Bar Method: Effect of Bar Geometry

The hidden-point bar method is useful in many surveying applications. The method involves determining the coordinates of a hidden point as a function of horizontal and vertical angles measured to three fixed points on the bar. Using these measurements, the procedure involves calculating the slant angles, the distances from the station to the fixed points, the coordinates of the fixed points, and then the coordinates of the hidden point. The propagation of the measurement errors in this complex process has not been fully investigated in the literature. This paper evaluates the effect of the bar geometry on the position accuracy of the hidden point which depends on the measurement errors of the horizontal and vertical angles. The results are used to establish some guidelines regarding the inclination angle of the bar and the location of the observed points that provide the best accuracy.

A New Hybrid Model with Passive Congregation for Stock Market Indices Prediction

In this paper, we propose a new hybrid learning model for stock market indices prediction by adding a passive congregation term to the standard hybrid model comprising Particle Swarm Optimization (PSO) with Genetic Algorithm (GA) operators in training Neural Networks (NN). This new passive congregation term is based on the cooperation between different particles in determining new positions rather than depending on the particles selfish thinking without considering other particles positions, thus it enables PSO to perform both the local and global search instead of only doing the local search. Experiment study carried out on the most famous European stock market indices in both long term and short term prediction shows significantly the influence of the passive congregation term in improving the prediction accuracy compared to standard hybrid model.

Exploring Performance-Based Music Attributes for Stylometric Analysis

Music Information Retrieval (MIR) and modern data mining techniques are applied to identify style markers in midi music for stylometric analysis and author attribution. Over 100 attributes are extracted from a library of 2830 songs then mined using supervised learning data mining techniques. Two attributes are identified that provide high informational gain. These attributes are then used as style markers to predict authorship. Using these style markers the authors are able to correctly distinguish songs written by the Beatles from those that were not with a precision and accuracy of over 98 per cent. The identification of these style markers as well as the architecture for this research provides a foundation for future research in musical stylometry.

Text Mining Technique for Data Mining Application

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Evolutionary Approach for Automated Discovery of Censored Production Rules

In the recent past, there has been an increasing interest in applying evolutionary methods to Knowledge Discovery in Databases (KDD) and a number of successful applications of Genetic Algorithms (GA) and Genetic Programming (GP) to KDD have been demonstrated. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. The PRs, however, are unable to handle exceptions and do not exhibit variable precision. The Censored Production Rules (CPRs), an extension of PRs, were proposed by Michalski & Winston that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations, in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence are tight or there is simply no information available as to whether it holds or not. Thus, the 'If P Then D' part of the CPR expresses important information, while the Unless C part acts only as a switch and changes the polarity of D to ~D. This paper presents a classification algorithm based on evolutionary approach that discovers comprehensible rules with exceptions in the form of CPRs. The proposed approach has flexible chromosome encoding, where each chromosome corresponds to a CPR. Appropriate genetic operators are suggested and a fitness function is proposed that incorporates the basic constraints on CPRs. Experimental results are presented to demonstrate the performance of the proposed algorithm.

Data Preprocessing for Supervised Leaning

Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.

Novelty as a Measure of Interestingness in Knowledge Discovery

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules leads to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach based on both objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules (knowledge). We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are promising.

Granulation using Clustering and Rough Set Theory and its Tree Representation

Granular computing deals with representation of information in the form of some aggregates and related methods for transformation and analysis for problem solving. A granulation scheme based on clustering and Rough Set Theory is presented with focus on structured conceptualization of information has been presented in this paper. Experiments for the proposed method on four labeled data exhibit good result with reference to classification problem. The proposed granulation technique is semi-supervised imbibing global as well as local information granulation. To represent the results of the attribute oriented granulation a tree structure is proposed in this paper.

The Robust Clustering with Reduction Dimension

A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA). A standard approach to overcome this problem is dimension reduction, which transforms a high-dimensional data into a lower-dimensional space with limited loss of information. One of the most common forms of dimensionality reduction is the principal components analysis (PCA). The 2DPCA is often called a variant of principal component (PCA), the image matrices were directly treated as 2D matrices; they do not need to be transformed into a vector so that the covariance matrix of image can be constructed directly using the original image matrices. The decomposed classical covariance matrix is very sensitive to outlying observations. The objective of paper is to compare the performance of robust minimizing vector variance (MVV) in the two dimensional projection PCA (2DPCA) and the PCA for clustering on an arbitrary data image when outliers are hiden in the data set. The simulation aspects of robustness and the illustration of clustering images are discussed in the end of paper

Using the Combined Model of PROMETHEE and Fuzzy Analytic Network Process for Determining Question Weights in Scientific Exams through Data Mining Approach

Need for an appropriate system of evaluating students- educational developments is a key problem to achieve the predefined educational goals. Intensity of the related papers in the last years; that tries to proof or disproof the necessity and adequacy of the students assessment; is the corroborator of this matter. Some of these studies tried to increase the precision of determining question weights in scientific examinations. But in all of them there has been an attempt to adjust the initial question weights while the accuracy and precision of those initial question weights are still under question. Thus In order to increase the precision of the assessment process of students- educational development, the present study tries to propose a new method for determining the initial question weights by considering the factors of questions like: difficulty, importance and complexity; and implementing a combined method of PROMETHEE and fuzzy analytic network process using a data mining approach to improve the model-s inputs. The result of the implemented case study proves the development of performance and precision of the proposed model.

Mining Correlated Bicluster from Web Usage Data Using Discrete Firefly Algorithm Based Biclustering Approach

For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.

T-Wave Detection Based on an Adjusted Wavelet Transform Modulus Maxima

The method described in this paper deals with the problems of T-wave detection in an ECG. Determining the position of a T-wave is complicated due to the low amplitude, the ambiguous and changing form of the complex. A wavelet transform approach handles these complications therefore a method based on this concept was developed. In this way we developed a detection method that is able to detect T-waves with a sensitivity of 93% and a correct-detection ratio of 93% even with a serious amount of baseline drift and noise.