Abstract: In this paper a new method is suggested for risk
management by the numerical patterns in data-mining. These patterns
are designed using probability rules in decision trees and are cared to
be valid, novel, useful and understandable. Considering a set of
functions, the system reaches to a good pattern or better objectives.
The patterns are analyzed through the produced matrices and some
results are pointed out. By using the suggested method the direction
of the functionality route in the systems can be controlled and best
planning for special objectives be done.
Abstract: In sport, human resources management gives special
attention to method of applying volunteers, their maintenance, and
participation of volunteers with each other and management
approaches for better operation of events celebrants. The recognition
of volunteers- characteristics and motives is important to notice,
because it makes the basis of their participation and commitment at
sport environment. The motivation and commitment of 281
volunteers were assessed using the organizational commitment scale,
motivation scale and personal characteristics questionnaire.The
descriptive results showed that; 64% of volunteers were women with
age average 21/24 years old. They were physical education student,
single (71/9%), without occupation (53%) and with average of 5
years sport experience. Their most important motivation was career
factor and the most important commitment factor was normative
factor. The results of examining the hypothesized showed that; age,
sport experience and education are effective in the amount of
volunteers- commitment. And the motive factors such as career,
material, purposive and protective factors also have the power to
predict the amount of sports volunteers- commitment value.
Therefore it is recommended to provide possible opportunities for
volunteers and carrying out appropriate instructional courses by
events executive managers.
Abstract: The growing outsourcing of logistics services
resulting from the ongoing current in firms of costs
reduction/increased efficiency means that it is becoming more and
more important for the companies doing the outsourcing to carry out
a proper evaluation.
The multiple definitions and measures of logistics service
performance found in research on the topic create a certain degree of
confusion and do not clear the way towards the proper measurement
of their performance. Do a model and a specific set of indicators exist
that can be considered appropriate for measuring the performance of
logistics services outsourcing in industrial environments? Are said
indicators in keeping with the objectives pursued by outsourcing? We
aim to answer these and other research questions in the study we have
initiated in the field within the framework of the international High
Performance Manufacturing (HPM) project of which this paper
forms part.
As the first stage of this research, this paper reviews articles
dealing with the topic published in the last 15 years with the aim of
detecting the models most used to make this measurement and
determining which performance indicators are proposed as part of
said models and which are most used. The first steps are also taken in
determining whether these indicators, financial and operational, cover
the aims that are being pursued when outsourcing logistics services.
The findings show there is a wide variety of both models and
indicators used. This would seem to testify to the need to continue
with our research in order to try to propose a model and a set of
indicators for measuring the performance of logistics services
outsourcing in industrial environments.
Abstract: Pearson-s correlation coefficient and sequential path
analysis has been used for determining the interrelationship among
yield, yield components, soil minerals and aroma of Khao Dawk Mali
(KDML) 105 rice grown in the area of Tungkularonghai in Roi-Et
province, located in the northeast of Thailand. Pearson-s correlation
coefficient in this study showed that the number of panicles was the
only factor that had positive significant (0.790**) effect on grain
yield. Sequential path analysis revealed that the number of panicles
followed by the number of fertile spikelets and 100-grain weight
were the first-order factors which had positive direct effects on grain
yield. Whereas, other factors analyzed had indirect effects
influencing grain yield. This study also indicated that no significant
relationship was found between the aroma level and any of the
factors analyzed.
Abstract: Association rules are an important problem in data
mining. Massively increasing volume of data in real life databases
has motivated researchers to design novel and incremental algorithms
for association rules mining. In this paper, we propose an incremental
association rules mining algorithm that integrates shocking
interestingness criterion during the process of building the model. A
new interesting measure called shocking measure is introduced. One
of the main features of the proposed approach is to capture the user
background knowledge, which is monotonically augmented. The
incremental model that reflects the changing data and the user beliefs
is attractive in order to make the over all KDD process more
effective and efficient. We implemented the proposed approach and
experiment it with some public datasets and found the results quite
promising.
Abstract: Each year many people are reported missing in most of the countries in the world owing to various reasons. Arrangements have to be made to find these people after some time. So the investigating agencies are compelled to make out these people by using manpower. But in many cases, the investigations carried out to find out an absconding for a long time may not be successful. At a time like that it may be difficult to identify these people by examining their old photographs, because their facial appearance might have changed mainly due to the natural aging process. On some occasions in forensic medicine if a dead body is found, investigations should be held to make sure that this corpse belongs to the same person disappeared some time ago. With the passage of time the face of the person might have changed and there should be a mechanism to reveal the person-s identity. In order to make this process easy, we must guess and decide as to how he will look like by now. To address this problem this paper presents a way of synthesizing a facial image with the aging effects.
Abstract: This paper proposes a Fuzzy Expert System design to
determine the wearing properties of nitrided and non nitrided steel.
The proposed Fuzzy Expert System approach helps the user and the
manufacturer to forecast the wearing properties of nitrided and non
nitrided steel under specified laboratory conditions. Surfaces of the
engineering components are often nitrided for improving wear,
corosion, fatigue specifications. A major property of nitriding
process is reducing distortion and wearing of the metalic alloys. A
Fuzzy Expert System was developed for determining the wearing and
durability properties of nitrided and non nitrided steels that were
tested under different loads and different sliding speeds in the
laboratory conditions.
Abstract: Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. In this paper, a novel feature search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.
Abstract: The hidden-point bar method is useful in many
surveying applications. The method involves determining the
coordinates of a hidden point as a function of horizontal and vertical
angles measured to three fixed points on the bar. Using these
measurements, the procedure involves calculating the slant angles,
the distances from the station to the fixed points, the coordinates of
the fixed points, and then the coordinates of the hidden point. The
propagation of the measurement errors in this complex process has
not been fully investigated in the literature. This paper evaluates the
effect of the bar geometry on the position accuracy of the hidden
point which depends on the measurement errors of the horizontal and
vertical angles. The results are used to establish some guidelines
regarding the inclination angle of the bar and the location of the
observed points that provide the best accuracy.
Abstract: In this paper, we propose a new hybrid learning model for stock market indices prediction by adding a passive congregation term to the standard hybrid model comprising Particle Swarm Optimization (PSO) with Genetic Algorithm (GA) operators in training Neural Networks (NN). This new passive congregation term is based on the cooperation between different particles in determining new positions rather than depending on the particles selfish thinking without considering other particles positions, thus it enables PSO to perform both the local and global search instead of only doing the local search. Experiment study carried out on the most famous European stock market indices in both long term and short term prediction shows significantly the influence of the passive congregation term in improving the prediction accuracy compared to standard hybrid model.
Abstract: Music Information Retrieval (MIR) and modern data mining techniques are applied to identify style markers in midi music for stylometric analysis and author attribution. Over 100 attributes are extracted from a library of 2830 songs then mined using supervised learning data mining techniques. Two attributes are identified that provide high informational gain. These attributes are then used as style markers to predict authorship. Using these style markers the authors are able to correctly distinguish songs written by the Beatles from those that were not with a precision and accuracy of over 98 per cent. The identification of these style markers as well as the architecture for this research provides a foundation for future research in musical stylometry.
Abstract: Text Mining is around applying knowledge discovery
techniques to unstructured text is termed knowledge discovery in text
(KDT), or Text data mining or Text Mining. In decision tree
approach is most useful in classification problem. With this
technique, tree is constructed to model the classification process.
There are two basic steps in the technique: building the tree and
applying the tree to the database. This paper describes a proposed
C5.0 classifier that performs rulesets, cross validation and boosting
for original C5.0 in order to reduce the optimization of error ratio.
The feasibility and the benefits of the proposed approach are
demonstrated by means of medial data set like hypothyroid. It is
shown that, the performance of a classifier on the training cases from
which it was constructed gives a poor estimate by sampling or using a
separate test file, either way, the classifier is evaluated on cases that
were not used to build and evaluate the classifier are both are large. If
the cases in hypothyroid.data and hypothyroid.test were to be
shuffled and divided into a new 2772 case training set and a 1000
case test set, C5.0 might construct a different classifier with a lower
or higher error rate on the test cases. An important feature of see5 is
its ability to classifiers called rulesets. The ruleset has an error rate
0.5 % on the test cases. The standard errors of the means provide an
estimate of the variability of results. One way to get a more reliable
estimate of predictive is by f-fold –cross- validation. The error rate of
a classifier produced from all the cases is estimated as the ratio of the
total number of errors on the hold-out cases to the total number of
cases. The Boost option with x trials instructs See5 to construct up to
x classifiers in this manner. Trials over numerous datasets, large and
small, show that on average 10-classifier boosting reduces the error
rate for test cases by about 25%.
Abstract: In the recent past, there has been an increasing interest
in applying evolutionary methods to Knowledge Discovery in
Databases (KDD) and a number of successful applications of Genetic
Algorithms (GA) and Genetic Programming (GP) to KDD have been
demonstrated. The most predominant representation of the
discovered knowledge is the standard Production Rules (PRs) in the
form If P Then D. The PRs, however, are unable to handle
exceptions and do not exhibit variable precision. The Censored
Production Rules (CPRs), an extension of PRs, were proposed by
Michalski & Winston that exhibit variable precision and supports an
efficient mechanism for handling exceptions. A CPR is an
augmented production rule of the form:
If P Then D Unless C, where C (Censor) is an exception to the rule.
Such rules are employed in situations, in which the conditional
statement 'If P Then D' holds frequently and the assertion C holds
rarely. By using a rule of this type we are free to ignore the exception
conditions, when the resources needed to establish its presence are
tight or there is simply no information available as to whether it
holds or not. Thus, the 'If P Then D' part of the CPR expresses
important information, while the Unless C part acts only as a switch
and changes the polarity of D to ~D.
This paper presents a classification algorithm based on evolutionary
approach that discovers comprehensible rules with exceptions in the
form of CPRs.
The proposed approach has flexible chromosome encoding, where
each chromosome corresponds to a CPR. Appropriate genetic
operators are suggested and a fitness function is proposed that
incorporates the basic constraints on CPRs. Experimental results are
presented to demonstrate the performance of the proposed algorithm.
Abstract: Many factors affect the success of Machine Learning
(ML) on a given task. The representation and quality of the instance
data is first and foremost. If there is much irrelevant and redundant
information present or noisy and unreliable data, then knowledge
discovery during the training phase is more difficult. It is well known
that data preparation and filtering steps take considerable amount of
processing time in ML problems. Data pre-processing includes data
cleaning, normalization, transformation, feature extraction and
selection, etc. The product of data pre-processing is the final training
set. It would be nice if a single sequence of data pre-processing
algorithms had the best performance for each data set but this is not
happened. Thus, we present the most well know algorithms for each
step of data pre-processing so that one achieves the best performance
for their data set.
Abstract: Rule Discovery is an important technique for mining
knowledge from large databases. Use of objective measures for
discovering interesting rules leads to another data mining problem,
although of reduced complexity. Data mining researchers have
studied subjective measures of interestingness to reduce the volume
of discovered rules to ultimately improve the overall efficiency of
KDD process.
In this paper we study novelty of the discovered rules as a
subjective measure of interestingness. We propose a hybrid approach
based on both objective and subjective measures to quantify novelty
of the discovered rules in terms of their deviations from the known
rules (knowledge). We analyze the types of deviation that can arise
between two rules and categorize the discovered rules according to
the user specified threshold. We implement the proposed framework
and experiment with some public datasets. The experimental results
are promising.
Abstract: Granular computing deals with representation of information in the form of some aggregates and related methods for transformation and analysis for problem solving. A granulation scheme based on clustering and Rough Set Theory is presented with focus on structured conceptualization of information has been presented in this paper. Experiments for the proposed method on four labeled data exhibit good result with reference to classification problem. The proposed granulation technique is semi-supervised imbibing global as well as local information granulation. To represent the results of the attribute oriented granulation a tree structure is proposed in this paper.
Abstract: A clustering is process to identify a homogeneous
groups of object called as cluster. Clustering is one interesting topic
on data mining. A group or class behaves similarly characteristics.
This paper discusses a robust clustering process for data images with
two reduction dimension approaches; i.e. the two dimensional
principal component analysis (2DPCA) and principal component
analysis (PCA). A standard approach to overcome this problem is
dimension reduction, which transforms a high-dimensional data into
a lower-dimensional space with limited loss of information. One of
the most common forms of dimensionality reduction is the principal
components analysis (PCA). The 2DPCA is often called a variant of
principal component (PCA), the image matrices were directly treated
as 2D matrices; they do not need to be transformed into a vector so
that the covariance matrix of image can be constructed directly using
the original image matrices. The decomposed classical covariance
matrix is very sensitive to outlying observations. The objective of
paper is to compare the performance of robust minimizing vector
variance (MVV) in the two dimensional projection PCA (2DPCA)
and the PCA for clustering on an arbitrary data image when outliers
are hiden in the data set. The simulation aspects of robustness and
the illustration of clustering images are discussed in the end of
paper
Abstract: Need for an appropriate system of evaluating students-
educational developments is a key problem to achieve the predefined
educational goals. Intensity of the related papers in the last years; that
tries to proof or disproof the necessity and adequacy of the students
assessment; is the corroborator of this matter. Some of these studies
tried to increase the precision of determining question weights in
scientific examinations. But in all of them there has been an attempt
to adjust the initial question weights while the accuracy and precision
of those initial question weights are still under question. Thus In
order to increase the precision of the assessment process of students-
educational development, the present study tries to propose a new
method for determining the initial question weights by considering
the factors of questions like: difficulty, importance and complexity;
and implementing a combined method of PROMETHEE and fuzzy
analytic network process using a data mining approach to improve
the model-s inputs. The result of the implemented case study proves
the development of performance and precision of the proposed
model.
Abstract: For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.
Abstract: The method described in this paper deals with the problems of T-wave detection in an ECG. Determining the position of a T-wave is complicated due to the low amplitude, the ambiguous and changing form of the complex. A wavelet transform approach handles these complications therefore a method based on this concept was developed. In this way we developed a detection method that is able to detect T-waves with a sensitivity of 93% and a correct-detection ratio of 93% even with a serious amount of baseline drift and noise.