Abstract: This paper explores the effectiveness of machine
learning techniques in detecting firms that issue fraudulent financial
statements (FFS) and deals with the identification of factors
associated to FFS. To this end, a number of experiments have been
conducted using representative learning algorithms, which were
trained using a data set of 164 fraud and non-fraud Greek firms in the
recent period 2001-2002. The decision of which particular method to
choose is a complicated problem. A good alternative to choosing
only one method is to create a hybrid forecasting system
incorporating a number of possible solution methods as components
(an ensemble of classifiers). For this purpose, we have implemented
a hybrid decision support system that combines the representative
algorithms using a stacking variant methodology and achieves better
performance than any examined simple and ensemble method. To
sum up, this study indicates that the investigation of financial
information can be used in the identification of FFS and underline the
importance of financial ratios.
Abstract: General requirements for knowledge representation in
the form of logic rules, applicable to design and control of industrial
processes, are formulated. Characteristic behavior of decision trees
(DTs) and rough sets theory (RST) in rules extraction from recorded
data is discussed and illustrated with simple examples. The
significance of the models- drawbacks was evaluated, using
simulated and industrial data sets. It is concluded that performance of
DTs may be considerably poorer in several important aspects,
compared to RST, particularly when not only a characterization of a
problem is required, but also detailed and precise rules are needed,
according to actual, specific problems to be solved.
Abstract: The one-class support vector machine “support vector
data description” (SVDD) is an ideal approach for anomaly or outlier
detection. However, for the applicability of SVDD in real-world
applications, the ease of use is crucial. The results of SVDD are
massively determined by the choice of the regularisation parameter C
and the kernel parameter of the widely used RBF kernel. While for
two-class SVMs the parameters can be tuned using cross-validation
based on the confusion matrix, for a one-class SVM this is not
possible, because only true positives and false negatives can occur
during training. This paper proposes an approach to find the optimal
set of parameters for SVDD solely based on a training set from
one class and without any user parameterisation. Results on artificial
and real data sets are presented, underpinning the usefulness of the
approach.
Abstract: Nevertheless the widespread application of finite
mixture models in segmentation, finite mixture model selection is
still an important issue. In fact, the selection of an adequate number
of segments is a key issue in deriving latent segments structures and
it is desirable that the selection criteria used for this end are effective.
In order to select among several information criteria, which may
support the selection of the correct number of segments we conduct a
simulation study. In particular, this study is intended to determine
which information criteria are more appropriate for mixture model
selection when considering data sets with only categorical
segmentation base variables. The generation of mixtures of
multinomial data supports the proposed analysis. As a result, we
establish a relationship between the level of measurement of
segmentation variables and some (eleven) information criteria-s
performance. The criterion AIC3 shows better performance (it
indicates the correct number of the simulated segments- structure
more often) when referring to mixtures of multinomial segmentation
base variables.
Abstract: This paper presents an intrusion detection system of hybrid neural network model based on RBF and Elman. It is used for anomaly detection and misuse detection. This model has the memory function .It can detect discrete and related aggressive behavior effectively. RBF network is a real-time pattern classifier, and Elman network achieves the memory ability for former event. Based on the hybrid model intrusion detection system uses DARPA data set to do test evaluation. It uses ROC curve to display the test result intuitively. After the experiment it proves this hybrid model intrusion detection system can effectively improve the detection rate, and reduce the rate of false alarm and fail.
Abstract: One of the main research directions in CAD/CAM
machining area is the reducing of machining time.
The feedrate scheduling is one of the advanced techniques that
allows keeping constant the uncut chip area and as sequel to keep
constant the main cutting force. They are two main ways for feedrate
optimization. The first consists in the cutting force monitoring, which
presumes to use complex equipment for the force measurement and
after this, to set the feedrate regarding the cutting force variation. The
second way is to optimize the feedrate by keeping constant the
material removal rate regarding the cutting conditions.
In this paper there is proposed a new approach using an extended
database that replaces the system model.
The feedrate scheduling is determined based on the identification
of the reconfigurable machine tool, and the feed value determination
regarding the uncut chip section area, the contact length between tool
and blank and also regarding the geometrical roughness.
The first stage consists in the blank and tool monitoring for the
determination of actual profiles. The next stage is the determination
of programmed tool path that allows obtaining the piece target
profile.
The graphic representation environment models the tool and blank
regions and, after this, the tool model is positioned regarding the
blank model according to the programmed tool path. For each of
these positions the geometrical roughness value, the uncut chip area
and the contact length between tool and blank are calculated. Each of
these parameters are compared with the admissible values and
according to the result the feed value is established.
We can consider that this approach has the following advantages:
in case of complex cutting processes the prediction of cutting force is
possible; there is considered the real cutting profile which has
deviations from the theoretical profile; the blank-tool contact length
limitation is possible; it is possible to correct the programmed tool
path so that the target profile can be obtained.
Applying this method, there are obtained data sets which allow the
feedrate scheduling so that the uncut chip area is constant and, as a
result, the cutting force is constant, which allows to use more
efficiently the machine tool and to obtain the reduction of machining
time.
Abstract: Technical laboratories are typically considered as
highly hazardous places in the polytechnic institution when
addressing the problems of high incidences and fatality rates. In
conjunction with several topics covered in the technical curricular,
safety and health precaution should be highlighted in order to
connect to few key ideas of being safe. Therefore the assessment of
safety awareness in terms of safety and health about hazardous and
risks at laboratories is needed and has to be incorporated with
technical education and other training programmes. The purpose of
this study was to determine the efficacy of technical laboratory safety
in one of the polytechnics in northern region. The study examined
three related issues that were; the availability of safety material and
equipment, safety practice adopted by technical teachers and
administrator-s safety attitudes in enforcing safety to the students. A
model of efficacy technical laboratory was developed to test the
linear relationship between existing safety material and equipment,
teachers- safety practice and administrators- attitude in enforcing
safety and to identify which of technical laboratory safety issues was
the most pertinent factor to realize safety in technical laboratory.
This was done by analyzing survey-based data sets particularly those
obtained from samples of 210 students in the polytechnic. The
Pearson Correlation was used to measure the association between the
variables and to test the research hypotheses. The result of the study
has found that there was a significant correlation between existing
safety material and equipment, safety practice adopted by teacher
and administrator-s attitude. There was also a significant relationship
between technical laboratory safety and safety practice adopted by
teacher and between technical laboratory safety and administrator
attitude. Hence, safety practice adopted by teacher and administrator
attitude is vital in realizing technical laboratory safety.
Abstract: In this paper, a neural network technique is applied to
real-time classifying media while a projectile is penetrating through
them. A laboratory-scaled penetrating setup was built for the
experiment. Features used as the network inputs were extracted from
the acceleration of penetrator. 6000 set of features from a single
penetration with known media and status were used to train the neural
network. The trained system was tested on 30 different penetration
experiments. The system produced an accuracy of 100% on the
training data set. And, their precision could be 99% for the test data
from 30 tests.
Abstract: Our Medicine-oriented research is based on a medical
data set of real patients. It is a security problem to share
patient private data with peoples other than clinician or hospital
staff. We have to remove person identification information
from medical data. The medical data without private data
are available after a de-identification process for any research
purposes. In this paper, we introduce an universal automatic
rule-based de-identification application to do all this stuff on an
heterogeneous medical data. A patient private identification is
replaced by an unique identification number, even in burnedin
annotation in pixel data. The identical identification is used
for all patient medical data, so it keeps relationships in a data.
Hospital can take an advantage of a research feedback based
on results.
Abstract: In this study, fuzzy rule-based classifier is used for the
diagnosis of congenital heart disease. Congenital heart diseases are
defined as structural or functional heart disease. Medical data sets
were obtained from Pediatric Cardiology Department at Selcuk
University, from years 2000 to 2003. Firstly, fuzzy rules were
generated by using medical data. Then the weights of fuzzy rules
were calculated. Two different reasoning methods as “weighted vote
method" and “singles winner method" were used in this study. The
results of fuzzy classifiers were compared.
Abstract: In this paper a data miner based on the learning
automata is proposed and is called LA-miner. The LA-miner extracts
classification rules from data sets automatically. The proposed
algorithm is established based on the function optimization using
learning automata. The experimental results on three benchmarks
indicate that the performance of the proposed LA-miner is
comparable with (sometimes better than) the Ant-miner (a data miner
algorithm based on the Ant Colony optimization algorithm) and CNZ
(a well-known data mining algorithm for classification).
Abstract: Support vector machines (SVMs) are considered to be
the best machine learning algorithms for minimizing the predictive
probability of misclassification. However, their drawback is that for
large data sets the computation of the optimal decision boundary is a
time consuming function of the size of the training set. Hence several
methods have been proposed to speed up the SVM algorithm. Here
three methods used to speed up the computation of the SVM
classifiers are compared experimentally using a musical genre
classification problem. The simplest method pre-selects a random
sample of the data before the application of the SVM algorithm. Two
additional methods use proximity graphs to pre-select data that are
near the decision boundary. One uses k-Nearest Neighbor graphs and
the other Relative Neighborhood Graphs to accomplish the task.
Abstract: Although there have been many researches in cluster
analysis to consider on feature weights, little effort is made on sample
weights. Recently, Yu et al. (2011) considered a probability
distribution over a data set to represent its sample weights and then
proposed sample-weighted clustering algorithms. In this paper, we
give a sample-weighted version of generalized fuzzy clustering
regularization (GFCR), called the sample-weighted GFCR
(SW-GFCR). Some experiments are considered. These experimental
results and comparisons demonstrate that the proposed SW-GFCR is
more effective than the most clustering algorithms.
Abstract: Serial hierarchical support vector machine (SHSVM)
is proposed to discriminate three brain tissues which are white matter
(WM), gray matter (GM), and cerebrospinal fluid (CSF). SHSVM
has novel classification approach by repeating the hierarchical
classification on data set iteratively. It used Radial Basis Function
(rbf) Kernel with different tuning to obtain accurate results. Also as
the second approach, segmentation performed with DAGSVM
method. In this article eight univariate features from the raw DTI data
are extracted and all the possible 2D feature sets are examined within
the segmentation process. SHSVM succeed to obtain DSI values
higher than 0.95 accuracy for all the three tissues, which are higher
than DAGSVM results.
Abstract: This study performs a comparative analysis of the 21 Greek Universities in terms of their public funding, awarded for covering their operating expenditure. First it introduces a DEA/MCDM model that allocates the fund into four expenditure factors in the most favorable way for each university. Then, it presents a common, consensual assessment model to reallocate the amounts, remaining in the same level of total public budget. From the analysis it derives that a number of universities cannot justify the public funding in terms of their size and operational workload. For them, the sufficient reduction of their public funding amount is estimated as a future target. Due to the lack of precise data for a number of expenditure criteria, the analysis is based on a mixed crisp-ordinal data set.
Abstract: Statistical learning theory was developed by Vapnik. It
is a learning theory based on Vapnik-Chervonenkis dimension. It also
has been used in learning models as good analytical tools. In general, a
learning theory has had several problems. Some of them are local
optima and over-fitting problems. As well, statistical learning theory
has same problems because the kernel type, kernel parameters, and
regularization constant C are determined subjectively by the art of
researchers. So, we propose an evolutionary statistical learning theory
to settle the problems of original statistical learning theory.
Combining evolutionary computing into statistical learning theory,
our theory is constructed. We verify improved performances of an
evolutionary statistical learning theory using data sets from KDD cup.
Abstract: Nowadays, with the emerging of the new applications
like robot control in image processing, artificial vision for visual
servoing is a rapidly growing discipline and Human-machine
interaction plays a significant role for controlling the robot. This
paper presents a new algorithm based on spatio-temporal volumes for
visual servoing aims to control robots. In this algorithm, after
applying necessary pre-processing on video frames, a spatio-temporal
volume is constructed for each gesture and feature vector is extracted.
These volumes are then analyzed for matching in two consecutive
stages. For hand gesture recognition and classification we tested
different classifiers including k-Nearest neighbor, learning vector
quantization and back propagation neural networks. We tested the
proposed algorithm with the collected data set and results showed the
correct gesture recognition rate of 99.58 percent. We also tested the
algorithm with noisy images and algorithm showed the correct
recognition rate of 97.92 percent in noisy images.
Abstract: Several works regarding facial recognition have dealt with methods which identify isolated characteristics of the face or with templates which encompass several regions of it. In this paper a new technique which approaches the problem holistically dispensing with the need to identify geometrical characteristics or regions of the face is introduced. The characterization of a face is achieved by randomly sampling selected attributes of the pixels of its image. From this information we construct a set of data, which correspond to the values of low frequencies, gradient, entropy and another several characteristics of pixel of the image. Generating a set of “p" variables. The multivariate data set with different polynomials minimizing the data fitness error in the minimax sense (L∞ - Norm) is approximated. With the use of a Genetic Algorithm (GA) it is able to circumvent the problem of dimensionality inherent to higher degree polynomial approximations. The GA yields the degree and values of a set of coefficients of the polynomials approximating of the image of a face. By finding a family of characteristic polynomials from several variables (pixel characteristics) for each face (say Fi ) in the data base through a resampling process the system in use, is trained. A face (say F ) is recognized by finding its characteristic polynomials and using an AdaBoost Classifier from F -s polynomials to each of the Fi -s polynomials. The winner is the polynomial family closer to F -s corresponding to target face in data base.
Abstract: In the recent works related with mixture discriminant
analysis (MDA), expectation and maximization (EM) algorithm is
used to estimate parameters of Gaussian mixtures. But, initial values
of EM algorithm affect the final parameters- estimates. Also, when
EM algorithm is applied two times, for the same data set, it can be
give different results for the estimate of parameters and this affect the
classification accuracy of MDA. Forthcoming this problem, we use
Self Organizing Mixture Network (SOMN) algorithm to estimate
parameters of Gaussians mixtures in MDA that SOMN is more robust
when random the initial values of the parameters are used [5]. We
show effectiveness of this method on popular simulated waveform
datasets and real glass data set.
Abstract: In this paper we use data mining techniques to investigate factors that contribute significantly to enhancing the risk of acute coronary syndrome. We assume that the dependent variable is diagnosis – with dichotomous values showing presence or absence of disease. We have applied binary regression to the factors affecting the dependent variable. The data set has been taken from two different cardiac hospitals of Karachi, Pakistan. We have total sixteen variables out of which one is assumed dependent and other 15 are independent variables. For better performance of the regression model in predicting acute coronary syndrome, data reduction techniques like principle component analysis is applied. Based on results of data reduction, we have considered only 14 out of sixteen factors.