Abstract: In this paper, a new learning algorithm based on a
hybrid metaheuristic integrating Differential Evolution (DE) and
Reduced Variable Neighborhood Search (RVNS) is introduced to train
the classification method PROAFTN. To apply PROAFTN, values of
several parameters need to be determined prior to classification. These
parameters include boundaries of intervals and relative weights for
each attribute. Based on these requirements, the hybrid approach,
named DEPRO-RVNS, is presented in this study. In some cases, the
major problem when applying DE to some classification problems
was the premature convergence of some individuals to local optima.
To eliminate this shortcoming and to improve the exploration and
exploitation capabilities of DE, such individuals were set to iteratively
re-explored using RVNS. Based on the generated results on
both training and testing data, it is shown that the performance of
PROAFTN is significantly improved. Furthermore, the experimental
study shows that DEPRO-RVNS outperforms well-known machine
learning classifiers in a variety of problems.
Abstract: One main drawback of intrusion detection system is the
inability of detecting new attacks which do not have known
signatures. In this paper we discuss an intrusion detection method
that proposes independent component analysis (ICA) based feature
selection heuristics and using rough fuzzy for clustering data. ICA is
to separate these independent components (ICs) from the monitored
variables. Rough set has to decrease the amount of data and get rid of
redundancy and Fuzzy methods allow objects to belong to several
clusters simultaneously, with different degrees of membership. Our
approach allows us to recognize not only known attacks but also to
detect activity that may be the result of a new, unknown attack. The
experimental results on Knowledge Discovery and Data Mining-
(KDDCup 1999) dataset.
Abstract: The significance of psychology in studying politics
is embedded in philosophical issues as well as behavioural
pursuits. For the former is often associated with Sigmund Freud
and his followers. The latter is inspired by the writings of Harold
Lasswell. Political psychology or psychopolitics has its own
impression on political thought ever since it deciphers the concept
of human nature and political propaganda. More importantly,
psychoanalysis views political thought as a textual content which
needs to explore the latent from the manifest content. In other
words, it reads the text symptomatically and interprets the hidden
truth. This paper explains the paradigm of dream interpretation
applied by Freud. The dream work is a process which has four
successive activities: condensation, displacement, representation
and secondary revision. The texts dealing with political though can
also be interpreted on these principles. Freud's method of dream
interpretation draws its source after the hermeneutic model of
philological research. It provides theoretical perspective and
technical rules for the interpretation of symbolic structures. The
task of interpretation remains a discovery of equivalence of
symbols and actions through perpetual analogies. Psychoanalysis
can help in studying political thought in two ways: to study the text
distortion, Freud's dream interpretation is used as a paradigm
exploring the latent text from its manifest text; and to apply Freud's
psychoanalytic concepts and theories ranging from individual mind
to civilization, religion, war and politics.
Abstract: The objectives of this research were to explore factors
influencing knowledge management process in the manufacturing
industry and develop a model to support knowledge management
processes. The studied factors were technology infrastructure, human
resource, knowledge sharing, and the culture of the organization. The
knowledge management processes included discovery, capture,
sharing, and application. Data were collected through questionnaires
and analyzed using multiple linear regression and multiple
correlation. The results found that technology infrastructure, human
resource, knowledge sharing, and culture of the organization
influenced the discovery and capture processes. However, knowledge
sharing had no influence in sharing and application processes. A
model to support knowledge management processes was developed,
which indicated that sharing knowledge needed further improvement
in the organization.
Abstract: Website plays a significant role in success of an e-business. It is the main start point of any organization and corporation for its customers, so it's important to customize and design it according to the visitors' preferences. Also, websites are a place to introduce services of an organization and highlight new service to the visitors and audiences. In this paper, we will use web usage mining techniques, as a new field of research in data mining and knowledge discovery, in an Iranian government website. Using the results, a framework for web content layour is proposed. An agent is designed to dynamically update and improve web links locations and layout. Then, we will explain how it is used to directly enable top managers of the organization to influence on the arrangement of web contents and also to enhance customization of web site navigation due to online users' behaviors.
Abstract: This paper investigates the problem of tracking spa¬tiotemporal changes of a satellite image through the use of Knowledge Discovery in Database (KDD). The purpose of this study is to help a given user effectively discover interesting knowledge and then build prediction and decision models. Unfortunately, the KDD process for spatiotemporal data is always marked by several types of imperfections. In our paper, we take these imperfections into consideration in order to provide more accurate decisions. To achieve this objective, different KDD methods are used to discover knowledge in satellite image databases. Each method presents a different point of view of spatiotemporal evolution of a query model (which represents an extracted object from a satellite image). In order to combine these methods, we use the evidence fusion theory which considerably improves the spatiotemporal knowledge discovery process and increases our belief in the spatiotemporal model change. Experimental results of satellite images representing the region of Auckland in New Zealand depict the improvement in the overall change detection as compared to using classical methods.
Abstract: Financial forecasting using machine learning techniques has received great efforts in the last decide . In this ongoing work, we show how machine learning of graphical models will be able to infer a visualized causal interactions between different banks in the Saudi equities market. One important discovery from such learned causal graphs is how companies influence each other and to what extend. In this work, a set of graphical models named Gaussian graphical models with developed ensemble penalized feature selection methods that combine ; filtering method, wrapper method and a regularizer will be shown. A comparison between these different developed ensemble combinations will also be shown. The best ensemble method will be used to infer the causal relationships between banks in Saudi equities market.
Abstract: Scale defects are common surface defects in hot steel rolling. The modelling of such defects is problematic and their causes are not straightforward. In this study, we investigated genetic algorithms in search for a mathematical solution to scale formation. For this research, a high-dimensional data set from hot steel rolling process was gathered. The synchronisation of the variables as well as the allocation of the measurements made on the steel strip were solved before the modelling phase.
Abstract: Association rules are an important problem in data
mining. Massively increasing volume of data in real life databases
has motivated researchers to design novel and incremental algorithms
for association rules mining. In this paper, we propose an incremental
association rules mining algorithm that integrates shocking
interestingness criterion during the process of building the model. A
new interesting measure called shocking measure is introduced. One
of the main features of the proposed approach is to capture the user
background knowledge, which is monotonically augmented. The
incremental model that reflects the changing data and the user beliefs
is attractive in order to make the over all KDD process more
effective and efficient. We implemented the proposed approach and
experiment it with some public datasets and found the results quite
promising.
Abstract: Text Mining is around applying knowledge discovery
techniques to unstructured text is termed knowledge discovery in text
(KDT), or Text data mining or Text Mining. In decision tree
approach is most useful in classification problem. With this
technique, tree is constructed to model the classification process.
There are two basic steps in the technique: building the tree and
applying the tree to the database. This paper describes a proposed
C5.0 classifier that performs rulesets, cross validation and boosting
for original C5.0 in order to reduce the optimization of error ratio.
The feasibility and the benefits of the proposed approach are
demonstrated by means of medial data set like hypothyroid. It is
shown that, the performance of a classifier on the training cases from
which it was constructed gives a poor estimate by sampling or using a
separate test file, either way, the classifier is evaluated on cases that
were not used to build and evaluate the classifier are both are large. If
the cases in hypothyroid.data and hypothyroid.test were to be
shuffled and divided into a new 2772 case training set and a 1000
case test set, C5.0 might construct a different classifier with a lower
or higher error rate on the test cases. An important feature of see5 is
its ability to classifiers called rulesets. The ruleset has an error rate
0.5 % on the test cases. The standard errors of the means provide an
estimate of the variability of results. One way to get a more reliable
estimate of predictive is by f-fold –cross- validation. The error rate of
a classifier produced from all the cases is estimated as the ratio of the
total number of errors on the hold-out cases to the total number of
cases. The Boost option with x trials instructs See5 to construct up to
x classifiers in this manner. Trials over numerous datasets, large and
small, show that on average 10-classifier boosting reduces the error
rate for test cases by about 25%.
Abstract: In the recent past, there has been an increasing interest
in applying evolutionary methods to Knowledge Discovery in
Databases (KDD) and a number of successful applications of Genetic
Algorithms (GA) and Genetic Programming (GP) to KDD have been
demonstrated. The most predominant representation of the
discovered knowledge is the standard Production Rules (PRs) in the
form If P Then D. The PRs, however, are unable to handle
exceptions and do not exhibit variable precision. The Censored
Production Rules (CPRs), an extension of PRs, were proposed by
Michalski & Winston that exhibit variable precision and supports an
efficient mechanism for handling exceptions. A CPR is an
augmented production rule of the form:
If P Then D Unless C, where C (Censor) is an exception to the rule.
Such rules are employed in situations, in which the conditional
statement 'If P Then D' holds frequently and the assertion C holds
rarely. By using a rule of this type we are free to ignore the exception
conditions, when the resources needed to establish its presence are
tight or there is simply no information available as to whether it
holds or not. Thus, the 'If P Then D' part of the CPR expresses
important information, while the Unless C part acts only as a switch
and changes the polarity of D to ~D.
This paper presents a classification algorithm based on evolutionary
approach that discovers comprehensible rules with exceptions in the
form of CPRs.
The proposed approach has flexible chromosome encoding, where
each chromosome corresponds to a CPR. Appropriate genetic
operators are suggested and a fitness function is proposed that
incorporates the basic constraints on CPRs. Experimental results are
presented to demonstrate the performance of the proposed algorithm.
Abstract: Many factors affect the success of Machine Learning
(ML) on a given task. The representation and quality of the instance
data is first and foremost. If there is much irrelevant and redundant
information present or noisy and unreliable data, then knowledge
discovery during the training phase is more difficult. It is well known
that data preparation and filtering steps take considerable amount of
processing time in ML problems. Data pre-processing includes data
cleaning, normalization, transformation, feature extraction and
selection, etc. The product of data pre-processing is the final training
set. It would be nice if a single sequence of data pre-processing
algorithms had the best performance for each data set but this is not
happened. Thus, we present the most well know algorithms for each
step of data pre-processing so that one achieves the best performance
for their data set.
Abstract: Rule Discovery is an important technique for mining
knowledge from large databases. Use of objective measures for
discovering interesting rules leads to another data mining problem,
although of reduced complexity. Data mining researchers have
studied subjective measures of interestingness to reduce the volume
of discovered rules to ultimately improve the overall efficiency of
KDD process.
In this paper we study novelty of the discovered rules as a
subjective measure of interestingness. We propose a hybrid approach
based on both objective and subjective measures to quantify novelty
of the discovered rules in terms of their deviations from the known
rules (knowledge). We analyze the types of deviation that can arise
between two rules and categorize the discovered rules according to
the user specified threshold. We implement the proposed framework
and experiment with some public datasets. The experimental results
are promising.
Abstract: An on-demand routing protocol for wireless ad hoc
networks is one that searches for and attempts to discover a route to
some destination node only when a sending node originates a data
packet addressed to that node. In order to avoid the need for such a
route discovery to be performed before each data packet is sent, such
routing protocols must cache routes previously discovered. This
paper presents an analysis of the effect of intelligent caching in a non
clustered network, using on-demand routing protocols in wireless ad
hoc networks. The analysis carried out is based on the Dynamic
Source Routing protocol (DSR), which operates entirely on-demand.
DSR uses the cache in every node to save the paths that are learnt
during route discovery procedure. In this implementation, caching
these paths only at intermediate nodes and using the paths from these
caches when required is tried. This technique helps in storing more
number of routes that are learnt without erasing the entries in the
cache, to store a new route that is learnt.
The simulation results on DSR have shown that this technique
drastically increases the available memory for caching the routes
discovered without affecting the performance of the DSR routing
protocol in any way, except for a small increase in end to end delay.
Abstract: There are several approaches in trying to solve the
Quantitative 1Structure-Activity Relationship (QSAR) problem.
These approaches are based either on statistical methods or on
predictive data mining. Among the statistical methods, one should
consider regression analysis, pattern recognition (such as cluster
analysis, factor analysis and principal components analysis) or partial
least squares. Predictive data mining techniques use either neural
networks, or genetic programming, or neuro-fuzzy knowledge. These
approaches have a low explanatory capability or non at all. This
paper attempts to establish a new approach in solving QSAR
problems using descriptive data mining. This way, the relationship
between the chemical properties and the activity of a substance
would be comprehensibly modeled.
Abstract: Mining frequent tree patterns have many useful
applications in XML mining, bioinformatics, network routing, etc.
Most of the frequent subtree mining algorithms (i.e. FREQT,
TreeMiner and CMTreeMiner) use anti-monotone property in the
phase of candidate subtree generation. However, none of these
algorithms have verified the correctness of this property in tree
structured data. In this research it is shown that anti-monotonicity
does not generally hold, when using weighed support in tree pattern
discovery. As a result, tree mining algorithms that are based on this
property would probably miss some of the valid frequent subtree
patterns in a collection of trees. In this paper, we investigate the
correctness of anti-monotone property for the problem of weighted
frequent subtree mining. In addition we propose W3-Miner, a new
algorithm for full extraction of frequent subtrees. The experimental
results confirm that W3-Miner finds some frequent subtrees that the
previously proposed algorithms are not able to discover.
Abstract: In this paper, a data mining model to SMEs for detecting financial and operational risk indicators by data mining is presenting. The identification of the risk factors by clarifying the relationship between the variables defines the discovery of knowledge from the financial and operational variables. Automatic and estimation oriented information discovery process coincides the definition of data mining. During the formation of model; an easy to understand, easy to interpret and easy to apply utilitarian model that is far from the requirement of theoretical background is targeted by the discovery of the implicit relationships between the data and the identification of effect level of every factor. In addition, this paper is based on a project which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK).
Abstract: It has been recognized that due to the autonomy and
heterogeneity, of Web services and the Web itself, new approaches
should be developed to describe and advertise Web services. The
most notable approaches rely on the description of Web services
using semantics. This new breed of Web services, termed semantic
Web services, will enable the automatic annotation, advertisement,
discovery, selection, composition, and execution of interorganization
business logic, making the Internet become a common
global platform where organizations and individuals communicate
with each other to carry out various commercial activities and to
provide value-added services. This paper deals with two of the
hottest R&D and technology areas currently associated with the Web
– Web services and the semantic Web. It describes how semantic
Web services extend Web services as the semantic Web improves the
current Web, and presents three different conceptual approaches to
deploying semantic Web services, namely, WSDL-S, OWL-S, and
WSMO.
Abstract: With increasing data in medical databases, medical
data retrieval is growing in popularity. Some of this analysis
including inducing propositional rules from databases using many
soft techniques, and then using these rules in an expert system.
Diagnostic rules and information on features are extracted from
clinical databases on diseases of congenital anomaly. This paper
explain the latest soft computing techniques and some of the
adaptive techniques encompasses an extensive group of methods
that have been applied in the medical domain and that are used for
the discovery of data dependencies, importance of features,
patterns in sample data, and feature space dimensionality
reduction. These approaches pave the way for new and interesting
avenues of research in medical imaging and represent an important
challenge for researchers.
Abstract: Discovery schools in Jordan are connected in one flat
ATM bridge network. All Schools connected to the network will hear
broadcast traffic. High percentage of unwanted traffic such as
broadcast, consumes the bandwidth between schools and QRC.
Routers in QRC have high CPU utilization. The number of
connections on the router is very high, and may exceed recommend
manufacturing specifications. One way to minimize number of
connections to the routers in QRC, and minimize broadcast traffic is
to use PPPoE. In this study, a PPPoE solution has been presented
which shows high performance for the clients when accessing the
school server resources. Despite the large number of the discovery
schools at MoE, the experimental results show that the PPPoE
solution is able to yield a satisfactory performance for each client at
the school and noticeably reduce the traffic broadcast to the QRC.