Abstract: In this paper, we present a new learning algorithm for
anomaly based network intrusion detection using improved self
adaptive naïve Bayesian tree (NBTree), which induces a hybrid of
decision tree and naïve Bayesian classifier. The proposed approach
scales up the balance detections for different attack types and keeps
the false positives at acceptable level in intrusion detection. In
complex and dynamic large intrusion detection dataset, the detection
accuracy of naïve Bayesian classifier does not scale up as well as
decision tree. It has been successfully tested in other problem
domains that naïve Bayesian tree improves the classification rates in
large dataset. In naïve Bayesian tree nodes contain and split as
regular decision-trees, but the leaves contain naïve Bayesian
classifiers. The experimental results on KDD99 benchmark network
intrusion detection dataset demonstrate that this new approach scales
up the detection rates for different attack types and reduces false
positives in network intrusion detection.
Abstract: In this paper, a novel algorithm based on Ridgelet
Transform and support vector machine is proposed for human action
recognition. The Ridgelet transform is a directional multi-resolution
transform and it is more suitable for describing the human action by
performing its directional information to form spatial features
vectors. The dynamic transition between the spatial features is carried
out using both the Principal Component Analysis and clustering
algorithm K-means. First, the Principal Component Analysis is used
to reduce the dimensionality of the obtained vectors. Then, the kmeans
algorithm is then used to perform the obtained vectors to form
the spatio-temporal pattern, called set-of-labels, according to given
periodicity of human action. Finally, a Support Machine classifier is
used to discriminate between the different human actions. Different
tests are conducted on popular Datasets, such as Weizmann and
KTH. The obtained results show that the proposed method provides
more significant accuracy rate and it drives more robustness in very
challenging situations such as lighting changes, scaling and dynamic
environment
Abstract: Many studies have been conducted for derivation of
attenuation relationships worldwide, however few relationships have
been developed to use for the seismic region of Iranian plateau and
only few of these studies have been conducted for derivation of
attenuation relationships for parameters such as uniform duration.
Uniform duration is the total time during which the acceleration is
larger than a given threshold value (default is 5% of PGA). In this
study, the database was same as that used previously by Ghodrati
Amiri et al. (2007) with same correction methods for earthquake
records in Iran. However in this study, records from earthquakes with
MS< 4.0 were excluded from this database, each record has
individually filtered afterward, and therefore the dataset has been
expanded. These new set of attenuation relationships for Iran are
derived based on tectonic conditions with soil classification into rock
and soil. Earthquake parameters were chosen to be
hypocentral distance and magnitude in order to make it easier to use
the relationships for seismic hazard analysis. Tehran is the capital
city of Iran wit ha large number of important structures. In this study,
a probabilistic approach has been utilized for seismic hazard
assessment of this city. The resulting uniform duration against return
period diagrams are suggested to be used in any projects in the area.
Abstract: Based on a long-term vegetation index dataset of NDVI and meteorological data from 68 meteorological stations in the Qinghai-Tibet plateau and their relations with major climate factors were analyzed. The results show the following: 1) The linear trends of temperature in the Qinghai-Tibet plateau indicate that the temperature in the plateau generally increased, but it rose faster in the last 20 years. 2) The most significant NDVI increase occurred in the eastern and southern plateau. However, the western and northern plateau demonstrate a decreasing trend. 3) There is a significant positive linear correlation between NDVI and temperature and a negative correlation between NDVI and mean wind speed. However, no significant statistical relationship was found between NDVI and relative humidity, precipitation or sunshine duration.4) The changes in NDVI for the plateau are driven by temperature-precipitation, but for the desert and forest areas, the relation changes to precipitation-temperature-wind velocity and wind velocity-temperature-precipitation.
Abstract: This paper describes a new approach of classification
using genetic programming. The proposed technique consists of
genetically coevolving a population of non-linear transformations on
the input data to be classified, and map them to a new space with a
reduced dimension, in order to get a maximum inter-classes
discrimination. The classification of new samples is then performed
on the transformed data, and so become much easier. Contrary to the
existing GP-classification techniques, the proposed one use a
dynamic repartition of the transformed data in separated intervals, the
efficacy of a given intervals repartition is handled by the fitness
criterion, with a maximum classes discrimination. Experiments were
first performed using the Fisher-s Iris dataset, and then, the KDD-99
Cup dataset was used to study the intrusion detection and
classification problem. Obtained results demonstrate that the
proposed genetic approach outperform the existing GP-classification
methods [1],[2] and [3], and give a very accepted results compared to
other existing techniques proposed in [4],[5],[6],[7] and [8].
Abstract: This study analyzes the effect of discretization on
classification of datasets including continuous valued features. Six
datasets from UCI which containing continuous valued features are
discretized with entropy-based discretization method. The
performance improvement between the dataset with original features
and the dataset with discretized features is compared with k-nearest
neighbors, Naive Bayes, C4.5 and CN2 data mining classification
algorithms. As the result the classification accuracies of the six
datasets are improved averagely by 1.71% to 12.31%.
Abstract: Nowadays predicting political risk level of country
has become a critical issue for investors who intend to achieve
accurate information concerning stability of the business
environments. Since, most of the times investors are layman and
nonprofessional IT personnel; this paper aims to propose a
framework named GECR in order to help nonexpert persons to
discover political risk stability across time based on the political
news and events.
To achieve this goal, the Bayesian Networks approach was
utilized for 186 political news of Pakistan as sample dataset.
Bayesian Networks as an artificial intelligence approach has been
employed in presented framework, since this is a powerful technique
that can be applied to model uncertain domains. The results showed
that our framework along with Bayesian Networks as decision
support tool, predicted the political risk level with a high degree of
accuracy.
Abstract: Computer worm detection is commonly performed by
antivirus software tools that rely on prior explicit knowledge of the
worm-s code (detection based on code signatures). We present an
approach for detection of the presence of computer worms based on
Artificial Neural Networks (ANN) using the computer's behavioral
measures. Identification of significant features, which describe the
activity of a worm within a host, is commonly acquired from security
experts. We suggest acquiring these features by applying feature
selection methods. We compare three different feature selection
techniques for the dimensionality reduction and identification of the
most prominent features to capture efficiently the computer behavior
in the context of worm activity. Additionally, we explore three
different temporal representation techniques for the most prominent
features. In order to evaluate the different techniques, several
computers were infected with five different worms and 323 different
features of the infected computers were measured. We evaluated
each technique by preprocessing the dataset according to each one
and training the ANN model with the preprocessed data. We then
evaluated the ability of the model to detect the presence of a new
computer worm, in particular, during heavy user activity on the
infected computers.
Abstract: In recent years with the rapid development of Internet and the Web, more and more web applications have been deployed in many fields and organizations such as finance, military, and government. Together with that, hackers have found more subtle ways to attack web applications. According to international statistics, SQL Injection is one of the most popular vulnerabilities of web applications. The consequences of this type of attacks are quite dangerous, such as sensitive information could be stolen or authentication systems might be by-passed. To mitigate the situation, several techniques have been adopted. In this research, a security solution is proposed using Artificial Neural Network to protect web applications against this type of attacks. The solution has been experimented on sample datasets and has given promising result. The solution has also been developed in a prototypic web application firewall called ANNbWAF.
Abstract: Computation of facility location problem for every
location in the country is not easy simultaneously. Solving the
problem is described by using cluster computing. A technique is to
design parallel algorithm by using local search with single swap
method in order to solve that problem on clusters. Parallel
implementation is done by the use of portable parallel programming,
Message Passing Interface (MPI), on Microsoft Windows Compute
Cluster. In this paper, it presents the algorithm that used local search
with single swap method and implementation of the system of a
facility to be opened by using MPI on cluster. If large datasets are
considered, the process of calculating a reasonable cost for a facility
becomes time consuming. The result shows parallel computation of
facility location problem on cluster speedups and scales well as
problem size increases.
Abstract: Instead of traditional (nominal) classification we investigate
the subject of ordinal classification or ranking. An enhanced
method based on an ensemble of Support Vector Machines (SVM-s)
is proposed. Each binary classifier is trained with specific weights
for each object in the training data set. Experiments on benchmark
datasets and synthetic data indicate that the performance of our
approach is comparable to state of the art kernel methods for
ordinal regression. The ensemble method, which is straightforward
to implement, provides a very good sensitivity-specificity trade-off
for the highest and lowest rank.
Abstract: The clustering ensembles combine multiple partitions
generated by different clustering algorithms into a single clustering
solution. Clustering ensembles have emerged as a prominent method
for improving robustness, stability and accuracy of unsupervised
classification solutions. So far, many contributions have been done to
find consensus clustering. One of the major problems in clustering
ensembles is the consensus function. In this paper, firstly, we
introduce clustering ensembles, representation of multiple partitions,
its challenges and present taxonomy of combination algorithms.
Secondly, we describe consensus functions in clustering ensembles
including Hypergraph partitioning, Voting approach, Mutual
information, Co-association based functions and Finite mixture
model, and next explain their advantages, disadvantages and
computational complexity. Finally, we compare the characteristics of
clustering ensembles algorithms such as computational complexity,
robustness, simplicity and accuracy on different datasets in previous
techniques.
Abstract: With the rapid development in the field of life
sciences and the flooding of genomic information, the need for faster
and scalable searching methods has become urgent. One of the
approaches that were investigated is indexing. The indexing methods
have been categorized into three categories which are the lengthbased
index algorithms, transformation-based algorithms and mixed
techniques-based algorithms. In this research, we focused on the
transformation based methods. We embedded the N-gram method
into the transformation-based method to build an inverted index
table. We then applied the parallel methods to speed up the index
building time and to reduce the overall retrieval time when querying
the genomic database. Our experiments show that the use of N-Gram
transformation algorithm is an economical solution; it saves time and
space too. The result shows that the size of the index is smaller than
the size of the dataset when the size of N-Gram is 5 and 6. The
parallel N-Gram transformation algorithm-s results indicate that the
uses of parallel programming with large dataset are promising which
can be improved further.
Abstract: In recent years, a number of works proposing the
combination of multiple classifiers to produce a single
classification have been reported in remote sensing literature. The
resulting classifier, referred to as an ensemble classifier, is
generally found to be more accurate than any of the individual
classifiers making up the ensemble. As accuracy is the primary
concern, much of the research in the field of land cover
classification is focused on improving classification accuracy. This
study compares the performance of four ensemble approaches
(boosting, bagging, DECORATE and random subspace) with a
univariate decision tree as base classifier. Two training datasets,
one without ant noise and other with 20 percent noise was used to
judge the performance of different ensemble approaches. Results
with noise free data set suggest an improvement of about 4% in
classification accuracy with all ensemble approaches in
comparison to the results provided by univariate decision tree
classifier. Highest classification accuracy of 87.43% was achieved
by boosted decision tree. A comparison of results with noisy data
set suggests that bagging, DECORATE and random subspace
approaches works well with this data whereas the performance of
boosted decision tree degrades and a classification accuracy of
79.7% is achieved which is even lower than that is achieved (i.e.
80.02%) by using unboosted decision tree classifier.
Abstract: Clustering in high dimensional space is a difficult
problem which is recurrent in many fields of science and
engineering, e.g., bioinformatics, image processing, pattern
reorganization and data mining. In high dimensional space some of
the dimensions are likely to be irrelevant, thus hiding the possible
clustering. In very high dimensions it is common for all the objects in
a dataset to be nearly equidistant from each other, completely
masking the clusters. Hence, performance of the clustering algorithm
decreases.
In this paper, we propose an algorithmic framework which
combines the (reduct) concept of rough set theory with the k-means
algorithm to remove the irrelevant dimensions in a high dimensional
space and obtain appropriate clusters. Our experiment on test data
shows that this framework increases efficiency of the clustering
process and accuracy of the results.
Abstract: Load forecasting has always been the essential part of
an efficient power system operation and planning. A novel approach
based on support vector machines is proposed in this paper for annual
power load forecasting. Different kernel functions are selected to
construct a combinatorial algorithm. The performance of the new
model is evaluated with a real-world dataset, and compared with two
neural networks and some traditional forecasting techniques. The
results show that the proposed method exhibits superior performance.
Abstract: This article outlines conceptualization and
implementation of an intelligent system capable of extracting
knowledge from databases. Use of hybridized features of both the
Rough and Fuzzy Set theory render the developed system flexibility
in dealing with discreet as well as continuous datasets. A raw data set
provided to the system, is initially transformed in a computer legible
format followed by pruning of the data set. The refined data set is
then processed through various Rough Set operators which enable
discovery of parameter relationships and interdependencies. The
discovered knowledge is automatically transformed into a rule base
expressed in Fuzzy terms. Two exemplary cancer repository datasets
(for Breast and Lung Cancer) have been used to test and implement
the proposed framework.
Abstract: Environmental factors affect agriculture production
productivity and efficiency resulted in changing of profit efficiency.
This paper attempts to estimate the impacts of environmental factors
to profitability of rice farmers in the Red River Delta of Vietnam. The
dataset was extracted from 349 rice farmers using personal
interviews. Both OLS and MLE trans-log profit functions were used
in this study. Five production inputs and four environmental factors
were included in these functions. The estimation of the stochastic
profit frontier with a two-stage approach was used to measure
profitability. The results showed that the profit efficiency was about
75% on the average and environmental factors change profit
efficiency significantly beside farm specific characteristics. Plant
disease, soil fertility, irrigation apply and water pollution were the
four environmental factors cause profit loss in rice production. The
result indicated that farmers should reduce household size, farm
plots, apply row seeding technique and improve environmental
factors to obtain high profit efficiency with special consideration is
given for irrigation water quality improvement.
Abstract: This paper introduces new algorithms (Fuzzy relative
of the CLARANS algorithm FCLARANS and Fuzzy c Medoids
based on randomized search FCMRANS) for fuzzy clustering of
relational data. Unlike existing fuzzy c-medoids algorithm (FCMdd)
in which the within cluster dissimilarity of each cluster is minimized
in each iteration by recomputing new medoids given current
memberships, FCLARANS minimizes the same objective function
minimized by FCMdd by changing current medoids in such away
that that the sum of the within cluster dissimilarities is minimized.
Computing new medoids may be effected by noise because outliers
may join the computation of medoids while the choice of medoids in
FCLARANS is dictated by the location of a predominant fraction of
points inside a cluster and, therefore, it is less sensitive to the
presence of outliers. In FCMRANS the step of computing new
medoids in FCMdd is modified to be based on randomized search.
Furthermore, a new initialization procedure is developed that add
randomness to the initialization procedure used with FCMdd. Both
FCLARANS and FCMRANS are compared with the robust and
linearized version of fuzzy c-medoids (RFCMdd). Experimental
results with different samples of the Reuter-21578, Newsgroups
(20NG) and generated datasets with noise show that FCLARANS is
more robust than both RFCMdd and FCMRANS. Finally, both
FCMRANS and FCLARANS are more efficient and their outputs
are almost the same as that of RFCMdd in terms of classification
rate.
Abstract: In this paper, a particle swarm optimization (PSO)
algorithm is proposed to solve machine loading problem in flexible
manufacturing system (FMS), with bicriterion objectives of
minimizing system unbalance and maximizing system throughput in
the occurrence of technological constraints such as available
machining time and tool slots. A mathematical model is used to
select machines, assign operations and the required tools. The
performance of the PSO is tested by using 10 sample dataset and the
results are compared with the heuristics reported in the literature. The
results support that the proposed PSO is comparable with the
algorithms reported in the literature.