Abstract: Data flow and the purpose of reporting the data are different and dependent on business needs. Different parameters are reported and transferred regularly during freight delivery. This business practices form the dataset constructed for each time point and contain all required information for freight moving decisions. As a significant amount of these data is used for various purposes, an integrating methodological approach must be developed to respond to the indicated problem. The proposed methodology contains several steps: (1) collecting context data sets and data validation; (2) multi-objective analysis for optimizing freight transfer services. For data validation, the study involves Grubbs outliers analysis, particularly for data cleaning and the identification of statistical significance of data reporting event cases. The Grubbs test is often used as it measures one external value at a time exceeding the boundaries of standard normal distribution. In the study area, the test was not widely applied by authors, except when the Grubbs test for outlier detection was used to identify outsiders in fuel consumption data. In the study, the authors applied the method with a confidence level of 99%. For the multi-objective analysis, the authors would like to select the forms of construction of the genetic algorithms, which have more possibilities to extract the best solution. For freight delivery management, the schemas of genetic algorithms' structure are used as a more effective technique. Due to that, the adaptable genetic algorithm is applied for the description of choosing process of the effective transportation corridor. In this study, the multi-objective genetic algorithm methods are used to optimize the data evaluation and select the appropriate transport corridor. The authors suggest a methodology for the multi-objective analysis, which evaluates collected context data sets and uses this evaluation to determine a delivery corridor for freight transfer service in the multi-modal transportation network. In the multi-objective analysis, authors include safety components, the number of accidents a year, and freight delivery time in the multi-modal transportation network. The proposed methodology has practical value in the management of multi-modal transportation processes.
Abstract: Prediction of wall shear stress in a rectangular channel, with non-homogeneous roughness distribution, was studied. Estimation of shear stress is an important subject in hydraulic engineering, since it affects the flow structure directly. In this study, the Genetic Algorithm Artificial (GAA) neural network is introduced as a hybrid methodology of the Artificial Neural Network (ANN) and modified Genetic Algorithm (GA) combination. This GAA method was employed to predict the wall shear stress. Various input combinations and transfer functions were considered to find the most appropriate GAA model. The results show that the proposed GAA method could predict the wall shear stress of open channels with high accuracy, by Root Mean Square Error (RMSE) of 0.064 in the test dataset. Thus, using GAA provides an accurate and practical simple-to-use equation.
Abstract: Cartesian Genetic Programming (CGP) is explored to
design an optimal circuit capable of early stage breast cancer
detection. CGP is used to evolve simple multiplexer circuits for
detection of malignancy in the Fine Needle Aspiration (FNA) samples
of breast. The data set used is extracted from Wisconsins Breast
Cancer Database (WBCD). A range of experiments were performed,
each with different set of network parameters. The best evolved
network detected malignancy with an accuracy of 99.14%, which is
higher than that produced with most of the contemporary non-linear
techniques that are computational expensive than the proposed
system. The evolved network comprises of simple multiplexers
and can be implemented easily in hardware without any further
complications or inaccuracy, being the digital circuit.
Abstract: Cardiologists perform cardiac auscultation to detect
abnormalities in heart sounds. Since accurate auscultation is
a crucial first step in screening patients with heart diseases,
there is a need to develop computer-aided detection/diagnosis
(CAD) systems to assist cardiologists in interpreting heart sounds
and provide second opinions. In this paper different algorithms
are implemented for automated heart sound classification using
unsegmented phonocardiogram (PCG) signals. Support vector
machine (SVM), artificial neural network (ANN) and cartesian
genetic programming evolved artificial neural network (CGPANN)
without the application of any segmentation algorithm has been
explored in this study. The signals are first pre-processed to remove
any unwanted frequencies. Both time and frequency domain features
are then extracted for training the different models. The different
algorithms are tested in multiple scenarios and their strengths and
weaknesses are discussed. Results indicate that SVM outperforms
the rest with an accuracy of 73.64%.
Abstract: In order to retrieve images efficiently from a large
database, a unique method integrating color and texture features
using genetic programming has been proposed. Opponent color
histogram which gives shadow, shade, and light intensity invariant
property is employed in the proposed framework for extracting color
features. For texture feature extraction, fast discrete curvelet
transform which captures more orientation information at different
scales is incorporated to represent curved like edges. The recent
scenario in the issues of image retrieval is to reduce the semantic gap
between user’s preference and low level features. To address this
concern, genetic algorithm combined with relevance feedback is
embedded to reduce semantic gap and retrieve user’s preference
images. Extensive and comparative experiments have been conducted
to evaluate proposed framework for content based image retrieval on
two databases, i.e., COIL-100 and Corel-1000. Experimental results
clearly show that the proposed system surpassed other existing
systems in terms of precision and recall. The proposed work achieves
highest performance with average precision of 88.2% on COIL-100
and 76.3% on Corel, the average recall of 69.9% on COIL and 76.3%
on Corel. Thus, the experimental results confirm that the proposed
content based image retrieval system architecture attains better
solution for image retrieval.
Abstract: Rivers have transient storage or dead zones where
injected pollutants or solutes are entrapped for considerable period of
time, known as residence time, before being released into the main
flowing zones of rivers. In this study, a new empirical expression for
residence time, implementing genetic programming on published
dispersion data, has been derived. The proposed expression uses few
hydraulic and geometric characteristics of rivers which are normally
known to the authorities. When compared with some reported
expressions, based on various statistical indices, it can be concluded
that the proposed expression predicts the residence time of pollutants
in natural rivers more accurately.
Abstract: This paper presents an extensive review of literature
relevant to the modelling techniques adopted in sediment yield and
hydrological modelling. Several studies relating to sediment yield are
discussed. Many research areas of sedimentation in rivers, runoff and
reservoirs are presented. Different types of hydrological models,
different methods employed in selecting appropriate models for
different case studies are analysed. Applications of evolutionary
algorithms and artificial intelligence techniques are discussed and
compared especially in water resources management and modelling.
This review concentrates on Genetic Programming (GP) and fully
discusses its theories and applications. The successful applications of
GP as a soft computing technique were reviewed in sediment
modelling. Some fundamental issues such as benchmark,
generalization ability, bloat, over-fitting and other open issues
relating to the working principles of GP are highlighted. This paper
concludes with the identification of some research gaps in
hydrological modelling and sediment yield.
Abstract: Hydrological modelling plays a crucial role in the planning and management of water resources, most especially in water stressed regions where the need to effectively manage the available water resources is of critical importance. However, due to the complex, nonlinear and dynamic behaviour of hydro-climatic interactions, achieving reliable modelling of water resource systems and accurate projection of hydrological parameters are extremely challenging. Although a significant number of modelling techniques (process-based and data-driven) have been developed and adopted in that regard, the field of hydrological modelling is still considered as one that has sluggishly progressed over the past decades. This is majorly as a result of the identification of some degree of uncertainty in the methodologies and results of techniques adopted. In recent times, evolutionary computation (EC) techniques have been developed and introduced in response to the search for efficient and reliable means of providing accurate solutions to hydrological related problems. This paper presents a comprehensive review of the underlying principles, methodological needs and applications of a promising evolutionary computation modelling technique – genetic programming (GP). It examines the specific characteristics of the technique which makes it suitable to solving hydrological modelling problems. It discusses the opportunities inherent in the application of GP in water related-studies such as rainfall estimation, rainfall-runoff modelling, streamflow forecasting, sediment transport modelling, water quality modelling and groundwater modelling among others. Furthermore, the means by which such opportunities could be harnessed in the near future are discussed. In all, a case for total embracement of GP and its variants in hydrological modelling studies is made so as to put in place strategies that would translate into achieving meaningful progress as it relates to modelling of water resource systems, and also positively influence decision-making by relevant stakeholders.
Abstract: Transient storage zones along the flow paths of rivers have great influence on the dispersion of pollutants that are either accidentally or otherwise led into them. The speed with which these pollution clouds get transported and dispersed downstream is, to a large extent, explained by the longitudinal dispersion coefficients in the free-flowing zones of rivers (Kf). In the present work, a new empirical expression for Kf has been derived employing genetic programming (GP) on published dispersion data. The proposed expression uses few hydraulic and geometric characteristics of a river that are readily available to field engineers. Based on various performance indices, the proposed expression is found superior to other existing expression for Kf.
Abstract: This paper presents an evolutionary method for designing
electronic circuits and numerical methods associated with
monitoring systems. The instruments described here have been used
in studies of weather and climate changes due to global warming, and
also in medical patient supervision. Genetic Programming systems
have been used both for designing circuits and sensors, and also for
determining sensor parameters. The authors advance the thesis that
the software side of such a system should be written in computer
languages with a strong mathematical and logic background in order
to prevent software obsolescence, and achieve program correctness.
Abstract: Water level forecasting using records of past time series is of importance in water resources engineering and management. For example, water level affects groundwater tables in low-lying coastal areas, as well as hydrological regimes of some coastal rivers. Then, a reliable prediction of sea-level variations is required in coastal engineering and hydrologic studies. During the past two decades, the approaches based on the Genetic Programming (GP) and Artificial Neural Networks (ANN) were developed. In the present study, the GP is used to forecast daily water level variations for a set of time intervals using observed water levels. The measurements from a single tide gauge at Urmia Lake, Northwest Iran, were used to train and validate the GP approach for the period from January 1997 to July 2008. Statistics, the root mean square error and correlation coefficient, are used to verify model by comparing with a corresponding outputs from Artificial Neural Network model. The results show that both these artificial intelligence methodologies are satisfactory and can be considered as alternatives to the conventional harmonic analysis.
Abstract: In this paper we present a GP-based method for automatically evolve projections, so that data can be more easily classified in the projected spaces. At the same time, our approach can reduce dimensionality by constructing more relevant attributes. Fitness of each projection measures how easy is to classify the dataset after applying the projection. This is quickly computed by a Simple Linear Perceptron. We have tested our approach in three domains. The experiments show that it obtains good results, compared to other Machine Learning approaches, while reducing dimensionality in many cases.
Abstract: By systematically applying different engineering
methods, difficult financial problems become approachable. Using a
combination of theory and techniques such as wavelet transform,
time series data mining, Markov chain based discrete stochastic
optimization, and evolutionary algorithms, this work formulated a
strategy to characterize and forecast non-linear time series. It
attempted to extract typical features from the volatility data sets of
S&P100 and S&P500 indices that include abrupt drops, jumps and
other non-linearity. As a result, accuracy of forecasting has reached
an average of over 75% surpassing any other publicly available
results on the forecast of any financial index.
Abstract: In the recent past, there has been an increasing interest
in applying evolutionary methods to Knowledge Discovery in
Databases (KDD) and a number of successful applications of Genetic
Algorithms (GA) and Genetic Programming (GP) to KDD have been
demonstrated. The most predominant representation of the
discovered knowledge is the standard Production Rules (PRs) in the
form If P Then D. The PRs, however, are unable to handle
exceptions and do not exhibit variable precision. The Censored
Production Rules (CPRs), an extension of PRs, were proposed by
Michalski & Winston that exhibit variable precision and supports an
efficient mechanism for handling exceptions. A CPR is an
augmented production rule of the form:
If P Then D Unless C, where C (Censor) is an exception to the rule.
Such rules are employed in situations, in which the conditional
statement 'If P Then D' holds frequently and the assertion C holds
rarely. By using a rule of this type we are free to ignore the exception
conditions, when the resources needed to establish its presence are
tight or there is simply no information available as to whether it
holds or not. Thus, the 'If P Then D' part of the CPR expresses
important information, while the Unless C part acts only as a switch
and changes the polarity of D to ~D.
This paper presents a classification algorithm based on evolutionary
approach that discovers comprehensible rules with exceptions in the
form of CPRs.
The proposed approach has flexible chromosome encoding, where
each chromosome corresponds to a CPR. Appropriate genetic
operators are suggested and a fitness function is proposed that
incorporates the basic constraints on CPRs. Experimental results are
presented to demonstrate the performance of the proposed algorithm.
Abstract: There are several approaches in trying to solve the
Quantitative 1Structure-Activity Relationship (QSAR) problem.
These approaches are based either on statistical methods or on
predictive data mining. Among the statistical methods, one should
consider regression analysis, pattern recognition (such as cluster
analysis, factor analysis and principal components analysis) or partial
least squares. Predictive data mining techniques use either neural
networks, or genetic programming, or neuro-fuzzy knowledge. These
approaches have a low explanatory capability or non at all. This
paper attempts to establish a new approach in solving QSAR
problems using descriptive data mining. This way, the relationship
between the chemical properties and the activity of a substance
would be comprehensibly modeled.
Abstract: The development of Artificial Neural Networks
(ANNs) is usually a slow process in which the human expert has to
test several architectures until he finds the one that achieves best
results to solve a certain problem. This work presents a new
technique that uses Genetic Programming (GP) for automatically
generating ANNs. To do this, the GP algorithm had to be changed in
order to work with graph structures, so ANNs can be developed. This
technique also allows the obtaining of simplified networks that solve
the problem with a small group of neurons. In order to measure the
performance of the system and to compare the results with other
ANN development methods by means of Evolutionary Computation
(EC) techniques, several tests were performed with problems based
on some of the most used test databases. The results of those
comparisons show that the system achieves good results comparable
with the already existing techniques and, in most of the cases, they
worked better than those techniques.
Abstract: Automated discovery of Rule is, due to its applicability, one of the most fundamental and important method in KDD. It has been an active research area in the recent past. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form: Decision If < condition> Generality Specificity . HPRs systems are capable of handling taxonomical structures inherent in the knowledge about the real world. This paper focuses on the issue of mining Quantified rules with crisp hierarchical structure using Genetic Programming (GP) approach to knowledge discovery. The post-processing scheme presented in this work uses Quantified production rules as initial individuals of GP and discovers hierarchical structure. In proposed approach rules are quantified by using Dempster Shafer theory. Suitable genetic operators are proposed for the suggested encoding. Based on the Subsumption Matrix(SM), an appropriate fitness function is suggested. Finally, Quantified Hierarchical Production Rules (HPRs) are generated from the discovered hierarchy, using Dempster Shafer theory. Experimental results are presented to demonstrate the performance of the proposed algorithm.
Abstract: In this paper, solution of fuzzy differential equation
under general differentiability is obtained by genetic programming
(GP). The obtained solution in this method is equivalent or very close
to the exact solution of the problem. Accuracy of the solution to this
problem is qualitatively better. An illustrative numerical example is
presented for the proposed method.
Abstract: Automated discovery of hierarchical structures in
large data sets has been an active research area in the recent past.
This paper focuses on the issue of mining generalized rules with crisp
hierarchical structure using Genetic Programming (GP) approach to
knowledge discovery. The post-processing scheme presented in this
work uses flat rules as initial individuals of GP and discovers
hierarchical structure. Suitable genetic operators are proposed for the
suggested encoding. Based on the Subsumption Matrix(SM), an
appropriate fitness function is suggested. Finally, Hierarchical
Production Rules (HPRs) are generated from the discovered
hierarchy. Experimental results are presented to demonstrate the
performance of the proposed algorithm.
Abstract: Protein residue contact map is a compact
representation of secondary structure of protein. Due to the
information hold in the contact map, attentions from researchers in
related field were drawn and plenty of works have been done
throughout the past decade. Artificial intelligence approaches have
been widely adapted in related works such as neural networks,
genetic programming, and Hidden Markov model as well as support
vector machine. However, the performance of the prediction was not
generalized which probably depends on the data used to train and
generate the prediction model. This situation shown the importance
of the features or information used in affecting the prediction
performance. In this research, support vector machine was used to
predict protein residue contact map on different combination of
features in order to show and analyze the effectiveness of the