Abstract: The distribution of macrobenthic polychaetes along
the coastal waters of Penang National Park was surveyed to estimate
the effect of various environmental parameters at three stations
(200m, 600m and 1200m) from the shoreline, during six sampling
months, from June 2010 to April 2011.The use of polychaetes in
descriptive ecology is surveyed in the light of a recent investigation
particularly concerning the soft bottom biota environments.
Polychaetes, often connected in the former to the notion of
opportunistic species able to proliferate after an enhancement in
organic matter, had performed a momentous role particularly with
regard to effected soft-bottom habitats. The objective of this survey
was to investigate different environment stress over soft bottom
polychaete community along Teluk Ketapang and Pantai Acheh
(Penang National Park) over a year period. Variations in the
polychaete community were evaluated using univariate and
multivariate methods. The results of PCA analysis displayed a
positive relation between macrobenthic community structures and
environmental parameters such as sediment particle size and organic
matter in the coastal water. A total of 604 individuals were examined
which was grouped into 23 families. Family Nereidae was the most
abundant (22.68%), followed by Spionidae (22.02%), Hesionidae
(12.58%), Nephtylidae (9.27%) and Orbiniidae (8.61%). It is
noticeable that good results can only be obtained on the basis of good
taxonomic resolution. We proposed that, in monitoring surveys,
operative time could be optimized not only by working at a highertaxonomic
level on the entire macrobenthic data set, but by also
choosing an especially indicative group and working at lower
taxonomic and good level.
Abstract: Investigation of soil properties like Cation Exchange
Capacity (CEC) plays important roles in study of environmental
reaserches as the spatial and temporal variability of this property
have been led to development of indirect methods in estimation of
this soil characteristic. Pedotransfer functions (PTFs) provide an
alternative by estimating soil parameters from more readily available
soil data. 70 soil samples were collected from different horizons of
15 soil profiles located in the Ziaran region, Qazvin province, Iran.
Then, multivariate regression and neural network model (feedforward
back propagation network) were employed to develop a
pedotransfer function for predicting soil parameter using easily
measurable characteristics of clay and organic carbon. The
performance of the multivariate regression and neural network model
was evaluated using a test data set. In order to evaluate the models,
root mean square error (RMSE) was used. The value of RMSE and
R2 derived by ANN model for CEC were 0.47 and 0.94 respectively,
while these parameters for multivariate regression model were 0.65
and 0.88 respectively. Results showed that artificial neural network
with seven neurons in hidden layer had better performance in
predicting soil cation exchange capacity than multivariate regression.
Abstract: Data clustering is an important data exploration
technique with many applications in data mining. The k-means
algorithm is well known for its efficiency in clustering large data
sets. However, this algorithm is suitable for spherical shaped clusters
of similar sizes and densities. The quality of the resulting clusters
decreases when the data set contains spherical shaped with large
variance in sizes. In this paper, we introduce a competent procedure
to overcome this problem. The proposed method is based on shifting
the center of the large cluster toward the small cluster, and recomputing
the membership of small cluster points, the experimental
results reveal that the proposed algorithm produces satisfactory
results.
Abstract: This paper deals with heterogeneous autoregressive
models of realized volatility (HAR-RV models) on high-frequency
data of stock indices in the USA. Its aim is to capture the behavior of
three groups of market participants trading on a daily, weekly and
monthly basis and assess their role in predicting the daily realized
volatility. The benefits of this work lies mainly in the application of
heterogeneous autoregressive models of realized volatility on stock
indices in the USA with a special aim to analyze an impact of the
global financial crisis on applied models forecasting performance.
We use three data sets, the first one from the period before the global
financial crisis occurred in the years 2006-2007, the second one from
the period when the global financial crisis fully hit the U.S. financial
market in 2008-2009 years, and the last period was defined over
2010-2011 years. The model output indicates that estimated realized
volatility in the market is very much determined by daily traders and
in some cases excludes the impact of those market participants who
trade on monthly basis.
Abstract: Microarrays technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining this data one can identify the dynamics of the gene expression time series. By recourse of principal component analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis. We applied PCA to reduce the dimensionality of the data set. Examination of the components also provides insight into the underlying factors measured in the experiments. Our results suggest that all rhythmic content of data can be reduced to three main components.
Abstract: The prediction of transmembrane helical segments
(TMHs) in membrane proteins is an important field in the
bioinformatics research. In this paper, a method based on discrete
wavelet transform (DWT) has been developed to predict the number
and location of TMHs in membrane proteins. PDB coded as 1F88 was
chosen as an example to describe the prediction of the number and
location of TMHs in membrane proteins by using this method. One
group of test data sets that contain total 19 protein sequences was
utilized to access the effect of this method. Compared with the
prediction results of DAS, PRED-TMR2, SOSUI, HMMTOP2.0 and
TMHMM2.0, the obtained results indicate that the presented method
has higher prediction accuracy.
Abstract: There is a world-wide need for the development of sustainable management strategies to control pest infestation and the development of phosphine (PH3) resistance in lesser grain borer (Rhyzopertha dominica). Computer simulation models can provide a relatively fast, safe and inexpensive way to weigh the merits of various management options. However, the usefulness of simulation models relies on the accurate estimation of important model parameters, such as mortality. Concentration and time of exposure are both important in determining mortality in response to a toxic agent. Recent research indicated the existence of two resistance phenotypes in R. dominica in Australia, weak and strong, and revealed that the presence of resistance alleles at two loci confers strong resistance, thus motivating the construction of a two-locus model of resistance. Experimental data sets on purified pest strains, each corresponding to a single genotype of our two-locus model, were also available. Hence it became possible to explicitly include mortalities of the different genotypes in the model. In this paper we described how we used two generalized linear models (GLM), probit and logistic models, to fit the available experimental data sets. We used a direct algebraic approach generalized inverse matrix technique, rather than the traditional maximum likelihood estimation, to estimate the model parameters. The results show that both probit and logistic models fit the data sets well but the former is much better in terms of small least squares (numerical) errors. Meanwhile, the generalized inverse matrix technique achieved similar accuracy results to those from the maximum likelihood estimation, but is less time consuming and computationally demanding.
Abstract: This paper presents a new approach for the prob-ability density function estimation using the Support Vector Ma-chines (SVM) and the Expectation Maximization (EM) algorithms.In the proposed approach, an advanced algorithm for the SVM den-sity estimation which incorporates the Mean Field theory in the learning process is used. Instead of using ad-hoc values for the para-meters of the kernel function which is used by the SVM algorithm,the proposed approach uses the EM algorithm for an automatic optimization of the kernel. Experimental evaluation using simulated data set shows encouraging results.
Abstract: The Neuro-Fuzzy hybridization scheme has become
of research interest in pattern classification over the past decade. The
present paper proposes a novel Modified Adaptive Fuzzy Inference
Engine (MAFIE) for pattern classification. A modified Apriori
algorithm technique is utilized to reduce a minimal set of decision
rules based on input output data sets. A TSK type fuzzy inference
system is constructed by the automatic generation of membership
functions and rules by the fuzzy c-means clustering and Apriori
algorithm technique, respectively. The generated adaptive fuzzy
inference engine is adjusted by the least-squares fit and a conjugate
gradient descent algorithm towards better performance with a
minimal set of rules. The proposed MAFIE is able to reduce the
number of rules which increases exponentially when more input
variables are involved. The performance of the proposed MAFIE is
compared with other existing applications of pattern classification
schemes using Fisher-s Iris and Wisconsin breast cancer data sets and
shown to be very competitive.
Abstract: Poly-β-hydroxybutyrate (PHB) is one of the most
famous biopolymers that has various applications in production of
biodegradable carriers. The most important strategy for enhancing
efficiency in production process and reducing the price of PHB, is the
accurate expression of kinetic model of products formation and
parameters that are effective on it, such as Dry Cell Weight (DCW)
and substrate consumption. Considering the high capabilities of
artificial neural networks in modeling and simulation of non-linear
systems such as biological and chemical industries that mainly are
multivariable systems, kinetic modeling of microbial production of
PHB that is a complex and non-linear biological process, the three
layers perceptron neural network model was used in this study.
Artificial neural network educates itself and finds the hidden laws
behind the data with mapping based on experimental data, of dry cell
weight, substrate concentration as input and PHB concentration as
output. For training the network, a series of experimental data for
PHB production from Hydrogenophaga Pseudoflava by glucose
carbon source was used. After training the network, two other
experimental data sets that have not intervened in the network
education, including dry cell concentration and substrate
concentration were applied as inputs to the network, and PHB
concentration was predicted by the network. Comparison of predicted
data by network and experimental data, indicated a high precision
predicted for both fructose and whey carbon sources. Also in present
study for better understanding of the ability of neural network in
modeling of biological processes, microbial production kinetic of
PHB by Leudeking-Piret experimental equation was modeled. The
Observed result indicated an accurate prediction of PHB
concentration by artificial neural network higher than Leudeking-
Piret model.
Abstract: A new dynamic clustering approach (DCPSO), based
on Particle Swarm Optimization, is proposed. This approach is
applied to unsupervised image classification. The proposed approach
automatically determines the "optimum" number of clusters and
simultaneously clusters the data set with minimal user interference.
The algorithm starts by partitioning the data set into a relatively large
number of clusters to reduce the effects of initial conditions. Using
binary particle swarm optimization the "best" number of clusters is
selected. The centers of the chosen clusters is then refined via the Kmeans
clustering algorithm. The experiments conducted show that
the proposed approach generally found the "optimum" number of
clusters on the tested images.
Abstract: This study investigated the effect of cross sectional
geometry on sediment transport rate. The processes of sediment
transport are generally associated to environmental management,
such as pollution caused by the forming of suspended sediment in the
channel network of a watershed and preserving physical habitats and
native vegetations, and engineering applications, such as the
influence of sediment transport on hydraulic structures and flood
control design. Many equations have been proposed for computing
the sediment transport, the influence of many variables on sediment
transport has been understood; however, the effect of other variables
still requires further research. For open channel flow, sediment
transport capacity is recognized to be a function of friction slope,
flow velocity, grain size, grain roughness and form roughness, the
hydraulic radius of the bed section and the type and quantity of
vegetation cover. The effect of cross sectional geometry of the
channel on sediment transport is one of the variables that need
additional investigation. The width-depth ratio (W/d) is a
comparative indicator of the channel shape. The width is the total
distance across the channel and the depth is the mean depth of the
channel. The mean depth is best calculated as total cross-sectional
area divided by the top width. Channels with high W/d ratios tend to
be shallow and wide, while channels with low (W/d) ratios tend to be
narrow and deep. In this study, the effects of the width-depth ratio on
sediment transport was demonstrated theoretically by inserting the
shape factor in sediment continuity equation and analytically by
utilizing the field data sets for Yalobusha River. It was found by
utilizing the two approaches as a width-depth ratio increases the
sediment transport decreases.
Abstract: Ontologies and tagging systems are two different ways to organize the knowledge present in the current Web. In this paper we propose a simple method to model folksonomies, as tagging systems, with ontologies. We show the scalability of the method using real data sets. The modeling method is composed of a generic ontology that represents any folksonomy and an algorithm to transform the information contained in folksonomies to the generic ontology. The method allows representing folksonomies at any instant of time.
Abstract: The recognition of human faces, especially those with
different orientations is a challenging and important problem in image
analysis and classification. This paper proposes an effective scheme
for rotation invariant face recognition using Log-Polar Transform and
Discrete Cosine Transform combined features. The rotation invariant
feature extraction for a given face image involves applying the logpolar
transform to eliminate the rotation effect and to produce a row
shifted log-polar image. The discrete cosine transform is then applied
to eliminate the row shift effect and to generate the low-dimensional
feature vector. A PSO-based feature selection algorithm is utilized to
search the feature vector space for the optimal feature subset.
Evolution is driven by a fitness function defined in terms of
maximizing the between-class separation (scatter index).
Experimental results, based on the ORL face database using testing
data sets for images with different orientations; show that the
proposed system outperforms other face recognition methods. The
overall recognition rate for the rotated test images being 97%,
demonstrating that the extracted feature vector is an effective rotation
invariant feature set with minimal set of selected features.
Abstract: This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.
Abstract: In this paper we propose new method for
simultaneous generating multiple quantiles corresponding to given
probability levels from data streams and massive data sets. This
method provides a basis for development of single-pass low-storage
quantile estimation algorithms, which differ in complexity, storage
requirement and accuracy. We demonstrate that such algorithms may
perform well even for heavy-tailed data.
Abstract: This paper proposes to use ETM+ multispectral data
and panchromatic band as well as texture features derived from the
panchromatic band for land cover classification. Four texture features
including one 'internal texture' and three GLCM based textures
namely correlation, entropy, and inverse different moment were used
in combination with ETM+ multispectral data. Two data sets
involving combination of multispectral, panchromatic band and its
texture were used and results were compared with those obtained by
using multispectral data alone. A decision tree classifier with and
without boosting were used to classify different datasets. Results
from this study suggest that the dataset consisting of panchromatic
band, four of its texture features and multispectral data was able to
increase the classification accuracy by about 2%. In comparison, a
boosted decision tree was able to increase the classification accuracy
by about 3% with the same dataset.
Abstract: Intelligent systems are required in order to quickly and accurately analyze enormous quantities of data in the Internet environment. In intelligent systems, information extracting processes can be divided into supervised learning and unsupervised learning. This paper investigates intelligent clustering by unsupervised learning. Intelligent clustering is the clustering system which determines the clustering model for data analysis and evaluates results by itself. This system can make a clustering model more rapidly, objectively and accurately than an analyzer. The methodology for the automatic clustering intelligent system is a multi-agent system that comprises a clustering agent and a cluster performance evaluation agent. An agent exchanges information about clusters with another agent and the system determines the optimal cluster number through this information. Experiments using data sets in the UCI Machine Repository are performed in order to prove the validity of the system.
Abstract: This paper presents a hybrid algorithm for solving a timetabling problem, which is commonly encountered in many universities. The problem combines both teacher assignment and course scheduling problems simultaneously, and is presented as a mathematical programming model. However, this problem becomes intractable and it is unlikely that a proven optimal solution can be obtained by an integer programming approach, especially for large problem instances. A hybrid algorithm that combines an integer programming approach, a greedy heuristic and a modified simulated annealing algorithm collaboratively is proposed to solve the problem. Several randomly generated data sets of sizes comparable to that of an institution in Indonesia are solved using the proposed algorithm. Computational results indicate that the algorithm can overcome difficulties of large problem sizes encountered in previous related works.
Abstract: Most agricultural crops cultivated in Brazil are highly
nutrient demanding. Brazilian soils are generally acidic with low base
saturation and available nutrients. Demand for fertilizer application
has increased because the national agricultural sector expansion. To
improve productivity without environmental impact, there is the need
for the utilization of novel procedures and techniques to optimize
fertilizer application. This includes the digital soil mapping and GIS
application applied to mapping in different scales. This paper is
based on research, realized during 2005 to 2010 by Brazilian
Corporation for Agricultural Research (EMBRAPA) and its partners.
The purpose was to map soil fertility in national and regional scales.
A soil profile data set in national scale (1:5,000,000) was constructed
from the soil archives of Embrapa Soils, Rio de Janeiro and in the
regional scale (1:250,000) from COMIGO Cooperative soil data set,
Rio Verde, Brazil. The mapping was doing using ArcGIS 9.1 tools
from ESRI.