Distribution of Macrobenthic Polychaete Families in Relation to Environmental Parameters in North West Penang, Malaysia

The distribution of macrobenthic polychaetes along the coastal waters of Penang National Park was surveyed to estimate the effect of various environmental parameters at three stations (200m, 600m and 1200m) from the shoreline, during six sampling months, from June 2010 to April 2011.The use of polychaetes in descriptive ecology is surveyed in the light of a recent investigation particularly concerning the soft bottom biota environments. Polychaetes, often connected in the former to the notion of opportunistic species able to proliferate after an enhancement in organic matter, had performed a momentous role particularly with regard to effected soft-bottom habitats. The objective of this survey was to investigate different environment stress over soft bottom polychaete community along Teluk Ketapang and Pantai Acheh (Penang National Park) over a year period. Variations in the polychaete community were evaluated using univariate and multivariate methods. The results of PCA analysis displayed a positive relation between macrobenthic community structures and environmental parameters such as sediment particle size and organic matter in the coastal water. A total of 604 individuals were examined which was grouped into 23 families. Family Nereidae was the most abundant (22.68%), followed by Spionidae (22.02%), Hesionidae (12.58%), Nephtylidae (9.27%) and Orbiniidae (8.61%). It is noticeable that good results can only be obtained on the basis of good taxonomic resolution. We proposed that, in monitoring surveys, operative time could be optimized not only by working at a highertaxonomic level on the entire macrobenthic data set, but by also choosing an especially indicative group and working at lower taxonomic and good level.

Comparison of Artificial Neural Network and Multivariate Regression Methods in Prediction of Soil Cation Exchange Capacity

Investigation of soil properties like Cation Exchange Capacity (CEC) plays important roles in study of environmental reaserches as the spatial and temporal variability of this property have been led to development of indirect methods in estimation of this soil characteristic. Pedotransfer functions (PTFs) provide an alternative by estimating soil parameters from more readily available soil data. 70 soil samples were collected from different horizons of 15 soil profiles located in the Ziaran region, Qazvin province, Iran. Then, multivariate regression and neural network model (feedforward back propagation network) were employed to develop a pedotransfer function for predicting soil parameter using easily measurable characteristics of clay and organic carbon. The performance of the multivariate regression and neural network model was evaluated using a test data set. In order to evaluate the models, root mean square error (RMSE) was used. The value of RMSE and R2 derived by ANN model for CEC were 0.47 and 0.94 respectively, while these parameters for multivariate regression model were 0.65 and 0.88 respectively. Results showed that artificial neural network with seven neurons in hidden layer had better performance in predicting soil cation exchange capacity than multivariate regression.

K-Means for Spherical Clusters with Large Variance in Sizes

Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in sizes. In this paper, we introduce a competent procedure to overcome this problem. The proposed method is based on shifting the center of the large cluster toward the small cluster, and recomputing the membership of small cluster points, the experimental results reveal that the proposed algorithm produces satisfactory results.

Performance of Heterogeneous Autoregressive Models of Realized Volatility: Evidence from U.S. Stock Market

This paper deals with heterogeneous autoregressive models of realized volatility (HAR-RV models) on high-frequency data of stock indices in the USA. Its aim is to capture the behavior of three groups of market participants trading on a daily, weekly and monthly basis and assess their role in predicting the daily realized volatility. The benefits of this work lies mainly in the application of heterogeneous autoregressive models of realized volatility on stock indices in the USA with a special aim to analyze an impact of the global financial crisis on applied models forecasting performance. We use three data sets, the first one from the period before the global financial crisis occurred in the years 2006-2007, the second one from the period when the global financial crisis fully hit the U.S. financial market in 2008-2009 years, and the last period was defined over 2010-2011 years. The model output indicates that estimated realized volatility in the market is very much determined by daily traders and in some cases excludes the impact of those market participants who trade on monthly basis.

Dynamical Analysis of Circadian Gene Expression

Microarrays technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining this data one can identify the dynamics of the gene expression time series. By recourse of principal component analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis. We applied PCA to reduce the dimensionality of the data set. Examination of the components also provides insight into the underlying factors measured in the experiments. Our results suggest that all rhythmic content of data can be reduced to three main components.

On the Prediction of Transmembrane Helical Segments in Membrane Proteins

The prediction of transmembrane helical segments (TMHs) in membrane proteins is an important field in the bioinformatics research. In this paper, a method based on discrete wavelet transform (DWT) has been developed to predict the number and location of TMHs in membrane proteins. PDB coded as 1F88 was chosen as an example to describe the prediction of the number and location of TMHs in membrane proteins by using this method. One group of test data sets that contain total 19 protein sequences was utilized to access the effect of this method. Compared with the prediction results of DAS, PRED-TMR2, SOSUI, HMMTOP2.0 and TMHMM2.0, the obtained results indicate that the presented method has higher prediction accuracy.

Phosphine Mortality Estimation for Simulation of Controlling Pest of Stored Grain: Lesser Grain Borer (Rhyzopertha dominica)

There is a world-wide need for the development of sustainable management strategies to control pest infestation and the development of phosphine (PH3) resistance in lesser grain borer (Rhyzopertha dominica). Computer simulation models can provide a relatively fast, safe and inexpensive way to weigh the merits of various management options. However, the usefulness of simulation models relies on the accurate estimation of important model parameters, such as mortality. Concentration and time of exposure are both important in determining mortality in response to a toxic agent. Recent research indicated the existence of two resistance phenotypes in R. dominica in Australia, weak and strong, and revealed that the presence of resistance alleles at two loci confers strong resistance, thus motivating the construction of a two-locus model of resistance. Experimental data sets on purified pest strains, each corresponding to a single genotype of our two-locus model, were also available. Hence it became possible to explicitly include mortalities of the different genotypes in the model. In this paper we described how we used two generalized linear models (GLM), probit and logistic models, to fit the available experimental data sets. We used a direct algebraic approach generalized inverse matrix technique, rather than the traditional maximum likelihood estimation, to estimate the model parameters. The results show that both probit and logistic models fit the data sets well but the former is much better in terms of small least squares (numerical) errors. Meanwhile, the generalized inverse matrix technique achieved similar accuracy results to those from the maximum likelihood estimation, but is less time consuming and computationally demanding.

Probability Density Estimation Using Advanced Support Vector Machines and the Expectation Maximization Algorithm

This paper presents a new approach for the prob-ability density function estimation using the Support Vector Ma-chines (SVM) and the Expectation Maximization (EM) algorithms.In the proposed approach, an advanced algorithm for the SVM den-sity estimation which incorporates the Mean Field theory in the learning process is used. Instead of using ad-hoc values for the para-meters of the kernel function which is used by the SVM algorithm,the proposed approach uses the EM algorithm for an automatic optimization of the kernel. Experimental evaluation using simulated data set shows encouraging results.

A Novel Modified Adaptive Fuzzy Inference Engine and Its Application to Pattern Classification

The Neuro-Fuzzy hybridization scheme has become of research interest in pattern classification over the past decade. The present paper proposes a novel Modified Adaptive Fuzzy Inference Engine (MAFIE) for pattern classification. A modified Apriori algorithm technique is utilized to reduce a minimal set of decision rules based on input output data sets. A TSK type fuzzy inference system is constructed by the automatic generation of membership functions and rules by the fuzzy c-means clustering and Apriori algorithm technique, respectively. The generated adaptive fuzzy inference engine is adjusted by the least-squares fit and a conjugate gradient descent algorithm towards better performance with a minimal set of rules. The proposed MAFIE is able to reduce the number of rules which increases exponentially when more input variables are involved. The performance of the proposed MAFIE is compared with other existing applications of pattern classification schemes using Fisher-s Iris and Wisconsin breast cancer data sets and shown to be very competitive.

Using Artificial Neural Network and Leudeking-Piret Model in the Kinetic Modeling of Microbial Production of Poly-β- Hydroxybutyrate

Poly-β-hydroxybutyrate (PHB) is one of the most famous biopolymers that has various applications in production of biodegradable carriers. The most important strategy for enhancing efficiency in production process and reducing the price of PHB, is the accurate expression of kinetic model of products formation and parameters that are effective on it, such as Dry Cell Weight (DCW) and substrate consumption. Considering the high capabilities of artificial neural networks in modeling and simulation of non-linear systems such as biological and chemical industries that mainly are multivariable systems, kinetic modeling of microbial production of PHB that is a complex and non-linear biological process, the three layers perceptron neural network model was used in this study. Artificial neural network educates itself and finds the hidden laws behind the data with mapping based on experimental data, of dry cell weight, substrate concentration as input and PHB concentration as output. For training the network, a series of experimental data for PHB production from Hydrogenophaga Pseudoflava by glucose carbon source was used. After training the network, two other experimental data sets that have not intervened in the network education, including dry cell concentration and substrate concentration were applied as inputs to the network, and PHB concentration was predicted by the network. Comparison of predicted data by network and experimental data, indicated a high precision predicted for both fructose and whey carbon sources. Also in present study for better understanding of the ability of neural network in modeling of biological processes, microbial production kinetic of PHB by Leudeking-Piret experimental equation was modeled. The Observed result indicated an accurate prediction of PHB concentration by artificial neural network higher than Leudeking- Piret model.

Dynamic Clustering using Particle Swarm Optimization with Application in Unsupervised Image Classification

A new dynamic clustering approach (DCPSO), based on Particle Swarm Optimization, is proposed. This approach is applied to unsupervised image classification. The proposed approach automatically determines the "optimum" number of clusters and simultaneously clusters the data set with minimal user interference. The algorithm starts by partitioning the data set into a relatively large number of clusters to reduce the effects of initial conditions. Using binary particle swarm optimization the "best" number of clusters is selected. The centers of the chosen clusters is then refined via the Kmeans clustering algorithm. The experiments conducted show that the proposed approach generally found the "optimum" number of clusters on the tested images.

Theoretical and Analytical Approaches for Investigating the Relations between Sediment Transport and Channel Shape

This study investigated the effect of cross sectional geometry on sediment transport rate. The processes of sediment transport are generally associated to environmental management, such as pollution caused by the forming of suspended sediment in the channel network of a watershed and preserving physical habitats and native vegetations, and engineering applications, such as the influence of sediment transport on hydraulic structures and flood control design. Many equations have been proposed for computing the sediment transport, the influence of many variables on sediment transport has been understood; however, the effect of other variables still requires further research. For open channel flow, sediment transport capacity is recognized to be a function of friction slope, flow velocity, grain size, grain roughness and form roughness, the hydraulic radius of the bed section and the type and quantity of vegetation cover. The effect of cross sectional geometry of the channel on sediment transport is one of the variables that need additional investigation. The width-depth ratio (W/d) is a comparative indicator of the channel shape. The width is the total distance across the channel and the depth is the mean depth of the channel. The mean depth is best calculated as total cross-sectional area divided by the top width. Channels with high W/d ratios tend to be shallow and wide, while channels with low (W/d) ratios tend to be narrow and deep. In this study, the effects of the width-depth ratio on sediment transport was demonstrated theoretically by inserting the shape factor in sediment continuity equation and analytically by utilizing the field data sets for Yalobusha River. It was found by utilizing the two approaches as a width-depth ratio increases the sediment transport decreases.

Self-adaptation of Ontologies to Folksonomies in Semantic Web

Ontologies and tagging systems are two different ways to organize the knowledge present in the current Web. In this paper we propose a simple method to model folksonomies, as tagging systems, with ontologies. We show the scalability of the method using real data sets. The modeling method is composed of a generic ontology that represents any folksonomy and an algorithm to transform the information contained in folksonomies to the generic ontology. The method allows representing folksonomies at any instant of time.

Rotation Invariant Face Recognition Based on Hybrid LPT/DCT Features

The recognition of human faces, especially those with different orientations is a challenging and important problem in image analysis and classification. This paper proposes an effective scheme for rotation invariant face recognition using Log-Polar Transform and Discrete Cosine Transform combined features. The rotation invariant feature extraction for a given face image involves applying the logpolar transform to eliminate the rotation effect and to produce a row shifted log-polar image. The discrete cosine transform is then applied to eliminate the row shift effect and to generate the low-dimensional feature vector. A PSO-based feature selection algorithm is utilized to search the feature vector space for the optimal feature subset. Evolution is driven by a fitness function defined in terms of maximizing the between-class separation (scatter index). Experimental results, based on the ORL face database using testing data sets for images with different orientations; show that the proposed system outperforms other face recognition methods. The overall recognition rate for the rotated test images being 97%, demonstrating that the extracted feature vector is an effective rotation invariant feature set with minimal set of selected features.

On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel

This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.

Exponentially Weighted Simultaneous Estimation of Several Quantiles

In this paper we propose new method for simultaneous generating multiple quantiles corresponding to given probability levels from data streams and massive data sets. This method provides a basis for development of single-pass low-storage quantile estimation algorithms, which differ in complexity, storage requirement and accuracy. We demonstrate that such algorithms may perform well even for heavy-tailed data.

Fusion of ETM+ Multispectral and Panchromatic Texture for Remote Sensing Classification

This paper proposes to use ETM+ multispectral data and panchromatic band as well as texture features derived from the panchromatic band for land cover classification. Four texture features including one 'internal texture' and three GLCM based textures namely correlation, entropy, and inverse different moment were used in combination with ETM+ multispectral data. Two data sets involving combination of multispectral, panchromatic band and its texture were used and results were compared with those obtained by using multispectral data alone. A decision tree classifier with and without boosting were used to classify different datasets. Results from this study suggest that the dataset consisting of panchromatic band, four of its texture features and multispectral data was able to increase the classification accuracy by about 2%. In comparison, a boosted decision tree was able to increase the classification accuracy by about 3% with the same dataset.

Multi-Agent Systems for Intelligent Clustering

Intelligent systems are required in order to quickly and accurately analyze enormous quantities of data in the Internet environment. In intelligent systems, information extracting processes can be divided into supervised learning and unsupervised learning. This paper investigates intelligent clustering by unsupervised learning. Intelligent clustering is the clustering system which determines the clustering model for data analysis and evaluates results by itself. This system can make a clustering model more rapidly, objectively and accurately than an analyzer. The methodology for the automatic clustering intelligent system is a multi-agent system that comprises a clustering agent and a cluster performance evaluation agent. An agent exchanges information about clusters with another agent and the system determines the optimal cluster number through this information. Experiments using data sets in the UCI Machine Repository are performed in order to prove the validity of the system.

Solving the Teacher Assignment-Course Scheduling Problem by a Hybrid Algorithm

This paper presents a hybrid algorithm for solving a timetabling problem, which is commonly encountered in many universities. The problem combines both teacher assignment and course scheduling problems simultaneously, and is presented as a mathematical programming model. However, this problem becomes intractable and it is unlikely that a proven optimal solution can be obtained by an integer programming approach, especially for large problem instances. A hybrid algorithm that combines an integer programming approach, a greedy heuristic and a modified simulated annealing algorithm collaboratively is proposed to solve the problem. Several randomly generated data sets of sizes comparable to that of an institution in Indonesia are solved using the proposed algorithm. Computational results indicate that the algorithm can overcome difficulties of large problem sizes encountered in previous related works.

Mapping Soil Fertility at Different Scales to Support Sustainable Brazilian Agriculture

Most agricultural crops cultivated in Brazil are highly nutrient demanding. Brazilian soils are generally acidic with low base saturation and available nutrients. Demand for fertilizer application has increased because the national agricultural sector expansion. To improve productivity without environmental impact, there is the need for the utilization of novel procedures and techniques to optimize fertilizer application. This includes the digital soil mapping and GIS application applied to mapping in different scales. This paper is based on research, realized during 2005 to 2010 by Brazilian Corporation for Agricultural Research (EMBRAPA) and its partners. The purpose was to map soil fertility in national and regional scales. A soil profile data set in national scale (1:5,000,000) was constructed from the soil archives of Embrapa Soils, Rio de Janeiro and in the regional scale (1:250,000) from COMIGO Cooperative soil data set, Rio Verde, Brazil. The mapping was doing using ArcGIS 9.1 tools from ESRI.