A New Approach for Classifying Large Number of Mixed Variables

The issue of classifying objects into one of predefined groups when the measured variables are mixed with different types of variables has been part of interest among statisticians in many years. Some methods for dealing with such situation have been introduced that include parametric, semi-parametric and nonparametric approaches. This paper attempts to discuss on a problem in classifying a data when the number of measured mixed variables is larger than the size of the sample. A propose idea that integrates a dimensionality reduction technique via principal component analysis and a discriminant function based on the location model is discussed. The study aims in offering practitioners another potential tool in a classification problem that is possible to be considered when the observed variables are mixed and too large.

A Novel Neighborhood Defined Feature Selection on Phase Congruency Images for Recognition of Faces with Extreme Variations

A novel feature selection strategy to improve the recognition accuracy on the faces that are affected due to nonuniform illumination, partial occlusions and varying expressions is proposed in this paper. This technique is applicable especially in scenarios where the possibility of obtaining a reliable intra-class probability distribution is minimal due to fewer numbers of training samples. Phase congruency features in an image are defined as the points where the Fourier components of that image are maximally inphase. These features are invariant to brightness and contrast of the image under consideration. This property allows to achieve the goal of lighting invariant face recognition. Phase congruency maps of the training samples are generated and a novel modular feature selection strategy is implemented. Smaller sub regions from a predefined neighborhood within the phase congruency images of the training samples are merged to obtain a large set of features. These features are arranged in the order of increasing distance between the sub regions involved in merging. The assumption behind the proposed implementation of the region merging and arrangement strategy is that, local dependencies among the pixels are more important than global dependencies. The obtained feature sets are then arranged in the decreasing order of discriminating capability using a criterion function, which is the ratio of the between class variance to the within class variance of the sample set, in the PCA domain. The results indicate high improvement in the classification performance compared to baseline algorithms.

Non-negative Principal Component Analysis for Face Recognition

Principle component analysis is often combined with the state-of-art classification algorithms to recognize human faces. However, principle component analysis can only capture these features contributing to the global characteristics of data because it is a global feature selection algorithm. It misses those features contributing to the local characteristics of data because each principal component only contains some levels of global characteristics of data. In this study, we present a novel face recognition approach using non-negative principal component analysis which is added with the constraint of non-negative to improve data locality and contribute to elucidating latent data structures. Experiments are performed on the Cambridge ORL face database. We demonstrate the strong performances of the algorithm in recognizing human faces in comparison with PCA and NREMF approaches.

Variance Based Component Analysis for Texture Segmentation

This paper presents a comparative analysis of a new unsupervised PCA-based technique for steel plates texture segmentation towards defect detection. The proposed scheme called Variance Based Component Analysis or VBCA employs PCA for feature extraction, applies a feature reduction algorithm based on variance of eigenpictures and classifies the pixels as defective and normal. While the classic PCA uses a clusterer like Kmeans for pixel clustering, VBCA employs thresholding and some post processing operations to label pixels as defective and normal. The experimental results show that proposed algorithm called VBCA is 12.46% more accurate and 78.85% faster than the classic PCA.

The Robust Clustering with Reduction Dimension

A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA). A standard approach to overcome this problem is dimension reduction, which transforms a high-dimensional data into a lower-dimensional space with limited loss of information. One of the most common forms of dimensionality reduction is the principal components analysis (PCA). The 2DPCA is often called a variant of principal component (PCA), the image matrices were directly treated as 2D matrices; they do not need to be transformed into a vector so that the covariance matrix of image can be constructed directly using the original image matrices. The decomposed classical covariance matrix is very sensitive to outlying observations. The objective of paper is to compare the performance of robust minimizing vector variance (MVV) in the two dimensional projection PCA (2DPCA) and the PCA for clustering on an arbitrary data image when outliers are hiden in the data set. The simulation aspects of robustness and the illustration of clustering images are discussed in the end of paper

Long- term Irrigation with Dairy Factory Wastewater Influences Soil Quality

The effects of irrigation with dairy factory wastewater on soil properties were investigated at two sites that had received irrigation for > 60 years. Two adjoining paired sites that had never received DFE were also sampled as well as another seven fields from a wider area around the factory. In comparison with paired sites that had not received effluent, long-term wastewater irrigation resulted in an increase in pH, EC, extractable P, exchangeable Na and K and ESP. These changes were related to the use of phosphoric acid, NaOH and KOH as cleaning agents in the factory. Soil organic C content was unaffected by DFE irrigation but the size (microbial biomass C and N) and activity (basal respiration) of the soil microbial community were increased. These increases were attributed to regular inputs of soluble C (e.g. lactose) present as milk residues in the wastewater. Principal component analysis (PCA) of the soils data from all 11sites confirmed that the main effects of DFE irrigation were an increase in exchangeable Na, extractable P and microbial biomass C, an accumulation of soluble salts and a liming effect. PCA analysis of soil bacterial community structure, using PCR-DGGE of 16S rDNA fragments, generally separated individual sites from one another but did not group them according to irrigation history. Thus, whilst the size and activity of the soil microbial community were increased, the structure and diversity of the bacterial community remained unaffected.

A Neural Network Approach in Predicting the Blood Glucose Level for Diabetic Patients

Diabetes Mellitus is a chronic metabolic disorder, where the improper management of the blood glucose level in the diabetic patients will lead to the risk of heart attack, kidney disease and renal failure. This paper attempts to enhance the diagnostic accuracy of the advancing blood glucose levels of the diabetic patients, by combining principal component analysis and wavelet neural network. The proposed system makes separate blood glucose prediction in the morning, afternoon, evening and night intervals, using dataset from one patient covering a period of 77 days. Comparisons of the diagnostic accuracy with other neural network models, which use the same dataset are made. The comparison results showed overall improved accuracy, which indicates the effectiveness of this proposed system.

Classification of Non Stationary Signals Using Ben Wavelet and Artificial Neural Networks

The automatic classification of non stationary signals is an important practical goal in several domains. An essential classification task is to allocate the incoming signal to a group associated with the kind of physical phenomena producing it. In this paper, we present a modular system composed by three blocs: 1) Representation, 2) Dimensionality reduction and 3) Classification. The originality of our work consists in the use of a new wavelet called "Ben wavelet" in the representation stage. For the dimensionality reduction, we propose a new algorithm based on the random projection and the principal component analysis.

A New Face Recognition Method using PCA, LDA and Neural Network

In this paper, a new face recognition method based on PCA (principal Component Analysis), LDA (Linear Discriminant Analysis) and neural networks is proposed. This method consists of four steps: i) Preprocessing, ii) Dimension reduction using PCA, iii) feature extraction using LDA and iv) classification using neural network. Combination of PCA and LDA is used for improving the capability of LDA when a few samples of images are available and neural classifier is used to reduce number misclassification caused by not-linearly separable classes. The proposed method was tested on Yale face database. Experimental results on this database demonstrated the effectiveness of the proposed method for face recognition with less misclassification in comparison with previous methods.

Dimension Reduction of Microarray Data Based on Local Principal Component

Analysis and visualization of microarraydata is veryassistantfor biologists and clinicians in the field of diagnosis and treatment of patients. It allows Clinicians to better understand the structure of microarray and facilitates understanding gene expression in cells. However, microarray dataset is a complex data set and has thousands of features and a very small number of observations. This very high dimensional data set often contains some noise, non-useful information and a small number of relevant features for disease or genotype. This paper proposes a non-linear dimensionality reduction algorithm Local Principal Component (LPC) which aims to maps high dimensional data to a lower dimensional space. The reduced data represents the most important variables underlying the original data. Experimental results and comparisons are presented to show the quality of the proposed algorithm. Moreover, experiments also show how this algorithm reduces high dimensional data whilst preserving the neighbourhoods of the points in the low dimensional space as in the high dimensional space.

Principal Component Regression in Noninvasive Pineapple Soluble Solids Content Assessment Based On Shortwave Near Infrared Spectrum

The Principal component regression (PCR) is a combination of principal component analysis (PCA) and multiple linear regression (MLR). The objective of this paper is to revise the use of PCR in shortwave near infrared (SWNIR) (750-1000nm) spectral analysis. The idea of PCR was explained mathematically and implemented in the non-destructive assessment of the soluble solid content (SSC) of pineapple based on SWNIR spectral data. PCR achieved satisfactory results in this application with root mean squared error of calibration (RMSEC) of 0.7611 Brix°, coefficient of determination (R2) of 0.5865 and root mean squared error of crossvalidation (RMSECV) of 0.8323 Brix° with principal components (PCs) of 14.

Multivariate Statistical Analysis of Decathlon Performance Results in Olympic Athletes (1988-2008)

The performance results of the athletes competed in the 1988-2008 Olympic Games were analyzed (n = 166). The data were obtained from the IAAF official protocols. In the principal component analysis, the first three principal components explained 70% of the total variance. In the 1st principal component (with 43.1% of total variance explained) the largest factor loadings were for 100m (0.89), 400m (0.81), 110m hurdle run (0.76), and long jump (–0.72). This factor can be interpreted as the 'sprinting performance'. The loadings on the 2nd factor (15.3% of the total variance) presented a counter-intuitive throwing-jumping combination: the highest loadings were for throwing events (javelin throwing 0.76; shot put 0.74; and discus throwing 0.73) and also for jumping events (high jump 0.62; pole vaulting 0.58). On the 3rd factor (11.6% of total variance), the largest loading was for 1500 m running (0.88); all other loadings were below 0.4.

Assamese Numeral Speech Recognition using Multiple Features and Cooperative LVQ -Architectures

A set of Artificial Neural Network (ANN) based methods for the design of an effective system of speech recognition of numerals of Assamese language captured under varied recording conditions and moods is presented here. The work is related to the formulation of several ANN models configured to use Linear Predictive Code (LPC), Principal Component Analysis (PCA) and other features to tackle mood and gender variations uttering numbers as part of an Automatic Speech Recognition (ASR) system in Assamese. The ANN models are designed using a combination of Self Organizing Map (SOM) and Multi Layer Perceptron (MLP) constituting a Learning Vector Quantization (LVQ) block trained in a cooperative environment to handle male and female speech samples of numerals of Assamese- a language spoken by a sizable population in the North-Eastern part of India. The work provides a comparative evaluation of several such combinations while subjected to handle speech samples with gender based differences captured by a microphone in four different conditions viz. noiseless, noise mixed, stressed and stress-free.

Dynamical Analysis of Circadian Gene Expression

Microarrays technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining this data one can identify the dynamics of the gene expression time series. By recourse of principal component analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis. We applied PCA to reduce the dimensionality of the data set. Examination of the components also provides insight into the underlying factors measured in the experiments. Our results suggest that all rhythmic content of data can be reduced to three main components.

Factors Influencing Students' Self-Concept among Malaysian Students

This paper examines the students’ self-concept among 16- and 17- year- old adolescents in Malaysian secondary schools. Previous studies have shown that positive self-concept played an important role in student adjustment and academic performance during schooling. This study attempts to investigate the factors influencing students’ perceptions toward their own self-concept. A total of 1168 students participated in the survey. This study utilized the CoPs (UM) instrument to measure self-concept. Principal Component Analysis (PCA) revealed three factors: academic selfconcept, physical self-concept and social self-concept. This study confirmed that students perceived certain internal context factors, and revealed that external context factor also have an impact on their self-concept.

Certain Data Dimension Reduction Techniques for application with ANN based MCS for Study of High Energy Shower

Cosmic showers, from their places of origin in space, after entering earth generate secondary particles called Extensive Air Shower (EAS). Detection and analysis of EAS and similar High Energy Particle Showers involve a plethora of experimental setups with certain constraints for which soft-computational tools like Artificial Neural Network (ANN)s can be adopted. The optimality of ANN classifiers can be enhanced further by the use of Multiple Classifier System (MCS) and certain data - dimension reduction techniques. This work describes the performance of certain data dimension reduction techniques like Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Self Organizing Map (SOM) approximators for application with an MCS formed using Multi Layer Perceptron (MLP), Recurrent Neural Network (RNN) and Probabilistic Neural Network (PNN). The data inputs are obtained from an array of detectors placed in a circular arrangement resembling a practical detector grid which have a higher dimension and greater correlation among themselves. The PCA, ICA and SOM blocks reduce the correlation and generate a form suitable for real time practical applications for prediction of primary energy and location of EAS from density values captured using detectors in a circular grid.

Fault Detection and Identification of COSMED K4b2 Based On PCA and Neural Network

COSMED K4b2 is a portable electrical device designed to test pulmonary functions. It is ideal for many applications that need the measurement of the cardio-respiratory response either in the field or in the lab is capable with the capability to delivery real time data to a sink node or a PC base station with storing data in the memory at the same time. But the actual sensor outputs and data received may contain some errors, such as impulsive noise which can be related to sensors, low batteries, environment or disturbance in data acquisition process. These abnormal outputs might cause misinterpretations of exercise or living activities to persons being monitored. In our paper we propose an effective and feasible method to detect and identify errors in applications by principal component analysis (PCA) and a back propagation (BP) neural network.

Principal Component Analysis-Ranking as a Variable Selection Method for the Simultaneous Spectrophotometric Determination of Phenol, Resorcinol and Catechol in Real Samples

Simultaneous determination of multicomponents of phenol, resorcinol and catechol with a chemometric technique a PCranking artificial neural network (PCranking-ANN) algorithm is reported in this study. Based on the data correlation coefficient method, 3 representative PCs are selected from the scores of original UV spectral data (35 PCs) as the original input patterns for ANN to build a neural network model. The results obtained by iterating 8000 .The RMSEP for phenol, resorcinol and catechol with PCranking- ANN were 0.6680, 0.0766 and 0.1033, respectively. Calibration matrices were 0.50-21.0, 0.50-15.1 and 0.50-20.0 μg ml-1 for phenol, resorcinol and catechol, respectively. The proposed method was successfully applied for the determination of phenol, resorcinol and catechol in synthetic and water samples.

Long-term Irrigation with Dairy Factory Wastewater Influences Soil Quality

The effects of irrigation with dairy factory wastewater on soil properties were investigated at two sites that had received irrigation for > 60 years. Two adjoining paired sites that had never received DFE were also sampled as well as another seven fields from a wider area around the factory. In comparison with paired sites that had not received effluent, long-term wastewater irrigation resulted in an increase in pH, EC, extractable P, exchangeable Na and K and ESP. These changes were related to the use of phosphoric acid, NaOH and KOH as cleaning agents in the factory. Soil organic C content was unaffected by DFE irrigation but the size (microbial biomass C and N) and activity (basal respiration) of the soil microbial community were increased. These increases were attributed to regular inputs of soluble C (e.g. lactose) present as milk residues in the wastewater. Principal component analysis (PCA) of the soils data from all 11sites confirmed that the main effects of DFE irrigation were an increase in exchangeable Na, extractable P and microbial biomass C, an accumulation of soluble salts and a liming effect. PCA analysis of soil bacterial community structure, using PCR-DGGE of 16S rDNA fragments, generally separated individual sites from one another but did not group them according to irrigation history. Thus, whilst the size and activity of the soil microbial community were increased, the structure and diversity of the bacterial community remained unaffected.

Discrimination of Seismic Signals Using Artificial Neural Networks

The automatic discrimination of seismic signals is an important practical goal for earth-science observatories due to the large amount of information that they receive continuously. An essential discrimination task is to allocate the incoming signal to a group associated with the kind of physical phenomena producing it. In this paper, two classes of seismic signals recorded routinely in geophysical laboratory of the National Center for Scientific and Technical Research in Morocco are considered. They correspond to signals associated to local earthquakes and chemical explosions. The approach adopted for the development of an automatic discrimination system is a modular system composed by three blocs: 1) Representation, 2) Dimensionality reduction and 3) Classification. The originality of our work consists in the use of a new wavelet called "modified Mexican hat wavelet" in the representation stage. For the dimensionality reduction, we propose a new algorithm based on the random projection and the principal component analysis.