Abstract: The issue of classifying objects into one of predefined
groups when the measured variables are mixed with different types
of variables has been part of interest among statisticians in many
years. Some methods for dealing with such situation have been
introduced that include parametric, semi-parametric and nonparametric
approaches. This paper attempts to discuss on a problem
in classifying a data when the number of measured mixed variables is
larger than the size of the sample. A propose idea that integrates a
dimensionality reduction technique via principal component analysis
and a discriminant function based on the location model is discussed.
The study aims in offering practitioners another potential tool in a
classification problem that is possible to be considered when the
observed variables are mixed and too large.
Abstract: A novel feature selection strategy to improve the recognition accuracy on the faces that are affected due to nonuniform illumination, partial occlusions and varying expressions is proposed in this paper. This technique is applicable especially in scenarios where the possibility of obtaining a reliable intra-class probability distribution is minimal due to fewer numbers of training samples. Phase congruency features in an image are defined as the points where the Fourier components of that image are maximally inphase. These features are invariant to brightness and contrast of the image under consideration. This property allows to achieve the goal of lighting invariant face recognition. Phase congruency maps of the training samples are generated and a novel modular feature selection strategy is implemented. Smaller sub regions from a predefined neighborhood within the phase congruency images of the training samples are merged to obtain a large set of features. These features are arranged in the order of increasing distance between the sub regions involved in merging. The assumption behind the proposed implementation of the region merging and arrangement strategy is that, local dependencies among the pixels are more important than global dependencies. The obtained feature sets are then arranged in the decreasing order of discriminating capability using a criterion function, which is the ratio of the between class variance to the within class variance of the sample set, in the PCA domain. The results indicate high improvement in the classification performance compared to baseline algorithms.
Abstract: Principle component analysis is often combined with
the state-of-art classification algorithms to recognize human faces.
However, principle component analysis can only capture these
features contributing to the global characteristics of data because it is a
global feature selection algorithm. It misses those features
contributing to the local characteristics of data because each principal
component only contains some levels of global characteristics of data.
In this study, we present a novel face recognition approach using
non-negative principal component analysis which is added with the
constraint of non-negative to improve data locality and contribute to
elucidating latent data structures. Experiments are performed on the
Cambridge ORL face database. We demonstrate the strong
performances of the algorithm in recognizing human faces in
comparison with PCA and NREMF approaches.
Abstract: This paper presents a comparative analysis of a new
unsupervised PCA-based technique for steel plates texture segmentation
towards defect detection. The proposed scheme called Variance
Based Component Analysis or VBCA employs PCA for feature
extraction, applies a feature reduction algorithm based on variance of
eigenpictures and classifies the pixels as defective and normal. While
the classic PCA uses a clusterer like Kmeans for pixel clustering,
VBCA employs thresholding and some post processing operations to
label pixels as defective and normal. The experimental results show
that proposed algorithm called VBCA is 12.46% more accurate and
78.85% faster than the classic PCA.
Abstract: A clustering is process to identify a homogeneous
groups of object called as cluster. Clustering is one interesting topic
on data mining. A group or class behaves similarly characteristics.
This paper discusses a robust clustering process for data images with
two reduction dimension approaches; i.e. the two dimensional
principal component analysis (2DPCA) and principal component
analysis (PCA). A standard approach to overcome this problem is
dimension reduction, which transforms a high-dimensional data into
a lower-dimensional space with limited loss of information. One of
the most common forms of dimensionality reduction is the principal
components analysis (PCA). The 2DPCA is often called a variant of
principal component (PCA), the image matrices were directly treated
as 2D matrices; they do not need to be transformed into a vector so
that the covariance matrix of image can be constructed directly using
the original image matrices. The decomposed classical covariance
matrix is very sensitive to outlying observations. The objective of
paper is to compare the performance of robust minimizing vector
variance (MVV) in the two dimensional projection PCA (2DPCA)
and the PCA for clustering on an arbitrary data image when outliers
are hiden in the data set. The simulation aspects of robustness and
the illustration of clustering images are discussed in the end of
paper
Abstract: The effects of irrigation with dairy factory wastewater on soil properties were investigated at two sites that had received irrigation for > 60 years. Two adjoining paired sites that had never received DFE were also sampled as well as another seven fields from a wider area around the factory. In comparison with paired sites that had not received effluent, long-term wastewater irrigation resulted in an increase in pH, EC, extractable P, exchangeable Na and K and ESP. These changes were related to the use of phosphoric acid, NaOH and KOH as cleaning agents in the factory. Soil organic C content was unaffected by DFE irrigation but the size (microbial biomass C and N) and activity (basal respiration) of the soil microbial community were increased. These increases were attributed to regular inputs of soluble C (e.g. lactose) present as milk residues in the wastewater. Principal component analysis (PCA) of the soils data from all 11sites confirmed that the main effects of DFE irrigation were an increase in exchangeable Na, extractable P and microbial biomass C, an accumulation of soluble salts and a liming effect. PCA analysis of soil bacterial community structure, using PCR-DGGE of 16S rDNA fragments, generally separated individual sites from one another but did not group them according to irrigation history. Thus, whilst the size and activity of the soil microbial community were increased, the structure and diversity of the bacterial community remained unaffected.
Abstract: Diabetes Mellitus is a chronic metabolic disorder, where the improper management of the blood glucose level in the diabetic patients will lead to the risk of heart attack, kidney disease and renal failure. This paper attempts to enhance the diagnostic accuracy of the advancing blood glucose levels of the diabetic patients, by combining principal component analysis and wavelet neural network. The proposed system makes separate blood glucose prediction in the morning, afternoon, evening and night intervals, using dataset from one patient covering a period of 77 days. Comparisons of the diagnostic accuracy with other neural network models, which use the same dataset are made. The comparison results showed overall improved accuracy, which indicates the effectiveness of this proposed system.
Abstract: The automatic classification of non stationary signals is an important practical goal in several domains. An essential classification task is to allocate the incoming signal to a group associated with the kind of physical phenomena producing it. In this paper, we present a modular system composed by three blocs: 1) Representation, 2) Dimensionality reduction and 3) Classification. The originality of our work consists in the use of a new wavelet called "Ben wavelet" in the representation stage. For the dimensionality reduction, we propose a new algorithm based on the random projection and the principal component analysis.
Abstract: In this paper, a new face recognition method based on
PCA (principal Component Analysis), LDA (Linear Discriminant
Analysis) and neural networks is proposed. This method consists of
four steps: i) Preprocessing, ii) Dimension reduction using PCA, iii)
feature extraction using LDA and iv) classification using neural
network. Combination of PCA and LDA is used for improving the
capability of LDA when a few samples of images are available and
neural classifier is used to reduce number misclassification caused by
not-linearly separable classes. The proposed method was tested on
Yale face database. Experimental results on this database
demonstrated the effectiveness of the proposed method for face
recognition with less misclassification in comparison with previous
methods.
Abstract: Analysis and visualization of microarraydata is veryassistantfor biologists and clinicians in the field of diagnosis and treatment of patients. It allows Clinicians to better understand the structure of microarray and facilitates understanding gene expression in cells. However, microarray dataset is a complex data set and has thousands of features and a very small number of observations. This very high dimensional data set often contains some noise, non-useful information and a small number of relevant features for disease or genotype. This paper proposes a non-linear dimensionality reduction algorithm Local Principal Component (LPC) which aims to maps high dimensional data to a lower dimensional space. The reduced data represents the most important variables underlying the original data. Experimental results and comparisons are presented to show the quality of the proposed algorithm. Moreover, experiments also show how this algorithm reduces high dimensional data whilst preserving the neighbourhoods of the points in the low dimensional space as in the high dimensional space.
Abstract: The Principal component regression (PCR) is a
combination of principal component analysis (PCA) and multiple linear regression (MLR). The objective of this paper is to revise the
use of PCR in shortwave near infrared (SWNIR) (750-1000nm) spectral analysis. The idea of PCR was explained mathematically and
implemented in the non-destructive assessment of the soluble solid
content (SSC) of pineapple based on SWNIR spectral data. PCR achieved satisfactory results in this application with root mean
squared error of calibration (RMSEC) of 0.7611 Brix°, coefficient of determination (R2) of 0.5865 and root mean squared error of crossvalidation
(RMSECV) of 0.8323 Brix° with principal components
(PCs) of 14.
Abstract: The performance results of the athletes competed in
the 1988-2008 Olympic Games were analyzed (n = 166). The data
were obtained from the IAAF official protocols. In the principal
component analysis, the first three principal components explained
70% of the total variance. In the 1st principal component (with
43.1% of total variance explained) the largest factor loadings were
for 100m (0.89), 400m (0.81), 110m hurdle run (0.76), and long jump
(–0.72). This factor can be interpreted as the 'sprinting performance'.
The loadings on the 2nd factor (15.3% of the total variance)
presented a counter-intuitive throwing-jumping combination: the
highest loadings were for throwing events (javelin throwing 0.76;
shot put 0.74; and discus throwing 0.73) and also for jumping events
(high jump 0.62; pole vaulting 0.58). On the 3rd factor (11.6% of
total variance), the largest loading was for 1500 m running (0.88); all
other loadings were below 0.4.
Abstract: A set of Artificial Neural Network (ANN) based methods
for the design of an effective system of speech recognition of
numerals of Assamese language captured under varied recording
conditions and moods is presented here. The work is related to
the formulation of several ANN models configured to use Linear
Predictive Code (LPC), Principal Component Analysis (PCA) and
other features to tackle mood and gender variations uttering numbers
as part of an Automatic Speech Recognition (ASR) system in
Assamese. The ANN models are designed using a combination of
Self Organizing Map (SOM) and Multi Layer Perceptron (MLP)
constituting a Learning Vector Quantization (LVQ) block trained in a
cooperative environment to handle male and female speech samples
of numerals of Assamese- a language spoken by a sizable population
in the North-Eastern part of India. The work provides a comparative
evaluation of several such combinations while subjected to handle
speech samples with gender based differences captured by a microphone
in four different conditions viz. noiseless, noise mixed, stressed
and stress-free.
Abstract: Microarrays technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining this data one can identify the dynamics of the gene expression time series. By recourse of principal component analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis. We applied PCA to reduce the dimensionality of the data set. Examination of the components also provides insight into the underlying factors measured in the experiments. Our results suggest that all rhythmic content of data can be reduced to three main components.
Abstract: This paper examines the students’ self-concept among 16- and 17- year- old adolescents in Malaysian secondary schools. Previous studies have shown that positive self-concept played an important role in student adjustment and academic performance during schooling. This study attempts to investigate the factors influencing students’ perceptions toward their own self-concept. A total of 1168 students participated in the survey. This study utilized the CoPs (UM) instrument to measure self-concept. Principal Component Analysis (PCA) revealed three factors: academic selfconcept, physical self-concept and social self-concept. This study confirmed that students perceived certain internal context factors, and revealed that external context factor also have an impact on their self-concept.
Abstract: Cosmic showers, from their places of origin in space,
after entering earth generate secondary particles called Extensive Air
Shower (EAS). Detection and analysis of EAS and similar High
Energy Particle Showers involve a plethora of experimental setups
with certain constraints for which soft-computational tools like
Artificial Neural Network (ANN)s can be adopted. The optimality
of ANN classifiers can be enhanced further by the use of Multiple
Classifier System (MCS) and certain data - dimension reduction
techniques. This work describes the performance of certain data
dimension reduction techniques like Principal Component Analysis
(PCA), Independent Component Analysis (ICA) and Self Organizing
Map (SOM) approximators for application with an MCS formed
using Multi Layer Perceptron (MLP), Recurrent Neural Network
(RNN) and Probabilistic Neural Network (PNN). The data inputs are
obtained from an array of detectors placed in a circular arrangement
resembling a practical detector grid which have a higher dimension
and greater correlation among themselves. The PCA, ICA and SOM
blocks reduce the correlation and generate a form suitable for real
time practical applications for prediction of primary energy and
location of EAS from density values captured using detectors in a
circular grid.
Abstract: COSMED K4b2 is a portable electrical device designed to test pulmonary functions. It is ideal for many applications that need the measurement of the cardio-respiratory response either in the field or in the lab is capable with the capability to delivery real time data to a sink node or a PC base station with storing data in the memory at the same time. But the actual sensor outputs and data received may contain some errors, such as impulsive noise which can be related to sensors, low batteries, environment or disturbance in data acquisition process. These abnormal outputs might cause misinterpretations of exercise or living activities to persons being monitored. In our paper we propose an effective and feasible method to detect and identify errors in applications by principal component analysis (PCA) and a back propagation (BP) neural network.
Abstract: Simultaneous determination of multicomponents of phenol, resorcinol and catechol with a chemometric technique a PCranking artificial neural network (PCranking-ANN) algorithm is reported in this study. Based on the data correlation coefficient method, 3 representative PCs are selected from the scores of original UV spectral data (35 PCs) as the original input patterns for ANN to build a neural network model. The results obtained by iterating 8000 .The RMSEP for phenol, resorcinol and catechol with PCranking- ANN were 0.6680, 0.0766 and 0.1033, respectively. Calibration matrices were 0.50-21.0, 0.50-15.1 and 0.50-20.0 μg ml-1 for phenol, resorcinol and catechol, respectively. The proposed method was successfully applied for the determination of phenol, resorcinol and catechol in synthetic and water samples.
Abstract: The effects of irrigation with dairy factory wastewater
on soil properties were investigated at two sites that had received
irrigation for > 60 years. Two adjoining paired sites that had never
received DFE were also sampled as well as another seven fields from
a wider area around the factory. In comparison with paired sites that
had not received effluent, long-term wastewater irrigation resulted in
an increase in pH, EC, extractable P, exchangeable Na and K and
ESP. These changes were related to the use of phosphoric acid,
NaOH and KOH as cleaning agents in the factory. Soil organic C
content was unaffected by DFE irrigation but the size (microbial
biomass C and N) and activity (basal respiration) of the soil
microbial community were increased. These increases were
attributed to regular inputs of soluble C (e.g. lactose) present as milk
residues in the wastewater. Principal component analysis (PCA) of
the soils data from all 11sites confirmed that the main effects of DFE
irrigation were an increase in exchangeable Na, extractable P and
microbial biomass C, an accumulation of soluble salts and a liming
effect. PCA analysis of soil bacterial community structure, using
PCR-DGGE of 16S rDNA fragments, generally separated individual
sites from one another but did not group them according to irrigation
history. Thus, whilst the size and activity of the soil microbial
community were increased, the structure and diversity of the
bacterial community remained unaffected.
Abstract: The automatic discrimination of seismic signals is an important practical goal for earth-science observatories due to the large amount of information that they receive continuously. An essential discrimination task is to allocate the incoming signal to a group associated with the kind of physical phenomena producing it. In this paper, two classes of seismic signals recorded routinely in geophysical laboratory of the National Center for Scientific and Technical Research in Morocco are considered. They correspond to signals associated to local earthquakes and chemical explosions. The approach adopted for the development of an automatic discrimination system is a modular system composed by three blocs: 1) Representation, 2) Dimensionality reduction and 3) Classification. The originality of our work consists in the use of a new wavelet called "modified Mexican hat wavelet" in the representation stage. For the dimensionality reduction, we propose a new algorithm based on the random projection and the principal component analysis.