Abstract: Many factors affect the success of Machine Learning
(ML) on a given task. The representation and quality of the instance
data is first and foremost. If there is much irrelevant and redundant
information present or noisy and unreliable data, then knowledge
discovery during the training phase is more difficult. It is well known
that data preparation and filtering steps take considerable amount of
processing time in ML problems. Data pre-processing includes data
cleaning, normalization, transformation, feature extraction and
selection, etc. The product of data pre-processing is the final training
set. It would be nice if a single sequence of data pre-processing
algorithms had the best performance for each data set but this is not
happened. Thus, we present the most well know algorithms for each
step of data pre-processing so that one achieves the best performance
for their data set.
Abstract: Brain Computer Interface (BCI) has been recently
increased in research. Functional Near Infrared Spectroscope (fNIRs)
is one the latest technologies which utilize light in the near-infrared
range to determine brain activities. Because near infrared technology
allows design of safe, portable, wearable, non-invasive and wireless
qualities monitoring systems, fNIRs monitoring of brain
hemodynamics can be value in helping to understand brain tasks. In
this paper, we present results of fNIRs signal analysis indicating that
there exist distinct patterns of hemodynamic responses which
recognize brain tasks toward developing a BCI. We applied two
different mathematics tools separately, Wavelets analysis for
preprocessing as signal filters and feature extractions and Neural
networks for cognition brain tasks as a classification module. We
also discuss and compare with other methods while our proposals
perform better with an average accuracy of 99.9% for classification.
Abstract: This paper proposes a neural network weights and
topology optimization using genetic evolution and the
backpropagation training algorithm. The proposed crossover and
mutation operators aims to adapt the networks architectures and
weights during the evolution process. Through a specific inheritance
procedure, the weights are transmitted from the parents to their
offsprings, which allows re-exploitation of the already trained
networks and hence the acceleration of the global convergence of the
algorithm. In the preprocessing phase, a new feature extraction
method is proposed based on Legendre moments with the Maximum
entropy principle MEP as a selection criterion. This allows a global
search space reduction in the design of the networks. The proposed
method has been applied and tested on the well known MNIST
database of handwritten digits.
Abstract: In this paper, a new face recognition method based on
PCA (principal Component Analysis), LDA (Linear Discriminant
Analysis) and neural networks is proposed. This method consists of
four steps: i) Preprocessing, ii) Dimension reduction using PCA, iii)
feature extraction using LDA and iv) classification using neural
network. Combination of PCA and LDA is used for improving the
capability of LDA when a few samples of images are available and
neural classifier is used to reduce number misclassification caused by
not-linearly separable classes. The proposed method was tested on
Yale face database. Experimental results on this database
demonstrated the effectiveness of the proposed method for face
recognition with less misclassification in comparison with previous
methods.
Abstract: Biclustering is a very useful data mining technique for
identifying patterns where different genes are co-related based on a
subset of conditions in gene expression analysis. Association rules
mining is an efficient approach to achieve biclustering as in
BIMODULE algorithm but it is sensitive to the value given to its
input parameters and the discretization procedure used in the
preprocessing step, also when noise is present, classical association
rules miners discover multiple small fragments of the true bicluster,
but miss the true bicluster itself. This paper formally presents a
generalized noise tolerant bicluster model, termed as μBicluster. An
iterative algorithm termed as BIDENS based on the proposed model
is introduced that can discover a set of k possibly overlapping
biclusters simultaneously. Our model uses a more flexible method to
partition the dimensions to preserve meaningful and significant
biclusters. The proposed algorithm allows discovering biclusters that
hard to be discovered by BIMODULE. Experimental study on yeast,
human gene expression data and several artificial datasets shows that
our algorithm offers substantial improvements over several
previously proposed biclustering algorithms.
Abstract: Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.
Abstract: Character segmentation is an important preprocessing step for text recognition. In degraded documents, existence of touching characters decreases recognition rate drastically, for any optical character recognition (OCR) system. In this paper a study of touching Gurmukhi characters is carried out and these characters have been divided into various categories after a careful analysis.Structural properties of the Gurmukhi characters are used for defining the categories. New algorithms have been proposed to segment the touching characters in middle zone. These algorithms have shown a reasonable improvement in segmenting the touching characters in degraded Gurmukhi script. The algorithms proposed in this paper are applicable only to machine printed text.
Abstract: Microarrays have become the effective, broadly used tools in biological and medical research to address a wide range of problems, including classification of disease subtypes and tumors. Many statistical methods are available for analyzing and systematizing these complex data into meaningful information, and one of the main goals in analyzing gene expression data is the detection of samples or genes with similar expression patterns. In this paper, we express and compare the performance of several clustering methods based on data preprocessing including strategies of normalization or noise clearness. We also evaluate each of these clustering methods with validation measures for both simulated data and real gene expression data. Consequently, clustering methods which are common used in microarray data analysis are affected by normalization and degree of noise and clearness for datasets.
Abstract: Image registration plays an important role in the
diagnosis of dental pathologies such as dental caries, alveolar bone
loss and periapical lesions etc. This paper presents a new wavelet
based algorithm for registering noisy and poor contrast dental x-rays.
Proposed algorithm has two stages. First stage is a preprocessing
stage, removes the noise from the x-ray images. Gaussian filter has
been used. Second stage is a geometric transformation stage.
Proposed work uses two levels of affine transformation. Wavelet
coefficients are correlated instead of gray values. Algorithm has been
applied on number of pre and post RCT (Root canal treatment)
periapical radiographs. Root Mean Square Error (RMSE) and
Correlation coefficients (CC) are used for quantitative evaluation.
Proposed technique outperforms conventional Multiresolution
strategy based image registration technique and manual registration
technique.
Abstract: In this paper a new approach to face recognition is presented that achieves double dimension reduction making the system computationally efficient with better recognition results. In pattern recognition techniques, discriminative information of image increases with increase in resolution to a certain extent, consequently face recognition results improve with increase in face image resolution and levels off when arriving at a certain resolution level. In the proposed model of face recognition, first image decimation algorithm is applied on face image for dimension reduction to a certain resolution level which provides best recognition results. Due to better computational speed and feature extraction potential of Discrete Cosine Transform (DCT) it is applied on face image. A subset of coefficients of DCT from low to mid frequencies that represent the face adequately and provides best recognition results is retained. A trade of between decimation factor, number of DCT coefficients retained and recognition rate with minimum computation is obtained. Preprocessing of the image is carried out to increase its robustness against variations in poses and illumination level. This new model has been tested on different databases which include ORL database, Yale database and a color database. The proposed technique has performed much better compared to other techniques. The significance of the model is two fold: (1) dimension reduction up to an effective and suitable face image resolution (2) appropriate DCT coefficients are retained to achieve best recognition results with varying image poses, intensity and illumination level.
Abstract: Several combinations of the preprocessing algorithms,
feature selection techniques and classifiers can be applied to the data
classification tasks. This study introduces a new accurate classifier,
the proposed classifier consist from four components: Signal-to-
Noise as a feature selection technique, support vector machine,
Bayesian neural network and AdaBoost as an ensemble algorithm.
To verify the effectiveness of the proposed classifier, seven well
known classifiers are applied to four datasets. The experiments show
that using the suggested classifier enhances the classification rates for
all datasets.
Abstract: In this paper we present the algorithm which allows
us to have an object tracking close to real time in Full HD videos.
The frame rate (FR) of a video stream is considered to be between
5 and 30 frames per second. The real time track building will be
achieved if the algorithm can follow 5 or more frames per second. The
principle idea is to use fast algorithms when doing preprocessing to
obtain the key points and track them after. The procedure of matching
points during assignment is hardly dependent on the number of points.
Because of this we have to limit pointed number of points using the
most informative of them.
Abstract: This paper presents a method for the detection of OD in the retina which takes advantage of the powerful preprocessing techniques such as the contrast enhancement, Gabor wavelet transform for vessel segmentation, mathematical morphology and Earth Mover-s distance (EMD) as the matching process. The OD detection algorithm is based on matching the expected directional pattern of the retinal blood vessels. Vessel segmentation method produces segmentations by classifying each image pixel as vessel or nonvessel, based on the pixel-s feature vector. Feature vectors are composed of the pixel-s intensity and 2D Gabor wavelet transform responses taken at multiple scales. A simple matched filter is proposed to roughly match the direction of the vessels at the OD vicinity using the EMD. The minimum distance provides an estimate of the OD center coordinates. The method-s performance is evaluated on publicly available DRIVE and STARE databases. On the DRIVE database the OD center was detected correctly in all of the 40 images (100%) and on the STARE database the OD was detected correctly in 76 out of the 81 images, even in rather difficult pathological situations.
Abstract: This paper describes reactive neural control used to
generate phototaxis and obstacle avoidance behavior of walking
machines. It utilizes discrete-time neurodynamics and consists of
two main neural modules: neural preprocessing and modular neural
control. The neural preprocessing network acts as a sensory fusion
unit. It filters sensory noise and shapes sensory data to drive the
corresponding reactive behavior. On the other hand, modular neural
control based on a central pattern generator is applied for locomotion
of walking machines. It coordinates leg movements and can generate
omnidirectional walking. As a result, through a sensorimotor loop this
reactive neural controller enables the machines to explore a dynamic
environment by avoiding obstacles, turn toward a light source, and
then stop near to it.
Abstract: Methods for organizing web data into groups in order
to analyze web-based hypertext data and facilitate data availability
are very important in terms of the number of documents available
online. Thereby, the task of clustering web-based document structures
has many applications, e.g., improving information retrieval on the
web, better understanding of user navigation behavior, improving web
users requests servicing, and increasing web information accessibility.
In this paper we investigate a new approach for clustering web-based
hypertexts on the basis of their graph structures. The hypertexts will
be represented as so called generalized trees which are more general
than usual directed rooted trees, e.g., DOM-Trees. As a important
preprocessing step we measure the structural similarity between the
generalized trees on the basis of a similarity measure d. Then,
we apply agglomerative clustering to the obtained similarity matrix
in order to create clusters of hypertext graph patterns representing
navigation structures. In the present paper we will run our approach
on a data set of hypertext structures and obtain good results in
Web Structure Mining. Furthermore we outline the application of
our approach in Web Usage Mining as future work.
Abstract: The iris recognition technology is the most accurate,
fast and less invasive one compared to other biometric techniques
using for example fingerprints, face, retina, hand geometry, voice or
signature patterns. The system developed in this study has the
potential to play a key role in areas of high-risk security and can
enable organizations with means allowing only to the authorized
personnel a fast and secure way to gain access to such areas. The
paper aim is to perform the iris region detection and iris inner and
outer boundaries localization. The system was implemented on
windows platform using Visual C# programming language. It is easy
and efficient tool for image processing to get great performance
accuracy. In particular, the system includes two main parts. The first
is to preprocess the iris images by using Canny edge detection
methods, segments the iris region from the rest of the image and
determine the location of the iris boundaries by applying Hough
transform. The proposed system tested on 756 iris images from 60
eyes of CASIA iris database images.
Abstract: The paper presents a method for multivariate time
series forecasting using Independent Component Analysis (ICA), as a preprocessing tool. The idea of this approach is to do the forecasting in the space of independent components (sources), and then to transform back the results to the original time series
space. The forecasting can be done separately and with a different
method for each component, depending on its time structure. The
paper gives also a review of the main algorithms for independent component analysis in the case of instantaneous mixture models, using second and high-order statistics. The method has been applied in simulation to an artificial multivariate time series
with five components, generated from three sources and a mixing matrix, randomly generated.
Abstract: A new target detection technique is presented in this
paper for the identification of small boats in coastal surveillance. The
proposed technique employs an adaptive progressive thresholding (APT) scheme to first process the given input scene to separate any
objects present in the scene from the background. The preprocessing
step results in an image having only the foreground objects, such as
boats, trees and other cluttered regions, and hence reduces the search
region for the correlation step significantly. The processed image is then fed to the shifted phase-encoded fringe-adjusted joint transform
correlator (SPFJTC) technique which produces single and delta-like
correlation peak for a potential target present in the input scene. A
post-processing step involves using a peak-to-clutter ratio (PCR) to determine whether the boat in the input scene is authorized or unauthorized. Simulation results are presented to show that the
proposed technique can successfully determine the presence of an authorized boat and identify any intruding boat present in the given input scene.
Abstract: This paper proposes new hybrid approaches for face
recognition. Gabor wavelets representation of face images is an
effective approach for both facial action recognition and face
identification. Perform dimensionality reduction and linear
discriminate analysis on the down sampled Gabor wavelet faces can
increase the discriminate ability. Nearest feature space is extended to
various similarity measures. In our experiments, proposed Gabor
wavelet faces combined with extended neural net feature space
classifier shows very good performance, which can achieve 93 %
maximum correct recognition rate on ORL data set without any preprocessing
step.
Abstract: Liver segmentation is the first significant process for
liver diagnosis of the Computed Tomography. It segments the liver
structure from other abdominal organs. Sophisticated filtering techniques
are indispensable for a proper segmentation. In this paper, we
employ a 3D anisotropic diffusion as a preprocessing step. While
removing image noise, this technique preserve the significant parts
of the image, typically edges, lines or other details that are important
for the interpretation of the image. The segmentation task is done
by using thresholding with automatic threshold values selection and
finally the false liver region is eliminated using 3D connected component.
The result shows that by employing the 3D anisotropic filtering,
better liver segmentation results could be achieved eventhough simple
segmentation technique is used.