Abstract: In this paper, we present a robust algorithm to recognize extracted text from grocery product images captured by mobile phone cameras. Recognition of such text is challenging since text in grocery product images varies in its size, orientation,
style, illumination, and can suffer from perspective distortion.
Pre-processing is performed to make the characters scale and
rotation invariant. Since text degradations can not be appropriately
defined using well-known geometric transformations such
as translation, rotation, affine transformation and shearing, we
use the whole character black pixels as our feature vector.
Classification is performed with minimum distance classifier
using the maximum likelihood criterion, which delivers very
promising Character Recognition Rate (CRR) of 89%. We
achieve considerably higher Word Recognition Rate (WRR) of
99% when using lower level linguistic knowledge about product
words during the recognition process.
Abstract: Mammography has been one of the most reliable
methods for early detection of breast cancer. There are different
lesions which are breast cancer characteristic such as
microcalcifications, masses, architectural distortions and bilateral
asymmetry. One of the major challenges of analysing digital
mammogram is how to extract efficient features from it for accurate
cancer classification. In this paper we proposed a hybrid feature
extraction method to detect and classify all four signs of breast
cancer. The proposed method is based on multiscale surrounding
region dependence method, Gabor filters, multi fractal analysis,
directional and morphological analysis. The extracted features are
input to self adaptive resource allocation network (SRAN) classifier
for classification. The validity of our approach is extensively
demonstrated using the two benchmark data sets Mammographic
Image Analysis Society (MIAS) and Digital Database for Screening
Mammograph (DDSM) and the results have been proved to be
progressive.
Abstract: Artificial Immune Systems (AIS), inspired by the
human immune system, are algorithms and mechanisms which are
self-adaptive and self-learning classifiers capable of recognizing and
classifying by learning, long-term memory and association. Unlike
other human system inspired techniques like genetic algorithms and
neural networks, AIS includes a range of algorithms modeling on
different immune mechanism of the body. In this paper, a mechanism
of a human immune system based on apoptosis is adopted to build an
Intrusion Detection System (IDS) to protect computer networks.
Features are selected from network traffic using Fisher Score. Based
on the selected features, the record/connection is classified as either
an attack or normal traffic by the proposed methodology. Simulation
results demonstrates that the proposed AIS based on apoptosis
performs better than existing AIS for intrusion detection.
Abstract: Microarray technology is universally used in the study
of disease diagnosis using gene expression levels. The main
shortcoming of gene expression data is that it includes thousands of
genes and a small number of samples. Abundant methods and
techniques have been proposed for tumor classification using
microarray gene expression data. Feature or gene selection methods
can be used to mine the genes that directly involve in the
classification and to eliminate irrelevant genes. In this paper
statistical measures like T-Statistics, Signal-to-Noise Ratio (SNR)
and F-Statistics are used to rank the genes. The ranked genes are used
for further classification. Particle Swarm Optimization (PSO)
algorithm and Shuffled Frog Leaping (SFL) algorithm are used to
find the significant genes from the top-m ranked genes. The Naïve
Bayes Classifier (NBC) is used to classify the samples based on the
significant genes. The proposed work is applied on Lung and Ovarian
datasets. The experimental results show that the proposed method
achieves 100% accuracy in all the three datasets and the results are
compared with previous works.
Abstract: Different strategies and tools are available at the oil
and gas industry for detecting and analyzing tension and possible
fractures in borehole walls. Most of these techniques are based on
manual observation of the captured borehole images. While this
strategy may be possible and convenient with small images and few
data, it may become difficult and suitable to errors when big
databases of images must be treated. While the patterns may differ
among the image area, depending on many characteristics (drilling
strategy, rock components, rock strength, etc.). In this work we
propose the inclusion of data-mining classification strategies in order
to create a knowledge database of the segmented curves. These
classifiers allow that, after some time using and manually pointing
parts of borehole images that correspond to tension regions and
breakout areas, the system will indicate and suggest automatically
new candidate regions, with higher accuracy. We suggest the use of
different classifiers methods, in order to achieve different knowledge
dataset configurations.
Abstract: Consumer-to-Consumer (C2C) E-commerce has been
growing at a very high speed in recent years. Since identical or
nearly-same kinds of products compete one another by relying on
keyword search in C2C E-commerce, some sellers describe their
products with spam keywords that are popular but are not related to
their products. Though such products get more chances to be retrieved
and selected by consumers than those without spam keywords,
the spam keywords mislead the consumers and waste their time.
This problem has been reported in many commercial services like
ebay and taobao, but there have been little research to solve this
problem. As a solution to this problem, this paper proposes a method
to classify whether keywords of a product are spam or not. The
proposed method assumes that a keyword for a given product is
more reliable if the keyword is observed commonly in specifications
of products which are the same or the same kind as the given
product. This is because that a hierarchical category of a product
in general determined precisely by a seller of the product and so is
the specification of the product. Since higher layers of the hierarchical
category represent more general kinds of products, a reliable degree
is differently determined according to the layers. Hence, reliable
degrees from different layers of a hierarchical category become
features for keywords and they are used together with features only
from specifications for classification of the keywords. Support Vector
Machines are adopted as a basic classifier using the features, since
it is powerful, and widely used in many classification tasks. In
the experiments, the proposed method is evaluated with a golden
standard dataset from Yi-han-wang, a Chinese C2C E-commerce,
and is compared with a baseline method that does not consider
the hierarchical category. The experimental results show that the
proposed method outperforms the baseline in F1-measure, which
proves that spam keywords are effectively identified by a hierarchical
category in C2C E-commerce.
Abstract: In this paper the issue of dimensionality reduction is
investigated in finger vein recognition systems using kernel Principal
Component Analysis (KPCA). One aspect of KPCA is to find the
most appropriate kernel function on finger vein recognition as there
are several kernel functions which can be used within PCA-based
algorithms. In this paper, however, another side of PCA-based
algorithms -particularly KPCA- is investigated. The aspect of
dimension of feature vector in PCA-based algorithms is of
importance especially when it comes to the real-world applications
and usage of such algorithms. It means that a fixed dimension of
feature vector has to be set to reduce the dimension of the input and
output data and extract the features from them. Then a classifier is
performed to classify the data and make the final decision. We
analyze KPCA (Polynomial, Gaussian, and Laplacian) in details in
this paper and investigate the optimal feature extraction dimension in
finger vein recognition using KPCA.
Abstract: ‘Steganalysis’ is one of the challenging and attractive interests for the researchers with the development of information hiding techniques. It is the procedure to detect the hidden information from the stego created by known steganographic algorithm. In this paper, a novel feature based image steganalysis technique is proposed. Various statistical moments have been used along with some similarity metric. The proposed steganalysis technique has been designed based on transformation in four wavelet domains, which include Haar, Daubechies, Symlets and Biorthogonal. Each domain is being subjected to various classifiers, namely K-nearest-neighbor, K* Classifier, Locally weighted learning, Naive Bayes classifier, Neural networks, Decision trees and Support vector machines. The experiments are performed on a large set of pictures which are available freely in image database. The system also predicts the different message length definitions.
Abstract: A brief review of the empirical studies on the methodology of the stock market decision support would indicate that they are at a threshold of validating the accuracy of the traditional and the fuzzy, artificial neural network and the decision trees. Many researchers have been attempting to compare these models using various data sets worldwide. However, the research community is on the way to the conclusive confidence in the emerged models. This paper attempts to use the automotive sector stock prices from National Stock Exchange (NSE), India and analyze them for the intra-sectorial support for stock market decisions. The study identifies the significant variables and their lags which affect the price of the stocks using OLS analysis and decision tree classifiers.
Abstract: This paper presents a fault identification, classification and fault location estimation method based on Discrete Wavelet Transform and Adaptive Network Fuzzy Inference System (ANFIS) for medium voltage cable in the distribution system.
Different faults and locations are simulated by ATP/EMTP, and then certain selected features of the wavelet transformed signals are used as an input for a training process on the ANFIS. Then an accurate fault classifier and locator algorithm was designed, trained and tested using current samples only. The results obtained from ANFIS output were compared with the real output. From the results, it was found that the percentage error between ANFIS output and real output is less than three percent. Hence, it can be concluded that the proposed technique is able to offer high accuracy in both of the fault classification and fault location.
Abstract: We propose obstacle classification method based on 2D
LIDAR Database. The existing obstacle classification method based
on 2D LIDAR, has an advantage in terms of accuracy and shorter
calculation time. However, it was difficult to classifier the type of
obstacle and therefore accurate path planning was not possible. In
order to overcome this problem, a method of classifying obstacle type
based on width data of obstacle was proposed. However, width data
was not sufficient to improve accuracy. In this paper, database was
established by width and intensity data; the first classification was
processed by the width data; the second classification was processed
by the intensity data; classification was processed by comparing to
database; result of obstacle classification was determined by finding
the one with highest similarity values. An experiment using an actual
autonomous vehicle under real environment shows that calculation
time declined in comparison to 3D LIDAR and it was possible to
classify obstacle using single 2D LIDAR.
Abstract: Medical Image fusion plays a vital role in medical
field to diagnose the brain tumors which can be classified as benign
or malignant. It is the process of integrating multiple images of the
same scene into a single fused image to reduce uncertainty and
minimizing redundancy while extracting all the useful information
from the source images. Fuzzy logic is used to fuse two brain MRI
images with different vision. The fused image will be more
informative than the source images. The texture and wavelet features
are extracted from the fused image. The multilevel Adaptive Neuro
Fuzzy Classifier classifies the brain tumors based on trained and
tested features. The proposed method achieved 80.48% sensitivity,
99.9% specificity and 99.69% accuracy. Experimental results
obtained from fusion process prove that the use of the proposed
image fusion approach shows better performance while compared
with conventional fusion methodologies.
Abstract: Diabetic retinopathy is characterized by the development of retinal microaneurysms. The damage can be prevented if disease is treated in its early stages. In this paper, we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers for automatic microaneurysm detection in images acquired through non-dilated pupils. The Nearest Neighbor classifier is used as a baseline for comparison. Detected microaneurysms are validated with expert ophthalmologists’ hand-drawn ground-truths. The sensitivity, specificity, precision and accuracy of each method are also compared.
Abstract: In this study, it was tried to identify some heart rhythm disorders by electrocardiography (ECG) data that is taken from MIT-BIH arrhythmia database by subtracting the required features, presenting to artificial neural networks (ANN), artificial immune systems (AIS), artificial neural network based on artificial immune system (AIS-ANN) and particle swarm optimization based artificial neural network (PSO-NN) classifier systems. The main purpose of this study is to evaluate the performance of hybrid AIS-ANN and PSO-ANN classifiers with regard to the ANN and AIS. For this purpose, the normal sinus rhythm (NSR), atrial premature contraction (APC), sinus arrhythmia (SA), ventricular trigeminy (VTI), ventricular tachycardia (VTK) and atrial fibrillation (AF) data for each of the RR intervals were found. Then these data in the form of pairs (NSR-APC, NSR-SA, NSR-VTI, NSR-VTK and NSR-AF) is created by combining discrete wavelet transform which is applied to each of these two groups of data and two different data sets with 9 and 27 features were obtained from each of them after data reduction. Afterwards, the data randomly was firstly mixed within themselves, and then 4-fold cross validation method was applied to create the training and testing data. The training and testing accuracy rates and training time are compared with each other.
As a result, performances of the hybrid classification systems, AIS-ANN and PSO-ANN were seen to be close to the performance of the ANN system. Also, the results of the hybrid systems were much better than AIS, too. However, ANN had much shorter period of training time than other systems. In terms of training times, ANN was followed by PSO-ANN, AIS-ANN and AIS systems respectively. Also, the features that extracted from the data affected the classification results significantly.
Abstract: This work presents a proposal to perform contextual sentiment analysis using a supervised learning algorithm and disregarding the extensive training of annotators. To achieve this goal, a web platform was developed to perform the entire procedure outlined in this paper. The main contribution of the pipeline described in this article is to simplify and automate the annotation process through a system of analysis of congruence between the notes. This ensured satisfactory results even without using specialized annotators in the context of the research, avoiding the generation of biased training data for the classifiers. For this, a case
study was conducted in a blog of entrepreneurship. The experimental results were consistent with the literature related annotation using formalized process with experts.
Abstract: Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modeling and testing work was done in R and Greenplum in-database analytic tools.
Abstract: Most of greenhouse growers desire a determined amount of yields in order to accurately meet market requirements. The purpose of this paper is to model a simple but often satisfactory supervised classification method. The original naive Bayes have a serious weakness, which is producing redundant predictors. In this paper, utilized regularization technique was used to obtain a computationally efficient classifier based on naive Bayes. The suggested construction, utilized L1-penalty, is capable of clearing redundant predictors, where a modification of the LARS algorithm is devised to solve this problem, making this method applicable to a wide range of data. In the experimental section, a study conducted to examine the effect of redundant and irrelevant predictors, and test the method on WSG data set for tomato yields, where there are many more predictors than data, and the urge need to predict weekly yield is the goal of this approach. Finally, the modified approach is compared with several naive Bayes variants and other classification algorithms (SVM and kNN), and is shown to be fairly good.
Abstract: Associative classification (AC) is a data mining approach that combines association rule and classification to build classification models (classifiers). AC has attracted a significant attention from several researchers mainly because it derives accurate classifiers that contain simple yet effective rules. In the last decade, a number of associative classification algorithms have been proposed such as Classification based Association (CBA), Classification based on Multiple Association Rules (CMAR), Class based Associative Classification (CACA), and Classification based on Predicted Association Rule (CPAR). This paper surveys major AC algorithms and compares the steps and methods performed in each algorithm including: rule learning, rule sorting, rule pruning, classifier building, and class prediction.
Abstract: Clusters of Microcalcifications (MCCs) are most frequent symptoms of Ductal Carcinoma in Situ (DCIS) recognized by mammography. Least-Square Support Vector Machine (LS-SVM) is a variant of the standard SVM. In the paper, LS-SVM is proposed as a classifier for classifying MCCs as benign or malignant based on relevant extracted features from enhanced mammogram. To establish the credibility of LS-SVM classifier for classifying MCCs, a comparative evaluation of the relative performance of LS-SVM classifier for different kernel functions is made. For comparative evaluation, confusion matrix and ROC analysis are used. Experiments are performed on data extracted from mammogram images of DDSM database. A total of 380 suspicious areas are collected, which contain 235 malignant and 145 benign samples, from mammogram images of DDSM database. A set of 50 features is calculated for each suspicious area. After this, an optimal subset of 23 most suitable features is selected from 50 features by Particle Swarm Optimization (PSO). The results of proposed study are quite promising.
Abstract: BCI (Brain Computer Interface) is a communication machine that translates brain massages to computer commands. These machines with the help of computer programs can recognize the tasks that are imagined. Feature extraction is an important stage of the process in EEG classification that can effect in accuracy and the computation time of processing the signals. In this study we process the signal in three steps of active segment selection, fractal feature extraction, and classification. One of the great challenges in BCI applications is to improve classification accuracy and computation time together. In this paper, we have used student’s 2D sample t-statistics on continuous wavelet transforms for active segment selection to reduce the computation time. In the next level, the features are extracted from some famous fractal dimension estimation of the signal. These fractal features are Katz and Higuchi. In the classification stage we used ANFIS (Adaptive Neuro-Fuzzy Inference System) classifier, FKNN (Fuzzy K-Nearest Neighbors), LDA (Linear Discriminate Analysis), and SVM (Support Vector Machines). We resulted that active segment selection method would reduce the computation time and Fractal dimension features with ANFIS analysis on selected active segments is the best among investigated methods in EEG classification.