Abstract: Patient-specific models are instance-based learning
algorithms that take advantage of the particular features of the patient
case at hand to predict an outcome. We introduce two patient-specific
algorithms based on decision tree paradigm that use AUC as a
metric to select an attribute. We apply the patient specific algorithms
to predict outcomes in several datasets, including medical datasets.
Compared to the patient-specific decision path (PSDP) entropy-based
and CART methods, the AUC-based patient-specific decision path
models performed equivalently on area under the ROC curve (AUC).
Our results provide support for patient-specific methods being a
promising approach for making clinical predictions.
Abstract: In this paper, an approach for the liver tumor detection
in computed tomography (CT) images is represented. The detection
process is based on classifying the features of target liver cell to
either tumor or non-tumor. Fractional differential (FD) is applied for
enhancement of Liver CT images, with the aim of enhancing texture
and edge features. Later on, a fusion method is applied to merge
between the various enhanced images and produce a variety of
feature improvement, which will increase the accuracy of
classification. Each image is divided into NxN non-overlapping
blocks, to extract the desired features. Support vector machines
(SVM) classifier is trained later on a supplied dataset different from
the tested one. Finally, the block cells are identified whether they are
classified as tumor or not. Our approach is validated on a group of
patients’ CT liver tumor datasets. The experiment results
demonstrated the efficiency of detection in the proposed technique.
Abstract: Advances in spatial and spectral resolution of satellite
images have led to tremendous growth in large image databases. The
data we acquire through satellites, radars, and sensors consists of
important geographical information that can be used for remote
sensing applications such as region planning, disaster management.
Spatial data classification and object recognition are important tasks
for many applications. However, classifying objects and identifying
them manually from images is a difficult task. Object recognition is
often considered as a classification problem, this task can be
performed using machine-learning techniques. Despite of many
machine-learning algorithms, the classification is done using
supervised classifiers such as Support Vector Machines (SVM) as the
area of interest is known. We proposed a classification method,
which considers neighboring pixels in a region for feature extraction
and it evaluates classifications precisely according to neighboring
classes for semantic interpretation of region of interest (ROI). A
dataset has been created for training and testing purpose; we
generated the attributes by considering pixel intensity values and
mean values of reflectance. We demonstrated the benefits of using
knowledge discovery and data-mining techniques, which can be on
image data for accurate information extraction and classification from
high spatial resolution remote sensing imagery.
Abstract: Segmentation of left ventricle (LV) from cardiac
ultrasound images provides a quantitative functional analysis of the
heart to diagnose disease. Active Shape Model (ASM) is widely used
for LV segmentation, but it suffers from the drawback that
initialization of the shape model is not sufficiently close to the target,
especially when dealing with abnormal shapes in disease. In this work,
a two-step framework is improved to achieve a fast and efficient LV
segmentation. First, a robust and efficient detection based on Hough
forest localizes cardiac feature points. Such feature points are used to
predict the initial fitting of the LV shape model. Second, ASM is
applied to further fit the LV shape model to the cardiac ultrasound
image. With the robust initialization, ASM is able to achieve more
accurate segmentation. The performance of the proposed method is
evaluated on a dataset of 810 cardiac ultrasound images that are mostly
abnormal shapes. This proposed method is compared with several
combinations of ASM and existing initialization methods. Our
experiment results demonstrate that accuracy of the proposed method
for feature point detection for initialization was 40% higher than the
existing methods. Moreover, the proposed method significantly
reduces the number of necessary ASM fitting loops and thus speeds up
the whole segmentation process. Therefore, the proposed method is
able to achieve more accurate and efficient segmentation results and is
applicable to unusual shapes of heart with cardiac diseases, such as left
atrial enlargement.
Abstract: Background modeling and subtraction in video
analysis has been widely used as an effective method for moving
objects detection in many computer vision applications. Recently, a
large number of approaches have been developed to tackle different
types of challenges in this field. However, the dynamic background
and illumination variations are the most frequently occurred problems
in the practical situation. This paper presents a favorable two-layer
model based on codebook algorithm incorporated with local binary
pattern (LBP) texture measure, targeted for handling dynamic
background and illumination variation problems. More specifically,
the first layer is designed by block-based codebook combining with
LBP histogram and mean value of each RGB color channel. Because
of the invariance of the LBP features with respect to monotonic
gray-scale changes, this layer can produce block wise detection results
with considerable tolerance of illumination variations. The pixel-based
codebook is employed to reinforce the precision from the output of the
first layer which is to eliminate false positives further. As a result, the
proposed approach can greatly promote the accuracy under the
circumstances of dynamic background and illumination changes.
Experimental results on several popular background subtraction
datasets demonstrate very competitive performance compared to
previous models.
Abstract: One of the global combinatorial optimization
problems in machine learning is feature selection. It concerned with
removing the irrelevant, noisy, and redundant data, along with
keeping the original meaning of the original data. Attribute reduction
in rough set theory is an important feature selection method. Since
attribute reduction is an NP-hard problem, it is necessary to
investigate fast and effective approximate algorithms. In this paper,
we proposed two feature selection mechanisms based on memetic
algorithms (MAs) which combine the genetic algorithm with a fuzzy
record to record travel algorithm and a fuzzy controlled great deluge
algorithm, to identify a good balance between local search and
genetic search. In order to verify the proposed approaches, numerical
experiments are carried out on thirteen datasets. The results show that
the MAs approaches are efficient in solving attribute reduction
problems when compared with other meta-heuristic approaches.
Abstract: We present probabilistic multinomial Dirichlet
classification model for multidimensional data and Gaussian process
priors. Here, we have considered efficient computational method that
can be used to obtain the approximate posteriors for latent variables
and parameters needed to define the multiclass Gaussian process
classification model. We first investigated the process of inducing a
posterior distribution for various parameters and latent function by
using the variational Bayesian approximations and important sampling
method, and next we derived a predictive distribution of latent
function needed to classify new samples. The proposed model is
applied to classify the synthetic multivariate dataset in order to verify
the performance of our model. Experiment result shows that our model
is more accurate than the other approximation methods.
Abstract: This paper introduces an effective method of
segmenting Korean text (place names in Korean) from a Korean road
sign image. A Korean advanced directional road sign is composed of
several types of visual information such as arrows, place names in
Korean and English, and route numbers. Automatic classification of
the visual information and extraction of Korean place names from the
road sign images make it possible to avoid a lot of manual inputs to a
database system for management of road signs nationwide. We
propose a series of problem-specific heuristics that correctly segments
Korean place names, which is the most crucial information, from the
other information by leaving out non-text information effectively. The
experimental results with a dataset of 368 road sign images show 96%
of the detection rate per Korean place name and 84% per road sign
image.
Abstract: This paper suggests a new internal architecture of
holon based on feature selection model using the combination of
Bees Algorithm (BA) and Artificial Neural Network (ANN). BA is
used to generate features while ANN is used as a classifier to
evaluate the produced features. Proposed system is applied on the
Wine dataset, the statistical result proves that the proposed system is
effective and has the ability to choose informative features with high
accuracy.
Abstract: In this article, a new method is proposed for the measuring of well-being inequality through a model composed of superimposing satisfaction waves. The displacement of households’ satisfactory state (i.e. satisfaction) is defined in a satisfaction string. The duration of the satisfactory state for a given period is measured in order to determine the relationship between utility and total satisfactory time, itself dependent on the density and tension of each satisfaction string. Thus, individual cardinal total satisfaction values are computed by way of a one-dimensional form for scalar sinusoidal (harmonic) moving wave function, using satisfaction waves with varying amplitudes and frequencies which allow us to measure wellbeing inequality. One advantage to using satisfaction waves is the ability to show that individual utility and consumption amounts would probably not commute; hence, it is impossible to measure or to know simultaneously the values of these observables from the dataset. Thus, we crystallize the problem by using a Heisenberg-type uncertainty resolution for self-adjoint economic operators. We propose to eliminate any estimation bias by correlating the standard deviations of selected economic operators; this is achieved by replacing the aforementioned observed uncertainties with households’ perceived uncertainties (i.e. corrected standard deviations) obtained through the logarithmic psychophysical law proposed by Weber and Fechner.
Abstract: Studying on the response of vegetation phenology to
climate change at different temporal and spatial scales is important for
understanding and predicting future terrestrial ecosystem dynamics
and the adaptation of ecosystems to global change. In this study, the
Moderate Resolution Imaging Spectroradiometer (MODIS)
Normalized Difference Vegetation Index (NDVI) dataset and climate
data were used to analyze the dynamics of grassland phenology as well
as their correlation with climatic factors in different eco-geographic
regions and elevation units across the Tibetan Plateau. The results
showed that during 2003–2012, the start of the grassland greening
season (SOS) appeared later while the end of the growing season
(EOS) appeared earlier following the plateau’s precipitation and heat
gradients from southeast to northwest. The multi-year mean value of
SOS showed differences between various eco-geographic regions and
was significantly impacted by average elevation and regional average
precipitation during spring. Regional mean differences for EOS were
mainly regulated by mean temperature during autumn. Changes in
trends of SOS in the central and eastern eco-geographic regions were
coupled to the mean temperature during spring, advancing by about
7d/°C. However, in the two southwestern eco-geographic regions,
SOS was delayed significantly due to the impact of spring
precipitation. The results also showed that the SOS occurred later with
increasing elevation, as expected, with a delay rate of 0.66 d/100m.
For 2003–2012, SOS showed an advancing trend in low-elevation
areas, but a delayed trend in high-elevation areas, while EOS was
delayed in low-elevation areas, but advanced in high-elevation areas.
Grassland SOS and EOS changes may be influenced by a variety of
other environmental factors in each eco-geographic region.
Abstract: Feature selection has been used in many fields such as
classification, data mining and object recognition and proven to be
effective for removing irrelevant and redundant features from the
original dataset. In this paper, a new design of distributed intrusion
detection system using a combination feature selection model based
on bees and decision tree. Bees algorithm is used as the search
strategy to find the optimal subset of features, whereas decision tree
is used as a judgment for the selected features. Both the produced
features and the generated rules are used by Decision Making Mobile
Agent to decide whether there is an attack or not in the networks.
Decision Making Mobile Agent will migrate through the networks,
moving from node to another, if it found that there is an attack on one
of the nodes, it then alerts the user through User Interface Agent or
takes some action through Action Mobile Agent. The KDD Cup 99
dataset is used to test the effectiveness of the proposed system. The
results show that even if only four features are used, the proposed
system gives a better performance when it is compared with the
obtained results using all 41 features.
Abstract: Human beings have the ability to make logical
decisions. Although human decision - making is often optimal, it is
insufficient when huge amount of data is to be classified. Medical
dataset is a vital ingredient used in predicting patient’s health
condition. In other to have the best prediction, there calls for most
suitable machine learning algorithms. This work compared the
performance of Artificial Neural Network (ANN) and Decision Tree
Algorithms (DTA) as regards to some performance metrics using
diabetes data. WEKA software was used for the implementation of
the algorithms. Multilayer Perceptron (MLP) and Radial Basis
Function (RBF) were the two algorithms used for ANN, while
RegTree and LADTree algorithms were the DTA models used. From
the results obtained, DTA performed better than ANN. The Root
Mean Squared Error (RMSE) of MLP is 0.3913 that of RBF is
0.3625, that of RepTree is 0.3174 and that of LADTree is 0.3206
respectively.
Abstract: The article describes the effect of the replacement of
the used reference coordinate system in the georeferencing of an old
map of Europe. The map was georeferenced into three types of
projection – the equal-area conic (original cartographic projection),
cylindrical Plate Carrée and cylindrical Mercator map projection. The
map was georeferenced by means of the affine and the second-order
polynomial transformation. The resulting georeferenced raster
datasets from the Plate Carrée and Mercator projection were
projected into the equal-area conic projection by means of projection
equations. The output is the comparison of drawn graphics, the
magnitude of standard deviations for individual projections and types
of transformation.
Abstract: Today, there is a large number of political transcripts
available on the Web to be mined and used for statistical analysis,
and product recommendations. As the online political resources are
used for various purposes, automatically determining the political
orientation on these transcripts becomes crucial. The methodologies
used by machine learning algorithms to do an automatic classification
are based on different features that are classified under categories
such as Linguistic, Personality etc. Considering the ideological
differences between Liberals and Conservatives, in this paper, the
effect of Personality traits on political orientation classification is
studied. The experiments in this study were based on the correlation
between LIWC features and the BIG Five Personality traits. Several
experiments were conducted using Convote U.S. Congressional-
Speech dataset with seven benchmark classification algorithms. The
different methodologies were applied on several LIWC feature sets
that constituted by 8 to 64 varying number of features that are
correlated to five personality traits. As results of experiments,
Neuroticism trait was obtained to be the most differentiating
personality trait for classification of political orientation. At the same
time, it was observed that the personality trait based classification
methodology gives better and comparable results with the related
work.
Abstract: STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to real-world data
Abstract: In present study, it was aimed to determine potential
agricultural lands (PALs) in Gokceada (Imroz) Island of Canakkale
province, Turkey. Seven-band Landsat 8 OLI images acquired on
July 12 and August 13, 2013, and their 14-band combination image
were used to identify current Land Use Land Cover (LULC) status.
Principal Component Analysis (PCA) was applied to three Landsat
datasets in order to reduce the correlation between the bands. A total
of six Original and PCA images were classified using supervised
classification method to obtain the LULC maps including 6 main
classes (“Forest”, “Agriculture”, “Water Surface”, “Residential Area-
Bare Soil”, “Reforestation” and “Other”). Accuracy assessment was
performed by checking the accuracy of 120 randomized points for
each LULC maps. The best overall accuracy and Kappa statistic
values (90.83%, 0.8791% respectively) were found for PCA images
which were generated from 14-bands combined images called 3-
B/JA.
Digital Elevation Model (DEM) with 15 m spatial resolution
(ASTER) was used to consider topographical characteristics. Soil
properties were obtained by digitizing 1:25000 scaled soil maps of
Rural Services Directorate General. Potential Agricultural Lands
(PALs) were determined using Geographic information Systems
(GIS). Procedure was applied considering that “Other” class of
LULC map may be used for agricultural purposes in the future
properties. Overlaying analysis was conducted using Slope (S), Land
Use Capability Class (LUCC), Other Soil Properties (OSP) and Land
Use Capability Sub-Class (SUBC) properties.
A total of 901.62 ha areas within “Other” class (15798.2 ha) of
LULC map were determined as PALs. These lands were ranked as
“Very Suitable”, “Suitable”, “Moderate Suitable” and “Low
Suitable”. It was determined that the 8.03 ha were classified as “Very
Suitable” while 18.59 ha as suitable and 11.44 ha as “Moderate
Suitable” for PALs. In addition, 756.56 ha were found to be “Low
Suitable”. The results obtained from this preliminary study can serve
as basis for further studies.
Abstract: By the evolvement in technology, the way of
expressing opinions switched direction to the digital world. The
domain of politics, as one of the hottest topics of opinion mining
research, merged together with the behavior analysis for affiliation
determination in texts, which constitutes the subject of this paper.
This study aims to classify the text in news/blogs either as
Republican or Democrat with the minimum number of features. As
an initial set, 68 features which 64 were constituted by Linguistic
Inquiry and Word Count (LIWC) features were tested against 14
benchmark classification algorithms. In the later experiments, the
dimensions of the feature vector reduced based on the 7 feature
selection algorithms. The results show that the “Decision Tree”,
“Rule Induction” and “M5 Rule” classifiers when used with “SVM”
and “IGR” feature selection algorithms performed the best up to
82.5% accuracy on a given dataset. Further tests on a single feature
and the linguistic based feature sets showed the similar results. The
feature “Function”, as an aggregate feature of the linguistic category,
was found as the most differentiating feature among the 68 features
with the accuracy of 81% in classifying articles either as Republican
or Democrat.
Abstract: Software fault prediction models are created by using
the source code, processed metrics from the same or previous version
of code and related fault data. Some company do not store and keep
track of all artifacts which are required for software fault prediction.
To construct fault prediction model for such company, the training
data from the other projects can be one potential solution. Earlier we
predicted the fault the less cost it requires to correct. The training
data consists of metrics data and related fault data at function/module
level. This paper investigates fault predictions at early stage using the
cross-project data focusing on the design metrics. In this study,
empirical analysis is carried out to validate design metrics for cross
project fault prediction. The machine learning techniques used for
evaluation is Naïve Bayes. The design phase metrics of other projects
can be used as initial guideline for the projects where no previous
fault data is available. We analyze seven datasets from NASA
Metrics Data Program which offer design as well as code metrics.
Overall, the results of cross project is comparable to the within
company data learning.
Abstract: Margin-Based Principle has been proposed for a long
time, it has been proved that this principle could reduce the
structural risk and improve the performance in both theoretical
and practical aspects. Meanwhile, feed-forward neural network is
a traditional classifier, which is very hot at present with a deeper
architecture. However, the training algorithm of feed-forward neural
network is developed and generated from Widrow-Hoff Principle that
means to minimize the squared error. In this paper, we propose
a new training algorithm for feed-forward neural networks based
on Margin-Based Principle, which could effectively promote the
accuracy and generalization ability of neural network classifiers
with less labelled samples and flexible network. We have conducted
experiments on four UCI open datasets and achieved good results
as expected. In conclusion, our model could handle more sparse
labelled and more high-dimension dataset in a high accuracy while
modification from old ANN method to our method is easy and almost
free of work.