Abstract: The problems arising from unbalanced data sets
generally appear in real world applications. Due to unequal class
distribution, many researchers have found that the performance of
existing classifiers tends to be biased towards the majority class. The
k-nearest neighbors’ nonparametric discriminant analysis is a method
that was proposed for classifying unbalanced classes with good
performance. In this study, the methods of discriminant analysis are
of interest in investigating misclassification error rates for classimbalanced
data of three diabetes risk groups. The purpose of this
study was to compare the classification performance between
parametric discriminant analysis and nonparametric discriminant
analysis in a three-class classification of class-imbalanced data of
diabetes risk groups. Data from a project maintaining healthy
conditions for 599 employees of a government hospital in Bangkok
were obtained for the classification problem. The employees were
divided into three diabetes risk groups: non-risk (90%), risk (5%),
and diabetic (5%). The original data including the variables of
diabetes risk group, age, gender, blood glucose, and BMI were
analyzed and bootstrapped for 50 and 100 samples, 599 observations
per sample, for additional estimation of the misclassification error
rate. Each data set was explored for the departure of multivariate
normality and the equality of covariance matrices of the three risk
groups. Both the original data and the bootstrap samples showed nonnormality
and unequal covariance matrices. The parametric linear
discriminant function, quadratic discriminant function, and the
nonparametric k-nearest neighbors’ discriminant function were
performed over 50 and 100 bootstrap samples and applied to the
original data. Searching the optimal classification rule, the choices of
prior probabilities were set up for both equal proportions (0.33: 0.33:
0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10)
and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples
indicated that the k-nearest neighbors approach when k=3 or k=4 and
the defined prior probabilities of non-risk: risk: diabetic as 0.90:
0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of
misclassification. The k-nearest neighbors approach would be
suggested for classifying a three-class-imbalanced data of diabetes
risk groups.
Abstract: The issue of classifying objects into one of predefined
groups when the measured variables are mixed with different types
of variables has been part of interest among statisticians in many
years. Some methods for dealing with such situation have been
introduced that include parametric, semi-parametric and nonparametric
approaches. This paper attempts to discuss on a problem
in classifying a data when the number of measured mixed variables is
larger than the size of the sample. A propose idea that integrates a
dimensionality reduction technique via principal component analysis
and a discriminant function based on the location model is discussed.
The study aims in offering practitioners another potential tool in a
classification problem that is possible to be considered when the
observed variables are mixed and too large.
Abstract: A new target detection technique is presented in this
paper for the identification of small boats in coastal surveillance. The
proposed technique employs an adaptive progressive thresholding (APT) scheme to first process the given input scene to separate any
objects present in the scene from the background. The preprocessing
step results in an image having only the foreground objects, such as
boats, trees and other cluttered regions, and hence reduces the search
region for the correlation step significantly. The processed image is then fed to the shifted phase-encoded fringe-adjusted joint transform
correlator (SPFJTC) technique which produces single and delta-like
correlation peak for a potential target present in the input scene. A
post-processing step involves using a peak-to-clutter ratio (PCR) to determine whether the boat in the input scene is authorized or unauthorized. Simulation results are presented to show that the
proposed technique can successfully determine the presence of an authorized boat and identify any intruding boat present in the given input scene.
Abstract: In this work, we improve a previously developed
segmentation scheme aimed at extracting edge information from
speckled images using a maximum likelihood edge detector. The
scheme was based on finding a threshold for the probability density
function of a new kernel defined as the arithmetic mean-to-geometric
mean ratio field over a circular neighborhood set and, in a general
context, is founded on a likelihood random field model (LRFM). The
segmentation algorithm was applied to discriminated speckle areas
obtained using simple elliptic discriminant functions based on
measures of the signal-to-noise ratio with fractional order moments.
A rigorous stochastic analysis was used to derive an exact expression
for the cumulative density function of the probability density
function of the random field. Based on this, an accurate probability
of error was derived and the performance of the scheme was
analysed. The improved segmentation scheme performed well for
both simulated and real images and showed superior results to those
previously obtained using the original LRFM scheme and standard
edge detection methods. In particular, the false alarm probability was
markedly lower than that of the original LRFM method with
oversegmentation artifacts virtually eliminated. The importance of
this work lies in the development of a stochastic-based segmentation,
allowing an accurate quantification of the probability of false
detection. Non visual quantification and misclassification in medical
ultrasound speckled images is relatively new and is of interest to
clinicians.