Abstract: By taking advantage of both k-NN which is highly
accurate and K-means cluster which is able to reduce the time of classification, we can introduce Cluster-k-Nearest Neighbor as "variable k"-NN dealing with the centroid or mean point of all subclasses generated by clustering algorithm. In general the algorithm of K-means cluster is not stable, in term of accuracy, for that reason we develop another algorithm for clustering our space which gives a higher accuracy than K-means cluster, less
subclass number, stability and bounded time of classification with respect to the variable data size. We find between 96% and 99.7 % of accuracy in the lassification of 6 different types of Time series by using K-means cluster algorithm and we find 99.7% by using the new clustering algorithm.
Abstract: Distributed denial-of-service (DDoS) attacks pose a
serious threat to network security. There have been a lot of
methodologies and tools devised to detect DDoS attacks and reduce
the damage they cause. Still, most of the methods cannot
simultaneously achieve (1) efficient detection with a small number of
false alarms and (2) real-time transfer of packets. Here, we introduce
a method for proactive detection of DDoS attacks, by classifying the
network status, to be utilized in the detection stage of the proposed
anti-DDoS framework. Initially, we analyse the DDoS architecture
and obtain details of its phases. Then, we investigate the procedures
of DDoS attacks and select variables based on these features. Finally,
we apply the k-nearest neighbour (k-NN) method to classify the
network status into each phase of DDoS attack. The simulation result
showed that each phase of the attack scenario is classified well and
we could detect DDoS attack in the early stage.
Abstract: The feature extraction method(s) used to recognize
hand-printed characters play an important role in ICR applications.
In order to achieve high recognition rate for a recognition system, the
choice of a feature that suits for the given script is certainly an
important task. Even if a new feature required to be designed for a
given script, it is essential to know the recognition ability of the
existing features for that script. Devanagari script is being used in
various Indian languages besides Hindi the mother tongue of majority
of Indians. This research examines a variety of feature extraction
approaches, which have been used in various ICR/OCR applications,
in context to Devanagari hand-printed script. The study is conducted
theoretically and experimentally on more that 10 feature extraction
methods. The various feature extraction methods have been evaluated
on Devanagari hand-printed database comprising more than 25000
characters belonging to 43 alphabets. The recognition ability of the
features have been evaluated using three classifiers i.e. k-NN, MLP
and SVM.
Abstract: The design of a pattern classifier includes an attempt
to select, among a set of possible features, a minimum subset of
weakly correlated features that better discriminate the pattern classes.
This is usually a difficult task in practice, normally requiring the
application of heuristic knowledge about the specific problem
domain. The selection and quality of the features representing each
pattern have a considerable bearing on the success of subsequent
pattern classification. Feature extraction is the process of deriving
new features from the original features in order to reduce the cost of
feature measurement, increase classifier efficiency, and allow higher
classification accuracy. Many current feature extraction techniques
involve linear transformations of the original pattern vectors to new
vectors of lower dimensionality. While this is useful for data
visualization and increasing classification efficiency, it does not
necessarily reduce the number of features that must be measured
since each new feature may be a linear combination of all of the
features in the original pattern vector. In this paper a new approach is
presented to feature extraction in which feature selection, feature
extraction, and classifier training are performed simultaneously using
a genetic algorithm. In this approach each feature value is first
normalized by a linear equation, then scaled by the associated weight
prior to training, testing, and classification. A knn classifier is used to
evaluate each set of feature weights. The genetic algorithm optimizes
a vector of feature weights, which are used to scale the individual
features in the original pattern vectors in either a linear or a nonlinear
fashion. By this approach, the number of features used in classifying
can be finely reduced.
Abstract: On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.
Abstract: Developing a stable early warning system (EWS)
model that is capable to give an accurate prediction is a challenging
task. This paper introduces k-nearest neighbour (k-NN) method
which never been applied in predicting currency crisis before with the
aim of increasing the prediction accuracy. The proposed k-NN
performance depends on the choice of a distance that is used where in
our analysis; we take the Euclidean distance and the Manhattan as a
consideration. For the comparison, we employ three other methods
which are logistic regression analysis (logit), back-propagation neural
network (NN) and sequential minimal optimization (SMO). The
analysis using datasets from 8 countries and 13 macro-economic
indicators for each country shows that the proposed k-NN method
with k = 4 and Manhattan distance performs better than the other
methods.
Abstract: The recognition of handwritten numeral is an
important area of research for its applications in post office, banks
and other organizations. This paper presents automatic recognition of
handwritten Kannada numerals based on structural features. Five
different types of features, namely, profile based 10-segment string,
water reservoir; vertical and horizontal strokes, end points and
average boundary length from the minimal bounding box are used in
the recognition of numeral. The effect of each feature and their
combination in the numeral classification is analyzed using nearest
neighbor classifiers. It is common to combine multiple categories of
features into a single feature vector for the classification. Instead,
separate classifiers can be used to classify based on each visual
feature individually and the final classification can be obtained based
on the combination of separate base classification results. One
popular approach is to combine the classifier results into a feature
vector and leaving the decision to next level classifier. This method
is extended to extract a better information, possibility distribution,
from the base classifiers in resolving the conflicts among the
classification results. Here, we use fuzzy k Nearest Neighbor (fuzzy
k-NN) as base classifier for individual feature sets, the results of
which together forms the feature vector for the final k Nearest
Neighbor (k-NN) classifier. Testing is done, using different features,
individually and in combination, on a database containing 1600
samples of different numerals and the results are compared with the
results of different existing methods.
Abstract: Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.