Predicting Application Layer DDoS Attacks Using Machine Learning Algorithms

A Distributed Denial of Service (DDoS) attack is a major threat to cyber security. It originates from the network layer or the application layer of compromised/attacker systems which are connected to the network. The impact of this attack ranges from the simple inconvenience to use a particular service to causing major failures at the targeted server. When there is heavy traffic flow to a target server, it is necessary to classify the legitimate access and attacks. In this paper, a novel method is proposed to detect DDoS attacks from the traces of traffic flow. An access matrix is created from the traces. As the access matrix is multi dimensional, Principle Component Analysis (PCA) is used to reduce the attributes used for detection. Two classifiers Naive Bayes and K-Nearest neighborhood are used to classify the traffic as normal or abnormal. The performance of the classifier with PCA selected attributes and actual attributes of access matrix is compared by the detection rate and False Positive Rate (FPR).

Statistical Measures and Optimization Algorithms for Gene Selection in Lung and Ovarian Tumor

Microarray technology is universally used in the study of disease diagnosis using gene expression levels. The main shortcoming of gene expression data is that it includes thousands of genes and a small number of samples. Abundant methods and techniques have been proposed for tumor classification using microarray gene expression data. Feature or gene selection methods can be used to mine the genes that directly involve in the classification and to eliminate irrelevant genes. In this paper statistical measures like T-Statistics, Signal-to-Noise Ratio (SNR) and F-Statistics are used to rank the genes. The ranked genes are used for further classification. Particle Swarm Optimization (PSO) algorithm and Shuffled Frog Leaping (SFL) algorithm are used to find the significant genes from the top-m ranked genes. The Naïve Bayes Classifier (NBC) is used to classify the samples based on the significant genes. The proposed work is applied on Lung and Ovarian datasets. The experimental results show that the proposed method achieves 100% accuracy in all the three datasets and the results are compared with previous works.

DWT Based Image Steganalysis

‘Steganalysis’ is one of the challenging and attractive interests for the researchers with the development of information hiding techniques. It is the procedure to detect the hidden information from the stego created by known steganographic algorithm. In this paper, a novel feature based image steganalysis technique is proposed. Various statistical moments have been used along with some similarity metric. The proposed steganalysis technique has been designed based on transformation in four wavelet domains, which include Haar, Daubechies, Symlets and Biorthogonal. Each domain is being subjected to various classifiers, namely K-nearest-neighbor, K* Classifier, Locally weighted learning, Naive Bayes classifier, Neural networks, Decision trees and Support vector machines. The experiments are performed on a large set of pictures which are available freely in image database. The system also predicts the different message length definitions.

Comparing SVM and Naïve Bayes Classifier for Automatic Microaneurysm Detections

Diabetic retinopathy is characterized by the development of retinal microaneurysms. The damage can be prevented if disease is treated in its early stages. In this paper, we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers for automatic microaneurysm detection in images acquired through non-dilated pupils. The Nearest Neighbor classifier is used as a baseline for comparison. Detected microaneurysms are validated with expert ophthalmologists’ hand-drawn ground-truths. The sensitivity, specificity, precision and accuracy of each method are also compared.

Automatic Microaneurysm Quantification for Diabetic Retinopathy Screening

Microaneurysm is a key indicator of diabetic retinopathy that can potentially cause damage to retina. Early detection and automatic quantification are the keys to prevent further damage. In this paper, which focuses on automatic microaneurysm detection in images acquired through non-dilated pupils, we present a series of experiments on feature selection and automatic microaneurysm pixel classification. We found that the best feature set is a combination of 10 features: the pixel-s intensity of shade corrected image, the pixel hue, the standard deviation of shade corrected image, DoG4, the area of the candidate MA, the perimeter of the candidate MA, the eccentricity of the candidate MA, the circularity of the candidate MA, the mean intensity of the candidate MA on shade corrected image and the ratio of the major axis length and minor length of the candidate MA. The overall sensitivity, specificity, precision, and accuracy are 84.82%, 99.99%, 89.01%, and 99.99%, respectively.

Improving Classification in Bayesian Networks using Structural Learning

Naïve Bayes classifiers are simple probabilistic classifiers. Classification extracts patterns by using data file with a set of labeled training examples and is currently one of the most significant areas in data mining. However, Naïve Bayes assumes the independence among the features. Structural learning among the features thus helps in the classification problem. In this study, the use of structural learning in Bayesian Network is proposed to be applied where there are relationships between the features when using the Naïve Bayes. The improvement in the classification using structural learning is shown if there exist relationship between the features or when they are not independent.

Optimizing Mobile Agents Migration Based on Decision Tree Learning

Mobile agents are a powerful approach to develop distributed systems since they migrate to hosts on which they have the resources to execute individual tasks. In a dynamic environment like a peer-to-peer network, Agents have to be generated frequently and dispatched to the network. Thus they will certainly consume a certain amount of bandwidth of each link in the network if there are too many agents migration through one or several links at the same time, they will introduce too much transferring overhead to the links eventually, these links will be busy and indirectly block the network traffic, therefore, there is a need of developing routing algorithms that consider about traffic load. In this paper we seek to create cooperation between a probabilistic manner according to the quality measure of the network traffic situation and the agent's migration decision making to the next hop based on decision tree learning algorithms.

Analysis of a Population of Diabetic Patients Databases with Classifiers

Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.

Detecting Email Forgery using Random Forests and Naïve Bayes Classifiers

As emails communications have no consistent authentication procedure to ensure the authenticity, we present an investigation analysis approach for detecting forged emails based on Random Forests and Naïve Bays classifiers. Instead of investigating the email headers, we use the body content to extract a unique writing style for all the possible suspects. Our approach consists of four main steps: (1) The cybercrime investigator extract different effective features including structural, lexical, linguistic, and syntactic evidence from previous emails for all the possible suspects, (2) The extracted features vectors are normalized to increase the accuracy rate. (3) The normalized features are then used to train the learning engine, (4) upon receiving the anonymous email (M); we apply the feature extraction process to produce a feature vector. Finally, using the machine learning classifiers the email is assigned to one of the suspects- whose writing style closely matches M. Experimental results on real data sets show the improved performance of the proposed method and the ability of identifying the authors with a very limited number of features.