Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering

Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.

A Review on Soft Computing Technique in Intrusion Detection System

Intrusion Detection System is significant in network security. It detects and identifies intrusion behavior or intrusion attempts in a computer system by monitoring and analyzing the network packets in real time. In the recent year, intelligent algorithms applied in the intrusion detection system (IDS) have been an increasing concern with the rapid growth of the network security. IDS data deals with a huge amount of data which contains irrelevant and redundant features causing slow training and testing process, higher resource consumption as well as poor detection rate. Since the amount of audit data that an IDS needs to examine is very large even for a small network, classification by hand is impossible. Hence, the primary objective of this review is to review the techniques prior to classification process suit to IDS data.

Mining Network Data for Intrusion Detection through Naïve Bayesian with Clustering

Network security attacks are the violation of information security policy that received much attention to the computational intelligence society in the last decades. Data mining has become a very useful technique for detecting network intrusions by extracting useful knowledge from large number of network data or logs. Naïve Bayesian classifier is one of the most popular data mining algorithm for classification, which provides an optimal way to predict the class of an unknown example. It has been tested that one set of probability derived from data is not good enough to have good classification rate. In this paper, we proposed a new learning algorithm for mining network logs to detect network intrusions through naïve Bayesian classifier, which first clusters the network logs into several groups based on similarity of logs, and then calculates the prior and conditional probabilities for each group of logs. For classifying a new log, the algorithm checks in which cluster the log belongs and then use that cluster-s probability set to classify the new log. We tested the performance of our proposed algorithm by employing KDD99 benchmark network intrusion detection dataset, and the experimental results proved that it improves detection rates as well as reduces false positives for different types of network intrusions.

Vehicle Detection Method using Haar-like Feature on Real Time System

This paper presents a robust vehicle detection approach using Haar-like feature. It is possible to get a strong edge feature from this Haar-like feature. Therefore it is very effective to remove the shadow of a vehicle on the road. And we can detect the boundary of vehicles accurately. In the paper, the vehicle detection algorithm can be divided into two main steps. One is hypothesis generation, and the other is hypothesis verification. In the first step, it determines vehicle candidates using features such as a shadow, intensity, and vertical edge. And in the second step, it determines whether the candidate is a vehicle or not by using the symmetry of vehicle edge features. In this research, we can get the detection rate over 15 frames per second on our embedded system.

A New Face Detection Technique using 2D DCT and Self Organizing Feature Map

This paper presents a new technique for detection of human faces within color images. The approach relies on image segmentation based on skin color, features extracted from the two-dimensional discrete cosine transform (DCT), and self-organizing maps (SOM). After candidate skin regions are extracted, feature vectors are constructed using DCT coefficients computed from those regions. A supervised SOM training session is used to cluster feature vectors into groups, and to assign “face" or “non-face" labels to those clusters. Evaluation was performed using a new image database of 286 images, containing 1027 faces. After training, our detection technique achieved a detection rate of 77.94% during subsequent tests, with a false positive rate of 5.14%. To our knowledge, the proposed technique is the first to combine DCT-based feature extraction with a SOM for detecting human faces within color images. It is also one of a few attempts to combine a feature-invariant approach, such as color-based skin segmentation, together with appearance-based face detection. The main advantage of the new technique is its low computational requirements, in terms of both processing speed and memory utilization.

A Computer Aided Detection (CAD) System for Microcalcifications in Mammograms - MammoScan mCaD

Clusters of microcalcifications in mammograms are an important sign of breast cancer. This paper presents a complete Computer Aided Detection (CAD) scheme for automatic detection of clustered microcalcifications in digital mammograms. The proposed system, MammoScan μCaD, consists of three main steps. Firstly all potential microcalcifications are detected using a a method for feature extraction, VarMet, and adaptive thresholding. This will also give a number of false detections. The goal of the second step, Classifier level 1, is to remove everything but microcalcifications. The last step, Classifier level 2, uses learned dictionaries and sparse representations as a texture classification technique to distinguish single, benign microcalcifications from clustered microcalcifications, in addition to remove some remaining false detections. The system is trained and tested on true digital data from Stavanger University Hospital, and the results are evaluated by radiologists. The overall results are promising, with a sensitivity > 90 % and a low false detection rate (approx 1 unwanted pr. image, or 0.3 false pr. image).

A Unified Robust Algorithm for Detection of Human and Non-human Object in Intelligent Safety Application

This paper presents a general trainable framework for fast and robust upright human face and non-human object detection and verification in static images. To enhance the performance of the detection process, the technique we develop is based on the combination of fast neural network (FNN) and classical neural network (CNN). In FNN, a useful correlation is exploited to sustain high level of detection accuracy between input image and the weight of the hidden neurons. This is to enable the use of Fourier transform that significantly speed up the time detection. The combination of CNN is responsible to verify the face region. A bootstrap algorithm is used to collect non human object, which adds the false detection to the training process of the human and non-human object. Experimental results on test images with both simple and complex background demonstrate that the proposed method has obtained high detection rate and low false positive rate in detecting both human face and non-human object.

Night-Time Traffic Light Detection Based On SVM with Geometric Moment Features

This paper presents an effective traffic lights detection method at the night-time. First, candidate blobs of traffic lights are extracted from RGB color image. Input image is represented on the dominant color domain by using color transform proposed by Ruta, then red and green color dominant regions are selected as candidates. After candidate blob selection, we carry out shape filter for noise reduction using information of blobs such as length, area, area of boundary box, etc. A multi-class classifier based on SVM (Support Vector Machine) applies into the candidates. Three kinds of features are used. We use basic features such as blob width, height, center coordinate, area, area of blob. Bright based stochastic features are also used. In particular, geometric based moment-s values between candidate region and adjacent region are proposed and used to improve the detection performance. The proposed system is implemented on Intel Core CPU with 2.80 GHz and 4 GB RAM and tested with the urban and rural road videos. Through the test, we show that the proposed method using PF, BMF, and GMF reaches up to 93 % of detection rate with computation time of in average 15 ms/frame.

Research on Hybrid Neural Network in Intrusion Detection System

This paper presents an intrusion detection system of hybrid neural network model based on RBF and Elman. It is used for anomaly detection and misuse detection. This model has the memory function .It can detect discrete and related aggressive behavior effectively. RBF network is a real-time pattern classifier, and Elman network achieves the memory ability for former event. Based on the hybrid model intrusion detection system uses DARPA data set to do test evaluation. It uses ROC curve to display the test result intuitively. After the experiment it proves this hybrid model intrusion detection system can effectively improve the detection rate, and reduce the rate of false alarm and fail.

Lung Nodule Detection in CT Scans

In this paper we describe a computer-aided diagnosis (CAD) system for automated detection of pulmonary nodules in computed-tomography (CT) images. After extracting the pulmonary parenchyma using a combination of image processing techniques, a region growing method is applied to detect nodules based on 3D geometric features. We applied the CAD system to CT scans collected in a screening program for lung cancer detection. Each scan consists of a sequence of about 300 slices stored in DICOM (Digital Imaging and Communications in Medicine) format. All malignant nodules were detected and a low false-positive detection rate was achieved.

A Fast Sign Localization System Using Discriminative Color Invariant Segmentation

Building intelligent traffic guide systems has been an interesting subject recently. A good system should be able to observe all important visual information to be able to analyze the context of the scene. To do so, signs in general, and traffic signs in particular, are usually taken into account as they contain rich information to these systems. Therefore, many researchers have put an effort on sign recognition field. Sign localization or sign detection is the most important step in the sign recognition process. This step filters out non informative area in the scene, and locates candidates in later steps. In this paper, we apply a new approach in detecting sign locations using a new color invariant model. Experiments are carried out with different datasets introduced in other works where authors claimed the difficulty in detecting signs under unfavorable imaging conditions. Our method is simple, fast and most importantly it gives a high detection rate in locating signs.

Fragile Watermarking for Color Images Using Thresholding Technique

In this paper, we propose ablock-wise watermarking scheme for color image authentication to resist malicious tampering of digital media. The thresholding technique is incorporated into the scheme such that the tampered region of the color image can be recovered with high quality while the proofing result is obtained. The watermark for each block consists of its dual authentication data and the corresponding feature information. The feature information for recovery iscomputed bythe thresholding technique. In the proofing process, we propose a dual-option parity check method to proof the validity of image blocks. In the recovery process, the feature information of each block embedded into the color image is rebuilt for high quality recovery. The simulation results show that the proposed watermarking scheme can effectively proof the tempered region with high detection rate and can recover the tempered region with high quality.

New Features for Specific JPEG Steganalysis

We present in this paper a new approach for specific JPEG steganalysis and propose studying statistics of the compressed DCT coefficients. Traditionally, steganographic algorithms try to preserve statistics of the DCT and of the spatial domain, but they cannot preserve both and also control the alteration of the compressed data. We have noticed a deviation of the entropy of the compressed data after a first embedding. This deviation is greater when the image is a cover medium than when the image is a stego image. To observe this deviation, we pointed out new statistic features and combined them with the Multiple Embedding Method. This approach is motivated by the Avalanche Criterion of the JPEG lossless compression step. This criterion makes possible the design of detectors whose detection rates are independent of the payload. Finally, we designed a Fisher discriminant based classifier for well known steganographic algorithms, Outguess, F5 and Hide and Seek. The experiemental results we obtained show the efficiency of our classifier for these algorithms. Moreover, it is also designed to work with low embedding rates (< 10-5) and according to the avalanche criterion of RLE and Huffman compression step, its efficiency is independent of the quantity of hidden information.

Attacks Classification in Adaptive Intrusion Detection using Decision Tree

Recently, information security has become a key issue in information technology as the number of computer security breaches are exposed to an increasing number of security threats. A variety of intrusion detection systems (IDS) have been employed for protecting computers and networks from malicious network-based or host-based attacks by using traditional statistical methods to new data mining approaches in last decades. However, today's commercially available intrusion detection systems are signature-based that are not capable of detecting unknown attacks. In this paper, we present a new learning algorithm for anomaly based network intrusion detection system using decision tree algorithm that distinguishes attacks from normal behaviors and identifies different types of intrusions. Experimental results on the KDD99 benchmark network intrusion detection dataset demonstrate that the proposed learning algorithm achieved 98% detection rate (DR) in comparison with other existing methods.

Face Detection using Variance based Haar-Like feature and SVM

This paper proposes a new approach to perform the problem of real-time face detection. The proposed method combines primitive Haar-Like feature and variance value to construct a new feature, so-called Variance based Haar-Like feature. Face in image can be represented with a small quantity of features using this new feature. We used SVM instead of AdaBoost for training and classification. We made a database containing 5,000 face samples and 10,000 non-face samples extracted from real images for learning purposed. The 5,000 face samples contain many images which have many differences of light conditions. And experiments showed that face detection system using Variance based Haar-Like feature and SVM can be much more efficient than face detection system using primitive Haar-Like feature and AdaBoost. We tested our method on two Face databases and one Non-Face database. We have obtained 96.17% of correct detection rate on YaleB face database, which is higher 4.21% than that of using primitive Haar-Like feature and AdaBoost.

A Novel Approach towards Segmentation of Breast Tumors from Screening Mammograms for Efficient Decision Support System

This paper presents a novel approach to finding a priori interesting regions in mammograms. In order to delineate those regions of interest (ROI-s) in mammograms, which appear to be prominent, a topographic representation called the iso-level contour map consisting of iso-level contours at multiple intensity levels and region segmentation based-thresholding have been proposed. The simulation results indicate that the computed boundary gives the detection rate of 99.5% accuracy.

Analysis of Testing and Operational Software Reliability in SRGM based on NHPP

Software Reliability is one of the key factors in the software development process. Software Reliability is estimated using reliability models based on Non Homogenous Poisson Process. In most of the literature the Software Reliability is predicted only in testing phase. So it leads to wrong decision-making concept. In this paper, two Software Reliability concepts, testing and operational phase are studied in detail. Using S-Shaped Software Reliability Growth Model (SRGM) and Exponential SRGM, the testing and operational reliability values are obtained. Finally two reliability values are compared and optimal release time is investigated.

Unsupervised Clustering Methods for Identifying Rare Events in Anomaly Detection

It is important problems to increase the detection rates and reduce false positive rates in Intrusion Detection System (IDS). Although preventative techniques such as access control and authentication attempt to prevent intruders, these can fail, and as a second line of defence, intrusion detection has been introduced. Rare events are events that occur very infrequently, detection of rare events is a common problem in many domains. In this paper we propose an intrusion detection method that combines Rough set and Fuzzy Clustering. Rough set has to decrease the amount of data and get rid of redundancy. Fuzzy c-means clustering allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect suspicious activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining-(KDDCup 1999) Dataset show that the method is efficient and practical for intrusion detection systems.

Scaling up Detection Rates and Reducing False Positives in Intrusion Detection using NBTree

In this paper, we present a new learning algorithm for anomaly based network intrusion detection using improved self adaptive naïve Bayesian tree (NBTree), which induces a hybrid of decision tree and naïve Bayesian classifier. The proposed approach scales up the balance detections for different attack types and keeps the false positives at acceptable level in intrusion detection. In complex and dynamic large intrusion detection dataset, the detection accuracy of naïve Bayesian classifier does not scale up as well as decision tree. It has been successfully tested in other problem domains that naïve Bayesian tree improves the classification rates in large dataset. In naïve Bayesian tree nodes contain and split as regular decision-trees, but the leaves contain naïve Bayesian classifiers. The experimental results on KDD99 benchmark network intrusion detection dataset demonstrate that this new approach scales up the detection rates for different attack types and reduces false positives in network intrusion detection.

Effective Traffic Lights Recognition Method for Real Time Driving Assistance Systemin the Daytime

This paper presents an effective traffic lights recognition method at the daytime. First, Potential Traffic Lights Detector (PTLD) use whole color source of YCbCr channel image and make each binary image of green and red traffic lights. After PTLD step, Shape Filter (SF) use to remove noise such as traffic sign, street tree, vehicle, and building. At this time, noise removal properties consist of information of blobs of binary image; length, area, area of boundary box, etc. Finally, after an intermediate association step witch goal is to define relevant candidates region from the previously detected traffic lights, Adaptive Multi-class Classifier (AMC) is executed. The classification method uses Haar-like feature and Adaboost algorithm. For simulation, we are implemented through Intel Core CPU with 2.80 GHz and 4 GB RAM and tested in the urban and rural roads. Through the test, we are compared with our method and standard object-recognition learning processes and proved that it reached up to 94 % of detection rate which is better than the results achieved with cascade classifiers. Computation time of our proposed method is 15 ms.