Unsupervised Clustering Methods for Identifying Rare Events in Anomaly Detection

It is important problems to increase the detection rates and reduce false positive rates in Intrusion Detection System (IDS). Although preventative techniques such as access control and authentication attempt to prevent intruders, these can fail, and as a second line of defence, intrusion detection has been introduced. Rare events are events that occur very infrequently, detection of rare events is a common problem in many domains. In this paper we propose an intrusion detection method that combines Rough set and Fuzzy Clustering. Rough set has to decrease the amount of data and get rid of redundancy. Fuzzy c-means clustering allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect suspicious activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining-(KDDCup 1999) Dataset show that the method is efficient and practical for intrusion detection systems.

Walsh-Hadamard Transform for Facial Feature Extraction in Face Recognition

This Paper proposes a new facial feature extraction approach, Wash-Hadamard Transform (WHT). This approach is based on correlation between local pixels of the face image. Its primary advantage is the simplicity of its computation. The paper compares the proposed approach, WHT, which was traditionally used in data compression with two other known approaches: the Principal Component Analysis (PCA) and the Discrete Cosine Transform (DCT) using the face database of Olivetti Research Laboratory (ORL). In spite of its simple computation, the proposed algorithm (WHT) gave very close results to those obtained by the PCA and DCT. This paper initiates the research into WHT and the family of frequency transforms and examines their suitability for feature extraction in face recognition applications.

Modelling and Analyzing a Hospital Procedureusing a Petri-Net Approach

Hierarchical high-level PNs (HHPNs) with time versions are a useful tool to model systems in a variety of application domains, ranging from logistics to complex workflows. This paper addresses an application domain which is receiving more and more attention: procedure that arranges the final inpatient charge in payment-s office and their management. We shall prove that Petri net based analysis is able to improve the delays during the procedure, in order that inpatient charges could be more reliable and on time.

Sounds Alike Name Matching for Myanmar Language

Personal name matching system is the core of essential task in national citizen database, text and web mining, information retrieval, online library system, e-commerce and record linkage system. It has necessitated to the all embracing research in the vicinity of name matching. Traditional name matching methods are suitable for English and other Latin based language. Asian languages which have no word boundary such as Myanmar language still requires sounds alike matching system in Unicode based application. Hence we proposed matching algorithm to get analogous sounds alike (phonetic) pattern that is convenient for Myanmar character spelling. According to the nature of Myanmar character, we consider for word boundary fragmentation, collation of character. Thus we use pattern conversion algorithm which fabricates words in pattern with fragmented and collated. We create the Myanmar sounds alike phonetic group to help in the phonetic matching. The experimental results show that fragmentation accuracy in 99.32% and processing time in 1.72 ms.

DODR : Delay On-Demand Routing

As originally designed for wired networks, TCP (transmission control protocol) congestion control mechanism is triggered into action when packet loss is detected. This implicit assumption for packet loss mostly due to network congestion does not work well in Mobile Ad Hoc Network, where there is a comparatively high likelihood of packet loss due to channel errors and node mobility etc. Such non-congestion packet loss, when dealt with by congestion control mechanism, causes poor TCP performance in MANET. In this study, we continue to investigate the impact of the interaction between transport protocols and on-demand routing protocols on the performance and stability of 802.11 multihop networks. We evaluate the important wireless networking events caused routing change, and propose a cross layer method to delay the unnecessary routing changes, only need to add a sensitivity parameter α , which represents the on-demand routing-s reaction to link failure of MAC layer. Our proposal is applicable to the plain 802.11 networking environment, the simulation results that this method can remarkably improve the stability and performance of TCP without any modification on TCP and MAC protocol.

A Fault Tolerant Token-based Algorithm for Group Mutual Exclusion in Distributed Systems

The group mutual exclusion (GME) problem is a variant of the mutual exclusion problem. In the present paper a token-based group mutual exclusion algorithm, capable of handling transient faults, is proposed. The algorithm uses the concept of dynamic request sets. A time out mechanism is used to detect the token loss; also, a distributed scheme is used to regenerate the token. The worst case message complexity of the algorithm is n+1. The maximum concurrency and forum switch complexity of the algorithm are n and min (n, m) respectively, where n is the number of processes and m is the number of groups. The algorithm also satisfies another desirable property called smooth admission. The scheme can also be adapted to handle the extended group mutual exclusion problem.

A Modified Cross Correlation in the Frequency Domain for Fast Pattern Detection Using Neural Networks

Recently, neural networks have shown good results for detection of a certain pattern in a given image. In our previous papers [1-5], a fast algorithm for pattern detection using neural networks was presented. Such algorithm was designed based on cross correlation in the frequency domain between the input image and the weights of neural networks. Image conversion into symmetric shape was established so that fast neural networks can give the same results as conventional neural networks. Another configuration of symmetry was suggested in [3,4] to improve the speed up ratio. In this paper, our previous algorithm for fast neural networks is developed. The frequency domain cross correlation is modified in order to compensate for the symmetric condition which is required by the input image. Two new ideas are introduced to modify the cross correlation algorithm. Both methods accelerate the speed of the fast neural networks as there is no need for converting the input image into symmetric one as previous. Theoretical and practical results show that both approaches provide faster speed up ratio than the previous algorithm.

An Optimal Feature Subset Selection for Leaf Analysis

This paper describes an optimal approach for feature subset selection to classify the leaves based on Genetic Algorithm (GA) and Kernel Based Principle Component Analysis (KPCA). Due to high complexity in the selection of the optimal features, the classification has become a critical task to analyse the leaf image data. Initially the shape, texture and colour features are extracted from the leaf images. These extracted features are optimized through the separate functioning of GA and KPCA. This approach performs an intersection operation over the subsets obtained from the optimization process. Finally, the most common matching subset is forwarded to train the Support Vector Machine (SVM). Our experimental results successfully prove that the application of GA and KPCA for feature subset selection using SVM as a classifier is computationally effective and improves the accuracy of the classifier.

Array Data Transformation for Source Code Obfuscation

Obfuscation is a low cost software protection methodology to avoid reverse engineering and re engineering of applications. Source code obfuscation aims in obscuring the source code to hide the functionality of the codes. This paper proposes an Array data transformation in order to obfuscate the source code which uses arrays. The applications using the proposed data structures force the programmer to obscure the logic manually. It makes the developed obscured codes hard to reverse engineer and also protects the functionality of the codes.

Detecting Interactions between Behavioral Requirements with OWL and SWRL

High quality requirements analysis is one of the most crucial activities to ensure the success of a software project, so that requirements verification for software system becomes more and more important in Requirements Engineering (RE) and it is one of the most helpful strategies for improving the quality of software system. Related works show that requirement elicitation and analysis can be facilitated by ontological approaches and semantic web technologies. In this paper, we proposed a hybrid method which aims to verify requirements with structural and formal semantics to detect interactions. The proposed method is twofold: one is for modeling requirements with the semantic web language OWL, to construct a semantic context; the other is a set of interaction detection rules which are derived from scenario-based analysis and represented with semantic web rule language (SWRL). SWRL based rules are working with rule engines like Jess to reason in semantic context for requirements thus to detect interactions. The benefits of the proposed method lie in three aspects: the method (i) provides systematic steps for modeling requirements with an ontological approach, (ii) offers synergy of requirements elicitation and domain engineering for knowledge sharing, and (3)the proposed rules can systematically assist in requirements interaction detection.

Wood Species Recognition System

The proposed system identifies the species of the wood using the textural features present in its barks. Each species of a wood has its own unique patterns in its bark, which enabled the proposed system to identify it accurately. Automatic wood recognition system has not yet been well established mainly due to lack of research in this area and the difficulty in obtaining the wood database. In our work, a wood recognition system has been designed based on pre-processing techniques, feature extraction and by correlating the features of those wood species for their classification. Texture classification is a problem that has been studied and tested using different methods due to its valuable usage in various pattern recognition problems, such as wood recognition, rock classification. The most popular technique used for the textural classification is Gray-level Co-occurrence Matrices (GLCM). The features from the enhanced images are thus extracted using the GLCM is correlated, which determines the classification between the various wood species. The result thus obtained shows a high rate of recognition accuracy proving that the techniques used in suitable to be implemented for commercial purposes.

Local Image Descriptor using VQ-SIFT for Image Retrieval

In this paper, we present local image descriptor using VQ-SIFT for more effective and efficient image retrieval. Instead of SIFT's weighted orientation histograms, we apply vector quantization (VQ) histogram as an alternate representation for SIFT features. Experimental results show that SIFT features using VQ-based local descriptors can achieve better image retrieval accuracy than the conventional algorithm while the computational cost is significantly reduced.

The Role of Contextual Ontologies in Enterprise Modeling

Information sharing and exchange, rather than information processing, is what characterizes information technology in the 21st century. Ontologies, as shared common understanding, gain increasing attention, as they appear as the most promising solution to enable information sharing both at a semantic level and in a machine-processable way. Domain Ontology-based modeling has been exploited to provide shareability and information exchange among diversified, heterogeneous applications of enterprises. Contextual ontologies are “an explicit specification of contextual conceptualization". That is: ontology is characterized by concepts that have multiple representations and they may exist in several contexts. Hence, contextual ontologies are a set of concepts and relationships, which are seen from different perspectives. Contextualization is to allow for ontologies to be partitioned according to their contexts. The need for contextual ontologies in enterprise modeling has become crucial due to the nature of today's competitive market. Information resources in enterprise is distributed and diversified and is in need to be shared and communicated locally through the intranet and globally though the internet. This paper discusses the roles that ontologies play in an enterprise modeling, and how ontologies assist in building a conceptual model in order to provide communicative and interoperable information systems. The issue of enterprise modeling based on contextual domain ontology is also investigated, and a framework is proposed for an enterprise model that consists of various applications.

Adaptive Car Safety System

Car accident is one of the major causes of death in many countries. Many researchers have attempted to design and develop techniques to increase car safety in the past recent years. In spite of all the efforts, it is still challenging to design a system adaptive to the driver rather than the automotive characteristics. In this paper, the adaptive car safety system is explained which attempts to find a balance.

Binary Classification Tree with Tuned Observation-based Clustering

There are several approaches for handling multiclass classification. Aside from one-against-one (OAO) and one-against-all (OAA), hierarchical classification technique is also commonly used. A binary classification tree is a hierarchical classification structure that breaks down a k-class problem into binary sub-problems, each solved by a binary classifier. In each node, a set of classes is divided into two subsets. A good class partition should be able to group similar classes together. Many algorithms measure similarity in term of distance between class centroids. Classes are grouped together by a clustering algorithm when distances between their centroids are small. In this paper, we present a binary classification tree with tuned observation-based clustering (BCT-TOB) that finds a class partition by performing clustering on observations instead of class centroids. A merging step is introduced to merge any insignificant class split. The experiment shows that performance of BCT-TOB is comparable to other algorithms.

Designing Ontology-Based Knowledge Integration for Preprocessing of Medical Data in Enhancing a Machine Learning System for Coding Assignment of a Multi-Label Medical Text

This paper discusses the designing of knowledge integration of clinical information extracted from distributed medical ontologies in order to ameliorate a machine learning-based multilabel coding assignment system. The proposed approach is implemented using a decision tree technique of the machine learning on the university hospital data for patients with Coronary Heart Disease (CHD). The preliminary results obtained show a satisfactory finding that the use of medical ontologies improves the overall system performance.

Multiple-Level Sequential Pattern Discovery from Customer Transaction Databases

Mining sequential patterns from large customer transaction databases has been recognized as a key research topic in database systems. However, the previous works more focused on mining sequential patterns at a single concept level. In this study, we introduced concept hierarchies into this problem and present several algorithms for discovering multiple-level sequential patterns based on the hierarchies. An experiment was conducted to assess the performance of the proposed algorithms. The performances of the algorithms were measured by the relative time spent on completing the mining tasks on two different datasets. The experimental results showed that the performance depends on the characteristics of the datasets and the pre-defined threshold of minimal support for each level of the concept hierarchy. Based on the experimental results, some suggestions were also given for how to select appropriate algorithm for a certain datasets.

Network Based High Performance Computing

In the past few years there is a change in the view of high performance applications and parallel computing. Initially such applications were targeted towards dedicated parallel machines. Recently trend is changing towards building meta-applications composed of several modules that exploit heterogeneous platforms and employ hybrid forms of parallelism. The aim of this paper is to propose a model of virtual parallel computing. Virtual parallel computing system provides a flexible object oriented software framework that makes it easy for programmers to write various parallel applications.

Analyzing Methods of the Relation between Concepts based on a Concept Hierarchy

Data objects are usually organized hierarchically, and the relations between them are analyzed based on a corresponding concept hierarchy. The relation between data objects, for example how similar they are, are usually analyzed based on the conceptual distance in the hierarchy. If a node is an ancestor of another node, it is enough to analyze how close they are by calculating the distance vertically. However, if there is not such relation between two nodes, the vertical distance cannot express their relation explicitly. This paper tries to fill this gap by improving the analysis method for data objects based on hierarchy. The contributions of this paper include: (1) proposing an improved method to evaluate the vertical distance between concepts; (2) defining the concept horizontal distance and a method to calculate the horizontal distance; and (3) discussing the methods to confine a range by the horizontal distance and the vertical distance, and evaluating the relation between concepts.

Learning Classifier Systems Approach for Automated Discovery of Censored Production Rules

In the recent past Learning Classifier Systems have been successfully used for data mining. Learning Classifier System (LCS) is basically a machine learning technique which combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. All LCSs models more or less, comprise four main components; a finite population of condition–action rules, called classifiers; the performance component, which governs the interaction with the environment; the credit assignment component, which distributes the reward received from the environment to the classifiers accountable for the rewards obtained; the discovery component, which is responsible for discovering better rules and improving existing ones through a genetic algorithm. The concatenate of the production rules in the LCS form the genotype, and therefore the GA should operate on a population of classifier systems. This approach is known as the 'Pittsburgh' Classifier Systems. Other LCS that perform their GA at the rule level within a population are known as 'Mitchigan' Classifier Systems. The most predominant representation of the discovered knowledge is the standard production rules (PRs) in the form of IF P THEN D. The PRs, however, are unable to handle exceptions and do not exhibit variable precision. The Censored Production Rules (CPRs), an extension of PRs, were proposed by Michalski and Winston that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: IF P THEN D UNLESS C, where Censor C is an exception to the rule. Such rules are employed in situations, in which conditional statement IF P THEN D holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence are tight or there is simply no information available as to whether it holds or not. Thus, the IF P THEN D part of CPR expresses important information, while the UNLESS C part acts only as a switch and changes the polarity of D to ~D. In this paper Pittsburgh style LCSs approach is used for automated discovery of CPRs. An appropriate encoding scheme is suggested to represent a chromosome consisting of fixed size set of CPRs. Suitable genetic operators are designed for the set of CPRs and individual CPRs and also appropriate fitness function is proposed that incorporates basic constraints on CPR. Experimental results are presented to demonstrate the performance of the proposed learning classifier system.