Mining Frequent Patterns with Functional Programming

Frequent patterns are patterns such as sets of features or items that appear in data frequently. Finding such frequent patterns has become an important data mining task because it reveals associations, correlations, and many other interesting relationships hidden in a dataset. Most of the proposed frequent pattern mining algorithms have been implemented with imperative programming languages such as C, Cµ, Java. The imperative paradigm is significantly inefficient when itemset is large and the frequent pattern is long. We suggest a high-level declarative style of programming using a functional language. Our supposition is that the problem of frequent pattern discovery can be efficiently and concisely implemented via a functional paradigm since pattern matching is a fundamental feature supported by most functional languages. Our frequent pattern mining implementation using the Haskell language confirms our hypothesis about conciseness of the program. The performance studies on speed and memory usage support our intuition on efficiency of functional language.

Meta-Search in Human Resource Management

In the area of Human Resource Management, the trend is towards online exchange of information about human resources. For example, online applications for employment become standard and job offerings are posted in many job portals. However, there are too many job portals to monitor all of them if someone is interested in a new job. We developed a prototype for integrating information of different job portals into one meta-search engine. First, existing job portals were investigated and XML schema documents were derived automated from these portals. Second, translation rules for transforming each schema to a central HR-XML-conform schema were determined. The HR-XML-schema is used to build a form for searching jobs. The data supplied by a user in this form is now translated into queries for the different job portals. Each result obtained by a job portal is sent to the meta-search engine that ranks the result of all received job offers according to user's preferences.

An Improved Preprocessing for Biosonar Target Classification

An improved processing description to be employed in biosonar signal processing in a cochlea model is proposed and examined. It is compared to conventional models using a modified discrimination analysis and both are tested. Their performances are evaluated with echo data captured from natural targets (trees).Results indicate that the phase characteristics of low-pass filters employed in the echo processing have a significant effect on class separability for this data.

Simulation of Robotic Arm using Genetic Algorithm and AHP

In this paper, we have proposed a low cost optimized solution for the movement of a three-arm manipulator using Genetic Algorithm (GA) and Analytical Hierarchy Process (AHP). A scheme is given for optimizing the movement of robotic arm with the help of Genetic Algorithm so that the minimum energy consumption criteria can be achieved. As compared to Direct Kinematics, Inverse Kinematics evolved two solutions out of which the best-fit solution is selected with the help of Genetic Algorithm and is kept in search space for future use. The Inverse Kinematics, Fitness Value evaluation and Binary Encoding like tasks are simulated and tested. Although, three factors viz. Movement, Friction and Least Settling Time (or Min. Vibration) are used for finding the Fitness Function / Fitness Values, however some more factors can also be considered.

A Fuzzy Classifier with Evolutionary Design of Ellipsoidal Decision Regions

A fuzzy classifier using multiple ellipsoids approximating decision regions for classification is to be designed in this paper. An algorithm called Gustafson-Kessel algorithm (GKA) with an adaptive distance norm based on covariance matrices of prototype data points is adopted to learn the ellipsoids. GKA is able toadapt the distance norm to the underlying distribution of the prototypedata points except that the sizes of ellipsoids need to be determined a priori. To overcome GKA's inability to determine appropriate size ofellipsoid, the genetic algorithm (GA) is applied to learn the size ofellipsoid. With GA combined with GKA, it will be shown in this paper that the proposed method outperforms the benchmark algorithms as well as algorithms in the field.

On the Noise Distance in Robust Fuzzy C-Means

In the last decades, a number of robust fuzzy clustering algorithms have been proposed to partition data sets affected by noise and outliers. Robust fuzzy C-means (robust-FCM) is certainly one of the most known among these algorithms. In robust-FCM, noise is modeled as a separate cluster and is characterized by a prototype that has a constant distance δ from all data points. Distance δ determines the boundary of the noise cluster and therefore is a critical parameter of the algorithm. Though some approaches have been proposed to automatically determine the most suitable δ for the specific application, up to today an efficient and fully satisfactory solution does not exist. The aim of this paper is to propose a novel method to compute the optimal δ based on the analysis of the distribution of the percentage of objects assigned to the noise cluster in repeated executions of the robust-FCM with decreasing values of δ . The extremely encouraging results obtained on some data sets found in the literature are shown and discussed.

Modeling of Pulping of Sugar Maple Using Advanced Neural Network Learning

This paper reports work done to improve the modeling of complex processes when only small experimental data sets are available. Neural networks are used to capture the nonlinear underlying phenomena contained in the data set and to partly eliminate the burden of having to specify completely the structure of the model. Two different types of neural networks were used for the application of Pulping of Sugar Maple problem. A three layer feed forward neural networks, using the Preconditioned Conjugate Gradient (PCG) methods were used in this investigation. Preconditioning is a method to improve convergence by lowering the condition number and increasing the eigenvalues clustering. The idea is to solve the modified problem where M is a positive-definite preconditioner that is closely related to A. We mainly focused on Preconditioned Conjugate Gradient- based training methods which originated from optimization theory, namely Preconditioned Conjugate Gradient with Fletcher-Reeves Update (PCGF), Preconditioned Conjugate Gradient with Polak-Ribiere Update (PCGP) and Preconditioned Conjugate Gradient with Powell-Beale Restarts (PCGB). The behavior of the PCG methods in the simulations proved to be robust against phenomenon such as oscillations due to large step size.

Moving Data Mining Tools toward a Business Intelligence System

Data mining (DM) is the process of finding and extracting frequent patterns that can describe the data, or predict unknown or future values. These goals are achieved by using various learning algorithms. Each algorithm may produce a mining result completely different from the others. Some algorithms may find millions of patterns. It is thus the difficult job for data analysts to select appropriate models and interpret the discovered knowledge. In this paper, we describe a framework of an intelligent and complete data mining system called SUT-Miner. Our system is comprised of a full complement of major DM algorithms, pre-DM and post-DM functionalities. It is the post-DM packages that ease the DM deployment for business intelligence applications.

Software Test Data Generation using Ant Colony Optimization

State-based testing is frequently used in software testing. Test data generation is one of the key issues in software testing. A properly generated test suite may not only locate the errors in a software system, but also help in reducing the high cost associated with software testing. It is often desired that test data in the form of test sequences within a test suite can be automatically generated to achieve required test coverage. This paper proposes an Ant Colony Optimization approach to test data generation for the state-based software testing.

AudioMine: Medical Data Mining in Heterogeneous Audiology Records

We report on the results of a pilot study in which a data-mining tool was developed for mining audiology records. The records were heterogeneous in that they contained numeric, category and textual data. The tools developed are designed to observe associations between any field in the records and any other field. The techniques employed were the statistical chi-squared test, and the use of self-organizing maps, an unsupervised neural learning approach.

Sensing Pressure for Authentication System Using Keystroke Dynamics

In this paper, an authentication system using keystroke dynamics is presented. We introduced pressure sensing for the improvement of the accuracy of measurement and durability against intrusion using key-logger, and so on, however additional instrument is needed. As the result, it has been found that the pressure sensing is also effective for estimation of real moment of keystroke.

Combining ILP with Semi-supervised Learning for Web Page Categorization

This paper presents a semi-supervised learning algorithm called Iterative-Cross Training (ICT) to solve the Web pages classification problems. We apply Inductive logic programming (ILP) as a strong learner in ICT. The objective of this research is to evaluate the potential of the strong learner in order to boost the performance of the weak learner of ICT. We compare the result with the supervised Naive Bayes, which is the well-known algorithm for the text classification problem. The performance of our learning algorithm is also compare with other semi-supervised learning algorithms which are Co-Training and EM. The experimental results show that ICT algorithm outperforms those algorithms and the performance of the weak learner can be enhanced by ILP system.

Adaptive Car Safety System

Car accident is one of the major causes of death in many countries. Many researchers have attempted to design and develop techniques to increase car safety in the past recent years. In spite of all the efforts, it is still challenging to design a system adaptive to the driver rather than the automotive characteristics. In this paper, the adaptive car safety system is explained which attempts to find a balance.

A Unified Framework for a Robust Conflict-Free Robot Navigation

Many environment specific methods and systems for Robot Navigation exist. However vast strides in the evolution of navigation technologies and system techniques create the need for a general unified framework that is scalable, modular and dynamic. In this paper a Unified Framework for a Robust Conflict-free Robot Navigation System that can be used for either a structured or unstructured and indoor or outdoor environments has been proposed. The fundamental design aspects and implementation issues encountered during the development of the module are discussed. The results of the deployment of three major peripheral modules of the framework namely the GSM based communication module, GIS Module and GPS module are reported in this paper.

Self-Organization of Clusters Having Locally Distributed Patterns for Highly Synchronized Inputs

Many experimental results suggest that more precise spike timing is significant in neural information processing. We construct a self-organization model using the spatiotemporal pat-terns, where Spike-Timing Dependent Plasticity (STDP) tunes the conduction delays between neurons. We show that, for highly syn-chronized inputs, the fluctuation of conduction delays causes globally continuous and locally distributed firing patterns through the self-organization.

An Evaluation of Algorithms for Single-Echo Biosonar Target Classification

A recent neurospiking coding scheme for feature extraction from biosonar echoes of various plants is examined with avariety of stochastic classifiers. Feature vectors derived are employedin well-known stochastic classifiers, including nearest-neighborhood,single Gaussian and a Gaussian mixture with EM optimization.Classifiers' performances are evaluated by using cross-validation and bootstrapping techniques. It is shown that the various classifers perform equivalently and that the modified preprocessing configuration yields considerably improved results.