Novelty as a Measure of Interestingness in Knowledge Discovery

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules leads to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach based on both objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules (knowledge). We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are promising.

Clustering Protein Sequences with Tailored General Regression Model Technique

Cluster analysis divides data into groups that are meaningful, useful, or both. Analysis of biological data is creating a new generation of epidemiologic, prognostic, diagnostic and treatment modalities. Clustering of protein sequences is one of the current research topics in the field of computer science. Linear relation is valuable in rule discovery for a given data, such as if value X goes up 1, value Y will go down 3", etc. The classical linear regression models the linear relation of two sequences perfectly. However, if we need to cluster a large repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we propose a new technique named General Regression Model Technique Clustering Algorithm (GRMTCA) to benignly handle the problem of linear sequences clustering. GRMT gives a measure, GR*, to tell the degree of linearity of multiple sequences without having to compare each pair of them.

Genetic Programming Approach to Hierarchical Production Rule Discovery

Automated discovery of hierarchical structures in large data sets has been an active research area in the recent past. This paper focuses on the issue of mining generalized rules with crisp hierarchical structure using Genetic Programming (GP) approach to knowledge discovery. The post-processing scheme presented in this work uses flat rules as initial individuals of GP and discovers hierarchical structure. Suitable genetic operators are proposed for the suggested encoding. Based on the Subsumption Matrix(SM), an appropriate fitness function is suggested. Finally, Hierarchical Production Rules (HPRs) are generated from the discovered hierarchy. Experimental results are presented to demonstrate the performance of the proposed algorithm.

A Hybrid Approach for Quantification of Novelty in Rule Discovery

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules lead to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach that uses objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules. We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are quite promising.

Discovery of Production Rules with Fuzzy Hierarchy

In this paper a novel algorithm is proposed that integrates the process of fuzzy hierarchy generation and rule discovery for automated discovery of Production Rules with Fuzzy Hierarchy (PRFH) in large databases.A concept of frequency matrix (Freq) introduced to summarize large database that helps in minimizing the number of database accesses, identification and removal of irrelevant attribute values and weak classes during the fuzzy hierarchy generation.Experimental results have established the effectiveness of the proposed algorithm.

Analyzing the Relation of Community Group for Research Paper Bookmarking by Using Association Rule

Currently searching through internet is very popular especially in a field of academic. A huge of educational information such as research papers are overload for user. So community-base web sites have been developed to help user search information more easily from process of customizing a web site to need each specifies user or set of user. In this paper propose to use association rule analyze the community group on research paper bookmarking. A set of design goals for community group frameworks is developed and discussed. Additionally Researcher analyzes the initial relation by using association rule discovery between the antecedent and the consequent of a rule in the groups of user for generate the idea to improve ranking search result and development recommender system.