A Hybrid Approach for Quantification of Novelty in Rule Discovery

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules lead to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach that uses objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules. We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are quite promising.





References:
[1] A. S. Al-Hegami, " Subjective Measures and their Role in Data Mining
Process ", In Proceedings of the 6th International Conference on
Cognitive Systems, New Delhi, India, 2004.
[2] A. S. Al-Hegami, V. Bhatnagar, and N. Kumar, " Novelty Framework
for Knowledge Discovery in Databases ", In Proceedings of 6th
International Conference on Data Warehousing and Knowledge
Discovery (DaWaK 2004), Spain, 2004.
[3] A. S. Al-Hegami, " Interestingness Measures of KDD : A Comparative
Analysis ", In Proceedings of the 11th International Conference on
Concurrent Engineering: Research and Applications, China, 2004.
[4] B. Padmanabhan and A. Tuzhilin, " Unexpectedness as a Measure of
Interestingness in Knowledge Discovery ", Working paper # IS-97-6,
Dept. of Information Systems, Stern School of Business, NYU, 1997.
[5] J. Han, and M. Kamber, "Data Mining: Concepts and Techniques", 1st
Edition, Harcourt India Private Limited. 2001.
[6] M. H. Dunham, " Data Mining: Introductory and Advanced Topics ",1st
Edition, Pearson Education (Singaphore) Pte. Ltd., 2003.
[7] G. Piateskey-Shapiro, and C. J. Matheus, "The Interestingness of
Deviations", In Proceedings of AAAI Workshop on Knowledge
Discovery in Databases, 1994.
[8] S. Basu, R. J. Mooney, K. V. Pasupuleti, and J. Ghosh, "Using Lexical
Knowledge to Evaluate the Novelty of Rules Mined from Text ", In
Proceedings of the NAACL workshop and other Lexical Resources:
Applications, Extensions and Customizations, 2001.
[9] A. Silberschatz and A.Tuzhilin, "On Subjective Measures of
Interestingness in Knowledge Discovery", In Proceedings of the 1st
International Conference on Knowledge Discovery and Data Mining.
1995.
[10] B. Liu, W. Hsu, and S. Chen, " Using General Impressions to Analyse
Discovered Classification Rules ", In Proceedings of the 3rd
International Conférence on Knowledge Discovery and Data Mining
(KDD 97), 1997.
[11] T. Kohonen, " Self-Organization and Associative Memory ", 3rd
Edition, Springer, Berlin. 1993.
[12] A. Silberschatz and A. Tuzhilin, "What Makes Patterns Interesting in
Knowledge Discovery Systems ", IEEE Transactions on Knowledge
and Data Engineering. V.5, No.6. 1996.
[13] B. Liu and W. Hsu, " Post Analysis of Learned Rules ", In Proceedings
of the 13th National Conférence on AI(AAAI'96), 1996.
[14] S. Marsland, " On-Line Novelty Detection Through Self-Organization,
with Application to Robotics ", Ph.D. Thesis, Department of Computer
Science, University of Manchester, 2001.
[15] N. Japkowicz , C. Myers, and M. Gluck, " A Novelty Detection
Approach to Classification", In Proceedings of the 14th International
Joint Conference on Artificial Intelligence, 1995.
[16] S. Roberts, and L. Tarassenko, "A Probabilistic Resource Allocation
Network for Novelty Detection", In Neural Computation, 6(2), 1994
[17] A. Ypma, and R. Duin, "Novelty Detection Using Self-Organizing
Maps", In Progress in Connectionist-Based Information Systems.
Volume 2, 1997.
[18] http://kdd.ics.uci.edu/
[19] http://www.comp.nus.edu.sg/~dm2/p_download.html