A Cumulative Learning Approach to Data Mining Employing Censored Production Rules (CPRs)

Knowledge is indispensable but voluminous knowledge becomes a bottleneck for efficient processing. A great challenge for data mining activity is the generation of large number of potential rules as a result of mining process. In fact sometimes result size is comparable to the original data. Traditional data mining pruning activities such as support do not sufficiently reduce the huge rule space. Moreover, many practical applications are characterized by continual change of data and knowledge, thereby making knowledge voluminous with each change. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. Michalski & Winston proposed Censored Production Rules (CPRs), as an extension of production rules, that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence, are tight or there is simply no information available as to whether it holds or not. Thus the 'If P Then D' part of the CPR expresses important information while the Unless C part acts only as a switch changes the polarity of D to ~D. In this paper a scheme based on Dempster-Shafer Theory (DST) interpretation of a CPR is suggested for discovering CPRs from the discovered flat PRs. The discovery of CPRs from flat rules would result in considerable reduction of the already discovered rules. The proposed scheme incrementally incorporates new knowledge and also reduces the size of knowledge base considerably with each episode. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested cumulative learning scheme would be useful in mining data streams.





References:
[1] Han, J., Kamber, M. "Data mining: Concepts and Techniques"
Academic Press (2001).
[2] Bharadwaj, K.K., Jain, N.K.: Hierarchical Censored Production
Rules (HCPRs) System, Data and Knowledge Engineering, vol.8
(North Holland), 1992.
[3] Bharadwaj, K.K., Neerja, Goel, G.C.: Hierarchical Censored
Production Rules (HCPRs) Systems Employing the Dampster-Shafer
Uncertainty Calculus, Information and Software technology,
Butterworth-Heinemann Ltd. (U.K.) Vol. 36 No., 155-164, 1994.
[4] Jain, N.K. ,Bharadwaj, K.K.,: Some Learning Techniques in
Hierarchical Censored Production Rules( HCPRs) System,
International Journal of intelligent systems, John Wiley & sons,
Inc.,vol. 13,pp 319-344, 1997.
[5] Quinlan, J.R. (1986): Induction of Decision tress: Machine
learning;1(1);81-106,1986.
[6] Adriaan, P., Zantingre, D. "Data Mining", Addison Wesley, 1999.
[7] Michalski, R.S., Winston, P.H., Variable Precision Logic, Artificial
intelligence,29,121-146,1986.
[8] Jain,N.K., Bharadwaj K.K. and, Norian Marrengallo " Extended
Hierarchical Censored Production Rules System", vol. 9, no 3-4,
journal of Intelligence Systems, UK ,1999.
[9] Ananthanarayana, V.S., Murty, M.N., Subramanian, D.K.: Dynamic
Data Mining, Proceedings of the International Conference, KBCS-
2002.
[10] Sebastian Thrun, Christos Faloutsos, Tom Mitchell, Larry
Wasserman: Automated Learning and Discovery: State-Of-The Art
and Research Topics in a Rapidly Growing Field, CMU_CALD-98-
100, September 1998.
[11] Ryszard S. Michalski, Pavel Brazdil: Introduction, Special Issue on
Multistrategy learning, Machine Learning, vol 50, pp 219-222, 2003.
[12] Bing Liu , Minqing Hu and Wynne Hsu, "Intuitive Representation of
Decision Trees Using General Rules and Exceptions" American
Association for Artificial Intelligence,2000.
[13] Nikola K.Kasabov. "Foundation of Neural Networks, Fuzzy systems,
and Knowledge Engineering" The MIT Press (2001).
[14] Brian Babcock, Shivnath Babu, Mayur data, Rajeev Motwani, and
Jennifer Widom: Models and Issues in data Stream Systems,
Proceeding of 21st ACM Symposium on Principles of Database
Systems (PODS 2002).
[15] Guozhu Dong, Jiawei Han, laks V.S. Lakshmanan, Jian Pei, Haixun
Wang, Philip S. Yu: Online Mining of changes from data Streams:
Research Problems and Preliminary Results, In Proceedings of the
2003 ACM SIGMOID Workshop on Management and Processing of
data Streams.