Genetic Programming Approach to Hierarchical Production Rule Discovery

Automated discovery of hierarchical structures in large data sets has been an active research area in the recent past. This paper focuses on the issue of mining generalized rules with crisp hierarchical structure using Genetic Programming (GP) approach to knowledge discovery. The post-processing scheme presented in this work uses flat rules as initial individuals of GP and discovers hierarchical structure. Suitable genetic operators are proposed for the suggested encoding. Based on the Subsumption Matrix(SM), an appropriate fitness function is suggested. Finally, Hierarchical Production Rules (HPRs) are generated from the discovered hierarchy. Experimental results are presented to demonstrate the performance of the proposed algorithm.




References:
[1] K. K. Bharadwaj and R. Varshneya, "Parallelization of hierarchical
censored production rules," Information and Software Technology, 37,
1995, pp.453-460.
[2] K. K. Bharadwaj and N. K. Jain, "Hierarchical censored production
rules (HCPRs) systems," Data and Knowledge Engineering, North
Holland, vol. 8, 1992, pp.19-34.
[3] S. Levachkine and A. Guzman-Arenas, "Hierarchies measuring
qualitative variables," Springer-Verlag Berlin Heidelberg 2004, A.
Gelbukh (Ed.):CICLing 2004,2004,pp.262-274.
[4] J. Han, and Y. FU, "Dynamic generation and refinement of concept
hierarchies for knowledge discovery in databases," AAAI-94 Workshop
Knowledge in Databases (KDD-94), Seattle, WA, July 1994, pp. 157-
168.
[5] H. Surynato and P. Compton, "Learning classification taxonomies from
a classification knowledge based system," Proceedings the First
Workshop on Ontology Learning in Conjunction with ECAI-2000,
Berlin, pp.1-6.
[6] B. Liu, M. Hu, and W. Hsu, "Multi-level organization and
summarization of the discovered rules," Boston, USA, SIGKDD-2000,
Aug 20-23, 2000.
[7] D. Richards and U. Malik, "Multi-level rule discovery from
propositional knowledge bases," International Workshop on Knowledge
Discovery in Multimedia and Complex Data (KDMCD-02), Taipei,
Taiwan, May 2002, pp.11-19.
[8] R. Srikant, Q. Vu and R. Agrawal, "Mining association rules with item
constraints," in Proc of the 3rd International Conf on Knowledge
Discovery and Data Mining (KDD-97), 1997, pp.67-73.
[9] M. Suan, "Semi-Automatic taxonomy for efficient information
searching," Proceeding of the 2nd International Conference on
Information Technology for Application (ICITA-2004), 2004.
[10] J. R. Koza, "Genetic programming: on the programming of computers
by means of natural selection," MIT Press, 1994.
[11] A. A. Freitas, "A survey of evolutionary algorithms for data mining and
knowledge discovery," In: A. Ghosh, and S. Tsutsui (Eds.) Advances in
Evolutionary Computation, Springer-Verlag, 2002.
[12] I. De Falco, A. Della Cioppa, and E. Tarantiono, "Discovering
interesting classification rules with genetic programming," Applied Soft
Computing, 1, 2002, pp.257-269.
[13] M. V. Fidelis, H. S. Lopes, and A. A. Freitas, "Discovering
comprehensible classification rules with a genetic algorithm," Proc.
Congress on Evolutionary Computation-2000 (CEC-2000), La Jolla, CA,
USA,IEEE, July 2000, pp.805-810.
[14] C. C. Bojarczuk, H. S. Lopes, and A. A. Freitas, " Genetic programming
for knowledge discovery in chest pain diagnosis," IEEE Engineering in
Medical and Biology magazine-special issue on data mining and
knowledge discovery, 19(4), July/Aug 2000,pp.38-44.
[15] M. C. J. Bot and W. B. Langdon, " Application of genetic programming
to induction of linear classification trees," Genetic Programming:
Proceedings of the 3rd European Conference (EuroCP-2000), Lecture
Notes in Computer Science 1802, Springer, 2000, pp.247-258.
[16] U.M. Fayyad, G.P. Shapiro, and P. Smyth, "The KDD process for
extracting useful knowledge from volumes of data," Communication of
ACM. Nov, 1996, vol. 39 (11), pp.27-34.