DIFFER: A Propositionalization approach for Learning from Structured Data

Logic based methods for learning from structured data is limited w.r.t. handling large search spaces, preventing large-sized substructures from being considered by the resulting classifiers. A novel approach to learning from structured data is introduced that employs a structure transformation method, called finger printing, for addressing these limitations. The method, which generates features corresponding to arbitrarily complex substructures, is implemented in a system, called DIFFER. The method is demonstrated to perform comparably to an existing state-of-art method on some benchmark data sets without requiring restrictions on the search space. Furthermore, learning from the union of features generated by finger printing and the previous method outperforms learning from each individual set of features on all benchmark data sets, demonstrating the benefit of developing complementary, rather than competing, methods for structure classification.




References:
[1] Page, D. and Srinivasan, A., (2003), ILP: A Short Look back and a
Longer Look Forward, Journal of machine learning research,
4(Aug):415-430.
[2] Quinlan, J. R., Cameron-Jones, R. M., (1993), FOIL, Proceedings of the
6th ECML, Lecture Notes in AI, Vol. 667, 3-20. Springer-Verlag
[3] Muggleton S.H. and Feng C. (1990). Efficient induction of logic
programs, Proceedings of the First Conference on Algorithmic Learning
Theory, Tokyo.
[4] Srinivasan A,. King, R.D, and Muggleton S, (1999), The role of
background knowledge: using a problem from chemistry to examine the
performance of an ILP program, Technical Report PRG-TR-08-99,
Oxford University.
[5] Krogel, M-A., Rawles, S., Železn├¢, F., Flach, P. A., Lavra─ì, N., and
Wrobel, S., (2003), Comparative evaluation of approaches to
propositionalization, Proc.of the 13th International Conference on ILP,
Lecture Notes in CS, 197-214.
[6] Lavrac, N. and Flach P., (2000), "An extended transformation approach
to Inductive Logic Programming", University publication, University of
Bristol.
[7] Nattee, C., Sinthupinyo, S., Numao, M., Okada, T., (2005), Inductive
Logic Programming for Structure-Activity Relationship Studies on
Large Scale Data, SAINT Workshops 2005: 332-335.
[8] Inokuchi, A., Washio, T., and Motoda, H., (2003), Complete mining of
frequent patterns from graphs, Mining graph data, Machine Learning,
50:321-354.
[9] Debnath, A.K. Lopez de Compadre, R.L., Debnath, G., Shusterman,
A.J., and Hansch, C. (1991), Structure-activity relationship of mutagenic
aromatic and heteroaromatic nitro compounds: Correlation with
molecular orbital energies and hydrophobicity, Journal Med. Chem.
34:786-797.
[10] Michie,D., Muggleton,S., Page,D., and Srinivasan,A., (1994), "To the
international computing community: A new East-West challenge"
Oxford University Computing laboratory, Oxford, UK.
[11] US National Toxicology program,
http://ntp.niehs.nih.gov/index.cfm?objectid=3 2BA9724-F1F6-975E-
7FCE50709CB4C932
[12] Pearce D.A. (1988). The induction of fault diagnosis systems from
qualitative models, Proceedings 7th National Conference on AI, Saint
Paul, Minnesota.
[13] Ian H. Witten and Eibe Frank (2005) "Data Mining: Practical machine
learning tools and techniques", 2nd Edition, Morgan Kaufmann, San
Francisco, 2005.
[14] Gonzalez, J., Holder, L. B. and Cook, D. J. (2001), "Application of
Graph-Based Concept Learning to the Predictive Toxicology Domain",
Proceedings of the Predictive Toxicology Challenge Workshop.
[15] Bringmann, B., and Zimmermann, A., (2005), "Tree - Decision Trees for
Tree Structured Data", Proceedings of PKDD 2005, LNAI 3721, pp. 46-
58, Springer.