DIFFER: A Propositionalization approach for Learning from Structured Data
Logic based methods for learning from structured data
is limited w.r.t. handling large search spaces, preventing large-sized
substructures from being considered by the resulting classifiers. A
novel approach to learning from structured data is introduced that
employs a structure transformation method, called finger printing, for
addressing these limitations. The method, which generates features
corresponding to arbitrarily complex substructures, is implemented in
a system, called DIFFER. The method is demonstrated to perform
comparably to an existing state-of-art method on some benchmark
data sets without requiring restrictions on the search space.
Furthermore, learning from the union of features generated by finger
printing and the previous method outperforms learning from each
individual set of features on all benchmark data sets, demonstrating
the benefit of developing complementary, rather than competing,
methods for structure classification.
[1] Page, D. and Srinivasan, A., (2003), ILP: A Short Look back and a
Longer Look Forward, Journal of machine learning research,
4(Aug):415-430.
[2] Quinlan, J. R., Cameron-Jones, R. M., (1993), FOIL, Proceedings of the
6th ECML, Lecture Notes in AI, Vol. 667, 3-20. Springer-Verlag
[3] Muggleton S.H. and Feng C. (1990). Efficient induction of logic
programs, Proceedings of the First Conference on Algorithmic Learning
Theory, Tokyo.
[4] Srinivasan A,. King, R.D, and Muggleton S, (1999), The role of
background knowledge: using a problem from chemistry to examine the
performance of an ILP program, Technical Report PRG-TR-08-99,
Oxford University.
[5] Krogel, M-A., Rawles, S., Železn├¢, F., Flach, P. A., Lavra─ì, N., and
Wrobel, S., (2003), Comparative evaluation of approaches to
propositionalization, Proc.of the 13th International Conference on ILP,
Lecture Notes in CS, 197-214.
[6] Lavrac, N. and Flach P., (2000), "An extended transformation approach
to Inductive Logic Programming", University publication, University of
Bristol.
[7] Nattee, C., Sinthupinyo, S., Numao, M., Okada, T., (2005), Inductive
Logic Programming for Structure-Activity Relationship Studies on
Large Scale Data, SAINT Workshops 2005: 332-335.
[8] Inokuchi, A., Washio, T., and Motoda, H., (2003), Complete mining of
frequent patterns from graphs, Mining graph data, Machine Learning,
50:321-354.
[9] Debnath, A.K. Lopez de Compadre, R.L., Debnath, G., Shusterman,
A.J., and Hansch, C. (1991), Structure-activity relationship of mutagenic
aromatic and heteroaromatic nitro compounds: Correlation with
molecular orbital energies and hydrophobicity, Journal Med. Chem.
34:786-797.
[10] Michie,D., Muggleton,S., Page,D., and Srinivasan,A., (1994), "To the
international computing community: A new East-West challenge"
Oxford University Computing laboratory, Oxford, UK.
[11] US National Toxicology program,
http://ntp.niehs.nih.gov/index.cfm?objectid=3 2BA9724-F1F6-975E-
7FCE50709CB4C932
[12] Pearce D.A. (1988). The induction of fault diagnosis systems from
qualitative models, Proceedings 7th National Conference on AI, Saint
Paul, Minnesota.
[13] Ian H. Witten and Eibe Frank (2005) "Data Mining: Practical machine
learning tools and techniques", 2nd Edition, Morgan Kaufmann, San
Francisco, 2005.
[14] Gonzalez, J., Holder, L. B. and Cook, D. J. (2001), "Application of
Graph-Based Concept Learning to the Predictive Toxicology Domain",
Proceedings of the Predictive Toxicology Challenge Workshop.
[15] Bringmann, B., and Zimmermann, A., (2005), "Tree - Decision Trees for
Tree Structured Data", Proceedings of PKDD 2005, LNAI 3721, pp. 46-
58, Springer.
[1] Page, D. and Srinivasan, A., (2003), ILP: A Short Look back and a
Longer Look Forward, Journal of machine learning research,
4(Aug):415-430.
[2] Quinlan, J. R., Cameron-Jones, R. M., (1993), FOIL, Proceedings of the
6th ECML, Lecture Notes in AI, Vol. 667, 3-20. Springer-Verlag
[3] Muggleton S.H. and Feng C. (1990). Efficient induction of logic
programs, Proceedings of the First Conference on Algorithmic Learning
Theory, Tokyo.
[4] Srinivasan A,. King, R.D, and Muggleton S, (1999), The role of
background knowledge: using a problem from chemistry to examine the
performance of an ILP program, Technical Report PRG-TR-08-99,
Oxford University.
[5] Krogel, M-A., Rawles, S., Železn├¢, F., Flach, P. A., Lavra─ì, N., and
Wrobel, S., (2003), Comparative evaluation of approaches to
propositionalization, Proc.of the 13th International Conference on ILP,
Lecture Notes in CS, 197-214.
[6] Lavrac, N. and Flach P., (2000), "An extended transformation approach
to Inductive Logic Programming", University publication, University of
Bristol.
[7] Nattee, C., Sinthupinyo, S., Numao, M., Okada, T., (2005), Inductive
Logic Programming for Structure-Activity Relationship Studies on
Large Scale Data, SAINT Workshops 2005: 332-335.
[8] Inokuchi, A., Washio, T., and Motoda, H., (2003), Complete mining of
frequent patterns from graphs, Mining graph data, Machine Learning,
50:321-354.
[9] Debnath, A.K. Lopez de Compadre, R.L., Debnath, G., Shusterman,
A.J., and Hansch, C. (1991), Structure-activity relationship of mutagenic
aromatic and heteroaromatic nitro compounds: Correlation with
molecular orbital energies and hydrophobicity, Journal Med. Chem.
34:786-797.
[10] Michie,D., Muggleton,S., Page,D., and Srinivasan,A., (1994), "To the
international computing community: A new East-West challenge"
Oxford University Computing laboratory, Oxford, UK.
[11] US National Toxicology program,
http://ntp.niehs.nih.gov/index.cfm?objectid=3 2BA9724-F1F6-975E-
7FCE50709CB4C932
[12] Pearce D.A. (1988). The induction of fault diagnosis systems from
qualitative models, Proceedings 7th National Conference on AI, Saint
Paul, Minnesota.
[13] Ian H. Witten and Eibe Frank (2005) "Data Mining: Practical machine
learning tools and techniques", 2nd Edition, Morgan Kaufmann, San
Francisco, 2005.
[14] Gonzalez, J., Holder, L. B. and Cook, D. J. (2001), "Application of
Graph-Based Concept Learning to the Predictive Toxicology Domain",
Proceedings of the Predictive Toxicology Challenge Workshop.
[15] Bringmann, B., and Zimmermann, A., (2005), "Tree - Decision Trees for
Tree Structured Data", Proceedings of PKDD 2005, LNAI 3721, pp. 46-
58, Springer.
@article{"International Journal of Information, Control and Computer Sciences:59913", author = "Thashmee Karunaratne and Henrik Böstrom", title = "DIFFER: A Propositionalization approach for Learning from Structured Data", abstract = "Logic based methods for learning from structured data
is limited w.r.t. handling large search spaces, preventing large-sized
substructures from being considered by the resulting classifiers. A
novel approach to learning from structured data is introduced that
employs a structure transformation method, called finger printing, for
addressing these limitations. The method, which generates features
corresponding to arbitrarily complex substructures, is implemented in
a system, called DIFFER. The method is demonstrated to perform
comparably to an existing state-of-art method on some benchmark
data sets without requiring restrictions on the search space.
Furthermore, learning from the union of features generated by finger
printing and the previous method outperforms learning from each
individual set of features on all benchmark data sets, demonstrating
the benefit of developing complementary, rather than competing,
methods for structure classification.", keywords = "Machine learning, Structure classification,
Propositionalization.", volume = "2", number = "3", pages = "872-3", }