Model Discovery and Validation for the Qsar Problem using Association Rule Mining

There are several approaches in trying to solve the Quantitative 1Structure-Activity Relationship (QSAR) problem. These approaches are based either on statistical methods or on predictive data mining. Among the statistical methods, one should consider regression analysis, pattern recognition (such as cluster analysis, factor analysis and principal components analysis) or partial least squares. Predictive data mining techniques use either neural networks, or genetic programming, or neuro-fuzzy knowledge. These approaches have a low explanatory capability or non at all. This paper attempts to establish a new approach in solving QSAR problems using descriptive data mining. This way, the relationship between the chemical properties and the activity of a substance would be comprehensibly modeled.




References:
[1] Agrawal, R., Imielinski, T. and Swami (1993) "Mining association
rules between sets of items in large databases", in Proceedings of
1993 ACM SIGMOD International Conference on Management of
Data, Washington D.C., pp. 207-216.
[2] Deshpande, M., Kuramochi, M., Wale, N. and George Karypis, G.,
(2005) "Frequent Substructure-Based Approaches for Classifying
Chemical Compounds" in IEEE Transaction on Knowledge and Data
Engineering, Vol 17(8): 1036-1050
[3] Dumitriu, L., (2002) "Interactive mining and knowledge reuse for the
closed-itemset incremental-mining problem", Newsletter of the ACM
Special Interest Group on Knowledge Discovery and Data Mining,
ed. U. Fayyad, Vol 3:2, pp. 28-36, ian. 2002, http://www.acm.org/
sigkdd/ explorations.
[4] Langdon, W. B. and Barrett, S. J., (2004) "Genetic Programming in
Data Mining for Drug Discovery", in Evolutionary Computing in
Data Mining, Springer, 2004, Ashish Ghosh and Lakhmi C. Jain, 163,
Studies in Fuzziness and Soft Computing, 10, ISBN 3-540-22370-3,
pp. 211--235.
[5] Neagu, C.D., Benfenati, E., Gini, G., Mazzatorta, P., Roncaglioni, A.,
(2002) "Neuro-Fuzzy Knowledge Representation for Toxicity
Prediction of Organic Compounds", in Proceedings of the 15th
European Conference on Artificial Intelligence, Frank van Harmelen
(Ed.):, ECAI'2002, Lyon, France, July 2002. IOS Press 2002: pp.
498-502
[6] Wang, Z., Durst, G., Eberhart, R., Boyd, D., Ben-Miled, Z., (2004)
"Particle Swarm Optimization and Neural Network Application for
QSAR", in the Proceedings of the 18th International Parallel and
Distributed Processing Symposium (IPDPS 2004), 26-30 April 2004,
Santa Fe, New Mexico, USA. IEEE Computer Society 2004, ISBN 0-
7695-2132-0.
[7] Wille, R. (1982) "Restructuring lattice theory: an approach based on
hierarchies of concepts", in Ordered Sets, Proceedings of NATO
Advanced Study Institute, D. Reidel Publisher Co., pp. 445-470.
[8] Zaki, M.J. and Ogihara, M. (1998) "Theoretical Foundations of
Association Rules", in Proceedings of the 3rd SIGMOD-98
Workshop on DMKD, Seattle, WA, pp 7:1-7:8.