Model Discovery and Validation for the Qsar Problem using Association Rule Mining
There are several approaches in trying to solve the
Quantitative 1Structure-Activity Relationship (QSAR) problem.
These approaches are based either on statistical methods or on
predictive data mining. Among the statistical methods, one should
consider regression analysis, pattern recognition (such as cluster
analysis, factor analysis and principal components analysis) or partial
least squares. Predictive data mining techniques use either neural
networks, or genetic programming, or neuro-fuzzy knowledge. These
approaches have a low explanatory capability or non at all. This
paper attempts to establish a new approach in solving QSAR
problems using descriptive data mining. This way, the relationship
between the chemical properties and the activity of a substance
would be comprehensibly modeled.
[1] Agrawal, R., Imielinski, T. and Swami (1993) "Mining association
rules between sets of items in large databases", in Proceedings of
1993 ACM SIGMOD International Conference on Management of
Data, Washington D.C., pp. 207-216.
[2] Deshpande, M., Kuramochi, M., Wale, N. and George Karypis, G.,
(2005) "Frequent Substructure-Based Approaches for Classifying
Chemical Compounds" in IEEE Transaction on Knowledge and Data
Engineering, Vol 17(8): 1036-1050
[3] Dumitriu, L., (2002) "Interactive mining and knowledge reuse for the
closed-itemset incremental-mining problem", Newsletter of the ACM
Special Interest Group on Knowledge Discovery and Data Mining,
ed. U. Fayyad, Vol 3:2, pp. 28-36, ian. 2002, http://www.acm.org/
sigkdd/ explorations.
[4] Langdon, W. B. and Barrett, S. J., (2004) "Genetic Programming in
Data Mining for Drug Discovery", in Evolutionary Computing in
Data Mining, Springer, 2004, Ashish Ghosh and Lakhmi C. Jain, 163,
Studies in Fuzziness and Soft Computing, 10, ISBN 3-540-22370-3,
pp. 211--235.
[5] Neagu, C.D., Benfenati, E., Gini, G., Mazzatorta, P., Roncaglioni, A.,
(2002) "Neuro-Fuzzy Knowledge Representation for Toxicity
Prediction of Organic Compounds", in Proceedings of the 15th
European Conference on Artificial Intelligence, Frank van Harmelen
(Ed.):, ECAI'2002, Lyon, France, July 2002. IOS Press 2002: pp.
498-502
[6] Wang, Z., Durst, G., Eberhart, R., Boyd, D., Ben-Miled, Z., (2004)
"Particle Swarm Optimization and Neural Network Application for
QSAR", in the Proceedings of the 18th International Parallel and
Distributed Processing Symposium (IPDPS 2004), 26-30 April 2004,
Santa Fe, New Mexico, USA. IEEE Computer Society 2004, ISBN 0-
7695-2132-0.
[7] Wille, R. (1982) "Restructuring lattice theory: an approach based on
hierarchies of concepts", in Ordered Sets, Proceedings of NATO
Advanced Study Institute, D. Reidel Publisher Co., pp. 445-470.
[8] Zaki, M.J. and Ogihara, M. (1998) "Theoretical Foundations of
Association Rules", in Proceedings of the 3rd SIGMOD-98
Workshop on DMKD, Seattle, WA, pp 7:1-7:8.
[1] Agrawal, R., Imielinski, T. and Swami (1993) "Mining association
rules between sets of items in large databases", in Proceedings of
1993 ACM SIGMOD International Conference on Management of
Data, Washington D.C., pp. 207-216.
[2] Deshpande, M., Kuramochi, M., Wale, N. and George Karypis, G.,
(2005) "Frequent Substructure-Based Approaches for Classifying
Chemical Compounds" in IEEE Transaction on Knowledge and Data
Engineering, Vol 17(8): 1036-1050
[3] Dumitriu, L., (2002) "Interactive mining and knowledge reuse for the
closed-itemset incremental-mining problem", Newsletter of the ACM
Special Interest Group on Knowledge Discovery and Data Mining,
ed. U. Fayyad, Vol 3:2, pp. 28-36, ian. 2002, http://www.acm.org/
sigkdd/ explorations.
[4] Langdon, W. B. and Barrett, S. J., (2004) "Genetic Programming in
Data Mining for Drug Discovery", in Evolutionary Computing in
Data Mining, Springer, 2004, Ashish Ghosh and Lakhmi C. Jain, 163,
Studies in Fuzziness and Soft Computing, 10, ISBN 3-540-22370-3,
pp. 211--235.
[5] Neagu, C.D., Benfenati, E., Gini, G., Mazzatorta, P., Roncaglioni, A.,
(2002) "Neuro-Fuzzy Knowledge Representation for Toxicity
Prediction of Organic Compounds", in Proceedings of the 15th
European Conference on Artificial Intelligence, Frank van Harmelen
(Ed.):, ECAI'2002, Lyon, France, July 2002. IOS Press 2002: pp.
498-502
[6] Wang, Z., Durst, G., Eberhart, R., Boyd, D., Ben-Miled, Z., (2004)
"Particle Swarm Optimization and Neural Network Application for
QSAR", in the Proceedings of the 18th International Parallel and
Distributed Processing Symposium (IPDPS 2004), 26-30 April 2004,
Santa Fe, New Mexico, USA. IEEE Computer Society 2004, ISBN 0-
7695-2132-0.
[7] Wille, R. (1982) "Restructuring lattice theory: an approach based on
hierarchies of concepts", in Ordered Sets, Proceedings of NATO
Advanced Study Institute, D. Reidel Publisher Co., pp. 445-470.
[8] Zaki, M.J. and Ogihara, M. (1998) "Theoretical Foundations of
Association Rules", in Proceedings of the 3rd SIGMOD-98
Workshop on DMKD, Seattle, WA, pp 7:1-7:8.
@article{"International Journal of Engineering, Mathematical and Physical Sciences:62122", author = "Luminita Dumitriu and Cristina Segal and Marian Craciun and Adina Cocu and Lucian P. Georgescu", title = "Model Discovery and Validation for the Qsar Problem using Association Rule Mining", abstract = "There are several approaches in trying to solve the
Quantitative 1Structure-Activity Relationship (QSAR) problem.
These approaches are based either on statistical methods or on
predictive data mining. Among the statistical methods, one should
consider regression analysis, pattern recognition (such as cluster
analysis, factor analysis and principal components analysis) or partial
least squares. Predictive data mining techniques use either neural
networks, or genetic programming, or neuro-fuzzy knowledge. These
approaches have a low explanatory capability or non at all. This
paper attempts to establish a new approach in solving QSAR
problems using descriptive data mining. This way, the relationship
between the chemical properties and the activity of a substance
would be comprehensibly modeled.", keywords = "association rules, classification, data mining,
Quantitative Structure - Activity Relationship.", volume = "1", number = "11", pages = "560-5", }