Mining Genes Relations in Microarray Data Combined with Ontology in Colon Cancer Automated Diagnosis System

MATCH project [1] entitle the development of an automatic diagnosis system that aims to support treatment of colon cancer diseases by discovering mutations that occurs to tumour suppressor genes (TSGs) and contributes to the development of cancerous tumours. The constitution of the system is based on a) colon cancer clinical data and b) biological information that will be derived by data mining techniques from genomic and proteomic sources The core mining module will consist of the popular, well tested hybrid feature extraction methods, and new combined algorithms, designed especially for the project. Elements of rough sets, evolutionary computing, cluster analysis, self-organization maps and association rules will be used to discover the annotations between genes, and their influence on tumours [2]-[11]. The methods used to process the data have to address their high complexity, potential inconsistency and problems of dealing with the missing values. They must integrate all the useful information necessary to solve the expert's question. For this purpose, the system has to learn from data, or be able to interactively specify by a domain specialist, the part of the knowledge structure it needs to answer a given query. The program should also take into account the importance/rank of the particular parts of data it analyses, and adjusts the used algorithms accordingly.




References:
[1] http://www.match-project.com/
[2] Pawlak Z. (1982) Rough sets. International Journal of Information and
Computer Sciences, 11(5):341-356.
[3] Pawlak Z. and Slowinski. R. (1994) Rough set approach to multiattribute
decision analysis. European Journal of Operational Research,
72(3):443-459.
[4] Slezak D. (2005) Association Reducts: A Framework for Mining Multiattribute
Dependencies. ISMIS 2005: 354-363.
[5] Wroblewski J. (1996) Theoretical Foundations of Order-Based Genetic
Algorithms. Fundam. Inform. 28(3-4): 423-430.
[6] Wroblewski:J., Slezak D. (2003) Order Based Genetic Algorithms for
the Search of Approximate Entropy Reducts. RSFDGrC 2003: 308-311.
[7] Yao H., Hamilton H.J., Butz C.J. (2004) A Foundational Approach to
Mining Itemset Utilities from Databases. SDM 2004.
[8] Yao J.T., Yao Y.Y., and Zhao, Y. (2005) Foundations of classification,
in: Lin, T.Y., Ohsuga, S., Liau, C.J. and Hu, X. (Eds), Foundations and
Novel Approaches in Data Mining, Springer, Berlin, pp. 75-97.
[9] Yao Y.Y., Zhong, N. and Zhao, Y.(2004) A three-layered conceptual
framework of data mining, Proceedings of ICDM'04 Workshop of
Foundation of Data Mining, 215-221.
[10] Ziarko, W. (1989) A technique for discovering and analysis of causeeffect
relationships in empirical data. International Joint Conference on
Artificial Intelligence, Proceedings of the Workshop on Knowledge
Discovery in Databases, Detroit, p.390-396.
[11] Ziarko, W. (1989) Determination of locally optimal set of features for
representation of implicit knowledge. Proceedings of International
Conference on Computing and Information, Toronto, North Holland,
p.433-438.
[12] Baskin C., García-Sastre A., Tumpey T. (2004) Integration of Clinical
Data, Pathology, and cDNA Microarrays in Influenza Virus-Infected
Pigtailed Macaques Journal of Virology, October 2004, p. 10420-10432,
Vol. 78, No. 19
[13] Casey R. M. (2005) Bioinformatics Data Integration. Business
Intelligence Network
[14] Pasquier, C. et al. THEA: ontology-driven analysis of microarray data.
Pasquier, C. et al. Bioinformatics 20(16), 2636-2643, 2004.
[15] Radetzki, U., Bode, T., Witterstein, G., Gnasa et al. (2003) A Service-
Centric Computing Environment for Heterogeneous Biological
Databases and Methods." In R. Spang, P. Beziat, and M. Vingron (eds.):
Currents in Computational Molecular Biology (RECOMB 2003), pp. 25-
26, April 2003, Berlin, Germany.
[16] Burger, M., Graepel, T., Obermayer, K.: Self-organizing maps:
Generalizations and new optimization techniques. Neurocomputing 20
(1998) pp. 173-190.
[17] Kohonen, T.: Self-organized formation of topologically correct feature
maps. Bio-logical Cybernetics 43 (1982) pp. 59-69.
[18] Gruzdz, A.,Ihnatowicz, A., Slezak, D.: Interactive gene clustering-A
case study of breast cancer microarray data. Information Systems
Frontiers (2006) 8:21-27.