Data Mining for Cancer Management in Egypt Case Study: Childhood Acute Lymphoblastic Leukemia

Data Mining aims at discovering knowledge out of data and presenting it in a form that is easily comprehensible to humans. One of the useful applications in Egypt is the Cancer management, especially the management of Acute Lymphoblastic Leukemia or ALL, which is the most common type of cancer in children. This paper discusses the process of designing a prototype that can help in the management of childhood ALL, which has a great significance in the health care field. Besides, it has a social impact on decreasing the rate of infection in children in Egypt. It also provides valubale information about the distribution and segmentation of ALL in Egypt, which may be linked to the possible risk factors. Undirected Knowledge Discovery is used since, in the case of this research project, there is no target field as the data provided is mainly subjective. This is done in order to quantify the subjective variables. Therefore, the computer will be asked to identify significant patterns in the provided medical data about ALL. This may be achieved through collecting the data necessary for the system, determimng the data mining technique to be used for the system, and choosing the most suitable implementation tool for the domain. The research makes use of a data mining tool, Clementine, so as to apply Decision Trees technique. We feed it with data extracted from real-life cases taken from specialized Cancer Institutes. Relevant medical cases details such as patient medical history and diagnosis are analyzed, classified, and clustered in order to improve the disease management.




References:
[1] J. P. Bigus, "Data Mining with Neural Networks", New York: McGraw-
Hill, 1996
[2] NCI Egypt website (www.nci.edu.eg) viewed on 1st August 2005
[3] M. Berry and S. Gordon, "Data Mining Techniques: For Marketing,
Sales, and Customer Support", May 1997
[4] Han, J. and Kamber, M., "Data Mining Concepts and Techniques", 2001
[5] M. Negnevitsky, "Artificial Intelligence, A Guide to Intelligent Systems",
England: Pearson Education Limited, 2002.
[6] G. Fort, S. Lambert Lacroix, "Classification using partial least squares
with penalized logistic regression", England: Bioinformatics-Oxford,
2005.
[7] S. Bicciato, A. Luchini, C. Di-Bello, "Marker identification and
classification of cancer types using gene expression data and SIMCA",
Germany: Methods-of-information-in-medicine, 2004.
[8] K. A. Marx, P. O'Neil, P. Hoffman, M. L. Ujwal, "Data mining the NCI
cancer cell line compound GI(50) values: identifying quinone subtypes
effective against melanoma and leukemia cell classes", United-States:
Journal-of-chemical-information-and-computer-sciences, 2003.
[9] G. A Forgionne, A. Gagopadhyay, and M. Adya, "Cancer Surveillance
Using Data Warehousing, Data Mining, and Decision Support
Systems", Topics in Health Information Management, vol. 21(1);
Proquest Medical Library, August 2000
[10] W. Kuo, R. Chang, D. Chen and C. C. Lee, "Data Mining with Decision
Trees for Diagnosis of Breast Tumor in Medical Ultrasonic Images",
Breast Cancer Research and Treatment, Dordrecht, vol. 66, Iss. 1, Mar
2001.
[11] National Cancer Institute official website (www.nci.nih.gov) viewed on
1st August 2005
[12] Periodicals of NCI Egypt (2001)
[13] "Introduction to Data Mining and Knowledge Discovery - Third
Edition", Two Crows Corporation (pdf)