Data Mining for Cancer Management in Egypt Case Study: Childhood Acute Lymphoblastic Leukemia
Data Mining aims at discovering knowledge out of
data and presenting it in a form that is easily comprehensible to
humans. One of the useful applications in Egypt is the Cancer
management, especially the management of Acute Lymphoblastic
Leukemia or ALL, which is the most common type of cancer in
children.
This paper discusses the process of designing a prototype that can
help in the management of childhood ALL, which has a great
significance in the health care field. Besides, it has a social impact
on decreasing the rate of infection in children in Egypt. It also
provides valubale information about the distribution and
segmentation of ALL in Egypt, which may be linked to the possible
risk factors.
Undirected Knowledge Discovery is used since, in the case of this
research project, there is no target field as the data provided is
mainly subjective. This is done in order to quantify the subjective
variables. Therefore, the computer will be asked to identify
significant patterns in the provided medical data about ALL. This
may be achieved through collecting the data necessary for the
system, determimng the data mining technique to be used for the
system, and choosing the most suitable implementation tool for the
domain.
The research makes use of a data mining tool, Clementine, so as to
apply Decision Trees technique. We feed it with data extracted from
real-life cases taken from specialized Cancer Institutes. Relevant
medical cases details such as patient medical history and diagnosis
are analyzed, classified, and clustered in order to improve the disease
management.
[1] J. P. Bigus, "Data Mining with Neural Networks", New York: McGraw-
Hill, 1996
[2] NCI Egypt website (www.nci.edu.eg) viewed on 1st August 2005
[3] M. Berry and S. Gordon, "Data Mining Techniques: For Marketing,
Sales, and Customer Support", May 1997
[4] Han, J. and Kamber, M., "Data Mining Concepts and Techniques", 2001
[5] M. Negnevitsky, "Artificial Intelligence, A Guide to Intelligent Systems",
England: Pearson Education Limited, 2002.
[6] G. Fort, S. Lambert Lacroix, "Classification using partial least squares
with penalized logistic regression", England: Bioinformatics-Oxford,
2005.
[7] S. Bicciato, A. Luchini, C. Di-Bello, "Marker identification and
classification of cancer types using gene expression data and SIMCA",
Germany: Methods-of-information-in-medicine, 2004.
[8] K. A. Marx, P. O'Neil, P. Hoffman, M. L. Ujwal, "Data mining the NCI
cancer cell line compound GI(50) values: identifying quinone subtypes
effective against melanoma and leukemia cell classes", United-States:
Journal-of-chemical-information-and-computer-sciences, 2003.
[9] G. A Forgionne, A. Gagopadhyay, and M. Adya, "Cancer Surveillance
Using Data Warehousing, Data Mining, and Decision Support
Systems", Topics in Health Information Management, vol. 21(1);
Proquest Medical Library, August 2000
[10] W. Kuo, R. Chang, D. Chen and C. C. Lee, "Data Mining with Decision
Trees for Diagnosis of Breast Tumor in Medical Ultrasonic Images",
Breast Cancer Research and Treatment, Dordrecht, vol. 66, Iss. 1, Mar
2001.
[11] National Cancer Institute official website (www.nci.nih.gov) viewed on
1st August 2005
[12] Periodicals of NCI Egypt (2001)
[13] "Introduction to Data Mining and Knowledge Discovery - Third
Edition", Two Crows Corporation (pdf)
[1] J. P. Bigus, "Data Mining with Neural Networks", New York: McGraw-
Hill, 1996
[2] NCI Egypt website (www.nci.edu.eg) viewed on 1st August 2005
[3] M. Berry and S. Gordon, "Data Mining Techniques: For Marketing,
Sales, and Customer Support", May 1997
[4] Han, J. and Kamber, M., "Data Mining Concepts and Techniques", 2001
[5] M. Negnevitsky, "Artificial Intelligence, A Guide to Intelligent Systems",
England: Pearson Education Limited, 2002.
[6] G. Fort, S. Lambert Lacroix, "Classification using partial least squares
with penalized logistic regression", England: Bioinformatics-Oxford,
2005.
[7] S. Bicciato, A. Luchini, C. Di-Bello, "Marker identification and
classification of cancer types using gene expression data and SIMCA",
Germany: Methods-of-information-in-medicine, 2004.
[8] K. A. Marx, P. O'Neil, P. Hoffman, M. L. Ujwal, "Data mining the NCI
cancer cell line compound GI(50) values: identifying quinone subtypes
effective against melanoma and leukemia cell classes", United-States:
Journal-of-chemical-information-and-computer-sciences, 2003.
[9] G. A Forgionne, A. Gagopadhyay, and M. Adya, "Cancer Surveillance
Using Data Warehousing, Data Mining, and Decision Support
Systems", Topics in Health Information Management, vol. 21(1);
Proquest Medical Library, August 2000
[10] W. Kuo, R. Chang, D. Chen and C. C. Lee, "Data Mining with Decision
Trees for Diagnosis of Breast Tumor in Medical Ultrasonic Images",
Breast Cancer Research and Treatment, Dordrecht, vol. 66, Iss. 1, Mar
2001.
[11] National Cancer Institute official website (www.nci.nih.gov) viewed on
1st August 2005
[12] Periodicals of NCI Egypt (2001)
[13] "Introduction to Data Mining and Knowledge Discovery - Third
Edition", Two Crows Corporation (pdf)
@article{"International Journal of Medical, Medicine and Health Sciences:53088", author = "Nevine M. Labib and Michael N. Malek", title = "Data Mining for Cancer Management in Egypt Case Study: Childhood Acute Lymphoblastic Leukemia", abstract = "Data Mining aims at discovering knowledge out of
data and presenting it in a form that is easily comprehensible to
humans. One of the useful applications in Egypt is the Cancer
management, especially the management of Acute Lymphoblastic
Leukemia or ALL, which is the most common type of cancer in
children.
This paper discusses the process of designing a prototype that can
help in the management of childhood ALL, which has a great
significance in the health care field. Besides, it has a social impact
on decreasing the rate of infection in children in Egypt. It also
provides valubale information about the distribution and
segmentation of ALL in Egypt, which may be linked to the possible
risk factors.
Undirected Knowledge Discovery is used since, in the case of this
research project, there is no target field as the data provided is
mainly subjective. This is done in order to quantify the subjective
variables. Therefore, the computer will be asked to identify
significant patterns in the provided medical data about ALL. This
may be achieved through collecting the data necessary for the
system, determimng the data mining technique to be used for the
system, and choosing the most suitable implementation tool for the
domain.
The research makes use of a data mining tool, Clementine, so as to
apply Decision Trees technique. We feed it with data extracted from
real-life cases taken from specialized Cancer Institutes. Relevant
medical cases details such as patient medical history and diagnosis
are analyzed, classified, and clustered in order to improve the disease
management.", keywords = "Data Mining, Decision Trees, Knowledge Discovery,Leukemia.", volume = "1", number = "8", pages = "481-6", }