Using Data Mining Techniques for Finding Cardiac Outlier Patients

In this paper we used data mining techniques to identify outlier patients who are using large amount of drugs over a long period of time. Any healthcare or health insurance system should deal with the quantities of drugs utilized by chronic diseases patients. In Kingdom of Bahrain, about 20% of health budget is spent on medications. For the managers of healthcare systems, there is no enough information about the ways of drug utilization by chronic diseases patients, is there any misuse or is there outliers patients. In this work, which has been done in cooperation with information department in the Bahrain Defence Force hospital; we select the data for Cardiac patients in the period starting from 1/1/2008 to December 31/12/2008 to be the data for the model in this paper. We used three techniques for finding the drug utilization for cardiac patients. First we applied a clustering technique, followed by measuring of clustering validity, and finally we applied a decision tree as classification algorithm. The clustering results is divided into three clusters according to the drug utilization, for 1603 patients, who received 15,806 prescriptions during this period can be partitioned into three groups, where 23 patients (2.59%) who received 1316 prescriptions (8.32%) are classified to be outliers. The classification algorithm shows that the use of average drug utilization and the age, and the gender of the patient can be considered to be the main predictive factors in the induced model.




References:
[1] G. Y. H. Lip, K. Peter "New oral anticoagulant drugs in cardiovascular
disease", Thrombosis and Haemostasis. ISSN: 0340-6245. 2010 July.
[2] World Health Organization, "The World Health Report 2006 - working
together for health", http://www.who.int/whr/2006/en/index.html. 2006.
[3] Ministry of Health - Kingdom of Bahrain. Annual Report of
2008.http://www.moh.gov.bh/PDF/Publications/Statistics/HS2008/PDF/
CH03-vital%20stat_2008.pdf
[4] J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd
Edition, Morgan Kaufmann, 2006.
[5] T. Mitchell, Machine Learning, McGraw Hill, 1997.
[6] J.R. Quinlan: C4.5, Programs for MachineLearning, Morgan Kaufmann,
1993.
[7] M. Last and O. Maimon, "A Compact and Accurate Model for
Classification", IEEE Transactions on Knowledge and Data
Engineering 2004; 16, 2: 203-215.
[8] O. Maimon and M. Last, Knowledge Discovery and Data Mining - The
InfoFuzzy Network (IFN) Methodology, Kluwer Academic Publishers,
Massive Computing, Boston, December 2000.
[9] M. Halkidi, Y. Batistakis, M. Vazirgiannis, "On Clustering Validation
Techniques", J. Intell. Inf. Syst. 2001; 17, 2-3: 107-145.
[10] M. Last, Y. Klein, A. Kandel, "Knowledge Discovery in Time Series
Databases", IEEE Transactions on Systems, Man, and Cybernetics 2001;
31, 1: 160-169.
[11] J.C. Prather, D.F. Lobach, L.K. Goodwin, J.W. Hales, M.L. Hage, W.E.
Hammond, "Medical Data Mining: Knowledge Discovery in a Clinical
Data Warehouse", Proc AMIA Annu Fall Symp. 1997:101-5.
[12] Krzysztof J. Cios, Witold Pedrycz, Roman W. Swiniarski, and Lukasz A.
Kurgan "Data Mining: A Knowledge Discovery Approach" ISBN-13:
978-0-387-33333-5; 2007 Springer.
[13] J.C. Dunn, "Well Separated Clusters and Optimal Fuzzy Partitions", J.
Cybern. 1974; 4: 95-104.
[14] F. Azuaje, "A Cluster Validity Framework for Genome Expression
Data", Bioinformatics 2002; 18: 319-320.