Observations about the Principal Components Analysis and Data Clustering Techniques in the Study of Medical Data
The medical data statistical analysis often requires the
using of some special techniques, because of the particularities of
these data. The principal components analysis and the data clustering
are two statistical methods for data mining very useful in the medical
field, the first one as a method to decrease the number of studied
parameters, and the second one as a method to analyze the
connections between diagnosis and the data about the patient-s
condition. In this paper we investigate the implications obtained from
a specific data analysis technique: the data clustering preceded by a
selection of the most relevant parameters, made using the principal
components analysis. Our assumption was that, using the principal
components analysis before data clustering - in order to select and to
classify only the most relevant parameters – the accuracy of
clustering is improved, but the practical results showed the opposite
fact: the clustering accuracy decreases, with a percentage
approximately equal with the percentage of information loss reported
by the principal components analysis.
[1] Chernick, M.R., Friis, R.H., Introductory Biostatistics for the Health
Sciences, John Wiley & Sons Publ., 2003.
[2] Zhou, X.H., Obuchowski, N.A., McClish, D.K., Statistical Methods in
Diagnostic Medicine, John Wiley & Sons Publ., 2002.
[3] Saporta, G., ┼×tefânescu, M.V., Analiza datelor ┼ƒi informaticâ, Ed.
Economicâ, 1996 (in romanian).
[4] C. Dascâlu, Boiculese, L., "The Usefulness of Algorithms Based on
Clustering in the Diagnosis Finding in Medical Practice", in Lecture
Notes of the ICB Seminars - Statistics and Clinical Practice, editors: L.
Bobrowski, J. Doroszewski, E. Marubini, N. Victor, Warsaw, 2000, pg.
53 - 56.
[5] Alsabti, K., Ranka, S., Singh, V., "An Efficient K-Means Clustering
Algorithm", in Proceedings of the 1st Workshop on High-Performance
Data Mining, 1998.
[6] Dumitrescu, D., Teoria clasificârii, Babe┼ƒ - Bolyai University, Cluj -
Napoca, 1991 (in romanian).
[1] Chernick, M.R., Friis, R.H., Introductory Biostatistics for the Health
Sciences, John Wiley & Sons Publ., 2003.
[2] Zhou, X.H., Obuchowski, N.A., McClish, D.K., Statistical Methods in
Diagnostic Medicine, John Wiley & Sons Publ., 2002.
[3] Saporta, G., ┼×tefânescu, M.V., Analiza datelor ┼ƒi informaticâ, Ed.
Economicâ, 1996 (in romanian).
[4] C. Dascâlu, Boiculese, L., "The Usefulness of Algorithms Based on
Clustering in the Diagnosis Finding in Medical Practice", in Lecture
Notes of the ICB Seminars - Statistics and Clinical Practice, editors: L.
Bobrowski, J. Doroszewski, E. Marubini, N. Victor, Warsaw, 2000, pg.
53 - 56.
[5] Alsabti, K., Ranka, S., Singh, V., "An Efficient K-Means Clustering
Algorithm", in Proceedings of the 1st Workshop on High-Performance
Data Mining, 1998.
[6] Dumitrescu, D., Teoria clasificârii, Babe┼ƒ - Bolyai University, Cluj -
Napoca, 1991 (in romanian).
@article{"International Journal of Medical, Medicine and Health Sciences:64585", author = "Cristina G. Dascâlu and Corina Dima Cozma and Elena Carmen Cotrutz", title = "Observations about the Principal Components Analysis and Data Clustering Techniques in the Study of Medical Data", abstract = "The medical data statistical analysis often requires the
using of some special techniques, because of the particularities of
these data. The principal components analysis and the data clustering
are two statistical methods for data mining very useful in the medical
field, the first one as a method to decrease the number of studied
parameters, and the second one as a method to analyze the
connections between diagnosis and the data about the patient-s
condition. In this paper we investigate the implications obtained from
a specific data analysis technique: the data clustering preceded by a
selection of the most relevant parameters, made using the principal
components analysis. Our assumption was that, using the principal
components analysis before data clustering - in order to select and to
classify only the most relevant parameters – the accuracy of
clustering is improved, but the practical results showed the opposite
fact: the clustering accuracy decreases, with a percentage
approximately equal with the percentage of information loss reported
by the principal components analysis.", keywords = "Data clustering, medical data, principal components
analysis.", volume = "2", number = "5", pages = "175-5", }