Concepts Extraction from Discharge Notes using Association Rule Mining

A large amount of valuable information is available in plain text clinical reports. New techniques and technologies are applied to extract information from these reports. In this study, we developed a domain based software system to transform 600 Otorhinolaryngology discharge notes to a structured form for extracting clinical data from the discharge notes. In order to decrease the system process time discharge notes were transformed into a data table after preprocessing. Several word lists were constituted to identify common section in the discharge notes, including patient history, age, problems, and diagnosis etc. N-gram method was used for discovering terms co-Occurrences within each section. Using this method a dataset of concept candidates has been generated for the validation step, and then Predictive Apriori algorithm for Association Rule Mining (ARM) was applied to validate candidate concepts.




References:
[1] M. Konchady , Text Mining Application Programming. Boston: Charles
River Media, 2006, ch. 1.
[2] D.B. Johnson, R.K. Taira, A.F. Cardenas, and D.R. Aberle, "Extracting
Information from Free Text Radiology Reports", Int. J. Digit Libr., vol.
1, no. 3, pp. 297-308, Dec. 1997.
[3] G. Schadow , C.J. Mcdonald,. "Extracting Structured Information from
Free Text Pathology Reports," in Conf. 2003 AMIA Annu. Symp. Proc.,
pp. 584-8.
[4] R.A. Erhardt, R. Schneider , and C. Blaschke, "Status of Text Mining
Techniques Applied to Biomedical Text," Drug Dicovery Today, vol.
11, no. 7-8, pp. 315-25, Apr. 2006.
[5] A.M. Cohen, W.R. Hersh, "A Survey of Current Work in Biomedical
Text Mining," Briefings in Bioinformatics, vol. 6, no. 1, pp. 57-71, Mar.
2005.
[6] Wikipedia, "Otolaryngology (Unpublished work style)," unpublished.
[7] Google, "Zemberek (Unpublished work style)," unpublished.
[8] DB2 Universal Database, "Associations (Unpublished work style),"
unpublished.
[9] S.E. Brossette, A.P. Sprague, J.M. Hardin, K.W.T. Jones, and S.A.
Moser , "Association rules and data mining in hospital infection control
and public health surveillance," Journal of American Medical
Informatics Association, vol. 5, pp. 373-81, 1998.
[10] J. Paetz, R.W. Brause, "A frequent patterns tree approach for rule
generation with categorical septic shock patient data," in Proceedings of
the second international symposium on medical data analysis, London:
Springer-Verlag, 2001, pp. 207-12.
[11] M. Ohsaki, Y. Sato, H. Yokoi, and T. Yamaguchi, "A rule discovery
support system for sequential medical data in the case study of a chronic
hepatitis dataset," in Proceedings of the ECML/PKDD 2003 discovery
challenge workshop.
[12] J. Chen, H. He, G.J. Williams, and Jin H, "Temporal sequence
associations for rare events," in Advances in knowledge discovery and
data mining, Berlin/Heidelberg: Springer, 2004, pp. 235-9.
[13] C. Ordonez, N.F. Ezquerra, and C.A. Santana, "Constraining and
summarizing association rules in medical data," Knowledge and
Information Systems,vol. 3, pp. 1-2, 2006.
[14] R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules
between sets of items in large databases," in Proceedings of the 1993
ACM SIGMOD International Conference on Management of Data,
Washington, DC: SIGMOD Conference, 1993, pp. 207-216.
[15] T. Scheffer, "Finding Association Rules That Trade Support Optimally
against Confidence," in Proc of the 5th European Conf. on principles
and Practice of Knowledge Discovery in Databases (PKDD'01),
Freiburg, Germany: Springer-Verlag, 2001, pp. 424-435.
[16] I.H. Witten, E. Frank, "Data Mining: Practical Machine Learning Tools
and Techniques with Java Implementations," San Francisco, 2005.
[17] E. Frank, M. Hall, L. Trigg, G. Holmes, and I.H. Witten, "Data Mining
in Bioinformatics using Weka," Bioinformatics, vol. 20, no. 15, pp.
2479-2481, 2004.