Automatic Real-Patient Medical Data De-Identification for Research Purposes

Our Medicine-oriented research is based on a medical data set of real patients. It is a security problem to share patient private data with peoples other than clinician or hospital staff. We have to remove person identification information from medical data. The medical data without private data are available after a de-identification process for any research purposes. In this paper, we introduce an universal automatic rule-based de-identification application to do all this stuff on an heterogeneous medical data. A patient private identification is replaced by an unique identification number, even in burnedin annotation in pixel data. The identical identification is used for all patient medical data, so it keeps relationships in a data. Hospital can take an advantage of a research feedback based on results.

[1] CSU - Czech Statistical Office, "─îesk├¢ statistick├¢ ├║řad: ├Ümrtnostn├¡
tabulky (Death-rate Statistics)," Online, 2010-03-02. http:
//, 2010.
[2] MZCR - Ministry of Health of the Czech Republic,
"Ministerstvo zdravotnictv├¡ ─îeské Republiky: V─østn├¡k
─ì. 2/2010: Pé─ìe o pacienty s cerebrovaskul├írn├¡m
onemocn─øn├¡m v ─îeské republice," Online, 2010-03-01.
9Bstn%C3%ADk_%20%C4%8D_02_2010.pdf, 2010.
[3] R. Dolin, L. Alschuler, S. Boyer, C. Beebe, F. Behlen, P. Biron,
and A. Shabo Shvo, "HL7 clinical document architecture, release
2," Journal of the American Medical Informatics Association,
vol. 13, no. 1, p. 30, 2006.
[4] Jules J. Bernman, "HHSWorkshop on the HIPAA Privacy Rule-s
De-Identification Standard," HHS Workshop, March 8-9, 2010,
Marriot at Metro Center, Washington, DC, March 8, 2010.
[5] Karlova univerzita v Praze - 2. lékařsk├í fakulta v Praze (Charles
University in Prague - 2nd Faculty of Medicine), "Data Standard
(DASTA)," Online, 2011-03-02., 2011.
[6] National Institute of Neurological Disorders and Stroke, "Digital
Imaging and Communications in Medicine (DICOM)," Online,
2010-03-02., Virginia, 2010.
[7] V. Rohan, P. Sevcik, J. Polivka, Z. Ambler, B. Kreuzberg, and
J. Ferda, "KlinickÛ pohled na vÛpočetní tomografii u akutní
ischemie mozku (A clinical Approach to Computed Tomography
in Acute Cerebral Ischemia)," Česká a slovenská neurologie a
neurochirurgie, 2007.
[8] P. Vcelak, J. Kleckova, and V. Rohan, "Cerebrovascular diseases
research based on heterogeneous medical data mining and knowledge
base," in 2010 International Conference for Internet Technology
and Secured Transactions (ICITST). London, United
Kingdom: IEEE, Infonomics Society, 2010, pp. 345-350.
[9] Health Level Seven, Inc., "What is hl7?" Online, 2010-03-02. http:
//, 2010.