On Identity Disclosure Risk Measurement for Shared Microdata

Probability-based identity disclosure risk measurement may give the same overall risk for different anonymization strategy of the same dataset. Some entities in the anonymous dataset may have higher identification risks than the others. Individuals are more concerned about higher risks than the average and are more interested to know if they have a possibility of being under higher risk. A notation of overall risk in the above measurement method doesn-t indicate whether some of the involved entities have higher identity disclosure risk than the others. In this paper, we have introduced an identity disclosure risk measurement method that not only implies overall risk, but also indicates whether some of the members have higher risk than the others. The proposed method quantifies the overall risk based on the individual risk values, the percentage of the records that have a risk value higher than the average and how larger the higher risk values are compared to the average. We have analyzed the disclosure risks for different disclosure control techniques applied to original microdata and present the results.




References:
[1] L. Willemborg, T. Waal, "Elements of Statistical Disclosure Control".
Springer Verlag. 2001.
[2] P. Kosseim and K. El Emam, "Privacy Interests in Prescription Records,
Part 1: Prescriber Privacy," IEEE Security and Privacy, vol. 7, pp.72-76,
2009
[3] K. El Emam and P. Kosseim, "Privacy Interests in Prescription Records,
Part 2: Patient Privacy," IEEE Security and Privacy, vol. 7, pp.75-78,
2009
[4] J. Lane, P. Heus and T. Mulcahy, "Data access in a cyber world: making
use of cyberinfrastructure", Transactions on Data Privacy, 1(1), pp.2-16,
2008
[5] P. Tendick, N. Matloff, , "A Modified Random Perturbation Method for
Database Security." ACM Transactions on Database Systems, Volume
19, Number 1. 1994.
[6] R. H. McGuckin, S. V Nguyen. , "Public Use Microdata: Disclosure and
Usefulness. Journal of Economic and Social Measurement", Vol. 16,
pp.19 - 39, 1990.
[7] R. J. A. Little, "Statistical Analysis of Masked Data", Journal of Official
Statistics, Vol. 9, pp.407-426, 1993.
[8] J. Domingo-Ferrer and J. Mateo-Sanz, "Practical Data-Oriented
Microaggregation for Statistical Disclosure Control". IEEE Transactions
on Knowledge and Data Engineering, Vol. 14, No. 1, pp.189-201. 2002.
[9] C. J. Skinner, C. Marsh, S. Openshaw, and C. Wymer, "Disclosure
control for census microdata", Journal of Official Statistics, pp.31-51.
1994.
[10] N. R. Adam and J. C. Wortmann , "Security Control Methods for
Statistical Databases: A Comparative Study". ACM Computing Surveys,
Vol. 21, No. 4. 1989.
[11] J. J. Kim, "A Method for Limiting Disclosure in Microdata Based on
Random Noise and Transformation", American Statistical Association,
Proceedings of the Section on Survey Research Methods, pp.303-308,
1986.
[12] K. Muralidhar and R. Sarathy, "Security of Random Data Perturbation
Methods", ACM Transactions on Database Systems, Vol. 24, No. 4,
pp.487-493, 1999.
[13] P. Kooiman, L. Willemborg, and J. Gouweleeuw, "PRAM: A Method
for Disclosure Limitation for Microdata", Report, Department of
Statistical Methods, Statistical Netherlands, Voorburg, 1997.
[14] T. Dalenius and S. P. Reiss, "Data-Swapping: A Technique for
Disclosure Control", Journal of Statistical Planning and Inference 6,
pp.73-85, 1982.
[15] D. Lambert, "Measures of Disclosure Risk and Harm". Journal of
Official Statistics, Vol. 9, pp.313-331, 1993.
[16] S. E. Fienberg, U. E. Markov, "Confidentiality, Uniqueness, and
Disclosure Limitation for Categorical Data", Journal of Official
Statistics, pp385 - 397, 1998.
[17] M. J. Elliot, "DIS: a new approach to the measurement of statistical
disclosure risk", International Journal of Risk Management, pp.39 -48,
2000.
[18] P. Samarati, "Protecting Respondents Identities in Microdata Release",
IEEE Transactions on Knowledge and Data Engineering, Vol. 13, No. 6,
pp.1010-1027, 2001.
[19] J. G. Bethlehem, W. J., Keller, and J. Pannekoek, "Disclosure control of
microdata". Journal of the American Statistical Association., vol. 85,
pp.38-45, 1990.
[20] B. Greenberg, and L. Zayatz, "Strategies for measuring risk in public
use microdata files". Statistica Neerlandica, vol. 46, pp.33-48, 1992.
[21] C.J. Skinner, C. Marsh, S. Openshaw, and C. Wymer, "Disclosure
control for census microdata". Journal of Official Statistics., vol. 10,
pp.31-51, 1994.
[22] G. Chen, and S. Keller-McNulty, "Estimation of identification disclosure
risk in microdata". Journal Official Statistics., vol. 14, pp.79-95, 1998.
[23] S.E. Fienberg, and U.E. Makov, "Confidentiality, uniqueness and
disclosure limitation for categorical data", Journal Official Statistics,
vol. 14, pp.385-397, 1998.
[24] S.M. Samuels, "A Bayesian, species-sampling-inspired approach to the
unique problems in microdata disclosure risk assessment". Journal
Official Statistics, vol. 14, pp.373-383, 1998.
[25] M.J. Elliot, and A. Dale," Scenarios of attack: the data intruder-s
perspective on statistical disclosure risk". Netherlands Official Statist.,
Spring, pp.6-10, 1999.
[26] G. Paass, "Disclosure risk and disclosure avoidance for microdata".
J.Bus.Econ.Statist., vol. 6, pp.487-500, 1988.
[27] U. Blien, H. Wirth, and M. M├╝ller, "Disclosure risk for microdata
stemming from official statistics". Statistica Neerlandica,vol. 46, pp.
69-82, 1992.
[28] X. Xiao, Y. Tao and N. Koudas, "Transparent Anonymization:
Thwarting Adversaries Who Know the Algorithm, ACM Transactions on
Database Systems (TODS)", Vol. 35, Issue 2, April 2010.
[29] V.S. Laks, Lakshmanan and T. NG Raymond and G. Ramesh, "On
disclosure risk analysis of anonymized itemsets in the presence of prior
knowledge", ACM Transactions on Knowledge Discovery from Data
(TKDD) Vol.2 , Issue 3 October 2008.
[30] A. Machanavajjhala, D. Kifer, J. Gehrke, and M.
VENKITASUBRAMANIAM, "l-Diversity: Privacy Beyond k-
Anonymity, ACM Transactions on Knowledge Discovery from Data
(TKDD)" Vol. 1 , Issue 1 , March 2007.
[31] F. K. Dankar and K. E. Emam, "A Method for Evaluating Marketer Reidentification
Risk", Proceedings of the 2010 EDBT/ICDT Workshops,
Lausanne, Switzerland 2010
[32] T.M. Truta, F. Fotouhi and D. Barth-Jones, "Assessing Global
Disclosure Risk in Masked Microdata", Proceedings of the 2004 ACM
workshop on Privacy in the electronic society table of contents,
Washington DC, USA, pp.85 - 93, 2004
[33] T.M. Truta, F. Fotouhi and D. Barth-Jones, "Disclosure Risk Measures
for the Sampling Disclosure Control Method", Proceedings of the 2004
ACM symposium on Applied computing, Nicosia, Cyprus, pp.301 - 306,
2004.
[34] M. Bezz, "Expressing privacy metrics as one-symbol information",
Proceedings of the 2010 EDBT/ICDT Workshops, Lausanne,
Switzerland, Article No.: 29, 2010
[35] J. Domingo-Ferrer and David Rebollo-Monedero, "Measuring Risk and
Utility of Anonymized Data Using Information Theory", Proceedings of
the 2009 EDBT/ICDT Workshops, Saint-Petersburg, Russia Pages: 126-
130, 2009
[36] T.M. Truta, F. Fotouhi and D. Barth-Jones, "Privacy and Confidentiality
Management for the Microaggregation Disclosure Control Method:
Disclosure Risk and Information Loss Measures", Proceedings of the
2003 ACM workshop on Privacy in the electronic society, Washington,
DC, pp.21 - 30, 2003.
[37] C. J. Skinner and M. J. Elliot, "A Measure of Disclosure Risk for
Microdata". Journal of the Royal Statistical Society, Series B, Vol. 64,
2002, 855--867
[38] R. Benedetti, L. Franconi, "Statistical and Technological Solutions for
Controlled Data Dissemination". Proceedings of New Techniques and
Technologies for Statistics, Vol. 1, pp. 225-232, 1998.
[39] D. E. Denning and P. J. Denning, "Data Security". ACM Computing
Surveys, Vol. 11, pp. 227-249, 1979.
[40] W. A. Fuller, "Masking Procedure for Microdata Disclosure Limitation",
Journal of Official Statistics, Vol. 9, pp.383-406, 1993.
[41] N. L. Spruill, "The Confidentiality and Analytic Usefulness of Masked
Business Microdata". Proceedings of the American Statistical
Association, Section on Survey Research Methods, pp.602-613, 1983.
[42] T.M. Truta, F. Fotouhi and D. Barth-Jones, "Disclosure risk measures
for microdata", Proceedings of the 15th International Conference on
Scientific and Statistical Database Management, Cambridge, MA, Page:
15-22, 2003
[43] P. Steel, and J. Sperling, "The Impact of Multiple Geographies and
Geographic Detail on Disclosure Risk: Interactions between Census
Tract and ZIP Code Tabulation Geography". Bureau of Census, 2001
[44] A. Takemura, "Local Recoding by Maximum Weight Matching for
Disclosure Control of Microdata Sets". ITME Discussion Paper, No.11,
1999.
[45] G.T. Duncan and D. Lambert, "The risk of disclosure for microdata",
J.Bus.Econ. Statist., vol. 7, pp.207-217, 1989.
[46] D. Lambert, "Measures of disclosure risk and harm". Journal Official
Statistics., vol.9, pp.313-331, 1993.