Dimensional Modeling of HIV Data Using Open Source

Selecting the data modeling technique for an information system is determined by the objective of the resultant data model. Dimensional modeling is the preferred modeling technique for data destined for data warehouses and data mining, presenting data models that ease analysis and queries which are in contrast with entity relationship modeling. The establishment of data warehouses as components of information system landscapes in many organizations has subsequently led to the development of dimensional modeling. This has been significantly more developed and reported for the commercial database management systems as compared to the open sources thereby making it less affordable for those in resource constrained settings. This paper presents dimensional modeling of HIV patient information using open source modeling tools. It aims to take advantage of the fact that the most affected regions by the HIV virus are also heavily resource constrained (sub-Saharan Africa) whereas having large quantities of HIV data. Two HIV data source systems were studied to identify appropriate dimensions and facts these were then modeled using two open source dimensional modeling tools. Use of open source would reduce the software costs for dimensional modeling and in turn make data warehousing and data mining more feasible even for those in resource constrained settings but with data available.




References:
[1] Chen, P. (1976). The Entity Relationship model-Towards a unified view
of data, ACM Transactions on Database Systems, 1, 1, 9-36.
[2] Chilton, M.A. (2006). Data Modeling Education: The changing
technology, Journal of Information Systems Educaion, 17,1, 17-20.
[3] Coar, K. (2006). The Open source Definition , Retrieved on 18th Nov
2008 from opensource.org: http://www.opensource.org/docs/osd
[4] Dash, A.K and Agarwal, R. (2001). Dimensional modeling for Data
warehouse, ACM SIGSOFT software engineering notes, 26, 1, 83-84.
[5] Golfarelli, M., Maio, D. and Rizzi, S. (1998). Conceptual Design of Data
warehouses from E-R schemes, Proceedings of the Hawaii International
Conference On System Sciences, January 6-9, Hawaii
[6] Gui, Y., Tang, S., Tong, Y. and Yang,D. (2006). Tripple Driven Data
Modeling Methodology in Data warehousing: A case study, ACM
workshop on Data warehousing and OLAP, 59-66
[7] Ilczuk, G. and Wakulicz-Deja, A. (2007). Selection of Important
attributes for Medical Diagnosis Systems. Transactions on Rough Sets ,
7,1, 70-84.
[8] Jones, M. E. and Song, I.Y. (2008). Dimensional modeling:
Identification, classification and evaluation of patterns. Decision
Support Systems , 59-76.
[9] Kleijen, J. P. (1995). Verification and validation of simulation models.
European Journal of Operations Research , 82,1, 145-162.
[10] Kortinik, M. A. and Moody, D. L. (2003). From ER Models to
Dimensional Models: Bridging the Gap between OLTP and OLAP
Design. Business Intelligence Journal , 8,3, 1-17.
[11] Laender H. F., Freitas, G.M., and Campos, M.L. (2002). MD2- Getting
Users Involved in the Development of Data Warehouse Applications.
4th International Conference Workshop Design and Management of
Data warehouses. May 27, Toronto, University of British Columbia, 3-
12.
[12] Lambert, B. (1995). Break Old Habits To Define Data Warehousing
Requirements. Data Management Review .
[13] Malinowski, E. and Zimanyi, E. (2007). A conceptual model for
temporal data warehouses and its transformation to the the ER and
object-relational model. Data and Knowledge Engineering ,64, 101-133.
[14] Martyn, T. (2004). Reconsidering Multi-Dimensional Schemas. ACMs
Special Interest Group On Management of Data , 33,1, 83-88.
[15] Nguyen, T. M., Tjoa, A. M., and Trujillo, J. (2005). Data Warehousing
and Knowledge Discovery: A Chronological View of Research
Challenges. Springer , 530-535.
[16] Pearson, W. (2008, 1 24). Dimensional Model components: Dimensions
part 1. Retrieved 11 19, 2008, from Database Journal:
http://www.databasejournal.com/features/mssql/article.php/3723311/Di
mensional-Model-Components--Dimensions-Part-I.htm
[17] Phipps, C. and Davis, K.C. (2003). Automating Data warehouse
conceptual Schema Design and Evaluation. Proceedings of the 4th
international conference on Design and Management of Data
warehouses. May 27, Toronto Canada, 23-32
[18] Pokorny, J. (2003). Modeling stars using XML.
[19] Riadh, B. M., Omar, B., & Sabine, R. (2004). A new OLAP Aggregation
Based on the AHC Technique. DOLAP (pp. 65-71). Washington,DC:
ACM.
[20] UNAIDS. (2008). 2008 Report on the Global AIDS epidemic. Geneva:
WHO Library Cataloguing-in-Publication Data.