Using Perspective Schemata to Model the ETL Process

Data Warehouses (DWs) are repositories which contain the unified history of an enterprise for decision support. The data must be Extracted from information sources, Transformed and integrated to be Loaded (ETL) into the DW, using ETL tools. These tools focus on data movement, where the models are only used as a means to this aim. Under a conceptual viewpoint, the authors want to innovate the ETL process in two ways: 1) to make clear compatibility between models in a declarative fashion, using correspondence assertions and 2) to identify the instances of different sources that represent the same entity in the real-world. This paper presents the overview of the proposed framework to model the ETL process, which is based on the use of a reference model and perspective schemata. This approach provides the designer with a better understanding of the semantic associated with the ETL process.





References:
[1] W. H. Inmon, Building the data warehouse, 4th ed. Wiley Publishing,
2005.
[2] R. F. Raminhos, "ETL state of the art," New University of Lisbon,
Tech. Rep., June 2007, unplished.
[3] C. Imhoff, N. Galemmo, and J. G. Geiger, Mastering Data Warehouse
Design - Relational and Dimensional Techniques. Wiley Publishing,
2003.
[4] J. M. P'erez, R. Berlanga, M. J. Aramburu, and T. B. Pedersen,
"A relevance-extended multi-dimensional model for a data warehouse
contextualized with documents," in DOLAP-05: Proc. of the 8th ACM
Intl. Workshop on Data Warehousing and OLAP. USA: ACM, 2005,
pp. 19-28.
[5] R. Matias and J. Moura-Pires, "Revisiting the olap interaction to cope
with spatial data and spatial data analysis," in ICEIS 2007 - Proc.
of the 9th Intl. Conf. on Enterprise Information Systems, J. Cardoso,
J. Cordeiro, and J. Filipe, Eds., vol. DISI, 2007, pp. 157-163.
[6] D. Calvanese, L. Dragone, D. Nardi, R. Rosati, and S. M. Trisolini,
"Enterprise modeling and data warehousing in TELECOM ITALIA,"
Inf. Syst., vol. 31, no. 1, pp. 1-32, 2006.
[7] R. Knackstedt and K. Klose, "Configurative reference model-based
development of data warehouse systems," Idea group publishing, vol.
Managing Modern Organizations through Information Technology, pp.
32-39, 2005.
[8] R. Kimball, M. Ross, W. Thornthwaite, J. Mundy, and B. Becker, The
Data Warehouse Lifecycle Tookit, 2nd ed. Wiley Publishing, 2008.
[9] W. Eckerson, "Four ways to build a data warehouse,"
What works, vol. 15, 2003. (Online). Available:
http://www.tdwi.org/research/display.aspx?id=6699.
[10] D. L. Moody, "From enterprise models to dimensional models: A
methodology for data warehouse and data mart design," in Proc. of the
Intl. Workshop on Design and Management of Data Warehouses, 2000.
[11] E. F. Codd, "A relational model of data for large shared data banks," in
Communications of the ACM, 1970, pp. 377-387.
[12] R. G. Cattell and D. Barry, Eds., The Object Database Standard ODMG
3.0. Morgan Kaufmann Publishers, 2000.
[13] T. B. Pedersen, "Warehousing the world: a few remaining challenges," in
DOLAP-07: Proc. of the ACM 10th intl. workshop on data warehousing
and OLAP. USA: ACM, 2007, pp. 101-102.
[14] R. Elmasri and S. B. Navathe, Fundamentals of database systems, 5th ed.
Pearson Education, 2006.
[15] V. M. Pequeno and J. C. G. M. Pires, "A formal object-relational data
warehouse model," New University of Lisbon, Tech. Rep., November
2007.
[16] G. Zhou, R. Hull, and R. King, "Generating data integration mediators
that use materialization," J. Intell. Inf. Syst., vol. 6(2/3), pp. 199-221,
May 1996.
[17] IBM, DB2 version 9.1 for z/OS - SQL reference, 6th ed. IBM
Corporation, December 2008.
[18] S. Abreu and V. Nogueira, "Using a logic programming language with
persistence and contexts," in Declarative Programming for Knowledge
Management, 16th intl. conf. on applications of declarative programming
and knowledge management, INAP 2005, Japan. Revised Selected
Papers., ser. Lecture Notes in Computer Science, O. Takata, M. Umeda,
I. Nagasawa, N. Tamura, A. Wolf, and G. Schrader, Eds., vol. 4369.
Springer, 2006, pp. 38-47.
[19] G. Wiederhold, "Mediators in the architecture of future information
systems," in IEEE Computer, vol. 25(3), 1992, pp. 38-49.
[20] D. Dori, R. Feldman, and A. Sturm, "From conceptual models
to schemata: An object-process-based data warehouse construction
method," Inf. Syst., vol. 33, no. 6, pp. 567-593, 2008.
[21] E. Malinowski and E. Zim'anyi, "A conceptual model for temporal data
warehouses and its transformation to the ER and the object-relational
models," Data knowl. eng., vol. 64, no. 1, pp. 101-133, 2008.
[22] M. Golfarelli, V. Maniezzo, and S. Rizzi, "Materialization of fragmented
views in multidimensional databases," Data Knowl. Eng., vol. 49, no. 3,
pp. 325-351, 2004.
[23] B. Husemann, J. Lechtenborger, and G. Vossen, "Conceptual data
warehouse modeling," in Design and Management of Data Warehouses,
2000, p. 6.
[24] S. Rizzi, "Conceptual modeling solutions for the data warehouse," In
Data Warehousing and Mining: Concepts, Methodologies, Tools, and
Applications, vol. Information Science Reference, pp. 208-227, 2008.
[25] R. Wrembel, "On a formal model of an object-oriented database with
views supporting data materialisation," in Proc. of the Conf. on Advances
in Databases and Information Systems, 1999, pp. 109-116.
[26] E. Franconi and A. Kamble, "A data warehouse conceptual data model,"
Proc. of the Int. Conf. on Scientific and Statistical Database Management,
vol. 00, pp. 435-436, 2004.
[27] A. S. Kamble, "A conceptual model for multidimensional data," in
APCCM-08: Proc. of the 15th on Asia-Pacific Conf. on Conceptual
Modelling. Australia: Australian Computer Society, Inc., 2008, pp.
29-38.
[28] C. Sapia, M. Blaschka, G. H¨ofling, and B. Dinter, "Extending the E/R
model for the multidimensional paradigm," in Proc. of the Workshops
on Data Warehousing and Data Mining, 1999, pp. 105-116.
[29] N. Tryfona, F. Busborg, and J. G. B. Christiansen, "starER: a conceptual
model for data warehouse design," in DOLAP -99: Proc. of the 2nd ACM
Intl. Workshop on Data warehousing and OLAP. USA: ACM, 1999,
pp. 3-8.
[30] S. Luj'an-Mora, J. Trujillo, and I.-Y. Song, "A UML profile for multidimensional
modelling in data warehouses," Data Knowl. Eng., vol. 59,
no. 3, pp. 725-769, 2005.
[31] T. B. Nguyen, A. M. Tjoa, and R. Wagner, "An object oriented
multidimensional data model for OLAP," in Web-Age Inf. Management,
2000, pp. 69-82.
[32] J. Trujillo, M. Palomar, and J. Gomez, "Applying object-oriented conceptual
modeling techniques to the design of multidimensional databases
and OLAP applications," WAIM-00. Lecture Notes in Computer Science
(LNCS), vol. 1846, pp. 83-94, 2000.
[33] F. Ravat and O. Teste, "A temporal object-oriented data warehouse
model," in Proc. of the Int. Workshop on Database and Expert Systems
Applications, 2000, pp. 583-592.
[34] P. Vassiliadis, A. Simitsis, and S. Skiadopoulos, "Conceptual modeling
for ETL processes," in DOLAP-02: Proc. of the 5th ACM Intl. Workshop
on Data Warehousing and OLAP. USA: ACM, 2002, pp. 14-21.
[35] D. Skoutas and A. Simitsis, "Designing ETL processes using semantic
web technologies," in DOLAP-06: Proceedings of the 9th ACM international
workshop on Data warehousing and OLAP. USA: ACM, 2006,
pp. 67-74.