Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. In this paper, we focus on the problem of the definition of ETL activities and provide formal foundations for their conceptual representation. The proposed conceptual model is (a) customized for the tracing of inter-attribute relationships and the respective ETL activities in the early stages of a data warehouse project; (b) enriched with a 'palette ' of a set of frequently used ETL activities, like the assignment of surrogate keys, the check for null values, etc; and (c) constructed in a customizable and extensible manner, so that the designer can enrich it with his own re-occurring patterns for ETL activities. Categories and Subject Descriptors H.2.1 [Database Management]: Logical design- data models, schema and subschema.
|
109
|
The Unified Modeling Language User Guide. Addison-Wesley [Fenv01] S.J. Fenves. A core product model for representing design information
– Booch, Rumbaugh, et al.
- 2001
|
|
81
|
Potter’s wheel: An interactive data cleaning system
– Raman, Hellerstein
- 2001
|
|
75
|
Data cleaning: Problems and current approaches
– Rahm, Do
- 2000
|
|
64
|
The Dimensional Fact Model: A Conceptual Model for Data Warehouses
– Golfarelli, Maio, et al.
- 1998
|
|
58
|
Extending the E/R Model for the Multidimensional Paradigm
– Sapia, Blaschka, et al.
- 1998
|
|
53
|
Ajax: An extensible data cleaning tool
– Galhardas, Florescu, et al.
- 2000
|
|
48
|
A methodological framework for data warehouse design
– Golfarelli, Rizzi
- 1998
|
|
48
|
The Data Warehouse Lifecycle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses
– Kimball, Reeves, et al.
- 1998
|
|
46
|
starER: A conceptual model for data warehouse design
– Tryfona, Busborg, et al.
- 1999
|
|
44
|
Architecture and quality in data warehouses: An extended repository approach
– Jarke, Jeusfeld, et al.
- 1999
|
|
31
|
From enterprise models to dimensional models: a methodology for data warehouse and data mart design
– Moody, Kortink
|
|
27
|
Design and analysis of quality information for data warehouses
– Jeusfeld, Quix, et al.
- 1998
|
|
20
|
Modeling Data Warehouse Refreshment Process as a Workflow Application
– Bouzeghoub, Fabret, et al.
- 1999
|
|
14
|
A dimensional modeling manifesto
– Kimball
- 1997
|
|
14
|
Matching Algorithms within a Duplicate Detection System
– Monge
- 2000
|
|
14
|
Modeling ETL activities as graphs
– Vassiliadis, Simitsis, et al.
- 2002
|
|
12
|
Automatically extracting structure from free text addresses
– Borkar, Deshmuk, et al.
- 2000
|
|
12
|
Enterprise Information Portals
– Shilakes, Tylman
- 1998
|
|
11
|
Applying Object-Oriented Conceptual Modeling Techniques to the Design of Multidimensional Databases and OLAP applications
– Trujillo, Palomar, et al.
- 2000
|
|
11
|
Conceptual data modeling for OLAP
– MAC
- 2001
|
|
11
|
in the land of data warehousing: practical experiences and observations of a researcher
– Gulliver
- 2000
|
|
10
|
Conceptual data warehouse modeling
– Hüsemann, Lechtenbörger, et al.
- 2000
|
|
8
|
Vassiliadis (eds.). Fundamentals of Data Warehouses
– Jarke, Lenzerini, et al.
- 2000
|
|
8
|
ARKTOS: towards the modeling, design, control and execution of ETL processes
– Vassiliadis, Vagena, et al.
- 2001
|
|
7
|
Oracle9i™ Warehouse Builder User’s Guide, Release 9.0.2
– Corp
- 2001
|
|
3
|
The Data Warehouse Budget
– Inmon
- 1997
|
|
2
|
The politics of data warehousing. http://www.hevanet.com/demarest/marc/dwpol.html
– Demarest
- 1998
|
|
2
|
An Object Oriented Multidimensional Data Model for OLAP
– Nguyen, Tjoa, et al.
- 2000
|
|
1
|
Information integration: Conceptual
– Calvanese, Giacomo, et al.
|
|
1
|
modeling and reasoning support
– Calvanese, Giacomo, et al.
- 1998
|
|
1
|
Efficient Resumption of Interrupted Warehouse Loads
– Wiley
- 1998
|
|
1
|
MS Data Transformation Services. www.microsoft.com/sq
– Corp
|