Designing an ETL: Kimball’s 34 Subsystems
The Kimball Group has carefully restructured ETL best practices into 34 subsystems that represent the key ETL architecture components required in almost every dimensional data warehouse environment. The ETL process really has four major components: Extracting, Cleaning and Conforming, Delivering and Managing. Each of these components and all 34 subsystems contained therein are explained below.
EXTRACTING: GETTING DATA INTO THE DATA WAREHOUSE
To no surprise, the initial subsystems of the ETL architecture address the issues of understanding your source data, extracting the data and transferring it to the data warehouse environment where the ETL system can operate on it independent of the operational systems. While the remaining subsystems focus on the transforming, loading and system management within the ETL environment, the initial subsystems interface to the source systems to access the required data. The extract-related ETL subsystems include:
- Data Profiling (subsystem 1) – Explores a data source to determine its fit for inclusion as a source and the associated cleaning and conforming requirements.
- Change Data Capture (subsystem 2) – Isolates the changes that occurred in the source system to reduce the ETL processing burden.