Member-only story
Data Modelling in Lakehouse
2 min readOct 4, 2024
In the Medallion Architecture, data modeling occurs across all three layers – Bronze, Silver, and Gold – but with different purposes and levels of refinement at each stage. Here’s how data modeling typically applies to each layer:
Bronze Layer
The Bronze layer is primarily for raw data ingestion, so minimal modeling occurs here:
- Create tables or directories to store raw data in its original format.
- Add metadata columns like ingestion timestamp, source system, and file names.
- Enable schema evolution to handle changes in source data structure[1].
Silver Layer
The Silver layer is where significant data modeling and transformation takes place:
- Design normalized or semi-normalized data models, often adhering to 3rd normal form principles[5].
- Create tables for key business entities and relationships.
- Apply data quality rules, deduplication, and basic transformations[1].
- Implement source system-aligned data models to maintain traceability[1].
Gold Layer
The Gold layer focuses on optimizing data for specific use cases and consumption: