Professional Documents
Culture Documents
Save
https://www.researchgate.net/figure/Data-warehouse-architecture_fig1_275068752
I worked as a data engineer for almost 8 years and during that span, attended
interviews where I was asked many data warehousing (DWH) concepts. Hence I
thought to jot down some of the DWH concepts that are important from data
engineering (DE) interview perspective, based on my interview experience in the
past. To aid in DE interview prep, I have listed few high difficulty SQL questions in
my previous post.
https://shobha-bhagwat.medium.com/data-warehousing-interview-questions-ab96c242f19c 1/9
15.10.2022, 11:58 Data Warehousing Interview Questions | by Shobha Bhagwat | Medium
DWH is useful for reporting, analysis, data mining, exploration of historical data
etc. A datamart is a subset of a data warehouse focused on a particular line of
business, department, or subject area.
designed in third normal form and loaded with data. Once these datamarts are
To make Medium work, we log user data.
Open in app Get started
loaded and data is approved by business,
By using Medium, you agreethe DWH is backfilled from these
to our
Privacy Policy, including cookie policy.
datamarts thus reducing implementation time.
C. Types of DWH —
1. Global warehouse (Enterprise DWH) — DWH is centralized and designed for
entire enterprise. This type of DWH is time-consuming and costly to implement.
D. Facts —
Facts are quantitative measures, metrics or facts of the business (e.g. — sales,
revenue etc.). The different types of facts stored in DWH are —
1. Additive — facts that can be summed up through all the dimensions in the fact
table (e.g. sales fact)
2. Semi-additive — facts that can be summed up for some of the dimensions in the
fact table but not for others (e.g. daily balances fact can be summed up through
customer dimension but not through time dimension)
3. Non-additive — facts that can’t be summed up for any of the dimensions present
in the fact table (e.g. facts which are in ratios, percentage form)
4. Factless fact table — fact table that contains no measure or facts but can be used
to get aggregated counts. (e.g. fact table containing only sale_date and
product_key doesn’t contain any measure per se but can be used to get count of
products sold over various periods of time).
Fact tables that contain aggregated facts are called summary tables.
E. Dimensions —
https://shobha-bhagwat.medium.com/data-warehousing-interview-questions-ab96c242f19c 3/9
15.10.2022, 11:58 Data Warehousing Interview Questions | by Shobha Bhagwat | Medium
Dimensions are attributes about facts. The various types of dimensions are-
To make Medium work, we log user data.
Open in app Get started
By using Medium, you agree to our
1. Conformed dimension — dimension
Privacy which
Policy, including cookiemeans
policy. the same thing with every
6. Static dimension — static dimension is not extracted from original data source
but is created within the context of the DWH and can be loaded manually. (e.g.
status codes)
Slowly changing dimensions (SCD) store both current and historical values for an
attribute. Since the dimensions don’t change frequently (as compared to facts), SCD
implementation deals with how history is maintained for dimensions (e.g. change
in marital status). The various types of SCD are —
https://shobha-bhagwat.medium.com/data-warehousing-interview-questions-ab96c242f19c 4/9
15.10.2022, 11:58 Data Warehousing Interview Questions | by Shobha Bhagwat | Medium
One way to implement Type II SCD is to have start and end dates to indicate which is the latest value
Another way to implement Type II SCD is to have a flag to indicate which is the latest value
3. Type III — Original record is modified to include a new column which contains
previous value. Only current and previous value is maintained.
Only current and previous value (limited history) is maintained in SCD Type III
F. Normalization —
Database normalization is the process of structuring a database in accordance with
a series of rules in order to reduce data redundancy and improve data integrity. The
various normal forms are —
1. One normal form (1NF) — Eliminate repeated groups. Make a separate table for
each set of related attributes and give each table a primary key (PK). Each field
contains at most one value.
Original data
https://shobha-bhagwat.medium.com/data-warehousing-interview-questions-ab96c242f19c 5/9
15.10.2022, 11:58 Data Warehousing Interview Questions | by Shobha Bhagwat | Medium
Original table
3. Three normal form (3NF) — Eliminate columns not dependent on the primary
key.
https://shobha-bhagwat.medium.com/data-warehousing-interview-questions-ab96c242f19c 6/9
15.10.2022, 11:58 Data Warehousing Interview Questions | by Shobha Bhagwat | Medium
Online Analytical Processing (OLAP) systems are de-normalized and process data in
batches. They store history and are used to run complex queries for reporting and
analysis. The tables here are de-normalized to support complex querying and data
analysis.
https://shobha-bhagwat.medium.com/data-warehousing-interview-questions-ab96c242f19c 7/9
15.10.2022, 11:58 Data Warehousing Interview Questions | by Shobha Bhagwat | Medium
2. how to load large files into a DWH through batch system (common data
cleaning steps, checking data integrity, null value handling, date formatting,
staging data and then moving to master table based on specific DML strategy
and maintaining PK- FK relationship while loading data)
3. various ETL strategies (for source file placed at ftp server, sourcing data from
transactional tables without locking, duplicate data handling etc.)
4. various data archival and staging strategies for different use cases
76 3
5. how to handle DDL operations while maintaining model integrity
There are many more questions and topics under data warehousing and data
modelling. I have covered only those ones which have been asked to me over the
years. Hope this is helpful!
https://shobha-bhagwat.medium.com/data-warehousing-interview-questions-ab96c242f19c 8/9
15.10.2022, 11:58 Data Warehousing Interview Questions | by Shobha Bhagwat | Medium
https://shobha-bhagwat.medium.com/data-warehousing-interview-questions-ab96c242f19c 9/9