You are on page 1of 20

AdventureWorks2019 SSIS Exploration Document

Date: 3.01.2024

Version: 4.0 *After Production

Idan Gadasin
CONTENT
1 DOCUMENT MANAGEMENT ...........................................................................3
1.1 Project Scope & Objective ........................................................................... 3
1.2 Project Content ............................................................................................ 3

2 TECHNICAL SPECIFICATIONS ..................................................................................4


2.1 Prerequisites – the involved systems during the process ..................................... 4
2.2 Solution Architecture ....................................................................................... 4

3 FUNCTIONAL SPECIFICATION.................................................................................4
3.1 Source to Target and ERD models ...................................................................... 4
3.2 Detailed description of ETL Process .................................................................... 5
3.3 in the DWH Tables + History tables .................................................................. 16

Page 2 of 20
1 Document Management
1.1 Project Scope & Objective
The purpose of the project is to provide Data Warehouse Solution Creation from AdventureWorks2019
Database for SSIS_DB, a company that is a multinational leader in the manufacturing aspect of non-food
items.
This project aims to establish a comprehensive BI solution transferring data from the AdventureWorks2019
Database for SSIS_DB. The solution will contain summarized data tables, with a focus on sales data,
employee records, customer information, product details.
Moreover, the project will solve the following issues:
• Providing a unified approach to organized and well-suited data for significant decision-making
scenarios.

• Integrating data from different sources to be further used by reporting activities. The data is
updated in real time, which makes it conveniently used for automated actions.

1.2 Project Content

In this project, we will build a Data Mart that will include information about many aspects of a real-world
business order detailing from end-to-end perspective.

1. Data Cleaning and Preparation: Before the analysis starts, we filter our data via data cleaning method
and preparation to verify their quality and consistency.

2. Main summary tables that will be built for the company's demands:

҂ FactOrders – Contains information about all the orders, including dates, pricing and quantity per
customer from the transactions of the transactional database.

҂ DimEmployees – Information about the company's employees.

҂ DimCustomers – Information about the company's customers.

҂ DimProducts - Information about the products that are sold by the company.

3. The project will contain measures that will lead into action to the achievement of the project's goal:

✓ Orders Department:
The Orders Department will focus on the sales-related data, when the order took place, identified
by the customer that made the order, the employee that activated the order. This department
will contain a full description of the purchase including the productID, its price and the total
amount per product in the order.
The department gathers information every time that a new order enters to the system and help
the user to follow over the new transactions.

Page 3 of 20
✓ Employees Department:
Each employee can create an order for a specific customer, the order will be added inside the factOrders
table.
The information about the employees will be stored inside dimEmployees table.
This department includes an identifier (emp_id) that is connected to the FactOrders table.

✓ Customer Department:
A customer can make a new order, all the data about the customer will be stored inside the dimCustomers
table – including an identifier (CustomerID) that is connected to the FactOrders table.

✓ Production Department:
A table that contains all the manufactured product in the company, every product has a description about
it. It includes an identifier (ProductID) which is connected to FactOrders Table and will take part inside the
transaction process.

2 Technical Specifications

2.1 Prerequisites – the involved systems during the process

System / Process Explanation


SQL Server Operational DB – tables – data (SQL Files)
SSIS ETL processes using SSIS in Visual Studio
Data Refreshing Refreshing processes through an attribute of Employees in SSMS

2.2 Solution Architecture

HLD:

Data Source ETL DWH


SQL Database

3 Functional Specification

3.1 Source to Target and ERD models


Source to target documentation link | ERD link

Page 4 of 20
3.2 Detailed description of ETL Process

҂ In SSIS, I have 12 packages for the required tables.

1. FactOrders Table:
• FactOrders:

The source is StgOrders table, after loading the source, the process adds a calculated colum
'Total' and after that a variable counts the amount of rows affected. The information inside
the table is transferred to 'FactOrders'

After that the records of: Package name, tablename, Start_Date, End_Date, the amount of
transferred rows are inserted inside the table 'Transfertable'

• StgOrders:

Creating a table 'stgOrders' that includes a united results of the SQL query that joins between
the different tables into our desired result. After that we count the amount of the transferred
rows affected. After that the data will be loaded inside 'stgOrders'

Every time the package runs, all the previous records from stgOrders will be deleted, ONLY
the new records will be inserted inside 'StgOrders' from the mrr tables. After that the records
of: Package name, tablename, Start_Date, End_Date, the amount of transferred rows are
inserted inside the table 'Transfertable'

• mrrOrders:
Transferring the Tables 'Sales.SalesOrderDetail', 'Sales.SalesOrderHeader' into mrr tables.
The load action will load ONLY new records that did not appear in the 'FactOrders' table before
to avoid a case that we load duplicated data. All the new records will be counted.

Page 5 of 20
Every time the package runs, all the previous records from mrrOrders will be deleted, ONLY the
new records will be inserted inside 'mrrOrders'. After that the records of: Package name,
tablename,Start_Date, End_Date, the amount of transferred rows are inserted inside the table
'Transfertable'

2. FactRetailOrders Table:
• FactRetailOrders:

The source is StgOrders table, after loading the source, the process adds a calculated colum
'Total' and after that a variable counts the amount of rows affected. The information inside
the table is transferred to 'FactRetailOrders'

After that the records of: Package name, tablename, Start_Date, End_Date, the amount of
transferred rows are inserted inside the table 'Transfertable'

• StgRetailOrders:

Creating a table 'stgOrders' that includes a united results of the SQL query that unions the
different tables of the transaction for every retailer into our desired result. After that we
count the amount of the transferred rows affected. After that the data will be loaded inside
'stgRetailOrders'

Every time the package runs, all the previous records from 'StgRetailOrders' will be deleted,
ONLY the new records will be inserted inside 'StgRetailOrders' from the mrr tables. After that
the records of: Package name, tablename, Start_Date, End_Date, the amount of transferred
rows are inserted inside the table 'Transfertable'

Page 6 of 20
• mrrRetailerss:
Loading the csv files: Mashbir,Mega,Retailers into mrr tables.
The load action will load ONLY new records that did not appear in the 'FactRetailOrders' table
before to avoid a case that we load duplicated data. All the new records will be counted.

Every time the package runs, all the previous records from mrrRetailOrders will be deleted,
ONLY the new records will be inserted inside our mrr tables. Next step is to transfer the files that
were loaded inside Achieve folder. After that the records of: Package name,
tablename,Start_Date, End_Date, the amount of transferred rows are inserted inside the table
'Transfertable'.

Page 7 of 20
3. DimEmployees Table:
• DimEmployees:

Loading all the data from StgEmployees, the data will be loaded according to the Upsert Merge
Command that will load & handle the data in few scenarios: 1. Load New Data, 2. Update
Existing data that had changes in the Online Transaction processing Tables
(HumanResource.Employees). 3. Data that has been deleted from the Online Transaction
processing.

• StgEmployees:

Creating a table 'stgEmployees' that includes a united results of the SQL query that joins
between the different tables into our desired result. After that we count the amount of the
transferred rows affected. After that the data will be loaded inside 'stgEmployees'

Every time the package runs, all the previous records from stgEmployees will be deleted,
ONLY the new records will be inserted inside ' stgEmployees ' from the mrr tables. After that
the records of: Package name, tablename, Start_Date,End_Date, the amount of transferred
rows are inserted inside the table 'Transfertable'

4. DimCustomers Table:
• DimCustomers:

Loading all the data from StgCustomers, , the data will be loaded according to the Upsert Merge
Command that will load & handle the data in few scenarios: 1. Load New Data, 2. Update
Existing data that had changes in the Online Transaction processing Tables (Sales.Customers). 3.
Data that has been deleted from the Online Transaction processing. After Loading the
'dimCustomers' table, the records of: Package name, tablename, Start_Date,End_Date, the
amount of transferred rows are inserted inside the table 'Transfertable'

• StgCustomers:

Creating a table 'stgCustomers' that includes a united results of the SQL query that joins
between the different tables into our desired result. After that we count the amount of the
transferred rows affected. After that the data will be loaded inside 'stgCustomers'

Page 8 of 20
Every time the package runs, all the previous records from stgCustomers will be deleted,
ONLY the new records will be inserted inside ' stgCustomers ' from the mrr tables. After that
the records of: Package name, tablename, Start_Date,End_Date, the amount of transferred
rows are inserted inside the table 'Transfertable'

5. DimProducts Table:
• DimProducts:
Loading all the data from StgProducts step by step, Only records that had changes (New
Records, Updated Records or Deleted records) will be loaded inside 'DimProducts' table, after
that the process will create the updated dimProducts.
In the first step, we deal with records that have been deleted from the
source(Production.Products) and appear in the target (dimProducts).

After that, we load all the new data or the updated data inside the Target (dimProducts)

After Loading the 'dimProducts' table, the records of: Package name, tablename, Date, the
amount of transferred rows are inserted inside the table 'Transfertable'

• StgProducts:

Creating a table 'stgProducts' that includes a united results of the SQL query that joins between
the different tables into our desired result. After that we count the amount of the transferred
rows affected. After that the data will be loaded inside 'stgProducts'

Every time the package runs, all the previous records from stgProducts will be deleted, ONLY
the new records will be inserted inside ' stgProducts ' from the mrr tables. After that the
records of: Package name, tablename, Start_Date,End_Date, the amount of transferred rows
are inserted inside the table 'Transfertable'

Page 9 of 20
6. DimStores:
• DimStores:
Loading all the data from mrrStores, csvFile, , the data will be loaded according to the Upsert
Merge Command that will load & handle the data in few scenarios: 1. Load New Data, 2.
Update Existing data that had changes in the Online Transaction processing Tables (Sales.Store).
3. Data that has been deleted from the Online Transaction processing. After Loading the
'dimStores' table, the records of: Package name, tablename, Start_Date,End_Date, the amount
of transferred rows are inserted inside the table 'Transfertable'

• StgStores:
Every time the package runs, only new stores will be loaded into the stgStores table. The store
table didn't match perfectly to the business rules that were demanded after the product
approached to the customer, in this stage the data has been matched between the 2 different
sources via UNION between the csv file table structure and the store table structure.

After Loading the 'dimStores' table, the records of: Package name, tablename,
Start_Date,End_Date, the amount of transferred rows are inserted inside the table
'Transfertable'

7. DimRetailers:
• DimRetailers:
Loading all the data from csvFile, , the data will be loaded according to the Upsert Merge
Command that will load & handle the data in few scenarios: 1. Load New Data, 2. Update
Existing data that had changes in the Online Transaction processing Tables (Sales.Store). 3. Data
that has been deleted from the Online Transaction processing. After Loading the 'dimRetailers'
table, the records of: Package name, tablename, Start_Date,End_Date, the amount of
transferred rows are inserted inside the table 'Transfertable'

Page 10 of 20
8. MrrDimTables:
Note: All the mrr tables were created in a single package, they all share the same control flow, this
is why it is displayed here and not in the dim tables explanation.
All the tables are created via data flow that creates each mrr table and counts how many rows were
transferred in the transaction.

mrrEmployees::
Loading the source tables from the Online Transaction Processing tables that related to employees
into new mrr tables in our database, after loading each mrr table, the amount of the transferred
rows will be counted to check how many rows transferred and recognize what happened during the
process.

mrrCustomers:
Loading the source tables from the Online Transaction Processing tables that related to customers
into new mrr tables in our database, after loading each mrr table, the amount of the transferred
rows will be counted to check how many rows transferred and recognize what happened during the
process.

Page 11 of 20
mrrProducts:
Loading the source tables from the Online Transaction Processing tables that related to products
into new mrr tables in our database. ONLY new data will be loaded into mrrProducts table. New data
is data that has been modified \ updated after the last date that appears on dimProducts
After loading each mrr table, the amount of the transferred rows will be counted to check how many
rows transferred and recognize what happened during the process.

mrrStores:
Loading the source tables from the Online Transaction Processing tables that related to stores into
new mrr tables in our database, after loading each mrr table, the amount of the transferred rows
will be counted to check how many rows transferred and recognize what happened during the
process.

mrrRetailStores:
Loading the source tables from the CSV File that related to stores into new mrr tables in our
database, after loading each mrr table, the amount of the transferred rows will be counted to check
how many rows transferred and recognize what happened during the process.

The flow of the mrr tables:


Before every load of the mrr tables, we truncate them, load each table – starting for mrrEmployees
table, insert into transfertable the data about the transaction, moving to mrrCustomers, insert into
transfertable the data about the transaction, loading mrrProduct tables, moving to mrrStores, insert
into transfertable the data about the transaction, moving to mrrRetailerstore, insert into
transfertable the data about the transaction.

Page 12 of 20
9. Transfertable Table:
• Transfertable:

The table itself updates itself during the Control Flow of the previous screen shots that were
displayed in the file.

The packages that included an 'EXECUTE SQL COMMAND' of insert are:


mrrOrders, mrrDimTables, stgOrders, stgEmployees, stgCustomers, stgProducts,
FactOrders, DimEmployees, DimCustomers, DimProducts.

All the MRR, Stg, Dim tables are being inserted into TransferTable transaction after transaction
automatically.

DimProductsHistory:
• DimProductsHistory:
The table contains all the data that is inside DimProducts and in addition the change that happened
per product (Update, Deletion) or insert new record (means no previous history for the specific
product). The information about each record will be given from the status of the record in the
'StgProducts' table and it will be compared to the history table that will make the required changes.
The table will be loaded in data according to the given description before.

Page 13 of 20
After that the package will handle records that have been deleted from the source and exist inside
the DimProducts

The next step is to deploy the Project


The deployed path : /SSISDB/BI33/PRODPROJ_SSIS_DB Into SSMS.
After that I create a job called MrrLoadingJob which will call the next job in line until all the jobs will be done.
The jobs are separated into small parts because if an error will occur, we want to detect it in the specific job and be
more efficient instead of loading massive amount of data each time.

This is the chronological order of the jobs:


MrrLoadingJob → Cus_Stg2DimJob → Emp_Stg2DimJob → Prd_Stg2DimJob → Rtl_Stg2DimJob →
Str_Stg2DimJob → FactOrders_Stg2DWHJob → FactRetailOrders_Stg2DWHJob

All the Extract-Transform-Load process will happen every Sunday at 04:00 AM.

Page 14 of 20
MRRLoadingJob steps:

Cus_Stg2DimJob steps:

Emp_Stg2DimJob steps:

Prd_Stg2DimJob steps:

Rtl_Stg2DimJob steps:

Str_Stg2DimJob steps:

FactOrders_Stg2DWHJOB steps:

FactRetailOrders_Stg2DWHJOB steps:

Page 15 of 20
3.3 in the DWH Tables + History tables
҂ Tables in the DWH:
• FactOrders:
In the FactOrders table we have the following columns:
1. Order_ID: The ID of the order.
2. Order_Date: The date that the order was recorded.
3. Due_Date: The date that the order is due to the customer.
4. Ship_Date: The date that the order was shipped to the customer.
5. Customer_ID: The ID of the customer. Foreign Key to DimCustomers table.
6. SalesPersonID: The ID of the employee that created the order. Foreign Key to
DimEmployees table.
7. TerritoryID: The ID of the place that the order took place at.
8. ProductID: The ID of the purchased product, Foreign Key to DimProducts Table.
9. Qty: The quantity of the selected product related to the specific order.
10. Price: The price of each unit of the purchased product.
11. Discount: The percentage of the discount in the given order for the given product.
12. Total: The total amount that the order was cost per product inside the order, including
the unitprice, discount, quantity and an additional VAT payment in a rate of 17%.

Page 16 of 20
• FactRetailOrders:
In the FactRetailOrders table we have the following columns:
1. OrderID: The ID of the order.
2. Store_ID: The ID of the store that made the order. Foreign Key to DimStores table
3. City: The city that the order took place at.
4. ProductID: The ID of the purchased product, Foreign Key to DimProducts Table.
5. UnitPrice: The price of each unit of the purchased product.
6. Quantity: The quantity of the selected product related to the specific order.
7. Discount: The percentage of the discount in the given order for the given product.
8. Total: The total amount that the order was cost per product inside the order, including
the unitprice, discount, quantity and an additional VAT payment in a rate of 17%.
9. InsertDate: The date that the order was inserted to the system.

• DimProducts:
In the DimProducts table we have the following columns:
1. productID: Behaves as a primary key of the table. Represents the unique ID of the product.
2. Product_Name: The name of the product.
3. Sub_Category_Name: The name of the subcategory that the product related to.
4. Category_Name: The name of the category that the product related to.
5. UpdateDate: A field that shows if a product has been deleted from the original database or not.
6. isActive: A field that shows if the product is still active for sales or not.

Page 17 of 20
• DimEmployees:
In the DimEmployees table we have the following columns:
1. emp_id: Behaves as a primary key of the table, Represents the unique ID of the employee.
2. First_Name: The first name of the employee.
3. Last_Name: The last name of the employee.
4. Job_Title: The job title of the employee.
5. Hire_Date: The date that the employee was hired to the company.
6. Phone_Number: The employee's phone number.
7. Email_Address: The employee's email address.
8. Territory_Name: The territory that the employee was assigned to.
9. isEngineer: A flag column, checks if the employee's title include Engineer field or not.
10. isActive: A flag column that displays if the employee is still in the company or not.
11. UpdateDate: The date that the table was updated per employee.

• DimCustomers:
In the DimCustomers table we have the following columns:
1. Customer_ID: Behaves as a primary key of the table, Represents the unique ID of customer.
2. Name: The full name of the customer.
3. Address: The address of the customer.
4. City: The city that the customer lives at.
5. Region: The religion that the customer is related to.
6. Country: The country of the customer.
7. UpdateDate: Column that checks when the customer's details were updated most recently.

• TransferTable:

Page 18 of 20
In the TransferTable table we have the following columns:
1. Packagename: The name of the recorded package.
2. Tablename: The name of the recorded table.
3. Start_Date: The date that the load started the action.
4. End_date: The date that the transaction finished the action.
5. Count: The amount of rows that were transferred in the specific transaction.

• DimProductsHistory:
In the DimProductsHistory table we have the following columns:
1. productID: Behaves as a primary key of the table. Represents the unique ID of the product.
2. Product_Name: The name of the product.
3. Sub_Category_Name: The name of the subcategory that the product related to.
4. Category_Name: The name of the category that the product related to.
5. Insert_Date: A field that shows when the record was updated in the dimProducts table.
6. End_Date: A field that shows if a product has been deleted from the original database or not.
NULL = Active Product, not null = product that is still active

Page 19 of 20
• DimStores:
In the DimStores table we have the following columns:
1. storeID: Behaves as a primary key of the table. Represents the unique ID of the store.
2. StoreName: The name of the store.
3. Emp_id: The id of the employee that related to the store, Represents as a Foreign Key to dimEmployees.
4. RetailerID: The id of the retailer that related to the store, Represents as a Foreign Key to dimRetailers.
5. City: The city that the store is located at.
6. isRetailer: A flag that checks if the related person is an employee or retailer.
7. UpdateDate: A field that shows if a product has been updated from the original database or not.
8. isActive: A field that shows if the store is still active for service or not.

** Store can't have both Employee and Retailer that related to it.

• DimRetailers:
In the DimRetailers table we have the following columns:
1. Retailer_ID: Behaves as a primary key of the table, Represents the unique ID of a retailer.
2. RetailerName: The name of the retailer.
3. RetailerType: The type of the retailer.
4. isActive: A field that shows if the retailer is still active or not.
5. UpdateDate: Column that checks when the retailer's details were updated most recently.

Page 20 of 20

You might also like