You are on page 1of 5

1st SIT COURSEWORK QUESTION PAPER: Autumn Semester 2022

© London Metropolitan University


CS7079 Data Warehousing and Big Data

The aim of this coursework is to design, implement, and test a


Datawarehouse based application on a given business case scenario, to
export data from it and ingest it for further processing on a Big Data platform.

Coursework Specification:

CASE STUDY of Boss Inc Retail Store

Aim: Boss Inc with head quarter in Charlotte, North Carolina has around 110 stores
across the country. Each store is composed of various departments with each
department offering variety of products to customer of various age group and
gender. Each department provides range of products like clothing, home appliance,
jewelry, cosmetic. The store also has several customer loyalty schemes provided to
their registered customer divided in several categories like Platinum, Silver and
Gold.
Boss now wants to build a Data warehouse to analyze the customer sales data of
all the stores and use it for improved and effective promotion for the customers.
Also, the top management wants to visualize the profit margin on all departments
on a monthly basis.

The warehousing systems need to be developed and tested on big data and data
warehousing architecture.

Business scenarios:

Customers can buy products from any store. If the customer is buying for the first
time, sales representative will ask the customer if s/he would like to get through the
registration process. Otherwise, customer may later signup through customer portal
or next time in store. If the registered customer is buying, the customer will get
points based on the amount s/he spend in the store. Customer point will start from
1 and incremented by 1 on spending each $10 extra (i.e. expenses <=$10,
points=1, expenses between $10 and $20, point =2, expenses between $20 and
$30, points=3). This loyalty points will be accumulated against his loyalty card id. A
customer will be classified as Platinum customer if he has spent less than $500 in
last quarter, he is upgraded to Silver customer if s/he spends in between $500 to
$1500 in last quarter. S/He becomes a Gold customer if the buying amount for last
quarter is more than $1500. The customer will be downgraded if s/he does not
meet the buying criteria in two consecutive quarter. Promotions on different items
can be offered to the Platinum, Gold and Silver customers separately. Obviously,
the offer that Platinum customers are getting will be given to Silver and Gold but all
the offers that Silver customer are getting may not be offered to Platinum
customers. Same applies to the rest type of Customers. Customer Hierarchy will
be: Platinum 🡪 Silver 🡪 Gold.
Profit Analysis is pretty straight forward. Dashboards/reports needs to be generated
to analyze which department is more profitable and which one is the least on a
monthly basis.

Description of Data Source Information:

For this coursework, the data set (OLTP) should include;

1. Location: This table consists of details about all the store locations (110). The
information stored in this table includes id and description for country, region, state,
district/city and location.

2. Customer: This consists of details about the customers who bought the product
from the store. It includes customer id, first name, middle name, last name,
customer category, address, phone number, age, gender, etc.

3. Loyalty Card: This consists of details about the Loyalty Card Type. The
information stored in this table includes loyalty id, loyalty schema name (Platinum,
Silver, Gold) and threshold amount.

4. Sales: This table includes the details about the purchase made by customer.
Data will be in Item/Location/Day level. It includes Product (Item) ID, Store/Location
Id, Day Id, Customer ID, Quantity of the product purchased, Amount paid by
customer, Discount if applicable.

5. Price Table: This table will hold information about the cost price(WAC) of each
item that Boss Inc paid to its supplier. This table includes information like Item Id ,
Location Id, Day ID, Cost Price.

6. Promotion Scheme: This consists of details about the promotion schemes that
can be offered to customers based on Loyalty Card Type. The information stored in
this table includes loyalty id, Item, promotion start date, promotion end date,
promotion type like clearance sales etc, promotion schema like Buy 2 get 30%
discount, Buy 3 get 1 free etc.

7. Time Table: This consists of details about the time like Year, Half Year, Quarter,
Month, Week, Day, Hour, Minute, Seconds.

Tasks:

A. Analysis and Design of DDM & DWH

1. Start building a ER diagram for the Data warehouse. Make sure ER


diagram follows Snowflake schema. In addition to that differentiate
Dimensions and Facts in your ER Diagram and add appropriate
primary keys and foreign keys constraints.
(10 Marks)

2. Create all the required databases, schemas, tables and views for
Source information as explained in the above section. Insert data into
those source tables as required.
(5 Marks)

3. Create all the required databases, schemas, tables and views for Data
warehousing purpose. (5 Marks)

4. Demonstrate the ETL Process on loading data from the source table
into corresponding target tables as per the business scenario
explained in the prior section. Transformation examples should be
shown clearly.

(25 Marks)

5. Create a fact table and insert data as follows: (5 Marks)


● Base transaction-level sales table
● Sales aggregation table at Item, Location, Month level
● Positional fact entity (Price). It must have a price for
Item/Location/Day

B. Dashboards and Reports

1. Prepare the business reports on below mentioned areas. (10 Marks)


● Monthly/Weekly report of Boss Inc retail business.
● Profit analysis of each Product Departments.
● Monthly analysis of top 10 and bottom 10 customers. Find out
which product was purchased the maximum number of times.
C. Big Data

Migrate test data from the data warehouse to an Apache Hadoop


platform for further analysis of Big Data using Apache Sqoop.
[Note: - For loading and processing data Please write scripts.]

1. Export the data warehouse database data into staging zone in Hive.
Demonstrate use of Sqoop to export and import files to HDFS from the
Database of your choice.
(10 Marks)
2. Load the final data file into Final HIVE Zone and demonstrate manipulation on
loaded data. (10 Marks)
3. Demonstrate the use of file formats and other optimization techniques in
Sqoop and Hive.
(10 Marks)

D. Report Writing

The report, which you will submit, should be well written, structured
and well-presented and it must include:
● An introduction section that summarizes the objectives of the
course work and business case scenario. (5 Marks)

● Provide a personal reflective conclusion of what you have learnt


from your overall coursework. (5 Marks)

END

You might also like