Professional Documents
Culture Documents
SE140798 NguyenThongPhiHuynh
SE140798 NguyenThongPhiHuynh
Chapter 8
1/ Question 1: Match the columns
Your answer 1:
(A ... J)
1 F
2 D
3 J
4 A
5 G
6 E
7 B
8 C
9 H
10 I
2/ Question 2: What are the platform options for the staging area? Compare the options and mention the
advantages and disadvantages.
Your answer 2:
The platform options for the staging area are the staging area sits between data sources and the warehouse
repositories.
3 platform options for the staging area:
Source data platform:
o Save time and effort in moving the data across platforms to the staging area
Data storage platform: platform on which the data warehouse DBMS runs and the database exists
o Able to eliminate a few intermediary sub-steps and apply data directly to the database from
some of the consolidated files in the staging area
Separate platform:
o Can optimize the separate platform for complex data transformations and data cleansing
o Tracking file or table to contain tracking entries. A separate environment is most conducive
for managing the movement of data
o Easily have people specifically trained on these tools running the separate computing
equipment
Chapter 9
3/ Question 3: Why do you think metadata is important in a data warehouse environment? Give a
general explanation in one or two paragraphs.
Your answer 3:
Metadata can be used for using the data warehouse, for building the data warehouse or for administering the
data warehouse and all IT professionals, power users and end users need metadata to manipulate the data
they retrieve from the data warehouse because metadata contains the answers to questions about the data
in the data warehouse (data about the data, table of contents for the data, catalog for the data, etc.).
Metadata describes all the pertinent aspects of the data in the data warehouse fully and precisely. Metadata
is like a Nerve Center, it assumes a key position and enables communication among various processes.
(T/F)
A F
B T
C F
D F
Chapter 10
5/ Question 5: Why is the entity-relationship modeling technique not suitable for the data warehouse? How is
dimensional modeling different?
Your answer 5:
Entity-relationship modeling technique is not suitable for the data warehouse because
ER modeling mainly focuses on removing data redundancy, ensuring data consistency and expressing
microscopic relationships
ER modeling is suitable for OLTP systems, which:
o capture details of events or transactions
o focus on individual events
o play a role as a window into micro-level transactions
o picture at detail level necessary to run the business
o suitable only for questions at transaction level
Meanwhile, using dimensional modeling for data warehouse is the best practice due to reasons given below:
Dimensional modeling mainly focuses on capturing critical measures, viewing along dimensions and it
is intuitive to business users
Besides, data warehouse:
o meant to answer questions on overall process
o focus is on how managers view the business
o reveals business trends
o information is centered around a business process
o answers show how the business measures the process
o the measures to be studied in many ways along several business dimensions
Chapter 11
6/ Question 6: How does a snowflake schema differ from a STAR schema? Name two advantages and
two disadvantages of the snowflake schema.
Your answer 6:
Snowflake schema is STAR schema that is normalized the dimensional tables, snowflake schema:
is a bottom-up model
uses less spaces than STAR schema
takes more time than STAR schema for the execution of queries
design is more complex than STAR schema
the query complexity of snowflake schema is higher than STAR schema
its understanding is difficult
has more number of foreign keys
has low data redundancy
Advantages:
Small savings in storage space
Normalized structures are easier to update and maintain
Disadvantages:
Ability to browse through the contents difficult
Degraded query performance because of additional joins
Chapter 12
7/ Question 7: When is a full data refresh preferable to an incremental load? Can you think of an example?
Your answer 7:
Full data refresh is preferable to an incremental load because refresh is a much simpler option than
update. To use the update option, you have to devise the proper strategy to extract the changes from
each data source. Then you have to determine the best strategy to apply the changes to the data
warehouse. The refresh option simply involves the periodic replacement of complete data warehouse
tables.
Example, when there are more than 40% of change, we should refresh instead of incremental load.
Chapter 13
8/ Question 8: Give examples of four types of data quality problems.
Your answer 8:
Dummy values in fields: user can fill a temporary for Social Security number
Absence of data values: value of demographic field is omitted
Violation of business rules: In a payroll system, an obvious business rule is that the days worked in a
year plus the vacation days, holidays, and sick days cannot exceed 365 or 366.
Inconsistent values: the gender field can assign True/False in one system and Male/Female in other
system
Chapter 15
9/ Question 9: What is meant by slice-and-dice? Give an example.
Your answer 9:
Slice: an operation that selects one specific dimension from a given data cube and provides a new
sub-cube
Dice: an operation that selects two or more dimensions from a given data cube and provides a new
sub-cube
To slice-and-dice is to break a body of information down into smaller parts or to examine it from
different viewpoints so that you can understand it better.
Example: in company, have 3 fields: product, time, location. You can slice and dice it and presentation
on table 2 direction
10/ Question 10: Discuss two reasons why feeding data into the OLAP system directly from the source
operational systems is not recommended.
Your answer 10:
Database is updated everyday, feeding data into the OLAP system directly from the source
operational systems make it more slowly
Business users must access the data continually frequently, with slow system, make business to hard
to plan the strategy quickly, accurately.