Professional Documents
Culture Documents
SE63205 HoangNhatThuan Test2
SE63205 HoangNhatThuan Test2
Chapter 8
1/ Question 1: Match the columns
Your answer 1:
(A ... J)
1 F
2 D
3 J
4 A
5 G
6 E
7 B
8 C
9 H
10 I
2/ Question 2: What are the platform options for the staging area? Compare the options and mention
the
advantages and disadvantages.
Your answer 2:
The platform options for the staging area are the staging area sits between data sources and the
warehouse repositories.
3 platform options for the staging area:
Source data platform:
o Save time and effort in moving the data across platforms to the staging area
Data storage platform: platform on which the data warehouse DBMS runs and the database
exists
o Able to eliminate a few intermediary sub-steps and apply data directly to the
database from some of the consolidated files in the staging area
Separate platform:
o Can optimize the separate platform for complex data transformations and data
cleansing
o Tracking file or table to contain tracking entries. A separate environment is most
conducive for managing the movement of data
o Easily have people specifically trained on these tools running the separate computing
equipment
Chapter 9
3/ Question 3: Why do you think metadata is important in a data warehouse environment? Give a
general explanation in one or two paragraphs.
Your answer 3:
Metadata can be used for using the data warehouse, for building the data warehouse or for
administering the data warehouse and all IT professionals, power users and end users need metadata
to manipulate the data they retrieve from the data warehouse because metadata contains the
answers to questions about the data in the data warehouse (data about the data, table of contents
for the data, catalog for the data, etc.). Metadata describes all the pertinent aspects of the data in the
data warehouse fully and precisely. Metadata is like a Nerve Center, it assumes a key position and
enables communication among various processes.
(T/F)
A F
B T
C F
D F
Chapter 10
5/ Question 5: Why is the entity-relationship modeling technique not suitable for the data
warehouse? How is dimensional modeling different?
Your answer 5:
Entity-relationship modeling technique is not suitable for the data warehouse because
ER modeling mainly focuses on removing data redundancy, ensuring data consistency and
expressing microscopic relationships
ER modeling is suitable for OLTP systems, which:
o capture details of events or transactions
o focus on individual events
o play a role as a window into micro-level transactions
o picture at detail level necessary to run the business
o suitable only for questions at transaction level
Meanwhile, using dimensional modeling for data warehouse is the best practice due to reasons given
below:
Dimensional modeling mainly focuses on capturing critical measures, viewing along
dimensions and it is intuitive to business users
Besides, data warehouse:
o meant to answer questions on overall process
o focus is on how managers view the business
o reveals business trends
o information is centered around a business process
o answers show how the business measures the process
o the measures to be studied in many ways along several business dimensions
Chapter 11
6/ Question 6: How does a snowflake schema differ from a STAR schema? Name two advantages and
two disadvantages of the snowflake schema.
Your answer 6:
Snowflake schema is STAR schema that is normalized the dimensional tables, snowflake schema:
is a bottom-up model
uses less spaces than STAR schema
takes more time than STAR schema for the execution of queries
design is more complex than STAR schema
the query complexity of snowflake schema is higher than STAR schema
its understanding is difficult
has more number of foreign keys
has low data redundancy
Advantages:
Small savings in storage space
Normalized structures are easier to update and maintain
Disadvantages:
Ability to browse through the contents difficult
Degraded query performance because of additional joins
Chapter 12
7/ Question 7: When is a full data refresh preferable to an incremental load? Can you think of an
example?
Your answer 7:
Full data refresh is preferable to an incremental load because refresh is a much simpler
option than update. To use the update option, you have to devise the proper strategy to
extract the changes from each data source. Then you have to determine the best strategy to
apply the changes to the data warehouse. The refresh option simply involves the periodic
replacement of complete data warehouse tables.
Example, when there are more than 40% of change, we should refresh instead of incremental
load.
Chapter 13
8/ Question 8: Give examples of four types of data quality problems.
Your answer 8:
Dummy values in fields: user can fill a temporary for Social Security number
Absence of data values: value of demographic field is omitted
Violation of business rules: In a payroll system, an obvious business rule is that the days
worked in a year plus the vacation days, holidays, and sick days cannot exceed 365 or 366.
Inconsistent values: the gender field can assign True/False in one system and Male/Female in
other system
Chapter 15
9/ Question 9: What is meant by slice-and-dice? Give an example.
Your answer 9:
Slice: an operation that selects one specific dimension from a given data cube and provides a
new sub-cube
Dice: an operation that selects two or more dimensions from a given data cube and provides
a new sub-cube
To slice-and-dice is to break a body of information down into smaller parts or to examine it
from different viewpoints so that you can understand it better.
Example: in company, have 3 fields: product, time, location. You can slice and dice it and
presentation on table 2 direction
10/ Question 10: Discuss two reasons why feeding data into the OLAP system directly from the source
operational systems is not recommended.
Your answer 10:
Database is updated everyday, feeding data into the OLAP system directly from the source
operational systems make it more slowly
Business users must access the data continually frequently, with slow system, make business
to hard to plan the strategy quickly, accurately.