You are on page 1of 5

Student Code: SE130230

Student Name: Nguyễn Hữu Tuấn Nam


Class Name: SE1322

Chapter 8
1/ Question 1: Match the columns

1. operational infrastructure A. shared-nothing architecture


2. preemptive multitasking B. provides high concurrency
3. shared disk C. single memory address space
4. MPP D. operating system feature
5. SMP E. vertical parallelism
6. interquery parallelization F. people, procedures, training
7. intraquery parallelization G. easy administration
8. NUMA H. choice data warehouse platform
9. UNIX-based system I. optimize for data transformation
10. data staging area J. data movement option

Your answer 1:
(A ... J)
1 F
2 D
3 J
4 A
5 G
6 E
7 B
8 C
9 H
10 I

2/ Question 2: What are the platform options for the staging area? Compare the options and mention
the
advantages and disadvantages.

Your answer 2:
The platform options for the staging area are the staging area sits between data sources and the
warehouse repositories.
3 platform options for the staging area:
 Source data platform:
o Save time and effort in moving the data across platforms to the staging area
 Data storage platform: platform on which the data warehouse DBMS runs and the database
exists
o Able to eliminate a few intermediary sub-steps and apply data directly to the
database from some of the consolidated files in the staging area
 Separate platform:
o Can optimize the separate platform for complex data transformations and data
cleansing
o Tracking file or table to contain tracking entries. A separate environment is most
conducive for managing the movement of data
o Easily have people specifically trained on these tools running the separate computing
equipment

Chapter 9
3/ Question 3: Why do you think metadata is important in a data warehouse environment? Give a
general explanation in one or two paragraphs.

Your answer 3:
Metadata can be used for using the data warehouse, for building the data warehouse or for
administering the data warehouse and all IT professionals, power users and end users need metadata
to manipulate the data they retrieve from the data warehouse because metadata contains the
answers to questions about the data in the data warehouse (data about the data, table of contents
for the data, catalog for the data, etc.). Metadata describes all the pertinent aspects of the data in the
data warehouse fully and precisely. Metadata is like a Nerve Center, it assumes a key position and
enables communication among various processes.

6/ Question 4: Indicate if true or false


A. The importance of metadata is the same in a data warehouse as it is in an operational
system.
B. Metadata is needed by IT for data warehouse administration.
C. Technical metadata is usually less structured than business metadata.
D. Maintaining metadata in a modern data warehouse is just for documentation.
Your answer 4:

(T/F)
A F
B T
C F
D F

Chapter 10
5/ Question 5: Why is the entity-relationship modeling technique not suitable for the data
warehouse? How is dimensional modeling different?
Your answer 5:
Entity-relationship modeling technique is not suitable for the data warehouse because
 ER modeling mainly focuses on removing data redundancy, ensuring data consistency and
expressing microscopic relationships
 ER modeling is suitable for OLTP systems, which:
o capture details of events or transactions
o focus on individual events
o play a role as a window into micro-level transactions
o picture at detail level necessary to run the business
o suitable only for questions at transaction level
Meanwhile, using dimensional modeling for data warehouse is the best practice due to reasons given
below:
 Dimensional modeling mainly focuses on capturing critical measures, viewing along
dimensions and it is intuitive to business users
 Besides, data warehouse:
o meant to answer questions on overall process
o focus is on how managers view the business
o reveals business trends
o information is centered around a business process
o answers show how the business measures the process
o the measures to be studied in many ways along several business dimensions

Chapter 11
6/ Question 6: How does a snowflake schema differ from a STAR schema? Name two advantages and
two disadvantages of the snowflake schema.

Your answer 6:
Snowflake schema is STAR schema that is normalized the dimensional tables, snowflake schema:
 is a bottom-up model
 uses less spaces than STAR schema
 takes more time than STAR schema for the execution of queries
 design is more complex than STAR schema
 the query complexity of snowflake schema is higher than STAR schema
 its understanding is difficult
 has more number of foreign keys
 has low data redundancy
Advantages:
 Small savings in storage space
 Normalized structures are easier to update and maintain
Disadvantages:
 Ability to browse through the contents difficult
 Degraded query performance because of additional joins
Chapter 12
7/ Question 7: When is a full data refresh preferable to an incremental load? Can you think of an
example?

Your answer 7:
 Full data refresh is preferable to an incremental load because refresh is a much simpler
option than update. To use the update option, you have to devise the proper strategy to
extract the changes from each data source. Then you have to determine the best strategy to
apply the changes to the data warehouse. The refresh option simply involves the periodic
replacement of complete data warehouse tables.
 Example, when there are more than 40% of change, we should refresh instead of incremental
load.

Chapter 13
8/ Question 8: Give examples of four types of data quality problems.

Your answer 8:
 Dummy values in fields: user can fill a temporary for Social Security number
 Absence of data values: value of demographic field is omitted
 Violation of business rules: In a payroll system, an obvious business rule is that the days
worked in a year plus the vacation days, holidays, and sick days cannot exceed 365 or 366.
 Inconsistent values: the gender field can assign True/False in one system and Male/Female in
other system

Chapter 15
9/ Question 9: What is meant by slice-and-dice? Give an example.

Your answer 9:
 Slice: an operation that selects one specific dimension from a given data cube and provides a
new sub-cube
 Dice: an operation that selects two or more dimensions from a given data cube and provides
a new sub-cube
 To slice-and-dice is to break a body of information down into smaller parts or to examine it
from different viewpoints so that you can understand it better.
 Example: in company, have 3 fields: product, time, location. You can slice and dice it and
presentation on table 2 direction
10/ Question 10: Discuss two reasons why feeding data into the OLAP system directly from the source
operational systems is not recommended.
Your answer 10:
 Database is updated everyday, feeding data into the OLAP system directly from the source
operational systems make it more slowly
 Business users must access the data continually frequently, with slow system, make business
to hard to plan the strategy quickly, accurately.

You might also like