You are on page 1of 3

Data warehousing

Exercise 1

Please answer all questions. The full score is 18 points.

1. Select only one answer for the following questions. (0.5 point per question)

1) Which of the following statement is NOT correct?


(A) At the operational level, day-to-day business decisions are made, typically in real-time
or with a short period.
(B) At the tactical level, decisions are made by middle management with a medium-term
focus.
(C) At the strategic level, decisions are made by senor management with long-term
implications
(D) A sata warehouse provides a centralized, consolidated data platform by integrating data
from different sources. As such, it provides a separate and dedicated environment for
operational level data management.

2) Which of the following is not a data warehouse feature?


(A) Subject-oriented
(B) Integrated.
(C) Time-variant.
(D) Volatile.

3) In terms of data processing, the data warehouse focuses on


(A) Add / update / delete / select statements.
(B) Add / select statements
(C) Select / update statements.
(D) Delete statements.

4) Which statement is correct?


(A) A star schema has one large central dimension table which is connected to various
smaller fact tables.
(B) The dimension tables of a star schema contain the criteria for aggregating the
measurement data and will typically be used as constraints to answer queries.
(C) To speed up report generation and avoid time-consuming joins in a star schema, the
dimension tables need to be normalized.
(D) The dimension tables in a star schema are frequently updated.

5) Which statement is not correct?


(A) A snowlake schema normalizes the fact table of a star schema.
(B) A fact constellation schema has more than one fact table which can share dimension
tables.

1
(C) Surrogate keys essentially buffer the data warehouse from the operational environment
by making it immune to any operational changes.
(D) A factless fact table is a fact table that only contains foreign keys and no measurement
data.

6) Which statement about ETL is not correct?


(A) Some estimates state that the ETL step can consume up to 80% of all efforts necded to
set up a data warehouse.
(B) To decrease the burden on both the operational systems and the data warehouse itself,
it is recom- mended to start the ETL process by dumping the data in a staging area where all
the ETL activities can be executed.
(C) During the loading step, the data warehouse is populated by filling the fact and
dimension tables, thereby also generating the necessary surrogate keys to link it all up. Fact
rows should be inserted/ updated before the dimension rows.
(D) The extraction strategy can be either full or incremental. In the latter case, only the
changes since the previous extraction are considered.

7) Which statement is not correct?


(A) Multidimensional OLAP (MOLAP) stores the multidimensional data using a
multidimensional DBMS (MDBMS) whereby the data are stored in a multidimensional
array-based data structure optimized for efficient storage and quick access.
(B) Relational OLAP (ROLAP) stores the data in a relational data warehouse, which can
be implemented using a star, snowflake, or fact constellation schema.
(C) Hybrid OLAP (HOLAP) tries to combine the best of both MOLAP and ROLAP. An
RDBMS can then be used to store the detailed data in a relational data warehouse, whereas
the pre-computed aggregated data can be kept as a multidimensional array managed by an
MDBMS.
(D) MOLAP scales better to more dimensions than ROLAP. The query performance may,
however, be inferior to ROLAP unless some of the queries are materialized or high-
performance indexes are defined.

8) Which statement is correct?


(A) Roll-up (or drill-up) refers to aggregating the current set of fact values within or across
one or more dimensions.
(B) Roll-down (or drill-down) de-aggregates the data by navigating from a lower level of
detail to a higher level of detail.
(C) Slicing represents the operation whereby one of the dimensions is set at a particular
value.
(D) Dicing corresponds to a range selection on one or more dimensions.
(E) All of the above are correct.

2. Define the following terms: OLAP (online analytical processing), ROLAP (relational OLAP),
MOLAP (multidimensional OLAP). (2 point)

2
3. Consider the following Students table, write SQL statements.

Student _ID Age Study_track Score


1 21 CS 5.0
2 20 Math 4.0
3 21 Phy 3.0
4 20 CS 2.0
5 20 Math 4.0
6 20 Math 3.0

(1) Write a SQL statement with CUBE keywords to build a cube for Age and Study_track
dimensions to calculate the average score. Use a database (e.g. PostgreSQL) you are familiar
with to insert the above data and then run your SQL statement and copy the results here. (2
points)

(2) Write a SQL statement to use ROLLUP keywords with Age and Study_track dimensions to
calculate the average score. Use a database to run your SQL statement and copy the results here.
(2 points)

(3) Write a SQL statement to use GROUPING SETS keywords for Age and Study_track
dimensions to calculate the average scores. Use a database to run your SQL statement and copy
the results here. (2 points)

4. Consider the following example of a relational table. What are the possible errors and
inconsistencies you can detect in this table? (3 points)

StudentID Name Birthday Gender Country City


1 John 31/12/1991 M Spain Madrid
2 Mary 03/12/1990 F Italy Lisbon
3 Jussi 1992-1-13 M Finland Helsinki
3 Tom 02/02/1992 U France

5. Write three SQL statements to perform the schema mapping: Source schema: Takes (course,
student), Courses (prof, course), Time (course, time); Target schema: Teaches( prof, course, time),
Study(student, course, time), ST(prof, student). That is, write SQL statements to define the three
tables in the target schema using the three tables in source schema. (3 points)

You might also like