You are on page 1of 5

Comprehensive Examination SS G515 – Data Warehousing

NAME: IDNO:
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
I SEMESTER 2005-2006
SS G515 DATA WAREHOUSING
Comprehensive Examination
th
Date: 08 December 2005
Time: 3 Hours (2.00 – 5.00 pm)
Weightage: 35% [Part A (closed book) – 16 & Part B (open book) – 19]
Part A – Closed Book
Points to note:
Answer multiple choice questions in the Question paper itself
Some questions may have more than one correct option. You will get credit only if you
mark all the correct options
 There is NO NEGATIVE MARKING
 ENCIRCLE the correct option(s)
 Short answer questions are to be answered in the supplementary answer sheet provided
Multiple-Choice Questions (20*0.5=10)

1. The characteristic that indicates that a data warehouse is organized around key
high-level entities of the enterprise is:
(a) Time-variant
(b) Non-volatile
(c) Subject-oriented
(d) Integrated
2. An ODS contains data that is:
(a) Detailed
(b) Current-valued
(c) Integrated
(d) Subject-oriented
3. Class IV ODS is different from class I, II, III ODS because:
(a) It is supported by the data warehouse
(b) Its granularity is different
(c) It contains enriched profile data
(d) Its refresh cycle is adhoc
4. The level of data transformations is highest in:
(a) Class I ODS
(b) Class II ODS
(c) Class III ODS
(d) Class IV ODS
5. The dimension that is not available in the operational systems:
(a) Product
(b) Store
(c) Customer
(d) Time
6. Which of the following operation differentiates HOLAP architecture from ROLAP
& MOLAP architectures:
(a) Drill-across
(b) Drill-through
(c) Drill-down
(d) Roll-up

Page 1 of 5
Comprehensive Examination SS G515 – Data Warehousing

7. Pick the correct statement(s) about fact tables


(a) Natural keys can appear in the fact table
(b) The same dimension can appear many times in a fact table
(c) Base level & summarized data can appear in the same fact table
(d) Null values can appear in a fact table
8. Pick the correct statement(s) about data marts
(a) Different data marts can have different granularities
(b) Data marts must be present in the data warehouse architecture
(c) Data marts can not contain the finest level granularity data
(d) Data marts make sense only in a bottom-up design approach
9. System(s) with finest granularity:
(a) Data Marts
(b) ODS
(c) Operational System
(d) Super marts
10. The number of aggregated tables in a data mart for a particular business process
depends on:
(a) Number of facts
(b) Number of dimensions
(c) Levels of hierarchies in each dimension
(d) Method of storing aggregated records
11. Partitioning wrt time dimension is generally recommended because:
(a) It can be easily done using range partitioning
(b) It simplifies the refresh process
(c) It allows for incremental backups
(d) Definition of time never changes
12. Pick the odd one out:
(a) Inside–out queries
(b) Outside-in queries
(c) Fact-focused
(d) Dimension-focused
13. Pick the correct statement(s) about conformed dimensions:
(a) They must have same number of rows
(b) They must have same number of attributes
(c) They must have same granularity
(d) Original and shrunken dimensions are conformed
14. Advantages of look-up tables include:
(a) Faster loading of dimension tables
(b) Faster backups
(c) Faster loading of fact tables
(d) All of the above
15. The CUBE operator of SQL:
(a) Stores aggregate records in separate tables
(b) Stores aggregate records in multidimensional arrays
(c) Computes all possible aggregates
(d) Stores aggregate records in the same table
16. Pick the correct statement(s) about promotion coverage fact table:
(a) Contains information about products that did not sell
(b) Its grain is different from that of the sales fact table
(c) It is a factless fact table
(d) It has the same schema as that of the sales fact table

Page 2 of 5
Comprehensive Examination SS G515 – Data Warehousing

17. Finding out about products that were on promotion but did not sell requires:
(a) Roll-up
(b) Slicing & dicing
(c) Drill-through
(d) Drill-across
18. Dimensional modeling is more restrictive that ER modeling because:
(a) Data is always classified as fact or dimension
(b) Dimension tables must have single field primary keys
(c) Dimensional tables can not be normalized
(d) Two dimension tables cannot be linked through foreign keys
19. In a data warehouse, bitmap indexes are created on:
(a) Fact tables
(b) Dimension tables
(c) Helper tables
(d) Minidimension tables
20. Snowflaking:
(a) Removes low cardinality columns
(b) Prohibits use of bitmap indexes
(c) Makes browsing difficult
(d) Saves space

Short Answer Questions (6*1=6)

1. List 3 benefits of having finest granularity data in the data warehouse.


2. Give situations under which an outrigger is a better option than a minidimension.
3. List the role played by views in data warehousing.
4. What are the advantages & disadvantages of using flat files as data structures in
your ETL system?
5. In a real-time data warehouse, suggest a way of handling changes to dimensions
in real time.
6. Briefly explain Kimball’s approach to building data warehouses. Explain why
Kimball disagrees that his approach is a purely bottom-up approach?

Page 3 of 5
Comprehensive Examination SS G515 – Data Warehousing

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI


I SEMESTER 2005-2006
SS G515 DATA WAREHOUSING
Comprehensive Examination
th
Date: 08 December 2005
Time: 3 Hours
Weightage: 35% [Part A (closed book) – 16 & Part B (open book) – 19]
Part B – Open Book
Problem 1
INSURANCE COMPANY DATA WAREHOUSE
An insurance company, with branches all over the country, wants to develop a
data warehouse for effective decision-making about their insurance policies.
There are a number of different types of insurance like Auto insurance, Home
insurance, Industrial insurance, etc. The entire country is categorized into four
regions, namely, North, South, East and West. Each region consists of a set of
states. There may be different types of customers like individuals, institution,
industry, etc. The data warehouse should record an entry for each policy issued
to each customer along with the premium paid.
With respect to the above business scenario, answer the following questions.
Clearly state any reasonable assumptions you make.
1. Design a star schema for the data warehouse clearly identifying the fact
table(s), dimensional table(s), their attributes and measures along with the
primary key and foreign key relationships.
2. Write an SQL query by which you can display region-wise, insurance-
type-wise, year-wise total premium collected from your schema.
3. Draw a cuboid that would display the result of the query specified in Q.2
above.
4. From the cuboid of Q. 3 above, if we want to see the amount of
premium collected during the year 2001 for the state of Maharashtra for
each type of customer, which sequence of OLAP operations would you
need to perform?
5. Show the lattice of cuboids for the multi-dimensional data considering
all the dimensions in your schema using a single level of hierarchy for
each dimension.
6. Draw possible schema hierarchies for each dimension.
7. Based on the schema hierarchies drawn in Q. 6 above, determine the
total number of cuboids, considering all the aggregation levels.
8. Once your data warehouse is ready and operational, there is a new
requirement to maintain the amount of claim lodged at the same level of
granularity. Extend your star schema to a fact constellation schema to take
care of the new requirement.
[2+1+1+1+2+1+2]

Page 4 of 5
Comprehensive Examination SS G515 – Data Warehousing

Problem 2
ONLINE PALCEMENT COMPANY DATA WAREHOUSE
Itplacement.com is an online placement company. The portal allows companies
looking for IT professionals to publish/post their requirements on the portal. The
portal also allows the applicants (job seekers) to post their resumes for possible
placements.
Design a data warehouse for the placement company. The data warehouse
should help the job seekers in getting better placements and also the companies
that are hiring. The DW should also help Itplacement.com in doing more
business.
Identify the requirements and design star schema(s). Give details of the
dimensions created by you.
Identify the advanced dimensional modeling features used by you in the design.
For all the three categories of users, write a typical analytical query that they
would want to ask. Also show how your data warehouse would be able to answer
them efficiently.
[4+2]

Problem 3
SPARSITY FAILURE
Consider a factless fact table containing the finest granularity attendance data of
students at BITS. The table contains data from academic year 2000-2001
onwards. Queries requiring aggregated attendance data are quite common.
Suggest a suitable aggregation strategy. Will the phenomenon of sparsity failure
occur when you pre-compute the aggregates? Justify your answer.
[3]

Page 5 of 5

You might also like