You are on page 1of 22

Final Question ITS472

Chapter 13 – Data Warehouse


Oct 2008
QUESTION 3
a) Who is the database administrator? List THREE managerial and technical skills
required for a DBA.
(5 marks)
The database administrator (DBA) is the person responsible for the control and
management of the shared database within an organization. The DBA controls the
database administration function within the organization

b) Suppose you are selling the data warehouse idea to your users. How would you
explain to them what multidimensional data analysis is and explain its advantages?

Multidimensional data analysis refers to the processing of data in which data are
viewed as part of a multidimensional structure, one in which data are related in
many different ways. Business decision makers usually view data from a business
perspective. That is, they tend to view business data as they relate to other business
data. For example, a business data analyst might investigate the relationship
between sales and other business variables such as customers, time, product line,
and location. The multidimensional view is much more representative of a business
perspective. A good way to visualize the development and use of relationships is to
examine data pivot tables in MS Excel.
.
(6 marks)
c) The data warehouse project is in the design phase. Explain how you would use a star
schema in the design.
(9 marks)
The star schema is a data modeling technique that is used to map multidimensional
decision support data into a relational database. The reason for the star schema's
development is that existing relational modeling techniques, E-R and normalization,
did not yield a database structure that served the advanced data analysis
requirements well. Star schemas yield an easily implemented model for
multidimensional data analysis while still preserving the relational structures on
which the operational database is built.

The basic star schema has two four components: facts, dimensions, attributes, and
attribute hierarchies. The star schemas represent aggregated data for specific
business activities. Using the schemas, we will create multiple aggregated data
sources that will represent different aspects of business operations. For example, the
aggregation may involve total sales by selected time periods, by products, by stores,
and so on. Aggregated totals can be total product units, total sales values by
products, etc.

Okt 2009
QUESTION 3

a) When is the three-tier data warehouse architecture more appropriate than two tier
data warehouse architecture?

Three tier is more appropriate than two tier when an organization deals with a larger
number of simultaneous clients. The two-tier architecture can have performance
problems for large data warehouses with data intensive application for decision
support. To overcome these difficulties many large organizations use a three tier data
architecture.

(4 marks)

b) i. What is a star schema? Use diagrams to illustrate your answers.

Star schema is a data modeling representation for multi dimensional


databases. In a relational database, a star schema has a fact table in the centre
related to multiple dimensional tables in a 1-M relationships.
(2 marks)

Star Schema:

Store
Item StoreId
ItemId StoreManager
ItemName StoreStreet
ItemUnitPrice StoreCity
ItemBrand StoreSales StoreState
ItemCategory StoreZip
StoreNation
ItemSales DivId
Sales DivName
SalesNo
DivManager
SalesUnits
SalesDollar
Customer SalesCost
TimeDim
CustId TimeNo
CustName TimeSales TimeDay
CustPhone TimeMonth
CustStreet CustSales TimeQuarter
CustCity TimeYear
CustState TimeDayOfWeek
CustZip TimeFiscalYear
CustNation
(5 marks)

Total : 7 marks

ii. How does a snowflake schema differ from a star schema?


The snowflake schema has multiple levels of dimension tables related to one or
more fact tables

(3 marks)

c) A data cube consists of cells containing measures, dimensions and members. Explain
the term measures, dimensions and members.

Measures-numeric values such as the unit sales amounts.


Dimensions- to label or group numeric data(e.g., Product, location, and Time).
Members- each dimension contains values known as members. For instance, the
Location dimension has five members(California, Washington, Utah, Arizona, and
Colorado).
(6 marks)

April 2010
-
April 2011
-
Jan 2012
QUESTION 5

a) What are the advantages of multidimensional data representation over relational data
representation?
Advantages:

 Provides an intuitive interface for business analyst


 Easy to understand and visualize as compared to a relational representation
 Provides increased retrieval speed
(Any TWO each 2 marks, total = 4 marks)

b) i) Name an industry that would have invested in data warehouse technology.

ii) List an application area that would use this technology

(i) (ii)

Industry Application

Airline Yield Management, route assessment

Telecommunication Customer retention, network design

Insurance Risk assessment, product design, fraud detection

Retail Target marketing, supply-chain management

iii) Discuss the benefits of using data warehouse technology in the application area you
have mention above.

Benefit:

 Improved decision making


 increased revenue and reduced expanses

(Any ONE answer for Industry 2 marks


Any ONE answer for Application 2 marks

Any TWO answer for Benefit 2 marks

Total = 6 marks)
a) The AddValue Automobile Company wants to build a data warehouse to analyze sales of its
cars either yearly, monthly or daily basis. The propose schema for the data warehouse is as
follows:

SALES (SerialNo, Date, DealerName, Price)

AUTO (SerialNo, Model, Color)

DEALER (Name, City, State, Telephone)

As a data warehouse expert,

i) List a dimension that is not listed in the above schema


Time

(2 marks)

ii) Name a suitable data model (schema) to represent the multidimensional data.
Star schema

(2 marks)

iii) Design the proposed data model.

AUTO
AUTO

1 M M 1
DEALER SALES TIME

(6 marks)
Jun 2012
QUESTION 4

a) Consider the scenario below:

Encik Abdullah manages a small product distribution company. Because the business
is growing fast, Encik Abdullah recognizes that it is time to manage the vast
information pool to help guide the accelerating growth. Encik Abdullah, who is
familiar with spreadsheet software, currently employs a small sales force of four
people. He has asked you to develop a data warehouse application prototype that
enables him to study sales quantity by year, region, agent, and product. (This
prototype is to be used as the basis for a future data warehouse database.)

The following SALES ORDER table describe about Encik Abdullah’s company sales
quantity according to year, region, agent, and product.
SALES ORDER

Year Region Agent Product Quantity

2009 East Carlos Erasers 50

2009 East Tere Erasers 12

2009 North Carlos Widgets 120

2009 North Tere Widgets 100

2009 North Carlos Widgets 30

2009 South Victor Balls 145

2009 South Victor Balls 34

2009 South Victor Balls 80

2009 West Mary Pencils 89

2009 West Mary Pencils 56

2010 East Carlos Pencils 45

2010 East Victor Balls 55

2010 North Mary Pencils 60

2010 North Victor Erasers 20

2010 South Carlos Widgets 30

2010 South Mary Widgets 75

2010 South Mary Widgets 50

2010 South Tere Balls 70

2010 South Tere Erasers 90

2010 West Carlos Widgets 25

2010 West Tere Balls 100

Using the data from the SALES ORDER table above:

i) Identify the appropriate fact table component.


Ans:
SALES ORDER is a fact table.
(1 mark)

ii) Identify the appropriate dimension tables.


Ans:
YEAR, REGION, AGENT, and PRODUCT are dimension tables.

(3 marks)
iii) Draw a star schema for the Data Warehouse.
Ans:

TIME AGENT
TimeID AgentID

Day AgentName

Week AgentAddress
in in

SALES ORDER
SalesID

TimeID

RegionID

AgentID

REGION PRODUCT
RegionID ProductID

RegionName ProductName
in
in

(4 marks)

iv) Describe the data cube from the table above.


Ans:

Other than below answer also can be accepted.


(e.g Dimension could be YEAR, PRODUCT and REGION, measures= profit)
300

250
Car-
200
los
Mary
150

100

50

0
East North South West

(6 marks)

b) What is the difference between ERD Snowflake Schema and ERD Constellation
Schema? Support your answers with diagrams.
Ans:
Snowflake schema: A data modeling repsentation for multidimensional databases. In
arelational database, a asnowflake schema has mulyiple levels of dimension tables
related to one or more fact tables.
Page 570(fig 16.11, Mannino)

ERD Constellation Schema contains multiple fact tables in the center related to
dimension tables. Typically, the fact tables share some dimension tables.
Page 569(fig 16.10, mannino)

(6 marks)
Jan 2013
QUESTION 1

a) Table 1.0 is part of a relational representation of FSKM student enrollment in 2012.


Table 1.0: FSKM student enrollment in 2012

Campus Program Total Student

Shah Alam CS220 240

Shah Alam CS221 300

Shah Alam CS224 100

Shah Alam CS231 150

Arau CS220 110

Merbok CS231 80

Machang CS221 200

Dungun CS224 70

i) Transform the table above into a multidimensional data cube.


ANSWER:

CS231 150 0 80 0 0

CS224 100 0 0 0 70

CS221 300 0 0 200 0

240 110 0 0 0
CS220
2012

Shah Merbo Macha Dungu


Arau
Alam k ng n
(4 marks)

ii) List TWO (2) advantages of multidimensional representation over relational


representation for FSKM management.
(4 marks)
ANSWER:
Advantages:
Better visualization of data
Easier and smoother decision making
(2 marks each)

b) Match phrases in column P with the correct term in column Q.

i) Process of discovering implicit patterns in data and using these patterns for
business advantage.

ii) Values in dimension.

iii) Allows users to navigate from a more general level to a more specific level.

iv) Retrieves a subset of a data cube similar to the restrict operator of relational
algebra.

v) Multiple levels of dimension tables surround the fact table.

vi) Stores numeric data such as sales results.


Q

Members

Drill-down

Data mining

Dice

Fact table

Data coupling

Measures

Slice

Snowflake schema

Constellation schema

Roll-up

Dimension table

(12 marks)
ANSWER:
i) Data mining
ii) Members
iii) Drill down
iv) Slice
v) Snowflake schema
vi) Fact table
(2 marks each)

Dec 2013
QUESTION 5

Consider the following relational database for Grand Travel Airlines.

Grand Travel Airlines has to keep track of its flight and airplane history. A flight is uniquely
identified by the combination of a flight number and a date. Every passenger who has flown
on Grand Travel has a unique passenger number. For a particular passenger who has
taken a particular flight, the company wants to keep track of the fare that he paid and the
reservation. Clearly, a passenger may have taken many flights (he must have taken at least
one to be in the database) and every flight has had many passengers on it.

A pilot is identified by a unique pilot (or employee) number. A flight on a particular date has
exactly one pilot. Each pilot has typically flown many flights but a pilot may be new to the
company, is in training, and has not flown any flights yet. Each airplane has a unique serial
number. A flight on a particular date used one airplane. Each airplane has flown on many
flights and dates, but a new airplane may not have been used at all yet.

The relational schemas for Grand Travel Airlines are shown as follow:

PILOT (PilotNumber, PilotName, BirthDate, HireDate)


AIRPLANE (AirplaneNumber, Model, PassengerCapacity, YearBuilt, Manufacturer)
FLIGHT (FlightNumber, Date, DepartureTime, ArrivalTime, PilotNumber, AirplaneNumber)
PASSENGER (PassengerNumber, PassengerName, Address, TelephoneNumber)
RESERVATION (FlightNumber, PassengerNumber, Fare, ReservationDate)

a) Give any TWO (2) characteristics of a data warehouse.


(3 marks)
Answer:

 data is subject oriented


 data is integrated
 data is non-volatile
 data is time variant
 data must be high quality
 data maybe aggregated
 data is often denormalized
 data is not necessarily absolutely current

b) Discuss any TWO (2) steps involved in building a data warehouse.


(4 marks)
Answer:
Data Extraction - Process of copying the data from the transactional databases in
preparation for loading it into the data warehouse.

Data Cleaning - Data warehouses are very sensitive to data errors which must be
“cleaned” or “cleansed” or “scrubbed” as the data is loaded into the data warehouse.

Data Transformation - As the data is extracted from the transactional databases, it must
go through several kinds of data transformations on its way to the data warehouse.
Data Loading - After all of the extracting, cleaning, and transforming, the data is ready to
be loaded into the data warehouse. A schedule for regularly updating the data
warehouse must be put in place

c) Identify the fact and dimensional tables for the above scenario.
(4 marks)
Answer:
Fact table – RESERVATION
Dimensional table – TIME_PERIOD, PASSENGER, PILOT, FLIGHT

d) Design a snowflake schema for the Grand Travel Airlines data warehouse.

Answer:
PASSENGER
TIME_PERIOD
- passenger_no (PK)
- reserve_date - passenger_name
- address
- tel_no

RESERVATION

- passenger_no (PK)
- flight_no (PK)
- time_period_no (PK)
- fare
- date
- reserve_date

FLIGHT PILOT

- depart_time - pilot_no (PK)


- pilot_no (FK) - pilot_name
- flight_no (PK) - date_of_birth
- date - date_of_hire
- arrival_time
- airplane_no
Entity and Attributes 5 marks

Relationship 2 marks

Snowflake schema design – 1 mark

Total 8 marks

(8 marks)
Jun 2014
QUESTION 4

e) Identify TWO (2) challenges in handling a data warehouse.


Answer:

 Data cleaning and finding more “dirty” data than expected.


 Problems associated with coordinating the regular appending of new data from the
transactional databases to the data warehouse.
 Difficulties in managing very large databases.
 The challenge of building and maintaining the data dictionary.
(Any 2 challenges, 2 marks each, total 4 marks)
f) Data cube or hypercube is a multidimensional format consists of cells containing
measure, member and dimension to label numeric data. Describe the data cube basic.
Answer:

 Dimension: subject label for a row or column E.g. – time, types


 Member: value of dimension. E.g. – product, location
 Measure: quantitative data stored in cells. E.g. – total sales
(Each basic 2 marks, total 6 marks)

g) The following tables show the sales data for 1st January 2013:

LOCATIONS
locid city state country
1 Madison WI USA
2 Fresno CA USA
5 Chennai TN India
PRODUCTS
pid pname category price
11 Lee Jeans Apparel 25
12 Zord Toys 18
13 Biro Pen Stationery 2
SALES
pid timeid locid sales
11 1 1 25
11 2 1 8
11 3 1 15
12 1 1 30
12 2 1 20
12 3 1 50
13 1 1 8
13 2 1 10
13 3 1 10
11 1 2 35
11 2 2 22

11 3 2 10

12 1 2 26

12 2 2 45

12 3 2 20

13 1 2 20

13 2 2 40

13 3 2 5
i) Draw a 3D picture of a data cube.
Answer:

8 10 10
13
pid locid = 1
12
30 20 50
11
timeid 1 2 3

Dimensions : 2
Members :2
Measures :2
Total 6 marks

Any relevant answers are accepted.

ii) Describe TWO (2) benefits of applying datawarehouse application in organization.


Answer:

 Help managers in decision making


 Integrating data from multiple sources
 Performing new types of analyses
 Reducing cost to access historical data.
 Standardizing data across the organization, a "single version of the truth"
 Improving turnaround time for analysis and reporting
 Sharing data and allowing others to easily access data
 Supporting ad hoc reporting and inquiry
 Reducing the development burden on IS/IT
(Any 2 benefits, 2 marks each, total 4 marks)

Dec 2015
QUESTION 5

The PeroTiga Automobile Company wants to build a data warehouse to analyze sales of its
cars either yearly, monthly or daily basis. The propose schema for the data warehouse is
as follows:
SALES (ModelID, TimeID, DealerCode, Quantity)
AUTO (ModelID, ModelName, Price)
DEALER (DealerCode, Name, City, State, Telephone)

a. Answer TRUE or FALSE for the following questions:

i. TIME_PERIOD relation is a dimension that is not listed in the above schema.

ii. SALES, AUTO and DEALER is also known as dimension tables.

iii. Data model (schema) to represent the multidimensional data above is called star
schema.

iv. A fact table is related to each dimension table in a one-to-many relationship.

v. Fact and dimension tables are related by primary keys.

ANSWER:
i. T
ii. F
iii. T
iv. F
v. F
(5 marks)

b. Online Analytic Processing (OLAP) tools create an advanced data analysis


environment that supports decision making, business modelling, and operations
research. Describe with example TWO (2) operations performed by OLAP using the
above data warehouse schema.
(5 marks)
ANSWER:

Drill-Down
Allows users to navigate from a more general level to a more specific level. Example
PeroTiga can retrieves the sales from state to individual city in the state.

Roll-Up
Allows users to navigate from a specific level to a general level of a hierarchical
dimension. Example PeroTiga can retrieves the sale of each state from all cities in a
state.
Slice
A subset of the data that focuses on a single value of one of the dimensions. Example
PeroTiga can see the sales of all car models sold by all the dealers at a specific time.

Dice
Replaces a dimension with a subset of values of the dimension. Example Perotiga can
see a specific dealer sales of all car models at a specific time or Perotiga can see a
specific car model sales for all dealers at a specific time.

Pivot or Rotation
Merely a matter of interchanging the data dimensions. Example Perotiga can see on
the screen that the car models appear on the columns and dealers on the rows with
sales data as the intersection of rows and columns. Also Perotiga can reverse the other
way round.
(Any TWO operations: each name 1 mark, description 1½ marks. Total 5 marks)

c. Explain data visualization and provide TWO (2) different techniques of data
visualization that can help PeroTiga in decision making.
(5 marks)
ANSWER:

Data visualization is the abstracting of data to provide information in a visual format


that enhances a user's ability to effectively comprehend the meaning of the data. The
goal of data visualization is to allow the user to see the big picture in the most efficient
way possible. Data visualization aggregates the data into a format that provides at-a-
glance insight into overall trends and patterns.
Data visualization techniques, that can range from simple to very complex, include pie
charts, line graphs, bar charts, scatter plots, gantt charts, and heat maps.
(Any TWO techniques: each name 1 mark, description 1½ marks. Total 5 marks)
d. Discuss TWO (2) examples of mobile database applications that you would recommend
for PeroTiga.
(5 marks)
ANSWER:

 Salespersons can update sales records on the move.


 Staff can update news anytime.
 New marketing strategy for potential car buyers.

(Any 2 mobile apps, each description 2½ marks. Total 5 marks)

You might also like