Professional Documents
Culture Documents
b) Suppose you are selling the data warehouse idea to your users. How would you
explain to them what multidimensional data analysis is and explain its advantages?
Multidimensional data analysis refers to the processing of data in which data are
viewed as part of a multidimensional structure, one in which data are related in
many different ways. Business decision makers usually view data from a business
perspective. That is, they tend to view business data as they relate to other business
data. For example, a business data analyst might investigate the relationship
between sales and other business variables such as customers, time, product line,
and location. The multidimensional view is much more representative of a business
perspective. A good way to visualize the development and use of relationships is to
examine data pivot tables in MS Excel.
.
(6 marks)
c) The data warehouse project is in the design phase. Explain how you would use a star
schema in the design.
(9 marks)
The star schema is a data modeling technique that is used to map multidimensional
decision support data into a relational database. The reason for the star schema's
development is that existing relational modeling techniques, E-R and normalization,
did not yield a database structure that served the advanced data analysis
requirements well. Star schemas yield an easily implemented model for
multidimensional data analysis while still preserving the relational structures on
which the operational database is built.
The basic star schema has two four components: facts, dimensions, attributes, and
attribute hierarchies. The star schemas represent aggregated data for specific
business activities. Using the schemas, we will create multiple aggregated data
sources that will represent different aspects of business operations. For example, the
aggregation may involve total sales by selected time periods, by products, by stores,
and so on. Aggregated totals can be total product units, total sales values by
products, etc.
Okt 2009
QUESTION 3
a) When is the three-tier data warehouse architecture more appropriate than two tier
data warehouse architecture?
Three tier is more appropriate than two tier when an organization deals with a larger
number of simultaneous clients. The two-tier architecture can have performance
problems for large data warehouses with data intensive application for decision
support. To overcome these difficulties many large organizations use a three tier data
architecture.
(4 marks)
Star Schema:
Store
Item StoreId
ItemId StoreManager
ItemName StoreStreet
ItemUnitPrice StoreCity
ItemBrand StoreSales StoreState
ItemCategory StoreZip
StoreNation
ItemSales DivId
Sales DivName
SalesNo
DivManager
SalesUnits
SalesDollar
Customer SalesCost
TimeDim
CustId TimeNo
CustName TimeSales TimeDay
CustPhone TimeMonth
CustStreet CustSales TimeQuarter
CustCity TimeYear
CustState TimeDayOfWeek
CustZip TimeFiscalYear
CustNation
(5 marks)
Total : 7 marks
(3 marks)
c) A data cube consists of cells containing measures, dimensions and members. Explain
the term measures, dimensions and members.
April 2010
-
April 2011
-
Jan 2012
QUESTION 5
a) What are the advantages of multidimensional data representation over relational data
representation?
Advantages:
(i) (ii)
Industry Application
iii) Discuss the benefits of using data warehouse technology in the application area you
have mention above.
Benefit:
Total = 6 marks)
a) The AddValue Automobile Company wants to build a data warehouse to analyze sales of its
cars either yearly, monthly or daily basis. The propose schema for the data warehouse is as
follows:
(2 marks)
ii) Name a suitable data model (schema) to represent the multidimensional data.
Star schema
(2 marks)
AUTO
AUTO
1 M M 1
DEALER SALES TIME
(6 marks)
Jun 2012
QUESTION 4
Encik Abdullah manages a small product distribution company. Because the business
is growing fast, Encik Abdullah recognizes that it is time to manage the vast
information pool to help guide the accelerating growth. Encik Abdullah, who is
familiar with spreadsheet software, currently employs a small sales force of four
people. He has asked you to develop a data warehouse application prototype that
enables him to study sales quantity by year, region, agent, and product. (This
prototype is to be used as the basis for a future data warehouse database.)
The following SALES ORDER table describe about Encik Abdullah’s company sales
quantity according to year, region, agent, and product.
SALES ORDER
(3 marks)
iii) Draw a star schema for the Data Warehouse.
Ans:
TIME AGENT
TimeID AgentID
Day AgentName
Week AgentAddress
in in
SALES ORDER
SalesID
TimeID
RegionID
AgentID
REGION PRODUCT
RegionID ProductID
RegionName ProductName
in
in
(4 marks)
250
Car-
200
los
Mary
150
100
50
0
East North South West
(6 marks)
b) What is the difference between ERD Snowflake Schema and ERD Constellation
Schema? Support your answers with diagrams.
Ans:
Snowflake schema: A data modeling repsentation for multidimensional databases. In
arelational database, a asnowflake schema has mulyiple levels of dimension tables
related to one or more fact tables.
Page 570(fig 16.11, Mannino)
ERD Constellation Schema contains multiple fact tables in the center related to
dimension tables. Typically, the fact tables share some dimension tables.
Page 569(fig 16.10, mannino)
(6 marks)
Jan 2013
QUESTION 1
Merbok CS231 80
Dungun CS224 70
CS231 150 0 80 0 0
CS224 100 0 0 0 70
240 110 0 0 0
CS220
2012
i) Process of discovering implicit patterns in data and using these patterns for
business advantage.
iii) Allows users to navigate from a more general level to a more specific level.
iv) Retrieves a subset of a data cube similar to the restrict operator of relational
algebra.
Members
Drill-down
Data mining
Dice
Fact table
Data coupling
Measures
Slice
Snowflake schema
Constellation schema
Roll-up
Dimension table
(12 marks)
ANSWER:
i) Data mining
ii) Members
iii) Drill down
iv) Slice
v) Snowflake schema
vi) Fact table
(2 marks each)
Dec 2013
QUESTION 5
Grand Travel Airlines has to keep track of its flight and airplane history. A flight is uniquely
identified by the combination of a flight number and a date. Every passenger who has flown
on Grand Travel has a unique passenger number. For a particular passenger who has
taken a particular flight, the company wants to keep track of the fare that he paid and the
reservation. Clearly, a passenger may have taken many flights (he must have taken at least
one to be in the database) and every flight has had many passengers on it.
A pilot is identified by a unique pilot (or employee) number. A flight on a particular date has
exactly one pilot. Each pilot has typically flown many flights but a pilot may be new to the
company, is in training, and has not flown any flights yet. Each airplane has a unique serial
number. A flight on a particular date used one airplane. Each airplane has flown on many
flights and dates, but a new airplane may not have been used at all yet.
The relational schemas for Grand Travel Airlines are shown as follow:
Data Cleaning - Data warehouses are very sensitive to data errors which must be
“cleaned” or “cleansed” or “scrubbed” as the data is loaded into the data warehouse.
Data Transformation - As the data is extracted from the transactional databases, it must
go through several kinds of data transformations on its way to the data warehouse.
Data Loading - After all of the extracting, cleaning, and transforming, the data is ready to
be loaded into the data warehouse. A schedule for regularly updating the data
warehouse must be put in place
c) Identify the fact and dimensional tables for the above scenario.
(4 marks)
Answer:
Fact table – RESERVATION
Dimensional table – TIME_PERIOD, PASSENGER, PILOT, FLIGHT
d) Design a snowflake schema for the Grand Travel Airlines data warehouse.
Answer:
PASSENGER
TIME_PERIOD
- passenger_no (PK)
- reserve_date - passenger_name
- address
- tel_no
RESERVATION
- passenger_no (PK)
- flight_no (PK)
- time_period_no (PK)
- fare
- date
- reserve_date
FLIGHT PILOT
Relationship 2 marks
Total 8 marks
(8 marks)
Jun 2014
QUESTION 4
g) The following tables show the sales data for 1st January 2013:
LOCATIONS
locid city state country
1 Madison WI USA
2 Fresno CA USA
5 Chennai TN India
PRODUCTS
pid pname category price
11 Lee Jeans Apparel 25
12 Zord Toys 18
13 Biro Pen Stationery 2
SALES
pid timeid locid sales
11 1 1 25
11 2 1 8
11 3 1 15
12 1 1 30
12 2 1 20
12 3 1 50
13 1 1 8
13 2 1 10
13 3 1 10
11 1 2 35
11 2 2 22
11 3 2 10
12 1 2 26
12 2 2 45
12 3 2 20
13 1 2 20
13 2 2 40
13 3 2 5
i) Draw a 3D picture of a data cube.
Answer:
8 10 10
13
pid locid = 1
12
30 20 50
11
timeid 1 2 3
Dimensions : 2
Members :2
Measures :2
Total 6 marks
Dec 2015
QUESTION 5
The PeroTiga Automobile Company wants to build a data warehouse to analyze sales of its
cars either yearly, monthly or daily basis. The propose schema for the data warehouse is
as follows:
SALES (ModelID, TimeID, DealerCode, Quantity)
AUTO (ModelID, ModelName, Price)
DEALER (DealerCode, Name, City, State, Telephone)
iii. Data model (schema) to represent the multidimensional data above is called star
schema.
ANSWER:
i. T
ii. F
iii. T
iv. F
v. F
(5 marks)
Drill-Down
Allows users to navigate from a more general level to a more specific level. Example
PeroTiga can retrieves the sales from state to individual city in the state.
Roll-Up
Allows users to navigate from a specific level to a general level of a hierarchical
dimension. Example PeroTiga can retrieves the sale of each state from all cities in a
state.
Slice
A subset of the data that focuses on a single value of one of the dimensions. Example
PeroTiga can see the sales of all car models sold by all the dealers at a specific time.
Dice
Replaces a dimension with a subset of values of the dimension. Example Perotiga can
see a specific dealer sales of all car models at a specific time or Perotiga can see a
specific car model sales for all dealers at a specific time.
Pivot or Rotation
Merely a matter of interchanging the data dimensions. Example Perotiga can see on
the screen that the car models appear on the columns and dealers on the rows with
sales data as the intersection of rows and columns. Also Perotiga can reverse the other
way round.
(Any TWO operations: each name 1 mark, description 1½ marks. Total 5 marks)
c. Explain data visualization and provide TWO (2) different techniques of data
visualization that can help PeroTiga in decision making.
(5 marks)
ANSWER: