Professional Documents
Culture Documents
Ques1: Name any three advantages of the STAR schema. Can you think of any
disadvantages of the STAR schema?
Answer:
Three advantages of STAR schema are:-
1. Easy for Users to Understand-
The STAR schema reflects exactly how the users think and need data for querying and
analysis. They think in terms of significant business metrics. The fact table contains the
metrics. The users think in terms of business dimensions for analyzing the metrics. The
dimension tables contain the attributes along which the users normally query and analyze.
When you explain to the users that the units of product A are stored in the fact table and
point out the relationship of this piece of data to each dimension table, the users readily
understand the connections. That is because the STAR schema defines the join paths in
exactly the same way users normally visualize the relationships. The STAR schema is
intuitively understood by the users.
2. Optimizes Navigation-
A major advantage of the STAR schema is that it optimizes the navigation through the
database. Even when you are looking for a query result that is seemingly complex, the
navigation is still simple and straightforward.
STARjoin is a high-speed, single-pass, parallelizable, multitable join. It can join more than
two tables in a single operation. This special scheme boosts query performance.
STARindex is a specialized index to accelerate join performance. These are indexes created
on one or more foreign keys of the fact table. These indexes speed up joins between the
dimension tables and the fact table.
Disadvantages of the STAR schema:-
1. Data integrity is not enforced well since in a highly de-normalized schema state.
2. Not flexible in terms if analytical needs as a normalized data model.
3. Star schemas don’t reinforce many-to-many relationships within business entities at least
not frequently.
Ques2: In a STAR schema to track the shipments for a distribution company, the following
dimension tables are found: (1) time, (2) customer ship-to, (3) ship-from, (4) product, (5)
type of deal, and (6) mode of shipment. Review these dimensions and list the possible
attributes for each of the dimension tables. Also, designate a primary key for each table.
Answer:This schema has six dimensional are time, customer ship-to, ship-from, product, type of
deal, and mode of shipment. The attribute of the first dimension table is Time key, Day, Month,
Quarter, Year, Fiscal_year and Day_of_week. The primary key of this dimension is time (PK).
The attribute of the second dimension table is Customer Ship to Key, Customer Ship to ID,
Customer Ship to Name, Customer Ship to City, Customer Ship to State, Assigned Sales
RepTeam Name, Customer Bill to Name, Customer Ship to Zip. The Customer Ship to the key is
the primary key of this dimension. The attribute of the Third dimension table is ship-from key,
ship type, ship Name. The primary key is ship-from key. The attribute of the fourth dimension
table is product key, productName, product code, product Line and Brand. The primary key is
product key. The attribute ofthe fifth dimension table are type of deal key, deal time, product
key. The primary key of this table is types of deal key. The attribute of the six dimension table
are mode of shipment key, shipment types. The primary key is mode of shipment key.
Ques3: Discuss the major design issues that need to be addressed before proceeding with
the data design.
Answer: Major design issues that need to be addressed before proceeding with the data design
are:
1. Mining methodology and user interaction issues
2. Performance issues
3. Issues relating to the diversity of database types
2. Performance issues:
•Efficiency and scalability of data mining algorithms:
To effectively extract information from a huge amount of data in databases, data mining
algorithms must be efficient and scalable.