You are on page 1of 3

DMBI SLS-2

Ques1: Name any three advantages of the STAR schema. Can you think of any
disadvantages of the STAR schema?
Answer:
Three advantages of STAR schema are:-
1. Easy for Users to Understand-
The STAR schema reflects exactly how the users think and need data for querying and
analysis. They think in terms of significant business metrics. The fact table contains the
metrics. The users think in terms of business dimensions for analyzing the metrics. The
dimension tables contain the attributes along which the users normally query and analyze.
When you explain to the users that the units of product A are stored in the fact table and
point out the relationship of this piece of data to each dimension table, the users readily
understand the connections. That is because the STAR schema defines the join paths in
exactly the same way users normally visualize the relationships. The STAR schema is
intuitively understood by the users.

2. Optimizes Navigation-
A major advantage of the STAR schema is that it optimizes the navigation through the
database. Even when you are looking for a query result that is seemingly complex, the
navigation is still simple and straightforward.

3. STARjoin and STARindex-


The STAR schema allows the query processor software to use better execution plans. It
enables specific performance schemes to be applied to queries. The STAR schema
arrangement is eminently suitable for special performance techniques such as the STAR-join
and the STARindex.

STARjoin is a high-speed, single-pass, parallelizable, multitable join. It can join more than
two tables in a single operation. This special scheme boosts query performance.

STARindex is a specialized index to accelerate join performance. These are indexes created
on one or more foreign keys of the fact table. These indexes speed up joins between the
dimension tables and the fact table.
Disadvantages of the STAR schema:-
1. Data integrity is not enforced well since in a highly de-normalized schema state.
2. Not flexible in terms if analytical needs as a normalized data model.
3. Star schemas don’t reinforce many-to-many relationships within business entities at least
not frequently.

Ques2: In a STAR schema to track the shipments for a distribution company, the following
dimension tables are found: (1) time, (2) customer ship-to, (3) ship-from, (4) product, (5)
type of deal, and (6) mode of shipment. Review these dimensions and list the possible
attributes for each of the dimension tables. Also, designate a primary key for each table.
Answer:This schema has six dimensional are time, customer ship-to, ship-from, product, type of
deal, and mode of shipment. The attribute of the first dimension table is Time key, Day, Month,
Quarter, Year, Fiscal_year and Day_of_week. The primary key of this dimension is time (PK).
The attribute of the second dimension table is Customer Ship to Key, Customer Ship to ID,
Customer Ship to Name, Customer Ship to City, Customer Ship to State, Assigned Sales
RepTeam Name, Customer Bill to Name, Customer Ship to Zip. The Customer Ship to the key is
the primary key of this dimension. The attribute of the Third dimension table is ship-from key,
ship type, ship Name. The primary key is ship-from key. The attribute of the fourth dimension
table is product key, productName, product code, product Line and Brand. The primary key is
product key. The attribute ofthe fifth dimension table are type of deal key, deal time, product
key. The primary key of this table is types of deal key. The attribute of the six dimension table
are mode of shipment key, shipment types. The primary key is mode of shipment key.

Ques3: Discuss the major design issues that need to be addressed before proceeding with
the data design.
Answer: Major design issues that need to be addressed before proceeding with the data design
are:
1. Mining methodology and user interaction issues
2. Performance issues
3. Issues relating to the diversity of database types

1.Mining methodology and user interaction issues:


•Mining different kinds of knowledge in databases:
Different user - different knowledge - different way.That means different client want a different
kind of information so it becomes difficult to cover vast range of data that can meet the client
requirement.

•Interactive mining of knowledge at multiple levels of abstraction:


Interactive mining allows users to focus the search for patterns from different angles.The data
mining process should be interactive because it is difficult to know what can be discovered
within a database.

•Incorporation of background knowledge:


Background knowledge is used to guide discovery process and to express the discovered
patterns.

•Query languages and ad hoc mining:


Relational query languages (such as SQL) allow users to pose ad-hoc queries for data
retrieval.The language of data mining query language should be in perfectly matched with the
query language of data warehouse.

•Handling noisy or incomplete data:


In a large database, many of the attribute values will be incorrect.This may be due to human
error or because of any instruments fail. Data cleaning methods and data analysis methods are
used to handle noise data.

2. Performance issues:
•Efficiency and scalability of data mining algorithms:
To effectively extract information from a huge amount of data in databases, data mining
algorithms must be efficient and scalable.

•Parallel, distributed, and incremental mining algorithms:


The huge size of many databases, the wide distribution of data, and complexity of some data
mining methods are factors motivating the development of parallel and distributed data mining
algorithms. Such algorithms divide the data into partitions, which are processed in parallel.

3. Issues relating to the diversity of database types:


•Handling of relational and complex types of data:
There are many kinds of data stored in databases and data warehouses. It is not possible for one
system to mine all these kind of data.So different data mining system should be construed for
different kind’s of data.

•Mining information from heterogeneous databases and global information systems:


Since data is fetched from different data sources on Local Area Network (LAN) and Wide Area
Network (WAN).The discovery of knowledge from different sources of structured is a great
challenge to data mining.

You might also like