BE.
Seventh Semester Examination, Dec.-2009
Data Warehousing and Data Mining at-101-£)
Note : Attempt any five questions. All questions carry equal maiks.
Q.1-(a) Differentiate between the following :
Database, Data Warehouse, Data Mining, KDD.
‘Ans. Database: Databace is an organized body of related information. A database isa collection of data
for one of more multiple uses. One way of classifying databases involves the type of content, for example
bibliographic, full-text, numeric, image.
Data Warehouse : A data warehouse is a repository of an organization’s electronically stored data
Durawarehouses are designed to faciitae reporting and analysis
‘This definition of data wacchouse focuses on data storage. However, the means to retrieve and analyze
data, to extract, transform and load data and to manage data diotionary are also considered essential compo-
nent of a c ta warehousing system.
Data ining : Data mining is the process of extracting patterns from data. Data mining is becoming an
ingly important tool to transform the data into information. It is commonly used in a wide range of
profiling practiced, such as marketing, surveillance, fraud detection and scientific discovery
KDD : KDD stands for knowledge discovery in databases.
KDD i synonymoue with large databases and automated discovery of patlerns and relationships.
KDD is “thenon-trivial pracess of identifying valid, novel, potentially useful and ultimately understand
able patterns in data.”
‘DD Process:
Reduction Transformed
coding Dal Plata Mining Visualization
Reporte
Jil gel
500 @ dul
Knowledge
ees eonQ.1. (2) Explain the concept of star, snowflake and galaxy schema with the help of suitable example,
‘Ans. Star Schema : Single fact table witha dimension table linked toi
There isa central large fact table with no redundancy.
i) Each type inthe fact table has a foreign key to a dimensional table which describes the details of that
dimensions.
Time Sales Time
dimension table a Fact table dimension table
Times hey Time key om.
Day Mesh tt ome
Day-oFtheweek} | Branch-key yon
Manth Location key :
Quaner lars- sold Pe spe
Year Unit: soid upper
L
Bioesion Location
panera Dimension Table
T Branch Key sation
| Branch- Name City
| Branch: Type Prorince or stat
Lee. ! Country
Fig, Datansining Concepts and Tech
Q.2.(a) Explain in detail the three-tier Data Warehouse architecture, How a query is mapped between
three cers, explain,
‘Ans,
Data warehouse adopt a throe tier architecture, these are:
(i) Bottom tier (datawarehouse server),
(@) this warehouse database, server
(i) Data fed using back end tools and utes.
i) Data extracted esing programs called gateways
(iv) Italso contains meta data repository.
Middle Tier : Middle ter isan OLAF server that is typically implemented using ether
Relational OLAP model hats, extended eclatonal DBMS that maps operations oni ulidimeral data
standard relational operations; or
A multidimensional OLAP mode, shat isa special purpose server that directly imp ements multii-
‘mensional data end operations
‘Top Tier: The top tes ia frontend client layer, which contains query and reporting tools, analysis toolsand or data mining tools.
Analysis
Data Mining
[| Tor Tier
| | Frontend Toots
Hee
Operational
Data Bases
=
=
Extemal Server‘Snowflake Schema : Single fact table with n-dimining tables organized as a hierarchy.
‘Some of th@Vimension tables are normalized thus splining data into additional tables,
Supnlior
Time + Sales Fime dlnpnsion
dimensional able fact table dimension ble table
Dry Htem- name
Day-oftheweek| | Branch-key fem
Month Location. key x Supple 1
Quarter Dollars- sold See crtype
Year Unite sold wpplier ty
ranch Cie
Dimension Location anension
Table Dimension Table Tobie
Branch Key ' Location -Key
Branch: Name ‘street
Branch: Type | | ciy
Galaxy Schema t Also known as fac constellation schema,
* Multiple facts table sharing dimension tables.
In the fig, given below the ‘sales’ fact table and ‘shipping’ fact table share the dimension tables,
Supplier
Time Time fact
dimension table foct table dimension table table
— im Tienvkey
mee ree, Item- Key Tiine-key
Oey ete Item-name |_| shipperhey
Day af-the-week Brand Frometocation
ome Type torlonetion
Quer Supplicrtype || | dotorvcast
uiteshingod
Branch Shipper dimension
Dimension ‘Location ,
‘Table Dimension Tabie ee
Location Ki Ke
Branch: Key 7 Shipper Key
Branch- Name ee ‘Shipper Name
Branch- Type Province-or-state
———F | count |(Q.2.(b) Diseuss various OLAP operations which can be performed on a multidimensional data cube.
‘Ans, OLAP Operations : The analyst con understand the meaning contained in the databases using
‘mokiimensional analysis. By sligaing the data content with the analyst's mental model, the chances of
confusion and erroneous interpretations are reduced. The analyst can navigate through the database and
screen fors particular subset ofthe data, changing the data's orientations and defining analytical relations, The
ser initiated process of navigating by calling for page displays intemstively. through ihe specification of
sliees via otations and drill down up is sometimes called “slice and die”. Common operations inelude slice and
die, di down, roll up and pivot.
Slice : Asie is a subset of a multidimensional array corresponding to a single value for one or more
‘member of the dimensions not in the subset
‘Die : The die operation i a slice on mors than two dimensions of a data cube.
Drill/Down/Up : Drilling down or up is a specific analytical technique whereby the user navigate among.
Levels of data ranging from the most summarized (up) tothe most detailed (down),
Roll up : A roll up involves computing 2M! of the data relationships for one or more dimensions, A.
computational relationship or formnala might be defined
Pivot : This operation is atso called rotate operation that rotates the dats in order to pravide an alternative
presettation of data. To change the dimensional orieatation of e report or page display.
‘Q.3. Suppose a datsbase has four transactions. Let min-support~60%, min-eonfidence = 80%
iL Le
100 TSS (A,B, )
1200 15/10/08 DAGE,B)
‘T300 19110008 {GABE} -
F400 22/10/08 {BAD} .
() Find al trequentitemsets using a prionialgorithmn,
{@) List all strong association rules matching the following meta-role, where X fsa variable repeesent-
dng customers and items denotes variables representing items (¢.g.,A. Bele) Ve (ransaction,
buys item) 4 buys item) => buys items).
‘Ans. (i) Apriori algorithm employs BFS and uses # hash tree structure
FI (Frequent Itemset)= {A, B,D}
(i Assocation Rates
conf (XY) = REPLY)
Coe Soi)
XasY whereX.¥, ct and KAY =§
Meta Retes:
supp(XwY)
Lin (x)= SMR)
SY) sou)
1-Supp(¥)
Comm X = Y) oO F)0. 4. Explain the concept of Query Language employed in data mining and standardization of’ nta
mining, How pattern presentation and visuakzation specifieation ean be carried out in data Mining Query
Langeage?
‘Ans. Data mining query languages are based on mine rule. MINE rule has been designed at the university
‘of Torsion andthe poitechmg di Miland. tt an evtension of sa" whichis coupled vith a veletion DBMS. Data
‘can be selected using the Sul power af SQL Mine association rules are materialized into celaional tables 3s,
‘well. MINE RULE extracts 23 rules between values of atributes in a relational table. However, itis up
to theuserto specify the form of the rules tu be enacted. The wer can specify the cardinality of body and lead
‘of the desired rules andthe attributes on which ule components can be built,
‘An imeresting aspect of mine rule is that itis possible to work on different levets on grouning during the
extraction, Hf there is one level of grouping, rule support wil be completed wet. the number of groups in the
table, Defining a second level of grouping leads to the definition of clusters.
Rules components an be taken in two different clusters, eventually ordered, inside the same group. 11s,
thus possible to extract some elementary sequential patterns (by clustering 09 a time related atriburs). For
instance, grouping purchases by customers who buy fist. Buiter and milk terd to by oil afer. Concerning
intrestingers measures. MINE RULE enables to specify minimal frequency and confidence thresholds. The
genera syntax ofa mine rule quality or extracting rules is :
(MINE RULE ]
ASBODY
{[)
ASHEAD
[SUPPORT] {CONFIDENCE
FROM Table> [WHERE ]
{CLUSTER BY
PHAVING CONFIDENCE: