You are on page 1of 10
BE. Seventh Semester Examination, Dec.-2009 Data Warehousing and Data Mining at-101-£) Note : Attempt any five questions. All questions carry equal maiks. Q.1-(a) Differentiate between the following : Database, Data Warehouse, Data Mining, KDD. ‘Ans. Database: Databace is an organized body of related information. A database isa collection of data for one of more multiple uses. One way of classifying databases involves the type of content, for example bibliographic, full-text, numeric, image. Data Warehouse : A data warehouse is a repository of an organization’s electronically stored data Durawarehouses are designed to faciitae reporting and analysis ‘This definition of data wacchouse focuses on data storage. However, the means to retrieve and analyze data, to extract, transform and load data and to manage data diotionary are also considered essential compo- nent of a c ta warehousing system. Data ining : Data mining is the process of extracting patterns from data. Data mining is becoming an ingly important tool to transform the data into information. It is commonly used in a wide range of profiling practiced, such as marketing, surveillance, fraud detection and scientific discovery KDD : KDD stands for knowledge discovery in databases. KDD i synonymoue with large databases and automated discovery of patlerns and relationships. KDD is “thenon-trivial pracess of identifying valid, novel, potentially useful and ultimately understand able patterns in data.” ‘DD Process: Reduction Transformed coding Dal Plata Mining Visualization Reporte Jil gel 500 @ dul Knowledge ees eon Q.1. (2) Explain the concept of star, snowflake and galaxy schema with the help of suitable example, ‘Ans. Star Schema : Single fact table witha dimension table linked toi There isa central large fact table with no redundancy. i) Each type inthe fact table has a foreign key to a dimensional table which describes the details of that dimensions. Time Sales Time dimension table a Fact table dimension table Times hey Time key om. Day Mesh tt ome Day-oFtheweek} | Branch-key yon Manth Location key : Quaner lars- sold Pe spe Year Unit: soid upper L Bioesion Location panera Dimension Table T Branch Key sation | Branch- Name City | Branch: Type Prorince or stat Lee. ! Country Fig, Datansining Concepts and Tech Q.2.(a) Explain in detail the three-tier Data Warehouse architecture, How a query is mapped between three cers, explain, ‘Ans, Data warehouse adopt a throe tier architecture, these are: (i) Bottom tier (datawarehouse server), (@) this warehouse database, server (i) Data fed using back end tools and utes. i) Data extracted esing programs called gateways (iv) Italso contains meta data repository. Middle Tier : Middle ter isan OLAF server that is typically implemented using ether Relational OLAP model hats, extended eclatonal DBMS that maps operations oni ulidimeral data standard relational operations; or A multidimensional OLAP mode, shat isa special purpose server that directly imp ements multii- ‘mensional data end operations ‘Top Tier: The top tes ia frontend client layer, which contains query and reporting tools, analysis tools and or data mining tools. Analysis Data Mining [| Tor Tier | | Frontend Toots Hee Operational Data Bases = = Extemal Server ‘Snowflake Schema : Single fact table with n-dimining tables organized as a hierarchy. ‘Some of th@Vimension tables are normalized thus splining data into additional tables, Supnlior Time + Sales Fime dlnpnsion dimensional able fact table dimension ble table Dry Htem- name Day-oftheweek| | Branch-key fem Month Location. key x Supple 1 Quarter Dollars- sold See crtype Year Unite sold wpplier ty ranch Cie Dimension Location anension Table Dimension Table Tobie Branch Key ' Location -Key Branch: Name ‘street Branch: Type | | ciy Galaxy Schema t Also known as fac constellation schema, * Multiple facts table sharing dimension tables. In the fig, given below the ‘sales’ fact table and ‘shipping’ fact table share the dimension tables, Supplier Time Time fact dimension table foct table dimension table table — im Tienvkey mee ree, Item- Key Tiine-key Oey ete Item-name |_| shipperhey Day af-the-week Brand Frometocation ome Type torlonetion Quer Supplicrtype || | dotorvcast uiteshingod Branch Shipper dimension Dimension ‘Location , ‘Table Dimension Tabie ee Location Ki Ke Branch: Key 7 Shipper Key Branch- Name ee ‘Shipper Name Branch- Type Province-or-state ———F | count | (Q.2.(b) Diseuss various OLAP operations which can be performed on a multidimensional data cube. ‘Ans, OLAP Operations : The analyst con understand the meaning contained in the databases using ‘mokiimensional analysis. By sligaing the data content with the analyst's mental model, the chances of confusion and erroneous interpretations are reduced. The analyst can navigate through the database and screen fors particular subset ofthe data, changing the data's orientations and defining analytical relations, The ser initiated process of navigating by calling for page displays intemstively. through ihe specification of sliees via otations and drill down up is sometimes called “slice and die”. Common operations inelude slice and die, di down, roll up and pivot. Slice : Asie is a subset of a multidimensional array corresponding to a single value for one or more ‘member of the dimensions not in the subset ‘Die : The die operation i a slice on mors than two dimensions of a data cube. Drill/Down/Up : Drilling down or up is a specific analytical technique whereby the user navigate among. Levels of data ranging from the most summarized (up) tothe most detailed (down), Roll up : A roll up involves computing 2M! of the data relationships for one or more dimensions, A. computational relationship or formnala might be defined Pivot : This operation is atso called rotate operation that rotates the dats in order to pravide an alternative presettation of data. To change the dimensional orieatation of e report or page display. ‘Q.3. Suppose a datsbase has four transactions. Let min-support~60%, min-eonfidence = 80% iL Le 100 TSS (A,B, ) 1200 15/10/08 DAGE,B) ‘T300 19110008 {GABE} - F400 22/10/08 {BAD} . () Find al trequentitemsets using a prionialgorithmn, {@) List all strong association rules matching the following meta-role, where X fsa variable repeesent- dng customers and items denotes variables representing items (¢.g.,A. Bele) Ve (ransaction, buys item) 4 buys item) => buys items). ‘Ans. (i) Apriori algorithm employs BFS and uses # hash tree structure FI (Frequent Itemset)= {A, B,D} (i Assocation Rates conf (XY) = REPLY) Coe Soi) XasY whereX.¥, ct and KAY =§ Meta Retes: supp(XwY) Lin (x)= SMR) SY) sou) 1-Supp(¥) Comm X = Y) oO F) 0. 4. Explain the concept of Query Language employed in data mining and standardization of’ nta mining, How pattern presentation and visuakzation specifieation ean be carried out in data Mining Query Langeage? ‘Ans. Data mining query languages are based on mine rule. MINE rule has been designed at the university ‘of Torsion andthe poitechmg di Miland. tt an evtension of sa" whichis coupled vith a veletion DBMS. Data ‘can be selected using the Sul power af SQL Mine association rules are materialized into celaional tables 3s, ‘well. MINE RULE extracts 23 rules between values of atributes in a relational table. However, itis up to theuserto specify the form of the rules tu be enacted. The wer can specify the cardinality of body and lead ‘of the desired rules andthe attributes on which ule components can be built, ‘An imeresting aspect of mine rule is that itis possible to work on different levets on grouning during the extraction, Hf there is one level of grouping, rule support wil be completed wet. the number of groups in the table, Defining a second level of grouping leads to the definition of clusters. Rules components an be taken in two different clusters, eventually ordered, inside the same group. 11s, thus possible to extract some elementary sequential patterns (by clustering 09 a time related atriburs). For instance, grouping purchases by customers who buy fist. Buiter and milk terd to by oil afer. Concerning intrestingers measures. MINE RULE enables to specify minimal frequency and confidence thresholds. The genera syntax ofa mine rule quality or extracting rules is : (MINE RULE ] ASBODY {[) ASHEAD [SUPPORT] {CONFIDENCE FROM Table> [WHERE ] {CLUSTER BY PHAVING CONFIDENCE:

You might also like