You are on page 1of 5

Anna University DEPARTMENT OF INFORMATION TECHNOLOGY Data Warehousing and Data Mining QUESTION BANK 2012 Edition Sub

Code : CS2032 Sub Name: Data Warehousing and Data Mining UNIT-I PART A DATA WAREHOUSING 1. Define the term ‘Data Warehouse’. 2. Write down the applications of data warehousing. 3. When is data mart appropriate? 4. List out the functionality of metadata. 5. What are nine decision in the design of a Data warehousing? 6. List out the two different types of reporting tools. 7. Why data mining is used in all organizations. 8. What are the technical issues to be considered when designing and implementing a data warehouse environment? 9. List out some of the examples of access tools. 10. What are the advantages of data warehousing. 11. Give the difference between the Horizontal and Vertical Parallelism. 12. Draw a neat diagram for the Distributed memory shared disk architecture. 13. Define star schema. 14. What are the reasons to achieve very good performance by SYBASE IQ technology? 15. What are the steps to be followed to store the external source into the data warehouse? 16. Define Legacy data. 17. Draw the standard framework for metadata interchange. 18. List out the five main groups of access tools. 19. Define Data Visualization. 20. What are the various forms of data preprocessing? 21. How is data warehouse different from database? How are they similar? 22. What is data transformation? Give example. 23. With an example explain what is Meta data? 24. What is data mart? PART-B 1. Enumerate the building blocks of data warehouse. Explain the importance of metadata in a data warehouse environment. [16] 2. Explain various methods of data cleaning in detail [8] 3. Diagrammatically illustrate and discuss the data warehousing architecture with briefly explain components of data warehouse [16] 4. (i) Distinguish between Data warehousing and data mining. [8] (ii)Describe in detail about data extraction, cleanup [8] 5. Write short notes on (i)Transformation [8] (ii)Metadata [8]

Define Concept Hierarchy. What is meant by OLAP? 4. Classify OLAP tools. Define ADF. 7. List out the applications that the organizations uses to build a query and reporting environment for the data warehouse. 17. What are the various forms of data preprocessing? 5. 12. (i)Explain Multidimensional Data model. List and discuss the basic features that are provided by reporting and query tools used for business analysis. Explain about OLAP in detail. Discuss about the OLAP tools and the Internet [16] 7. 14. Distinguish between window painter and data windows painter. .6. 15. List out any 5 OLAP guidelines. 2. 16. Draw a neat sketch for three-tired client/server architecture. SGF and DEF. 11. [6] 2. [16] 3. List and discuss the steps involved in mapping the data warehouse to a multiprocessor architecture. List out the five categories of decision support tools. Define Cognos Impromptu 8. [6] UNIT-III DATA MINING PART A 1. 2. [16] 7. With relevant examples discuss multidimensional online analytical processing and multi-relational online analytical processing. Define MQE. 10. What is the function of power play administrator? PART-B 1. What is the need for discretization in data mining?. Difference between OLAP & OLTP 5. State why the data preprocessing an important issue for data warehousing and data mining. Discuss in detail about Bitmapped Indexing [16] 8. [10] (ii)Discuss how computations can be performed efficiently on data cubes. [16] UNIT-II BUSINESS ANALYSIS PART A 1. [16] 6. 13. [16] 5. Explain in detail about different Vendor Solutions. Describe in detail about Cognos Impromptu [16] 4. 4. 6. Discuss the typical OLAP operations with an example. 9. Draw a neat diagram for the web processing model. 3. 3. Define ROLAP. Difference between OLAP and OLTP. Distinguish between multidimensional and multi-relational OLAP. Define data. What is concept Hierarchy? Give an example.

6. What is meant by market Basket analysis? 2. 13. PART-B 1. Define conditional pattern base. What is meant by pruning in a decision tree induction? 4. List out the major strength of decision tree method. What is tree pruning in decision tree induction? 15. 8. [6] 5. [10] (ii)Explain in detail about data mining task primitives. [16] 3. In classification trees. What is the use of multi level association rules? 16. Discuss about different types of data and functionalities. Write the two measures of association rule. What do data mining functionalities include? 11. PART-B . (i)Describe in detail about Interestingness of patterns. [10] (ii) Describe the various descriptive statistical measures for data mining. 12. What is the use of multilevel association rules? 3. What is a support vector machine? 19. and how are they used? 9. 8. How data mining system can be integrated with a data warehouse? Discuss with [16] an example. How are association rules mined from large databases? 14. With an example explain correlation analysis. 6. (i)Discuss about different Issues of data mining. Write the two measures of Association Rule. Mention the various tasks to be accomplished as part of data pre-processing. what are the surrogate splits. 7. List out any four data mining tools. List out the major strength of the decision tree Induction. What are the Apriori properties used in the Apriori algorithms? 17. UNIT-IV ASSOCIATION RULE AND CLASSIFICATION Part A 1. [6] 4. [6] 2. Define patterns. What are the various forms of data preprocessing? 7. (ii)Explain in detail about data preprocessing. What is the frequent item set property? 11. Define Data Mining. 9.6. What are the means to improve the performance of association rule mining algorithm? 20. (i) Explain the various primitives for specifying Data mining Task. 10. How is predication different from classification? 18. State the advantages of the decision tree approach over other approaches for performing classification. [10] How data mining system are classified? Discuss each classification with an [16] example. The Naïve Bayes’ classifier makes what assumptions that motivate its name? 10. 5.

8.8) Medium M (1.… candidate item sets in C1. (a)Explain the algorithm for constructing a decision tree from training samples [12] (b)Explain Bayes theorem. 1. [16] . 1. B.7) Low M (2. 8) High F (1. F. 1.B} T200 10/15/07 {D.7.E. 1.A.9. [16] 4. C2. giving a suitable example.E} T400 10/22/07 {B.5. TID DATE ITEMS_BOUGHT T100 10/15/07 {K.C. G 6 B. Discuss the approaches for mining multi level association rules from the transactional databases. E. S 3 A. [16] 7.6) Low M (1.0.8. H 2 B. Risk is the class label attribute. The minimum confidence is 70%. 2. Calculate the information gain if Height is chosen as the test attribute. Discuss in detail about Bayesian classification [8] 11. Compare the efficiency of the two mining process. Gender Height Risk F (1. F.B. L2.1. [16] 8. Generate all the “IF-THEN rules from the decision tree.6. Let min sup=60% and min conf=80%. [16] 2. The Height values have been already discredited into disjoint ranges. [4] 6. Draw the final decision tree (without any pruning) for the training dataset.D} Find all frequent itemsets using Apriori and FP growth.A.8) Medium [16] (a) Given the following transactional database 1 C.0) High F ( [7] 3.8. 1. Calculate the information gain if Gender is chosen as the test attribute. How is attribute oriented induction implemented? Explain in detail. Describe the multi-dimensional association rule. H 5 B. Consider the following training dataset and the original decision tree induction algorithm (ID3). 8) High M (2. B. Give relevant example. briefly outline the method of decision tree classification.6) Low M (1. G 4 C. 1. F.A.0.9) Medium F (1. respectively.8.B} T300 10/19/07 {C. (You need to give the setof frequent item sets in L1. Decision tree induction is a popular classification method.7.9) Medium F (1.…) [9] (ii) Find all the association rules that involve only B. Taking one typical decision tree induction algorithm . Give relevant example.A. O (i) We want to mine all the frequent itemsets in the data using the Apriori algorithm.6.Illustrate the algorithm with a relevant example.H (in either leftor right hand side of the rule). [16] 10. Assume the minimum support level is 30%. 1. A database has four transactions. Develop an algorithm for classification using Bayesian classification. 1. C. 1.9) Medium F (1. [16] 9. Write and explain the algorithm for mining frequent item sets without candidate generation.8) Medium F (1.0) Medium F (1. 1.9) Medium F (1. 2. 1.7) Low M (1.

Write a short note on web mining taxonomy. [6+5+5] 4. Explain the different activities of text mining. [6] 2. What is cluster analysis ? 16. What is web usage mining? 12. What is a multimedia database? Explain the methods of mining multimedia database? [16] 6. [16] 8. 18. 11. What are the two data structures in cluster analysis? 17. (a) BIRCH (b) CURE [16] 7. PART-B 1. Mention the advantages of Hierarchical clustering. What is the objective function of K-means algorithm? 8. What are the applications of spatial databases? 14. 10. Describe K means clustering with an example. 11. What is audio data mining? 19. 5. What is text mining? 15. With relevant example discuss constraint based cluster analysis. Distinguish between classification and clustering. [16] . 9. 7. What are the requirements of clustering? 13. Distinguish between classification and clustering. Define a Spatial database. What is text mining? 4. Discuss in detail about any four data mining applications. BIRCH and CLARANS are two interesting clustering algorithms that perform effective clustering in large data sets. What are the requirements of clustering? 2. What is an outlier? Give example. 3. [10] (ii) Compare and outline the major differences of the two scalable clustering algorithms BIRCH and CLARANS. Describe in detail about Hierarchical methods. Discuss and elaborate the current trends in data mining. List the requirements of clustering in data mining.UNIT-V CLUSTERING AND APPLICATION AND TRENDS IN DATA PART A 1. List two application of data mining. What are the applications of spatial data bases? 3. Discuss spatial data bases and Text databases [16] 5. [16] 10. (a) Explain the following clustering methods in detail. List out any two various commercial data mining tools. (i) Outline how BIRCH performs clustering in large data sets. Write short notes on (i) Partitioning methods [8] (ii) Outlier analysis [8] 9. 6.