Professional Documents
Culture Documents
DM QB PDF
DM QB PDF
QUESTION BANK
Module 1 and Module 2
Sl No. Questions Marks
1 Why do many enterprises need a data warehouse? 4
2 What are OLTP and OLAP database syatems? 4
3 What is ODS and what is is used for ? 4
4 Explain why ETL must deal with dirty data when extracting information from the source 8
systems.
5 List the major steps involved in the ETL process 6
6 What is the need for a separate database for decision makers? 4
7 What is a data warehouse and how it might be defined? 4
8 What are the likely benefits of building an enterprise data warehouse? 6
9 What is the major difference between the star schema and the snowflake schema? 8
10 List some differences between an OLTP system and a data warehouse system. 7
11 Describe the features of a data warehouse. 6
12 What is OLTP database system? 8
13 What is an ODS used for? How does it differ from an OLTP system 7
14 Give three most important guideline in implementing a data wartehouse for a large enterprise. 7
15 Give two major components of any data warehouse system. 8
16 What ETL? 4
17 Give two reasons for the dirty data being extracted from source systems? 7
18 List four steps of the ETL process. 8
19 Define the terms star schema and snowflake schema. 10
20 What types of queries do managers need to pose to the enterprise’s database systems? 8
21 Describe the type of metadata that is maintained in a data warehouse. 8
22 What are the major differences between OLTP and a data warehouse system? 10
23 Explain the star scheme technique of modeling a data warehouse. 8
24 What are the type of metadata that is maintained in a data warehouse. 8
25 What are the dimensions, members, measure and fact table? 7
26 What is OLAP? 4
27 List the characterstics of OLAP systems. 4
28 List some of the motivations for using OLAP. 6
29 Expalin multidimensional view and a data cube. 8
30 What are the different implementations of a data cube? 8
31 What are the differences between ROLAP and MOlAP. 10
32 Describe the operations roll-up, drill-down, slice and the dice and pivot. 10
33 List some guidelines for implementations OLAP. 8
34 What OLAP softaware is available in the market? 6
35 List four types of aggregate queries that are possible with two variables. 7
36 What are dimension? 4
37 What is a measure? 4
38 What is fact and fact table? 6
39 Give a Simple definition of OLAP. 7
40 List two major characterstics of OLAP. 5
41 Define data cube in your own words. 7
42 Show how a data cube of two dimensions looks like. 7
43 Give a simple data cube implenetation. 8
44 Are all data cube entries non-zero? If not, why not? 8
45 What is the differences between roll-up and Pivot? 10
th
B.E 6 Semester Information Science 1
46 What is the difference between drill-down and slicing? 10
Module 2:
Sl Questions Marks
No.
1. What is data mining 5
2.
3. Mention Data mining functionality, classification, prediction, clustering & evolution 5
analysis?
4. What are the challenges in methodology of Data Mining technology? 5
5. Discuss issues to consider during Data Mining? 5
6. What defines a Data Mining Task Explain at least 5 primitives? 5
7. What is knowledge discovery? 5
8. Explain the motivating challenges in development of data mining. 5
9. Explain with example the data mining tasks 10
10 What is a data? What do you mean by quality of data? 4
11 What is a data set? Explain the various types of data sets 10
12 What is data preprocessing?
13 Explain the following 5 marks
i. Aggregation each
ii. Sampling
iii. Dimensionality reduction
iv. Feature subset selection
v. Feature creation
vi. Discretization and binarization
vii. Variable transformation
Give example
14 Explain the similarity and dissimilarity between 2 objects 6
15 What is Ecludian distance? Write the generalized Minkowski distance metric for 8
various values r.
16 Explain the properties of Ecludian distance. 6
17 What is simple matching coefficients and Jaccard coefficient? Explain with examples 8
18 What is meant by cousine similarity? Explain with example. 6
19 What is Bregman divergence? 5
20 What are the issues related to proximity measures? 10
21 Discuss on selection on right proximity measures 7
Module 3:
1. What is Apriori algorithm? 5
2. Explain the association rule Mining? 5
3. What is more efficient method for Generalizing association rule explain? 5
4. Suppose that the following table is derived by attribute-oriented induction.
Canada 180 10
Programmer others 120
Canada 20
DBA others 80
a. Transform the table into crosstab showing the associated t-weights and d-
weights.
b. Map the class Programmer into a (bidirectional)Quantitative descriptive rule,
for example, VX, Programmer(X) (birth_place (X)<=>”Canada”^…) [t:x%,
d:y%]…V(…) [t:w%,d:z%].
5. Suppose that the data for analysis includes the attribute age. The age values for the 10
data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20,21, 22, 22, 25, 25,
25, 25, 30, 33, 33, 35, 35, 35, 36, 40, 45, 46, 52, 70.
a. Find all frequent items using apriori & FP-growth, respectively. Compare the
efficiency of the two meaning process.
b. List all of the strong association rules (with support s and confidence c)
matching the following metarule where X is a variable representing
customers, and item i denotes variables representing items (e.g., “A”, “B”,
etc.):
Vx Є transactions, buys(X,item1) ^ buys(X,item2) =>buys(X,item3)[s,c]
7. Prove that each entry in the following table correctly characterizes its corresponding 10
rule constraint for frequent item set mining
a. v Є S No Yes Yes
b. S C V Yes No Yes
c. min(S)≤v No Yes Yes
d. range(S) ≤v Yes No No
e. variance(S) ≤v convertible convertible No
Module-4 and 5
th
B.E 6 Semester Information Science 4