You are on page 1of 5

Code No: R05321204 Set No.

1
III B.Tech II Semester Regular Examinations, Apr/May 2008
DATA WAREHOUSING AND DATA MINING
(Information Technology)
Time: 3 hours Max Marks: 80
Answer any FIVE Questions
All Questions carry equal marks
⋆⋆⋆⋆⋆

1. (a) Draw and explain the architecture for on-line analytical mining.
(b) Briefly discuss the data warehouse applications. [8+8]
2. Briefly discuss the role of data cube aggregation and dimension reduction in the
data reduction process. [16]
3. Write the syntax for the following data mining primitives:
(a) Task-relevant data.
(b) Concept hierarchies. [16]
4. Write short notes for the following in detail:
(a) Measuring the central tendency
(b) Measuring the dispersion of data. [16]
5. (a) Write the FP-growth algorithm. Explain.
(b) What is an iceberg query? Explain with example. [10+6]
6. (a) What is classification? What is prediction?
(b) What is Bayes theorem? Explain about Naive Bayesian classification.
(c) Discuss about k-Nearest neighbor classifiers and case-based reasoning.[4+6+6]
7. (a) Given the following measurement for the variable age:
18, 22, 25, 42, 28, 43, 33, 35, 56, 28
Standardize the variable by the following:
i. Compute the mean absolute deviation of age.
ii. Compute the Z-score for the first four measurements.
(b) What is a distance-based outlier? What are efficient algorithms for mining
distance-based algorithm? How are outliers determined in this method?
[4+4+2+3+3]
8. An e-mail database is a database that stores a large number of electronic mail
messages. It can be viewed as a semistructured database consisting mainly of text
data. Discuss the following.
(a) How can such an e-mail database be structured so as to facilitate multi-
dimensional search, such as by sender, by receiver, by subject, by time, and
so on?

1 of 2
Code No: R05321204 Set No. 1
(b) What can be mined from such an e-mail database?
(c) suppose you have roughly classified a set of your previous e-mail messages as
junk, unimportant, normal, or important. Describe how a data mining system
may take this as the training set to automatically classify new e-mail messages
or unclassified ones. [5+5+6]

⋆⋆⋆⋆⋆

2 of 2
Code No: R05321204 Set No. 2
III B.Tech II Semester Regular Examinations, Apr/May 2008
DATA WAREHOUSING AND DATA MINING
(Information Technology)
Time: 3 hours Max Marks: 80
Answer any FIVE Questions
All Questions carry equal marks
⋆⋆⋆⋆⋆

1. (a) Explain data mining as a step in the process of knowledge discovery.


(b) Differentiate operational database systems and data warehousing. [8+8]

2. (a) Briefly discuss about data integration.


(b) Briefly discuss about data transformation. [8+8]

3. (a) Explain the syntax for Task-relevant data specification.


(b) Explain the syntax for specifying the kind of knowledge to be mined. [8+8]

4. (a) Write the algorithm for attribute-oriented induction. Explain the steps in-
volved in it.
(b) How can concept description mining be performed incrementally and in a
distributed manner? [8+8]

5. Explain the Apriori algorithm with example. [16]

6. Discuss about Backpropagation classification. [16]

7. (a) Write algorithms for k-Means and k-Medoids. Explain.


(b) Discuss about density-based methods. [8+8]

8. Suppose that a city transportation department would like to perform data analysis
on highway traffic for the planning of highway construction based on the city traffic
data collected at different hours every day.

(a) Design a spatial data warehouse that stores the highway traffic information so
that people can easily see the average and peak time traffic flow by highway, by
time of day, and by weekdays, and the traffic situation when a major accident
occurs.
(b) What information can we mine from such a spatial data warehouse to help
city planners?
(c) This data warehouse contains both spatial and temporal data. Propose one
mining technique that can efficiently mine interesting patterns from such a
spatio-temporal data warehouse. [5+5+6]

⋆⋆⋆⋆⋆

1 of 1
Code No: R05321204 Set No. 3
III B.Tech II Semester Regular Examinations, Apr/May 2008
DATA WAREHOUSING AND DATA MINING
(Information Technology)
Time: 3 hours Max Marks: 80
Answer any FIVE Questions
All Questions carry equal marks
⋆⋆⋆⋆⋆

1. (a) Explain the major issues in data mining.


(b) Explain the three-tier datawarehousing architecture. [8+8]

2. Discuss the role of data compression and numerosity reduction in data reduction
process. [16]

3. Write the syntax for the following data mining primitives:

(a) The kind of knowledge to be mined.


(b) Measures of pattern interestingness. [16]

4. (a) What are the differences between concept description in large data bases and
OLAP?
(b) Explain about the graph displays of basic statistical class description. [8+8]

5. Explain the Apriori algorithm with example. [16]

6. (a) Describe the data classification process with a neat diagram.


(b) How does the Naive Bayesian classification works? Explain.
(c) Explain classifier accuracy. [5+5+6]

7. (a) Given two objects represented by the tuples (22,1,42,10) and (20,0,36,8):
i. Compute the Euclidean distance between the two objects.
ii. Compute the Manhanttan distance between the two objects.
iii. Compute the Minkowski distance between the two objects, using q=3.
(b) Explain about Statistical-based outlier detection and Deviation-based outlier
detection. [3+3+4+3+3]

8. Explain the following:

(a) Constriction and mining of object cubes


(b) Mining associations in multimedia data
(c) Periodicity analysis
(d) Latent semantic indexing. [4+4+4+4]

⋆⋆⋆⋆⋆

1 of 1
Code No: R05321204 Set No. 4
III B.Tech II Semester Regular Examinations, Apr/May 2008
DATA WAREHOUSING AND DATA MINING
(Information Technology)
Time: 3 hours Max Marks: 80
Answer any FIVE Questions
All Questions carry equal marks
⋆⋆⋆⋆⋆

1. (a) Explain data mining as a step in the process of knowledge discovery.


(b) Differentiate operational database systems and data warehousing. [8+8]
2. (a) Briefly discuss about data integration.
(b) Briefly discuss about data transformation. [8+8]
3. (a) Describe why is it important to have a data mining query language.
(b) The four major types of concept hierarchies are: schema hierarchies, set-
grouping hierarchies, operation-derived hierarchies, and rule-based hierarchies-
Briefly define each type of hierarchy. [8+8]
4. Write short notes for the following in detail:
(a) Measuring the central tendency
(b) Measuring the dispersion of data. [16]
5. (a) How can we mine multilevel Association rules efficiently using concept hierar-
chies? Explain.
(b) Can we design a method that mines the complete set of frequent item sets
without candidate generation. If yes, explain with example. [8+8]
6. (a) Explain about basic decision tree induction algorithm.
(b) Discuss about Bayesian classification. [8+8]
7. (a) Given two objects represented by the tuples (22,1,42,10) and (20,0,36,8):
i. Compute the Euclidean distance between the two objects.
ii. Compute the Manhanttan distance between the two objects.
iii. Compute the Minkowski distance between the two objects, using q=3.
(b) Explain about Statistical-based outlier detection and Deviation-based outlier
detection. [3+3+4+3+3]
8. (a) Give an example of generalization-based mining of plan databases by divide-
and-conquer.
(b) What is sequential pattern mining? Explain.
(c) Explain the construction of a multilayered web information base. [8+4+4]

⋆⋆⋆⋆⋆

1 of 1