You are on page 1of 4

A 04000CS402052202 Pages: 4

Reg No.:_______________ Name:__________________________


APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
B.Tech Degree S8 (S, FE) / S6 (PT) (S, FE) Examination June 2023 (2015 Scheme)

Course Code: CS402


Course Name: DATA MINING AND WAREHOUSING
Max. Marks: 100 Duration: 3 Hours
PART A
Answer all questions, each carries 4 marks. Marks

1 Differentiate roll up and drill down operations with an example. (4)


2 List out the challenges and applications of data warehousing. (4)
3 Suppose a group of Sales Price records has been sorted as follows: (4)
5, 5, 10, 11, 11, 12, 13, 14, 14, 15, 20, 22, 25, 28, 30, 39
Plot histogram for the given data using equi-width and equi-depth concept.
4 Explain the relevance of attribute selection measures used in Decision Tree. (4)
How does information gain differs from gain ratio?
5 Differentiate eager learning and lazy learning techniques. Why k-NN is (4)
called as lazy learner technique?
6 Suppose a computer program for recognizing dogs in photographs identifies (4)
eight dogs in a picture containing 12 dogs and some cats. Of the eight dogs
identified, five actually are dogs while the rest are cats. Compute the precision
and recall of the program.
7 How do we measure the dissimilarity in clustering? Illustrate the various (4)
distance measures used for computing the dissimilarity.
8 Construct FP Tree using the following eight set of transactions. Set minimum (4)
support as 30%.
T1 { E, A, D, B }, T2 { D, A, C, E, B }, T3 { C, A, B, E },
T4 { B, A, D }, T5 { D }, T6 { D, B }, T7 { A, D, E }, T8 { B,C }
9 Explain the major terms used in density based clustering. (4)
(a) Core Object (b) Density connected (c) Density Reachable
(d) Directly Density Reachable
10 Elucidate the characteristics of social networks in data mining. (4)

Page 1of 4
04000CS402052202

PART B
Answer any two full questions, each carries 9 marks.
11 a) Elaborate the different steps involved in knowledge discovery in databases. How (5)
KDD and data mining is differed?
b) Consider the following data (in increasing order) for the attribute age: (4)
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35,
35, 36, 40, 45, 46, 52, 70.
Use smoothing by bin means and bin boundary to smooth these data, using a bin
depth of 3. Illustrate your steps. Comment on the effect of this technique for the
given data.
12 a) Discuss the salient features of data warehouse. Enumerate the main (5)
characteristics that makes data warehouse different from database?
b) Redundant attribute may be present when data from multiple sources are (4)
integrated. Give any one method to identify such redundant attributes.
Suppose two stocks A and B have the following values in one week: (2, 5), (3,
8), (5, 10), (4, 11), (6, 14). If the stocks are affected by the same industry trends,
will their prices rise or fall together?
13 a) Explain the data warehouse architecture and its components with a neat diagram. (5)
b) In real-world data, tuples with missing values for some attributes and noisy data (4)
are a common occurrence. Describe various methods for handling these two
kinds of issues in data mining.
PART C
Answer any two full questions, each carries 9 marks.
14 a) Consider the following dataset given below to predict the transport mode. Use (5)
ID3 algorithm and find the best attribute to split at the root level of a decision
tree.
Gender Car Travel Income Transport
Ownership Cost Level Mode
Male 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Female 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Female 1 Expensive High Car

Page 2of 4
04000CS402052202

Male 2 Expensive Medium Car


Female 2 Expensive High Car
Female 1 Cheap Medium Train
Male 0 Standard Medium Train
Female 1 Standard Medium Train

b) Illustrate the steps of back propagation algorithm in neural networks. (4)


15 a) Perform KNN classification on the following dataset and predict the class for the (5)
data point X (P1 = 3, P2 =7), assuming the value of K as 3. (Using Manhattan
distance).

P1 P2 Class

7 7 False

7 4 False

3 4 True

1 4 True

b) What is meant by kernel trick in support vector machines? How it is utilised in a (4)
SVM classifier?
16 a) Find the least square regression line y = ax +b for the given set of points: (5)
{(-1, 0), (0, 2), (1, 4), (2, 5)}.
b) Differentiate Regression and Classification with an example. (4)
PART D
Answer any two full questions, each carries 12 marks.
17 a) A database has five transactions. Assume the minimum support and confidence (6)
to be 60%. Perform Apriori Algorithm on the given set and find the frequent
itemsets. Also check whether the rule {Milk, Diaper} → {Beer} is strong
association rule or not. Justify your answer.

Trans id Items Bought


1 Bread, Milk
2 Bread, Diaper, Beer, Egg
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Page 3of 4
04000CS402052202

b) Explain the steps and techniques involved in text mining process. (6)
18 a) Suppose that the data mining task is to cluster the following seven points (with (6)
(x,y) representing location) into two clusters A1(1, 1), A2(1.5, 2), A3(3, 4),
A4(5, 7), A5(3.5, 5), A6(4.5, 5), A7(3.5, 4.5) The distance function is City block
distance. Suppose initially we assign A1, A5 as the centre for each cluster
respectively. Using the K-means algorithm to find the three clusters and their
centres after first round of execution.
b) Explain BIRCH clustering method. List out the advantages of BIRCH over other (6)
clustering techniques.
19 a) Explain k-medoids clustering algorithm with an example. (6)
b) Explain the Apriori based approach for mining frequent sub graphs with an (6)
example.
****

Page 4of 4

You might also like