Professional Documents
Culture Documents
Page 1of 4
04000CS402052202
PART B
Answer any two full questions, each carries 9 marks.
11 a) Elaborate the different steps involved in knowledge discovery in databases. How (5)
KDD and data mining is differed?
b) Consider the following data (in increasing order) for the attribute age: (4)
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35,
35, 36, 40, 45, 46, 52, 70.
Use smoothing by bin means and bin boundary to smooth these data, using a bin
depth of 3. Illustrate your steps. Comment on the effect of this technique for the
given data.
12 a) Discuss the salient features of data warehouse. Enumerate the main (5)
characteristics that makes data warehouse different from database?
b) Redundant attribute may be present when data from multiple sources are (4)
integrated. Give any one method to identify such redundant attributes.
Suppose two stocks A and B have the following values in one week: (2, 5), (3,
8), (5, 10), (4, 11), (6, 14). If the stocks are affected by the same industry trends,
will their prices rise or fall together?
13 a) Explain the data warehouse architecture and its components with a neat diagram. (5)
b) In real-world data, tuples with missing values for some attributes and noisy data (4)
are a common occurrence. Describe various methods for handling these two
kinds of issues in data mining.
PART C
Answer any two full questions, each carries 9 marks.
14 a) Consider the following dataset given below to predict the transport mode. Use (5)
ID3 algorithm and find the best attribute to split at the root level of a decision
tree.
Gender Car Travel Income Transport
Ownership Cost Level Mode
Male 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Female 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Female 1 Expensive High Car
Page 2of 4
04000CS402052202
P1 P2 Class
7 7 False
7 4 False
3 4 True
1 4 True
b) What is meant by kernel trick in support vector machines? How it is utilised in a (4)
SVM classifier?
16 a) Find the least square regression line y = ax +b for the given set of points: (5)
{(-1, 0), (0, 2), (1, 4), (2, 5)}.
b) Differentiate Regression and Classification with an example. (4)
PART D
Answer any two full questions, each carries 12 marks.
17 a) A database has five transactions. Assume the minimum support and confidence (6)
to be 60%. Perform Apriori Algorithm on the given set and find the frequent
itemsets. Also check whether the rule {Milk, Diaper} → {Beer} is strong
association rule or not. Justify your answer.
Page 3of 4
04000CS402052202
b) Explain the steps and techniques involved in text mining process. (6)
18 a) Suppose that the data mining task is to cluster the following seven points (with (6)
(x,y) representing location) into two clusters A1(1, 1), A2(1.5, 2), A3(3, 4),
A4(5, 7), A5(3.5, 5), A6(4.5, 5), A7(3.5, 4.5) The distance function is City block
distance. Suppose initially we assign A1, A5 as the centre for each cluster
respectively. Using the K-means algorithm to find the three clusters and their
centres after first round of execution.
b) Explain BIRCH clustering method. List out the advantages of BIRCH over other (6)
clustering techniques.
19 a) Explain k-medoids clustering algorithm with an example. (6)
b) Explain the Apriori based approach for mining frequent sub graphs with an (6)
example.
****
Page 4of 4