You are on page 1of 2

Name

Roll No.
National Institute of Technology, Hamirpur
Department of Computer Science & Engineering
End Semester Examination- May, 2019
Course: B.Tech. IIITU
Semester: VIII
Subject: Data Warehouse and Data Mining
Code: CSD-421
Time: 03:00 hrs
Max. Marks: 60
Note: All questions are compulsory
Differentiate between the following pairs (give appropriate examples/diagrams): 1101
(a) Full materialization and Partial materialization
(b) Roll up and Drill down
(c) Snow flake and Star schema
(d) Closed frequent item set and Maximal frequent item set
(e) Data warehouse and Data mart

Q2. (a) Suppose a group of 12 sales price records has been sorted as follows: 5, 10, 11, 13, 15, 151
35, 50, 55, 72, 92, 204, 215.
(a) Calculate mean, inter-quartile range and variance.
(b) Normalize maximum value of data in the range [0, 1].
(c) Partition the data using bin-median method. •

(b) The table given below shows data obtained during the outbreak of smallpox. Test the 151
effectiveness of the vaccination in preventing the disease with the help of x2 at 5% level
of significance (x2 value at 0.05 significance level for one degree of freedom= 3.841).

Attacked Not Attacked Total


Vaccinated 31 469 500
Not Vaccinated 185 1315 1500
Total 216 1784 2000
Q3. (a) What is Distributed Data Warehouse? Compare the two architectural approaches for 151
distributed data warehouse.
(b) Write short note on any two of the following.
151
(i) World Wide Web Mining
(ii) Attribute Oriented Induction
(iii) Curse of Dimensionality
Consider the 6 transactions given below. If minimum support is 40% and minimum PI
confidence is 60% then
(a) Determine the frequent itemsets using Apripri algorithm.
(b) List all closed and maximal frequent.item sets:
(c) Determine the strong association rules.
Transact1on Items
TI Bread, Jam Milk, Butter
12 Bread, Milk, Butter, Ketchup
T3 Jam, Milk, Ketchup
T4 Bread, Jam, Milk, Butter
T5 Jam, Milk
T6 Jam. Milk, Butter
Q5. Explain why and in what scenarios GA can be used for classification. Suppose a genetic 12+8=101
algorithm uses chromosomes of the form x = abcdefgh with a fixed length of eight
genes. Each gene can be any digit between 0 and 9. Given the fitness function and the
initial chromosomes, evaluate the fitness of the new population using one-point
crossover method. Showing all your workings. Has the overall fitness improved? Let
f(x) and the set {xi, x2, x3, x4} represent fitness function and initial chromosomes
respectively.
f (x) = (a + b) — (c + d) + (e + f ) — (g + h)
65413532
x2 = 87126601
x3 = 23921285
x4 = 41852094

Q6. (a) Cluster the following eight points (with (x, y) representing location) into three clusters: [6]
A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), Cl(I, 2), C2(4,.9). The
distance function is Minkowski distance. Suppose initially we assign Al, B1, and Cl
as the center of each cluster, respectively. Use the k-means algorithm to show:
(a) The three cluster centers after the first round execution.
(b) The final three clusters.
(b) State advantages and disadvantages of the following data mining techniques. Also, [4]
mention an application scenario for each.
(a) SVM
(b) Neural Network
(c) Fuzzy Logic
(d) Decision Tree

***All the Best***

You might also like