You are on page 1of 4

Instructions for how to solve Assignment

1. Each assignment must be solved in white pages sheet separately using BLUE PEN.
2. Each of assignment (every page) you need to write your name, registration number and put your
signature on it.
3. Last date for submitting assignment is 23/03/2021 on before 11:30 P.M at MS Team.
Motilal Nehru National Institute of Technology, Allahabad.
Department of Computer Science & Engineering,
Information Technology
B. Tech. (IT) VI Semester
Subject:- Data Mining
Assignment-I

Q1. For each of the following applications, decide whether DM would offer a viable solution
and briefly justify your decision:

a) Predicting whether a particular chemical compound is carcinogenic (i.e., will induce


cancer) or not.
b) Determining which african-american applicants should be extended a home loan.
c) Discovering what grocery items Walmart customers tend to buy together.
d) Grouping Wells Fargo customers by socio-demographic attributes and banking habits.
e) Predicting tomorrow's value of Microsoft's stock.
f) Predicting whether a Netflix subscriber will rent a particular new release.
g) Sorting a list of number in ascending order.
h) Identifying terrorists.

Q2. Briefly outline a possible application of DM to some aspect of your life. This need not
have to do with you directly; you may think of companies you/your friends/your relatives
work for, schools you've attended, businesses you come in contact with regularly, etc.

Q3. Consider the following simple dataset.

A B C D T
0 1 1 0 1
0 0 1 1 1
0 1 0 0 0
1 1 0 1 1
0 0 0 1 0

All attributes are binary. T is the target attribute. Use ID3 to induce a decision tree from this
dataset. Show all of your calculations, intermediate and final trees.

Q4. Consider the following simple dataset. The target attribute is PlayTennis, which has
values Yes or No for different Saturday mornings, depending on several other attributes of
those mornings.
Day Outlook Temperature Humidity Wind PlayTennis
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rainy Mild High Weak Yes
5 Rainy Cool Normal Weak Yes
6 Rainy Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rainy Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rainy Mild High Strong No

I. . k-Nearest Neighbors.

a) Design a simple distance metric for this space. Briefly justify your choice.
b) Perform 7-fold cross-validation with k=3 for this dataset. Show your work (there
should be 7 iterations with 7 corresponding predictions/accuracies on the held-out
folds, and a final result).

II. Naive Bayes.

a) Using the full dataset, induce the corresponding NB model. Show your result in the
form of probability tables as we did in class.
b) What would your model predict for the following two Saturday mornings: <Oct 1,
Overcast, Cool, High, Weak>, and <May 26, Sunny, Hot, Normal, Strong>? Show
your work.
Q5. Consider the following simple dataset.

A B T
1 0 1
0 1 0

T is the (binary) target attribute. Consider a 2-layer feedforward neural network with two
input units (one for A and one for B), a single hidden unit, and one output unit (for T).
Initialize all weights (there should be 3 of them) to 0.1. Assume a learning rate of 0.3. Using
incremental weight updates, show the values of the weights after each of the first three
training iterations. Show your results in the form of a table as we did in class.

Q6. Assume that the units of a neural network are modified so they compute the squashing
function tanh (instead of the sigmoid function). What is the resulting backpropagation weight
update rule for the output layer? (Note, tanh’(x) = 1 – tanh2(x)).

You might also like