You are on page 1of 2

MENOUFIA UNIVERSITY Name:……….………………..

Faculty of Computer & information Section:……….………………


Subject: Data Mining, 4th Year-1st Term Examiner: Dr. Hayam Mousa
Exam: Midterm Data: 19/11/2019 Marks:15 Time: 60 Min Pages:2 pages
Question 1
GPA Studied Passed
A) We will use the dataset below to learn a decision tree which predicts
if people pass machine learning (Yes or No), based on their previous L F F
GPA (High, Medium, or Low) and whether or not they studied.
According to this dataset L T T
a) What is the entropy H(Passed)? M F F
b) What is the entropy H(Passed | GPA)?
c) What is the entropy H(Passed | Studied)? M T T
d) Draw the full decision tree that would be learned for this dataset.
e) What causes over-fitting in a decision classification tree? Does over-fitting increase with H F T
number of training examples, explain your answer?
H T T

B) Consider you have the following dataset calculate the best splitting position for the age attribute.
ID Age Car Class
Type label
0 23 Family High
1 17 Sport High
2 43 Sport High
3 68 Family Low
4 32 Truck Low
5 20 Famiy High
Question 2:
A) Consider the three perceptrons below, which respectively correspond to classes A, B,
and C. For a given input x, the perceptron with the highest value of
∑ 𝒘𝒊 𝒙𝒊
𝒊
is the prediction of the group.
For the following test set, find the prediction of this set of perceptrons on each example,
and create a corresponding confusion matrix. Given the actual class label is indicated in the Table
below.

Example X1 X2 X3 Label
1 0 1 1 A
2 1 0 1 C
3 0 0 0 C
4 1 1 1 B

B) If vector x=(0,1,0,1) and y=(1,0,1,0) Calculate the Cosine, Correlation, Euclidean and jaccard
similarity and distance measures.

You might also like