Professional Documents
Culture Documents
np.random.seed(0)
import mltools as ml
import matplotlib.pyplot as plt # use matplotlib for plotting with inline pl
ots
%matplotlib inline
(148, 5)
m
er as
In [4]: Y = iris[:,-1]
co
X = iris[:,:-1]
eH w
m, n = X.shape
o.
rs e
For the histogram we will crate a plot with 4 subplots and plot each one in the loop.
ou urc
In [5]: fig, ax = plt.subplots(1, 4, figsize=(12, 3))
o
for i in range(n):
ax[i].hist(X[:,i], bins=20)
aC s
vi y re
plt.show()
ed d
ar stu
sh is
Th
Plotting the different features of the iris data set using the code from the discussion.
https://www.coursehero.com/file/25892324/HW1s-finalpdf/
In [7]: fig, ax = plt.subplots(1, 3, figsize=(12, 3))
colors = ['b','g','r']
# Looping over the features starting from the second ond and plotting each one
with
# the first feature.
for f in range(1, n):
for i, c in enumerate(np.unique(Y)):
mask = np.where(Y == c)[0]
ax[f - 1].scatter(X[mask, 0], X[mask, f], s=80, c=colors[i], alpha=0.7
5)
plt.show()
m
er as
co
eH w
o.
rs e
ou urc
In [8]: # Alternative, "simpler" version using provided tools
fig,ax = plt.subplots(1, 3, figsize=(12, 3))
o
for i in range(1,n):
aC s
ml.plotClassify2D(None, X[:,[0,i]],Y,axis=ax[i-1])
vi y re
plt.show()
ed d
ar stu
sh is
Th
https://www.coursehero.com/file/25892324/HW1s-finalpdf/
Now, let's plot the k-nearest neighbor decision boundary using only the first two features:
plt.show()
m
er as
co
eH w
o.
rs e
ou urc
Let's compute error rates at various values of K, and visualize the decision functions as well:
o
aC s
vi y re
l):", knn.err(Xva[:,0:2],Yva))
ml.plotClassify2D(knn, Xtr[:,0:2],Ytr, axis=ax[i])
sh is
plt.show()
Th
https://www.coursehero.com/file/25892324/HW1s-finalpdf/
Now, compute the error rate at more values of K, and plot both the training error rate and validation error rates:
m
er as
ax.legend()
ax.set_xlim(0, 200)
co
ax.set_ylim(0, 1)
eH w
o.
print("Training and validation error as a function of K:")
rs e
ou urc
plt.show()
https://www.coursehero.com/file/25892324/HW1s-finalpdf/
Based on this plot, k = 50 had the lowest validation error, so I would most likely choose that. You can also see
evidence of overfitting (k = 1..10; low training error but high validation error) and of underfitting (k = 100 or more;
similar, high training and validation errors).
Your plots may be a bit different, and end up with slightly different values of k , but most likely the trend is similar -
- at low k , training error is much lower than validation error, suggesting overfitting; at high k , they are similar but
high, suggesting underfitting.
m
er as
co
eH w
o.
rs e
ou urc
o
aC s
vi y re
ed d
ar stu
sh is
Th
https://www.coursehero.com/file/25892324/HW1s-finalpdf/
In [14]: fig,ax = plt.subplots(1, 1, figsize=(8, 6))
ax.legend()
ax.set_xlim(0, 200)
m
ax.set_ylim(0, 1)
er as
co
print("Training and validation error as a function of K:")
eH w
plt.show()
o.
1 0.0 0.0540540540541
rs e
2 0.027027027027 0.027027027027
ou urc
5 0.018018018018 0.027027027027
10 0.018018018018 0.027027027027
50 0.117117117117 0.0540540540541
o
https://www.coursehero.com/file/25892324/HW1s-finalpdf/
With all the features in the dataset, I would most likely choose k = 10.
In [15]: #(a)
p_y = 4.0/10; # p(y) = 4/10
# p(xi | y=-1)
p_x1_y0 = 3.0/6;
p_x2_y0 = 5.0/6;
p_x3_y0 = 4.0/6;
p_x4_y0 = 5.0/6;
p_x5_y0 = 2.0/6;
m
# p(xi | y=+1)
er as
p_x1_y1 = 3.0/4;
co
p_x2_y1 = 0.0/4;
eH w
p_x3_y1 = 3.0/4;
p_x4_y1 = 2.0/4;
o.
p_x5_y1 = 1.0/4;
rs e
ou urc
o
aC s
vi y re
ed d
ar stu
sh is
Th
https://www.coursehero.com/file/25892324/HW1s-finalpdf/
In [16]: # (b)
f_y1_00000 = p_y*(1-p_x1_y1)*(1-p_x2_y1)*(1-p_x3_y1)*(1-p_x4_y1)*(1-p_x5_y1)
print ("f_y1_00000 = ",f_y1_00000)
f_y0_00000 = (1-p_y)*(1-p_x1_y0)*(1-p_x2_y0)*(1-p_x3_y0)*(1-p_x4_y0)*(1-p_x5_y
0)
print ("f_y0_00000 = ",f_y0_00000)
print ("\n\n")
f_y1_11010 = p_y*(p_x1_y1)*(p_x2_y1)*(1-p_x3_y1)*(p_x4_y1)*(1-p_x5_y1)
print ("f_y1_11010 = ",f_y1_11010)
m
f_y0_11010 = (1-p_y)*(p_x1_y0)*(p_x2_y0)*(1-p_x3_y0)*(p_x4_y0)*(1-p_x5_y0)
er as
print ("f_y0_11010 = ",f_y0_11010)
co
eH w
if (f_y1_11010 > f_y0_11010):
print ("Predict class +1")
o.
else:
rs e
print ("Predict class -1")
ou urc
print ("\n")
o
f_y1_00000 = 0.009375000000000001
f_y0_00000 = 0.0018518518518518515
aC s
Predict class +1
vi y re
f_y1_11010 = 0.0
ed d
f_y0_11010 = 0.046296296296296315
ar stu
Predict class -1
sh is
In [17]: # (c)
Th
# p(y1|11010) =
print ("p(y=1|11010) = ", f_y1_11010 / (f_y1_11010 + f_y0_11010))
print ("\n")
# For the other pattern (not required), p(y1|00000) =
print ("p(y=1|00000) =", f_y1_00000 / (f_y1_00000 + f_y0_00000))
p(y=1|11010) = 0.0
p(y=1|00000) = 0.8350515463917526
https://www.coursehero.com/file/25892324/HW1s-finalpdf/
(d) A Bayes classifier using a joint distribution model for p(x|y = c) would have 25 − 1 = 31 degrees of
freedom (independent probabilities) to estimate; here we have only 6 and 4 data points respectively. So such a
model would be extremely unlikely to generalize well to new data.
(e) We don't need to re-train the model. Just ignore x1 and use parameters of the rest of the features. This is
because probabilities have been independently computed for each feature, conditioned on the class, so rest of
them will reamin the same.
m
er as
co
eH w
o.
rs e
ou urc
o
aC s
vi y re
ed d
ar stu
sh is
Th
https://www.coursehero.com/file/25892324/HW1s-finalpdf/