Professional Documents
Culture Documents
8
DS100-1
SUPERVISED MACHINE LEARNING IN PYTHON
APPLIED DATA SCIENCE
Name:
Write codes in Jupyter notebook as required by the problems. Copy both code and output as screen grab or screen shot and paste
them here.
1 Import house-votes-84 (edited).csv. Write codes necessary to import and examine this dataset. Which of the
following statements is not true? The target variable in this DataFrame is ‘party’.
import pandas as pd
housevotes = pd.read_csv('house-votes-84 (edited).csv')
print (housevotes.shape)
housevotes.info()
print (housevotes.head())
print (housevotes.keys())
Page 1 of 5
2 Perform graphical exploratory data analysis on the house votes dataset. Use Seaborn’s countplot to visualize the votes to
the satellite testing bill, grouped by party. Include the following line before the show function: plt.xticks([0,1],
[‘No’, ‘Yes’]). Do the same for the missile bill. Of the two bills, which one/s do Democrats vote resoundingly in
favor of, compared to Republicans?
A. Missile bill
B. Satellite bill
C. Both Missile and Satellite bills
D. Neither Missile nor Satellite bill
Code and Output
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv('house-votes-84 (edited).csv')
plt.figure()
plt.subplot(2,1,1)
sns.countplot(x = 'sat_test', data = df, hue = 'party')
plt.xticks([0,1], ['No', 'Yes'])
plt.title('Satellite Bill')
plt.legend (loc = 'upper left')
plt.subplot(2,1,2)
sns.countplot(x = 'mx_missile', data = df, hue='party')
plt.xticks([0,1], ['No', 'Yes'])
plt.title('Missile Bill')
plt.legend(loc= 'upper right')
plt.show()
3 Predict the party affiliation of the House member whose votes have been recorded in the file named x_new.csv. Write
the code here to achieve the following output:
Page 2 of 5
Party Prediction: [‘democrat’/’republican’]
Code and Output
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
votes = pd.read_csv('house-votes-84 (edited).csv')
x_new=pd.read_csv('x_new.csv')
y= votes['party']. values
x = votes.drop('party', axis=1).values
knn = KNeighborsClassifier (n_neighbors=16)
knn.fit(x,y)
prediction = knn.predict(x_new)
print("Party Prediction: {}".format(prediction))
4 Use train_test_split from sklearn on your House votes data. Use 70% of the data for training and the rest for
testing. Add the following arguments to train_test_split: random_state = 21, stratify = y. Print out the
predictions for the test set and the model score.
Code and Output
Page 3 of 5
x= votes.drop('party', axis=1).values
x_train, x_test, y_train, y_test=train_test_split(x,y, test_size=0.7, random_state=21, stratify=y)
knn=KNeighborsClassifier (n_neighbors=16)
knn.fit(x_train, y_train)
print(knn.predict(x_test))
print(knn.score (x_test,y_test))
Page 4 of 5
5 Import the gapminder file. Perform regression on the data (life expectancy as a function of fertility). Prepare a plot showing
the data points (in blue) and the linear model (in red). Print out the regression score.
Code and Output
import pandas as pd
gapminder = pd.read_csv('gapminder_p06.csv')
x= gapminder['fertility'].values
y = gapminder['life_exp'].values
print("x before reshaping: {}".format(x.shape))
print("y before reshaping: {}".format(y.shape))
x = x.reshape(-1,1)
y= y.reshape(-1,1)
print("x after reshaping: {}".format(x. shape))
print("y after reshaping: {}".format(y.shape))
Page 5 of 5