Professional Documents
Culture Documents
This function just merges the OECD's life satisfaction data and the IMF's GDP per capita data. It's a bit too long and boring and it's not
specific to Machine Learning, which is why I left it out of the book.
The code in the book expects the data files to be located in the current directory. I just tweaked it here to fetch the files in datasets/lifesat.
import os
datapath = os.path.join("datasets", "lifesat", "")
Downloading oecd_bli_2015.csv
Downloading gdp_per_capita.csv
# Code example
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model
[[5.96242338]]
Note: you can ignore the rest of this notebook, it just generates
many of the figures in chapter 1.
np.random.seed(42)
Household
Dwellings Employees Household
Consultation net Long-term
Air Assault without Educational working Employment Homicide net
Indicator on rule- adjusted ... unemployment
pollution rate basic attainment very long rate rate financial
making disposable rate
facilities hours wealth
income
Country
Australia 13.0 2.1 10.5 1.1 76.0 14.02 72.0 0.8 31588.0 47657.0 ... 1.08
Austria 27.0 3.4 7.1 1.0 83.0 7.61 72.0 0.4 31173.0 49887.0 ... 1.19
2 rows × 24 columns
oecd_bli["Life satisfaction"].head()
Country
Australia 7.3
Austria 6.9
Belgium 6.9
Brazil 7.0
Canada 7.3
Name: Life satisfaction, dtype: float64
Country
Gross domestic product per capita, current U.S. See notes for: Gross domestic product,
Afghanistan Units 599.994 2013.0
prices dollars curren...
Gross domestic product per capita, current U.S. See notes for: Gross domestic product,
Albania Units 3995.383 2010.0
prices dollars curren...
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
full_country_stats.sort_values(by="GDP per capita", inplace=True)
full_country_stats
Time
Household devoted
Dwellings Employees Household
Consultation net to
Air Assault without Educational working Employment Homicide net
on rule- adjusted ... leisure
pollution rate basic attainment very long rate rate financial turno
making disposable and
facilities hours wealth
income personal
care
Country
Brazil 18.0 7.9 4.0 6.7 45.0 10.41 67.0 25.5 11664.0 6844.0 ... 14.97
Mexico 30.0 12.8 9.0 4.2 37.0 28.83 61.0 23.4 13085.0 9056.0 ... 13.89
Russia 15.0 3.8 2.5 15.1 94.0 0.16 69.0 12.8 19292.0 3412.0 ... 14.97
Turkey 35.0 5.0 5.5 12.7 34.0 40.86 50.0 1.2 14095.0 3251.0 ... 13.42
Hungary 15.0 3.6 7.9 4.8 82.0 3.19 58.0 1.3 15442.0 13277.0 ... 15.04
Poland 33.0 1.4 10.8 3.2 90.0 7.41 60.0 0.9 17852.0 10919.0 ... 14.20
Chile 46.0 6.9 2.0 9.4 57.0 15.42 62.0 4.4 14533.0 17733.0 ... 14.41
Slovak
13.0 3.0 6.6 0.6 92.0 7.02 60.0 1.2 17503.0 8663.0 ... 14.99
Republic
Czech
16.0 2.8 6.8 0.9 92.0 6.98 68.0 0.8 18404.0 17299.0 ... 14.98
Republic
Estonia 9.0 5.5 3.3 8.1 90.0 3.30 68.0 4.8 15167.0 7680.0 ... 14.90
Greece 27.0 3.7 6.5 0.7 68.0 6.16 49.0 1.6 18575.0 14579.0 ... 14.91
Portugal 18.0 5.7 6.5 0.9 38.0 9.62 61.0 1.1 20086.0 31245.0 ... 14.95
Slovenia 26.0 3.9 10.3 0.5 85.0 5.63 63.0 0.4 19326.0 18465.0 ... 14.62
Spain 24.0 4.2 7.3 0.1 55.0 5.89 56.0 0.6 22477.0 24774.0 ... 16.06
Korea 30.0 2.1 10.4 4.2 82.0 18.72 64.0 1.1 19510.0 29091.0 ... 14.63
Italy 21.0 4.7 5.0 1.1 57.0 3.66 56.0 0.7 25166.0 54987.0 ... 14.98
Japan 24.0 1.4 7.3 6.4 94.0 22.26 72.0 0.3 26111.0 86764.0 ... 14.93
Israel 21.0 6.4 2.5 3.7 85.0 16.03 67.0 2.3 22104.0 52933.0 ... 14.48
New
11.0 2.2 10.3 0.2 74.0 13.87 73.0 1.2 23815.0 28290.0 ... 14.87
Zealand
France 12.0 5.0 3.5 0.5 73.0 8.15 64.0 0.6 28799.0 48741.0 ... 15.33
Belgium 21.0 6.6 4.5 2.0 72.0 4.57 62.0 1.1 28307.0 83876.0 ... 15.71
Germany 16.0 3.6 4.5 0.1 86.0 5.25 73.0 0.5 31252.0 50394.0 ... 15.31
Finland 15.0 2.4 9.0 0.6 85.0 3.58 69.0 1.4 27927.0 18761.0 ... 14.89
Canada 15.0 1.3 10.5 0.2 89.0 3.94 72.0 1.5 29365.0 67913.0 ... 14.25
Netherlands 30.0 4.9 6.1 0.0 73.0 0.45 74.0 0.9 27888.0 77961.0 ... 15.44
Austria 27.0 3.4 7.1 1.0 83.0 7.61 72.0 0.4 31173.0 49887.0 ... 14.46
United
13.0 1.9 11.5 0.2 78.0 12.70 71.0 0.3 27029.0 60778.0 ... 14.83
Kingdom
Sweden 10.0 5.1 10.9 0.0 88.0 1.13 74.0 0.7 29185.0 60328.0 ... 15.11
Iceland 18.0 2.7 5.1 0.4 71.0 12.25 82.0 0.3 23965.0 43045.0 ... 14.61
Australia 13.0 2.1 10.5 1.1 76.0 14.02 72.0 0.8 31588.0 47657.0 ... 14.41
Ireland 13.0 2.6 9.0 0.2 75.0 4.20 60.0 0.8 23917.0 31580.0 ... 15.19
Denmark 15.0 3.9 7.0 0.9 78.0 2.03 73.0 0.3 26491.0 44488.0 ... 16.06
United
18.0 1.5 8.3 0.1 89.0 11.30 67.0 5.2 41355.0 145769.0 ... 14.27
States
Norway 16.0 3.3 8.1 0.3 82.0 2.82 75.0 0.6 33492.0 8797.0 ... 15.56
Switzerland 20.0 4.2 8.4 0.0 86.0 6.72 80.0 0.5 33491.0 108823.0 ... 14.98
Luxembourg 12.0 4.3 6.0 0.1 78.0 3.47 66.0 0.4 38951.0 61765.0 ... 15.12
36 rows × 30 columns
sample_data.loc[list(position_text.keys())]
Country
import numpy as np
(4.853052800266436, 4.911544589158484e-05)
22587.49
5.96244744318815
Country
(5.1+5.7+6.5)/3
5.766666666666667
# Code example
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model
[[5.96242338]]
missing_data
Country
position_text2 = {
"Brazil": (1000, 9.0),
"Mexico": (11000, 9.0),
"Chile": (25000, 9.0),
"Czech Republic": (35000, 9.0),
"Norway": (60000, 3),
"Switzerland": (72000, 3.0),
"Luxembourg": (90000, 3.0),
}
lin_reg_full = linear_model.LinearRegression()
Xfull = np.c_[full_country_stats["GDP per capita"]]
yfull = np.c_[full_country_stats["Life satisfaction"]]
lin_reg_full.fit(Xfull, yfull)
save_fig('representative_training_data_scatterplot')
plt.show()
Country
New Zealand 7.3
Sweden 7.2
Norway 7.4
Switzerland 7.5
Name: Life satisfaction, dtype: float64
Country
Gross domestic product per capita, current U.S. See notes for: Gross domestic product,
Botswana Units 6040.957 2008.0
prices dollars curren...
Gross domestic product per capita, current U.S. See notes for: Gross domestic product,
Kuwait Units 29363.027 2014.0
prices dollars curren...
Gross domestic product per capita, current U.S. See notes for: Gross domestic product,
Malawi Units 354.275 2011.0
prices dollars curren...
New Gross domestic product per capita, current U.S. See notes for: Gross domestic product,
Units 37044.891 2015.0
Zealand prices dollars curren...
Gross domestic product per capita, current U.S. See notes for: Gross domestic product,
Norway Units 74822.106 2015.0
prices dollars curren...
plt.figure(figsize=(8,3))
ridge = linear_model.Ridge(alpha=10**9.5)
Xsample = np.c_[sample_data["GDP per capita"]]
ysample = np.c_[sample_data["Life satisfaction"]]
ridge.fit(Xsample, ysample)
t0ridge, t1ridge = ridge.intercept_[0], ridge.coef_[0][0]
plt.plot(X, t0ridge + t1ridge * X, "b", label="Regularized linear model on partial data")
plt.legend(loc="lower right")
plt.axis([0, 110000, 0, 10])
plt.xlabel("GDP per capita (USD)")
save_fig('ridge_model_plot')
plt.show()
[[5.76666667]]