You are on page 1of 8

{

"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Welcome to the second Hands On linear regression. \n",
"\n",
"In this exercise , you will try out multi linaer regression using stats model
that you have learnt in the course. We have created this Python Notebook with all
the necessary things needed for completing this exercise. \n",
"\n",
"To run the code in each cell click on the cell and press **shift + enter** "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Run the below cell to load the boston dataset**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \\\
n",
"0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 \n",
"1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 \n",
"2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 \n",
"3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 \n",
"4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 \n",
"\n",
" PTRATIO B LSTAT target \n",
"0 15.3 396.90 4.98 24.0 \n",
"1 17.8 396.90 9.14 21.6 \n",
"2 17.8 392.83 4.03 34.7 \n",
"3 18.7 394.63 2.94 33.4 \n",
"4 18.7 396.90 5.33 36.2 \n"
]
}
],
"source": [
"from sklearn.datasets import load_boston\n",
"import pandas as pd\n",
"boston = load_boston()\n",
"dataset = pd.DataFrame(data=boston.data, columns=boston.feature_names)\n",
"dataset['target'] = boston.target\n",
"print(dataset.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Follow the steps in sequence to extract features and target**\n",
"\n",
"- Create a dataframe named as 'X' such that it includes all the feature
columns and drop the target column.\n",
"- Assign the 'target' columns to variiable Y"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"X = dataset.drop(['target'], axis=1)\n",
"Y = dataset['target']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Follow the steps in sequence to find correlation value**\n",
"- Now the dataframe X has just the features that influence the target\n",
"- Print the correlation matrix for dataframe X. Use '.corr()' function to
compute correlation matrix \n",
"- From the correlation matrix note down the correlation value between 'CRIM'
and 'PTRATIO' and assign it to variable 'corr_value' by rounding off to 2 decimal
places."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"corr_value = round(X[['CRIM', 'PTRATIO']].corr()['CRIM']['PTRATIO'], 2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Follow the steps in sequence to initialise and fit the model**\n",
"- Import stats model as sm\n",
"- Initalize the OLS model with target Y and dataframe X(features)\n",
"- Fit the model and print the summary"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table class=\"simpletable\">\n",
"<caption>OLS Regression Results</caption>\n",
"<tr>\n",
" <th>Dep. Variable:</th> <td>target</td> <th> R-squared
(uncentered):</th> <td> 0.959</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Model:</th> <td>OLS</td> <th> Adj. R-squared
(uncentered):</th> <td> 0.958</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Method:</th> <td>Least Squares</td> <th> F-statistic:
</th> <td> 891.1</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Date:</th> <td>Sun, 28 Aug 2022</td> <th> Prob (F-
statistic):</th> <td> 0.00</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Time:</th> <td>00:42:40</td> <th> Log-
Likelihood: </th> <td> -1523.8</td>\n",
"</tr>\n",
"<tr>\n",
" <th>No. Observations:</th> <td> 506</td> <th> AIC:
</th> <td> 3074.</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Df Residuals:</th> <td> 493</td> <th> BIC:
</th> <td> 3129.</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Df Model:</th> <td> 13</td> <th>
</th> <td> </td> \n",
"</tr>\n",
"<tr>\n",
" <th>Covariance Type:</th> <td>nonrobust</td> <th>
</th> <td> </td> \n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <td></td> <th>coef</th> <th>std err</th> <th>t</th>
<th>P>|t|</th> <th>[0.025</th> <th>0.975]</th> \n",
"</tr>\n",
"<tr>\n",
" <th>CRIM</th> <td> -0.0916</td> <td> 0.034</td> <td>
-2.675</td> <td> 0.008</td> <td> -0.159</td> <td> -0.024</td>\n",
"</tr>\n",
"<tr>\n",
" <th>ZN</th> <td> 0.0487</td> <td> 0.014</td> <td>
3.379</td> <td> 0.001</td> <td> 0.020</td> <td> 0.077</td>\n",
"</tr>\n",
"<tr>\n",
" <th>INDUS</th> <td> -0.0038</td> <td> 0.064</td> <td>
-0.059</td> <td> 0.953</td> <td> -0.130</td> <td> 0.123</td>\n",
"</tr>\n",
"<tr>\n",
" <th>CHAS</th> <td> 2.8564</td> <td> 0.904</td> <td>
3.160</td> <td> 0.002</td> <td> 1.080</td> <td> 4.633</td>\n",
"</tr>\n",
"<tr>\n",
" <th>NOX</th> <td> -2.8808</td> <td> 3.359</td> <td>
-0.858</td> <td> 0.392</td> <td> -9.481</td> <td> 3.720</td>\n",
"</tr>\n",
"<tr>\n",
" <th>RM</th> <td> 5.9252</td> <td> 0.309</td> <td>
19.168</td> <td> 0.000</td> <td> 5.318</td> <td> 6.533</td>\n",
"</tr>\n",
"<tr>\n",
" <th>AGE</th> <td> -0.0072</td> <td> 0.014</td> <td>
-0.523</td> <td> 0.601</td> <td> -0.034</td> <td> 0.020</td>\n",
"</tr>\n",
"<tr>\n",
" <th>DIS</th> <td> -0.9680</td> <td> 0.196</td> <td>
-4.947</td> <td> 0.000</td> <td> -1.352</td> <td> -0.584</td>\n",
"</tr>\n",
"<tr>\n",
" <th>RAD</th> <td> 0.1704</td> <td> 0.067</td> <td>
2.554</td> <td> 0.011</td> <td> 0.039</td> <td> 0.302</td>\n",
"</tr>\n",
"<tr>\n",
" <th>TAX</th> <td> -0.0094</td> <td> 0.004</td> <td>
-2.393</td> <td> 0.017</td> <td> -0.017</td> <td> -0.002</td>\n",
"</tr>\n",
"<tr>\n",
" <th>PTRATIO</th> <td> -0.3924</td> <td> 0.110</td> <td>
-3.571</td> <td> 0.000</td> <td> -0.608</td> <td> -0.177</td>\n",
"</tr>\n",
"<tr>\n",
" <th>B</th> <td> 0.0150</td> <td> 0.003</td> <td>
5.561</td> <td> 0.000</td> <td> 0.010</td> <td> 0.020</td>\n",
"</tr>\n",
"<tr>\n",
" <th>LSTAT</th> <td> -0.4170</td> <td> 0.051</td> <td>
-8.214</td> <td> 0.000</td> <td> -0.517</td> <td> -0.317</td>\n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <th>Omnibus:</th> <td>204.050</td> <th> Durbin-Watson: </th>
<td> 0.999</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Prob(Omnibus):</th> <td> 0.000</td> <th> Jarque-Bera (JB): </th>
<td>1372.527</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Skew:</th> <td> 1.609</td> <th> Prob(JB): </th>
<td>9.11e-299</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Kurtosis:</th> <td>10.399</td> <th> Cond. No. </th>
<td>8.50e+03</td> \n",
"</tr>\n",
"</table><br/><br/>Warnings:<br/>[1] Standard Errors assume that the
covariance matrix of the errors is correctly specified.<br/>[2] The condition
number is large, 8.5e+03. This might indicate that there are<br/>strong
multicollinearity or other numerical problems."
],
"text/plain": [
"<class 'statsmodels.iolib.summary.Summary'>\n",
"\"\"\"\n",
" OLS Regression Results
\n",

"==================================================================================
=====\n",
"Dep. Variable: target R-squared (uncentered):
0.959\n",
"Model: OLS Adj. R-squared (uncentered):
0.958\n",
"Method: Least Squares F-statistic:
891.1\n",
"Date: Sun, 28 Aug 2022 Prob (F-statistic):
0.00\n",
"Time: 00:42:40 Log-Likelihood:
-1523.8\n",
"No. Observations: 506 AIC:
3074.\n",
"Df Residuals: 493 BIC:
3129.\n",
"Df Model: 13
\n",
"Covariance Type: nonrobust
\n",

"==============================================================================\n",
" coef std err t P>|t| [0.025
0.975]\n",

"------------------------------------------------------------------------------\n",
"CRIM -0.0916 0.034 -2.675 0.008 -0.159 -
0.024\n",
"ZN 0.0487 0.014 3.379 0.001 0.020
0.077\n",
"INDUS -0.0038 0.064 -0.059 0.953 -0.130
0.123\n",
"CHAS 2.8564 0.904 3.160 0.002 1.080
4.633\n",
"NOX -2.8808 3.359 -0.858 0.392 -9.481
3.720\n",
"RM 5.9252 0.309 19.168 0.000 5.318
6.533\n",
"AGE -0.0072 0.014 -0.523 0.601 -0.034
0.020\n",
"DIS -0.9680 0.196 -4.947 0.000 -1.352 -
0.584\n",
"RAD 0.1704 0.067 2.554 0.011 0.039
0.302\n",
"TAX -0.0094 0.004 -2.393 0.017 -0.017 -
0.002\n",
"PTRATIO -0.3924 0.110 -3.571 0.000 -0.608 -
0.177\n",
"B 0.0150 0.003 5.561 0.000 0.010
0.020\n",
"LSTAT -0.4170 0.051 -8.214 0.000 -0.517 -
0.317\n",
"==============================================================================\n",
"Omnibus: 204.050 Durbin-Watson:
0.999\n",
"Prob(Omnibus): 0.000 Jarque-Bera (JB):
1372.527\n",
"Skew: 1.609 Prob(JB): 9.11e-
299\n",
"Kurtosis: 10.399 Cond. No.
8.50e+03\n",

"==============================================================================\n",
"\n",
"Warnings:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.\n",
"[2] The condition number is large, 8.5e+03. This might indicate that there
are\n",
"strong multicollinearity or other numerical problems.\n",
"\"\"\""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import statsmodels.api as sm\n",
"fitted_model = sm.OLS(Y, X).fit()\n",
"fitted_model.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Find the r_squared value**\n",
"- From the summary report note down R squared value and assign it to variable
'r_squared' by rounding off to 2 decimal places."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"###Start code here\n",
"r_squared = 0.96\n",
"###End code(approx 1 line)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Run the below cell without modifying to save your answers\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"246c0903b5a64b2a854ec1e7865f174f\n",
"e93bb0ef149f78aeae0eab58c5a28758\n"
]
}
],
"source": [
"import hashlib\n",
"import pickle\n",
"def gethex(ovalue):\n",
" hexresult=hashlib.md5(str(ovalue).encode())\n",
" return hexresult.hexdigest()\n",
"def pickle_ans1(value):\n",
" hexresult=gethex(value)\n",
" with open('ans/output1.pkl', 'wb') as file:\n",
" hexresult=gethex(value)\n",
" print(hexresult)\n",
" pickle.dump(hexresult,file)\n",
"def pickle_ans2(value):\n",
" hexresult=gethex(value)\n",
" with open('ans/output2.pkl', 'wb') as file:\n",
" hexresult=gethex(value)\n",
" print(hexresult)\n",
" pickle.dump(hexresult,file)\n",
"pickle_ans1(corr_value)\n",
"pickle_ans2(r_squared)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

You might also like