You are on page 1of 4

Dummy Regression

January 30, 2021

[18]: import pandas as pd


import numpy as np
import seaborn as sns
import math
from scipy import stats
import scipy
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib as mpl
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import statsmodels.api as s

[19]: dumy=pd.read_excel(r'C:\Users\Nazakat ali\Desktop\python\Dumy.xlsx')


dumy

[19]: x1 x2 subjects y
0 3 3 Math 50
1 4 4 Math 42
2 6 5 Stat 60
3 3 9 Eco 24
4 6 5 Stat 56
5 2 1 Math 43
6 6 4 Eco 57
7 2 6 Eco 34
8 1 1 Stat 56
9 4 3 Math 43
10 1 6 Stat 35
11 3 3 Math 67

1 Add dummy in data drame of subjects


[20]: dumies=pd.get_dummies(dumy['subjects'])
dumies

[20]: Eco Math Stat


0 0 1 0

1
1 0 1 0
2 0 0 1
3 1 0 0
4 0 0 1
5 0 1 0
6 1 0 0
7 1 0 0
8 0 0 1
9 0 1 0
10 0 0 1
11 0 1 0

[24]: dumy1=pd.concat([dumy, dumies], axis=1)


dumy1

[24]: x1 x2 subjects y Eco Math Stat


0 3 3 Math 50 0 1 0
1 4 4 Math 42 0 1 0
2 6 5 Stat 60 0 0 1
3 3 9 Eco 24 1 0 0
4 6 5 Stat 56 0 0 1
5 2 1 Math 43 0 1 0
6 6 4 Eco 57 1 0 0
7 2 6 Eco 34 1 0 0
8 1 1 Stat 56 0 0 1
9 4 3 Math 43 0 1 0
10 1 6 Stat 35 0 0 1
11 3 3 Math 67 0 1 0

[25]: dumy1.drop(['subjects'], inplace=True, axis=1)


dumy1

[25]: x1 x2 y Eco Math Stat


0 3 3 50 0 1 0
1 4 4 42 0 1 0
2 6 5 60 0 0 1
3 3 9 24 1 0 0
4 6 5 56 0 0 1
5 2 1 43 0 1 0
6 6 4 57 1 0 0
7 2 6 34 1 0 0
8 1 1 56 0 0 1
9 4 3 43 0 1 0
10 1 6 35 0 0 1
11 3 3 67 0 1 0

[41]: plt.scatter(dumy1['x1'], dumy['y'], color='g')

2
[41]: <matplotlib.collections.PathCollection at 0x2c0408da1f0>

2 OLS summary fiting reg


[36]: fit1=sm.OLS(dumy1['y'], s.add_constant(dumy1[['x1', 'x2', 'Eco', 'Stat']])).
,→fit()

print(fit1.summary())

OLS Regression Results


==============================================================================
Dep. Variable: y R-squared: 0.673
Model: OLS Adj. R-squared: 0.487
Method: Least Squares F-statistic: 3.607
Date: Sat, 30 Jan 2021 Prob (F-statistic): 0.0669
Time: 21:44:58 Log-Likelihood: -40.168
No. Observations: 12 AIC: 90.34
Df Residuals: 7 BIC: 92.76
Df Model: 4
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 48.2204 7.367 6.545 0.000 30.800 65.641
x1 3.6008 1.502 2.398 0.048 0.049 7.152
x2 -3.8367 1.598 -2.401 0.047 -7.615 -0.058

3
Eco 1.2094 8.643 0.140 0.893 -19.227 21.646
Stat 7.2330 6.467 1.118 0.300 -8.060 22.526
==============================================================================
Omnibus: 13.930 Durbin-Watson: 1.451
Prob(Omnibus): 0.001 Jarque-Bera (JB): 8.700
Skew: 1.562 Prob(JB): 0.0129
Kurtosis: 5.763 Cond. No. 21.9
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
C:\Users\Nazakat ali\anaconda3\lib\site-packages\scipy\stats\stats.py:1603:
UserWarning: kurtosistest only valid for n>=20 … continuing anyway, n=12
warnings.warn("kurtosistest only valid for n>=20 … continuing "

[37]: residual=fit1.resid
probplt=s.ProbPlot(residual, stats.norm, fit=True)
fig=probplt.qqplot(line='45')
h=plt.title('qqplot of residuals')

You might also like