DIT W16 Regression

Statistics For Data Science
Topic: Simple Linear Regression
Dr. Fode Tounkara
May 26, 2022
Dr. Fode Tounkara Statistics For Data Science May 26, 2022 1 / 58
This week: Regression and Correlation
Overview
Scatterplot and Correlation
Least-Squares Line
Statistical Inference about the Regression Slope
Analysis of Variance (ANOVA)
Regression Diagnostic
Introduction
Regression is a method for studying the relationship between two or

more quantitative variables
Simple linear regression (SLR):
I One quantitative dependent variable
F response variable
F dependent variable
F Y
I One quantitative independent variable
F explanatory variable
F predictor variable
F X
SLR Examples
I predict salary from years of experience
I Relationship between child’s height and their head circumference
I predict car speed from gas mileage
Example: Height and Weight
Variables:
X: Height (measured in inch)

Y: Weight (measured in pound)
Data: SRS of 9 students
Height Weight
60 84
62 95
64 140
66 155
68 119
70 175
72 145
74 197
76 150
Scatterplot and Correlation
Relationship among quantitative variables
Suppose we have two quantitative variables X (say represents income) and

Y (say represents expense).
We want to check whether there is any relationship between them or not.
Let (x1 , x2 , ..., xn ) and (y1 , y2 , ..., yn ) are the two corresponding data
vectors.
I x1 is the income of the first individual and y1 is his/her expense.
A visual display of these two vectors can be done by drawing a Scatter
Plot.
Plotting yi ’s against xi ’s will give us the scatter plot where i = 1, 2, ..., n
Recall: Scatter plot suggests the direction and magnitude of correlation
between X and Y
Scatterplot
A scatterplot, also called a scattergraph or scatter diagram, is a plot
of the data points in a set. It plots data that takes two variables into
account at the same time. Here are some examples of scatterplots:
Example : Student’s Heights and their Weights
R-code:
plot(Weight~Height, xlab="Height", pch=18, cex=3,
cex.lab=1.8, col="blue")
R-output:
200
180
160
Weight
140
120
100
80
60 65 70 75
Height
Pearson correlation coefficient
Source: Dr.
http://www.pythagorasandthat.co.uk/scatter-
Fode Tounkara graphs
Statistics For Data Science May 26, 2022 9 / 58
Think of a hypothetical line that goes through the points.

Direction of the line:
I the line is going upward =⇒ the correlation is positive.
I the line is going downward =⇒ then the correlation is negative.
Closeness of the points to the line suggests the strength of the correlation
I points are closely clustered around the line =⇒ strong correlation
I points are not so close to the line =⇒ moderate/weak correlation
If the points look totally random =⇒ No relationship between X and Y
Correlation coefficient, r measures the linear relation ship between two

variables.
It’s a unit free number which ranges from -1 to 1.
r = −1 =⇒ Perfect Negative Correlation (All the points are exactly on a
downward line)
r = 1 =⇒ Perfect Positive Correlation (All the points are exactly on a
upward line)
r = 0 =⇒ Zero correlation.
Correlation coefficient calculated from a sample,

P
(xi − x̄)(yi − ȳ)
r = pP i P
2 2
i (xi − x̄) i (yi − ȳ)
Example 1: Height and Weight
R-code:
cor(Height,Weight, method="pearson")
R-output:
[1] 0.7586069
Interpretation: There is strong positive relationship between Student’s Height

and their Weights.
Least square regression
Least square regression (1)
Let y = b1 + b2 x is the equation of the hypothetical line that we thought is

going throw the points.
(yi − b1 − b2 xi ) is the deviation of yi from the line.
Source: https://medium.com/statistical- guess/least- square- regression- 34aaef3f76ec
Least square regression (2)
Least square regression is the technique of finding the line (in other words,
finding b1 and b2 ) that minimizes sum of the squared deviations,
n
X
(yi − b1 − b2 xi )2
i=1
Differentiating this expression with respect to b1 and b2 and equating to

zero gives us:
b1 = ȳ − b2 x̄
Pn Pn
i=1 (xi − x̄)(yi − ȳ) xi yi − nx̄ȳ
b2 = Pn 2
= Pi=1
n 2 2
i=1 (xi − x̄) i=1 xi − nx̄
Example:
x y (x − x̄) (x − x̄)2 (y − ȳ) (y − ȳ)2 (x − x̄)(y − ȳ)

3.9 8.9 2.9 8.41 5.51 30.360 15.979
2.6 7.1 1.6 2.56 3.71 13.764 5.936
2.4 4.6 1.4 1.96 1.21 1.464 1.694
4.1 10.7 3.1 9.61 7.31 53.436 22.661
−0.2 1.0 −1.2 1.44 −2.39 5.712 2.868
5.4 12.6 4.4 19.36 9.21 84.824 40.524
0.6 3.3 −0.4 0.16 −0.09 0.008 0.036
−5.6 −10.4 −6.6 43.56 −13.79 190.164 91.014
−1.1 −2.3 −2.1 4.41 −5.69 32.376 11.949
−2.1 −1.6 −3.1 P 9.61 −4.99 P 24.900 P 15.469
x̄ = 1 ȳ = 3.39 - = 101.08 - = 437.009 = 208.13
Therefore,
b2 = 208.13
101.08 = 2.059062 ≈ 2.059 and
b1 = 3.39 − 2.059062 ∗ 1 = 1.330938 ≈ 1.331
The least square regression line is: y = 1.331 + 2.059x
Some comments:
Least square regression doesn’t require any distributional assumption.

It is more like how to fit a linear regression line if we have all have
population level data.
I found this page online which explains the concept of least square
interactively. https://setosa.io/ev/ordinary-least-squares-regression/.
One the second graph of this page, try changing the intercept or slope
value and see what happens graphically.
Least square estimates in R

R-code:
Height=c(60, 62, 64, 66, 68, 70, 72, 74, 76)
Weight=c(84, 95, 140, 155, 119, 175, 145, 197, 150)
model1=lm(Weight~Height)
R-output:
(Intercept) Height
-200 5
The least-squares estimates are: b1 =-200 and b2 =5.
Thus, the least-squares equation is: y = −200 + 5x.
Scatterplot with the regression Line
Scatterplot of Height vs. Weight
200
180
160
Weight
140
120
100
80
60 65 70 75
Height
Linear Regression under Normal distribution
Linear Regression under Normal distribution
Let’s take a look at a hypothetical example (may look a bit unrealistic)
X represents income category (10K, 20K etc.)

I For the sake of this example, say income can only be $10K or $20K or . . .
and nothing in between.
For each category of X, we have expenses of 10000 different individuals.
In total we have 50,000 individuals in our population.
Some assumptions:
(Y |X = x) ∼ N (β1 + β2 x, σ 2 )
The conditional mean of Y is a linear function of X
The conditional variance of Y , (σ 2 ) is constant
yi ’s are independent
Parameter estimation
The conditional distribution of Y is assumed to be Normal.
E[Yi |Xi = xi ] = β1 + β2 xi
var[Yi |Xi = xi ] = σ 2
The likelihood function of (Y1 = y1 , Y2 = y2 , ..., Yn = yn ) will be a function
of (x1 , x2 , ..., xn ), β1 , β2 and σ 2 =⇒
n
1 X
L(β1 , β2 , σ 2 |data) = (2πσ 2 )−n/2 exp[− (yi − β1 − β2 xi )2 ]
2σ 2 i=1
For any given σ 2 , this likelihood will be maximized when

P n 2
i=1 (yi − β1 − β2 xi ) will be minimized.
Hence, the optimization becomes same as the least square regression

(which does not involve any Normality assumption)
Therefore,
Pn Pn
(x − x̄)(yi − ȳ) xi yi − nx̄ȳ
β̂2 = b2 = Pn i
i=1
2
= Pi=1
n 2 2
, and β̂1 = b1 = ȳ−b2 x̄.
i=1 (x i − x̄) i=1 xi − nx̄
Interpretation of regression parameters
β1 represents the expected value of Y when X = 0
β2 represents the change in expected value of Y for 1-unit increase in X
Using our example:

I β1 is the average expense when income = 0.
I β2 is the change in the average expense, when income increases by 1-unit
(the unit here is $1000).
Properties of estimators of regression
parameters (1)
If we had the population level data, we would have been able to calculate
the “true" intercept and slope
I Population parameters: β1 and β2
Instead we observe a sample and calculate estimates of those parameters.

I Estimates: b1 and b2
If we keep taking random samples and keep calculating the intercept and
the slope we will get different values(likely)
I Estimators: B1 and B2 ← these two are random variables.
Recall: µ is the parameter, X̄ is the estimator(random variable) and x̄ is
the value from our sample ie. estimate.
parameters (2)
We can re-write the equations of the estimators
B1 = Ȳ − B2 x̄
Pn
(x − x̄)(Yi − Ȳ )
Pn i
B2 = i=1 2
i=1 (xi − x̄)
Technical Note: Since we are dealing with bunch of conditional

distributions, Y is the random variable here and x is treated as fixed
constant.
B2 can be expressed as
Pn Pn
(x − x̄)(Yi − Ȳ ) (xi − x̄)Yi
B2 = i=1 Pn i 2
= Pi=1
n 2
i=1 (xi − x̄) i=1 (xi − x̄)
parameters (3)
B2 is a linear combinations of Yi ’s (which are bunch of Normal variables).
So is B1 .
Then both B1 and B2 follows Normal distribution.
B1 and B2 are unbiased estimators of β1 and β2
I E[B1 ] = β1
I E[B2 ] = β2
var[B1 ] and var[B2 ] can be calculated and will be a function of σ 2
σ2
var[B2 ] = Pn 2
i=1 (xi − x̄)
We can write,
σ2

B2 ∼ N β2 , Pn 2
i=1 (xi − x̄)
Confidence interval/t-test for the slope
An unbiased estimator of σ 2 is
n
1 X
S2 = (Yi − b1 − b2 xi )2
n − 2 i=1
Then using the definition of t-distribution,

B − β2
r 2 ∼ t(n−2)
Pn S 2 2 (xi −x̄)
i=1
Then 100 ∗ (1 − α)% level confidence interval for β2

B2 ± t(1−α/2),(df =n−2) ∗ SE(B2 )
r
S2
where SE(B2 ) = Pn
(xi −x̄)2
i=1
For the numeric example on page 4, b2 = 2.06 , SE(B2 ) = 0.1023,

t0.975,(8) = 2.306
95% CI for β2
Dr. Fode Tounkara 2.06 ± Statistics
2.306 ∗For 0.1023 = (1.824, 2.296)
Data Science May 26, 2022 28 / 58
Using R
x = c(3.9,2.6,2.4,4.1,-0.2,5.4,0.6,-5.6,-1.1, -2.1)
y = c(8.9,7.1,4.6,10.7,1.0,12.6,3.3,-10.4,-2.3,-1.6)
y_pred = 1.331 + 2.059*x
S2 = sum((y-y_pred)ˆ2)/8
SE_B2 = sqrt(S2/sum((x-mean(x))ˆ2))
SE_B2
## [1] 0.1022622
B2 −β2
Using T = q S2
as a test statistic which follows a t(df =n−2)
Pn
(xi −x̄)2
i=1
distribution we can conduct any test of hypothesis like
H0 : β2 = 0
(or some other value)

Using R
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6727 -0.3960 0.1155 0.6541 1.3931
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.3309 0.3408 3.905 0.00451 **
## x 2.0591 0.1023 20.135 3.86e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.028 on 8 degrees of freedom
## Multiple R-squared: 0.9806, Adjusted R-squared: 0.9782
## F-statistic: 405.4 on 1 and 8 DF, p-value: 3.864e-08
Using R
Significance test for the Slope
Hypothesis
H0 : β = 0 (x has no effect on Y) against HA : β 6= 0
The test statistic

b−0
T= ∼ tn−2 when H0 is true
se(b)
The rejection region: at significant level α,
we reject H0 if | tobs |>| t α2 |

The pvalue is =⇒ pvalue = P | T |>| tobs |
The test statistic and the pvalue can be calculated using R.
R-code:
# Data set 1: Height versus Head Circum of 11 childs
Height = c(27.75, 24.5, 25.5, 26, 25, 27.75, 26.5, 27, 26.75, 26.75, 27.5)
Circ = c(17.5, 17.1, 17.1,17.3, 16.9, 17.6, 17.3, 17.5, 17.3, 17.5, 17.5)
Dat1 = data.frame(Height,Circ)
model1=lm(Height~Circ)
coefficients(summary(model1))
R-output:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.4931689 0.72968486 17.121321 3.558749e-08
Height 0.1827324 0.02756116 6.630072 9.590386e-05
H0 : β = 0 (Height has no effect on Head Circumference ).

=⇒ b = 0.183; se(b) = 0.028; tobs = 6.63; pvalue = 9.590 × 10−05
At level α = 0.05, since pvalue = 9.590 × 10−05 < 0.05 we reject the null
hypothesis that height has no effect on head circumference.
Confidence Interval (C.I.)
To test whether there is no linear relationship between Y and x
(H0 : β = 0), another approach consists of constructing a C.I. for β.
A confidence interval for β is computed using the following formula:
b ± t∗ se(b) = b − t∗ se(b) ; b + t∗ se(b) , where

I t∗ is a t-critical value associated with the confidence level and determined

from a t-distribution with df=n-2;
I b is the least-squares estimate of the population slope, and se(b) is the
standard error of b.
Since the alternative is two sided (HA : β 6= 0) we reject H0 if the C.I.

doesn’t contain zero.
We can use the R function confint() to compute the lower and the upper
limits of the C.I. for β.
1 Find the 95% C.I. for the regression slope (β).

R-code:
confint(model1, conf.level=0.95)
R-output:
2.5 % 97.5 %
(Intercept) 10.8425070 14.1438307
Height 0.1203848 0.2450801
The 95% C.I. for the slpoe β is: (0.1204 ; 0.2451)
2 Test H0 : β = 0 at level 5 %.
Since the C.I. (0.1204 ; 0.2451) doesn’t comprise 0, we reject H0 .
ANOVA can be used to assess the SLR goodness of fit.

The ANOVA method decomposes the total variability in the response
variable (TSS) into two parts: variability that is explained by the
regression (RSS) and unexplained variability (ESS).
Pn 2
I TSS = i=1
(yi − y) with df=n-1
Pn
I RSS = i=1
(ybi − y)2 with df=1
Pn
I ESS = i=1
(yi − ybi )2 with df=n-2
We can show that: ESS=RSS + ESS
To have a good fitting model, ESS should be small and RSS close to
TSS.
RSS ESS
We can perform a F-test which compare MSReg = 1 and MSE = n−2 .
ANOVA TABLE
The null hypothesis is: H0 : The regression line dosn’t fit well the
data
The usual way of setting out this test is to use an ANOVA table
Source of Degrees of Sum of squares Mean squares F
Variation Freedom (df) (SS) (MS)
RSS
Regression 1 RSS 1
RSS
1
F = ESS
n−2
ESS
Residual n-2 ESS n−2
Total n-1 TSS
By comparing the F-statistic to the appropriate F-distribution, we (or

the computer) can get a P-value.
Using R, we can get all ANOVA table components.
Coefficient of Determination (or R2 )
The proportion of total variability that is explained by the regression

model is called the Coefficient of Determination (or R2 ):
I R2 = RSS
TSS
=1− ESS
TSS
I 1 ≤ R2 ≤ 1
I R2 near 1 suggests a good fit to the data.
I R2 = 1, ALL points fall exactly on the line (you can predict without error).
In SLR, squaring the correlation coefficient (r2 ) gives us the Coefficient of

determination R2 for the SLR model.
Coefficient of determination (R2 )
Pn
Total sum of square (TSS) = i=1 (yi − ȳ)2
TSS can be written as the sum of two terms:
Pn
I Error/Residual sum of square (ESS) = (yi − b1 − b2 xi )2
2
P n i=1 2
I Regression sum of square (RSS) = b2 i=1
(xi − x̄)
It can be shown that

T SS = RSS + ESS
Coefficient of determination (R2 ) is defined as

RSS
R2 =
T SS
R2 represents the proportion of variation in Y that can be explained by

the model.
For simple linear regression (only one X variable),
√
r2 = R2 =⇒ r = R2
Example
For our numeric example,
TSS = sum((y-mean(y))ˆ2)
TSS
## [1] 437.009
ESS = sum((y-y_pred)ˆ2)
ESS
## [1] 8.456399
RSS = TSS - ESS
R2= RSS/TSS
R2
## [1] 0.9806494
R2 = 0.9806 =⇒ 98.06% variation in Y can be explained by the model/by
√in X.
the variation
√
r = R2 = 0.9806 = 0.9903 (why should “r" be +ve)
So, there a is strong +ve relationship between X and Y
Example: Heights (x) and Weights (y) of student
Find SSreg, SSE and R2 for this example.
xi yi ŷi (ŷi − ȳ) (ŷi − ȳ)2 (yi − ŷi ) (yi − ŷi )2

60 84 100 -40 1600 -16 256
62 95 110 -30 900 -15 225
64 140 120 -20 400 20 400
66 155 130 -10 100 25 625
68 119 140 0 0 -21 441
70 175 150 10 100 25 625
72 145 160 20 400 -15 225
74 197 170 30 900 27 729
76 150 180 40 1600 -30 900
SSreg = 6000 SSE = 4426
SSreg=6000, SSE=4426 and SST=10 426

R2 = 10426
6000
= 0.58 =⇒ 58% of the variability in Weight is explained by Height.
The other 42% is “Error”.
ANOVA TABLE in R: Heights and Weight data
anova(model1)
Analysis of Variance Table
Response: Circ
Df Sum Sq Mean Sq F value Pr(>F)
Height 1 0.39993 0.39993 43.958 9.59e-05 ***
Residuals 9 0.08188 0.00910
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that the F-statistic is equal to 43.958 with a p-value of 9.59e-05. Thus,
there is strong evidence that β is not equal to zero, and hence the least-squares
line fits well the data.
SSreg
The coefficient of determination R2 = SSreg+RSS 0.39993
= 0.39993+0.0818 = 0.83,
indicating that 83% of the variability in the response is explained by the
explanatory variable.
Regression Diagnostic
Conditions for Linear Regression
The residuals ê = yi − ŷi can be used to check regression conditions.

Condition 1: A linear model is appropriate for the data; no curvature, no
influential points, no group with different patterns.
I Check: Scatterplot; Residuals plot against x , or ŷ
Condition 2: The variation in the error terms is constant.

I Check: Scatterplot; Residual plot
Condition 3: The errors are normally distributed.

I Check: Histogram, Boxplot, Normal quantile plot of residuals
independent errors (this essentially equates to independent

Let’s use Height and Circumference data in Example 1 to check
regression conditions.
Condtion 1: Linearity Assumption
Graph: scatter plot of Weight against height type:

27.5
27.0
26.5
Height
26.0
25.5
25.0
24.5
16.9 17.0 17.1 17.2 17.3 17.4 17.5 17.6
Circ
Condtion 2: Constant variance Assumption
Graph: scatter plot of the residual against height :

0.10
0.05
Residuals
0.00
−0.05
−0.10
−0.15
16.9 17.0 17.1 17.2 17.3 17.4 17.5 17.6
Circ
Condtion 3: Normality assumption
Graph: QQ-plot of residuals.

qqnorm(resid, pch=18, cex=3, lwd=3, cex.lab=1.5, col="blue")
qqline(resid, col="red")
Normal Q−Q Plot
0.10
0.05
Sample Quantiles
0.00
−0.05
−0.10
−0.15
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
Theoretical Quantiles
Practice Questions
Consider the following Regression output
Call:
lm(formula = Circ ~ Height)
Residuals:
Min 1Q Median 3Q Max
-0.16148 -0.05842 -0.01831 0.06442 0.12989
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.49317 0.72968 17.12 3.56e-08 ***
Height 0.18273 0.02756 6.63 9.59e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09538 on 9 degrees of freedom

Multiple R-squared: 0.8301, Adjusted R-squared: 0.8112
F-statistic: 43.96 on 1 and 9 DF, p-value: 9.59e-05
Questions:
Using the above output, answer the following question

1 Find the correlation coefficient (r). Comment on the strength and direction
of the linear relationship between the variables.
2 What is the estimated intercept and slope of the regression line? Write
down the Least squares Line equation.
3 Write in words the interpretation of the slope.
4 Test the hypothesis that there is no linear relationship between the
student’s height and their weight. What is the p-value?
5 Find and interpret the value of the coefficient of determination.
What have we Learn?
Know how to use a scatterplot to analyse a relationship between two

quantitative variables
Know how to use a regression line to describe a linear relationship between
an explanatory variable and a response variable.
Understand the meaning of the slope of the least squares regression line.
Know how to perform inference about the regression slope.
Understand how to use an ANOVA table to assess a regression goodness of
fit.
What the coefficient of determination is, how to compute it.
Understand how to use R to fit the regression line, perform t-test and
F-test, and check for regression assumptions.
SLR in R and Python
Example 1:Height and Weight of 9 students
Variables
y: Height
x: Weight
Data in R
Height=c(60, 62, 64, 66, 68, 70, 72, 74, 76)
Weight=c(84, 95, 140, 155, 119, 175, 145, 197, 150)
Data in Python
Height=[60, 62, 64, 66, 68, 70, 72, 74, 76]
Weight=[84, 95, 140, 155, 119, 175, 145, 197, 150]
Regression in R
Data in R
Height=c(60, 62, 64, 66, 68, 70, 72, 74, 76)
Weight=c(84, 95, 140, 155, 119, 175, 145, 197, 150)
Use lm(y~x) to perform a SLR in R

fit.lm=lm(Height~Weight)
fit.lm
##
## Call:
## lm(formula = Height ~ Weight)
##
## Coefficients:
## (Intercept) Weight
## 51.8864 0.1151
Regression in R
summary(fit.lm)
##
## Call:
## lm(formula = Height ~ Weight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0000 -2.0284 -0.8206 2.4170 6.8490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.88644 5.38322 9.639 2.73e-05 ***
## Weight 0.11510 0.03736 3.080 0.0178 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.815 on 7 degrees of freedom
## Multiple R-squared: 0.5755, Adjusted R-squared: 0.5148
## F-statistic: 9.489 on 1 and 7 DF, p-value: 0.0178
Regression in Python
Data in R
Height=[60, 62, 64, 66, 68, 70, 72, 74, 76]
Weight=[84, 95, 140, 155, 119, 175, 145, 197, 150]
y=Height
x=Weight
Import statsmodels module

import statsmodels.api as sm
add constant to predictor variables

x = sm.add_constant(x)
fit linear regression model

#add constant to predictor variables
x = sm.add_constant(x)
model = sm.OLS(y, x).fit()
view model summary
print(model.summary())
## OLS Regression Results

## ========================================================================
## Dep. Variable: y R-squared:
## Model: OLS Adj. R-squared:
## Method: Least Squares F-statistic:
## Date: Thu, 26 May 2022 Prob (F-statistic):
## Time: 13:33:26 Log-Likelihood: -
## No. Observations: 9 AIC:
## Df Residuals: 7 BIC:
## Df Model: 1
## Covariance Type: nonrobust
## ========================================================================
## coef std err t P>|t| [0.025
## ------------------------------------------------------------------------
## const 51.8864 5.383 9.639 0.000 39.157
## x1 0.1151 0.037 3.080 0.018 0.027
## ========================================================================
## Omnibus: 1.604 Durbin-Watson:
## Prob(Omnibus): 0.448 Jarque-Bera (JB):
## Skew:
Dr. Fode Tounkara 0.721
Statistics For Prob(JB):
Data Science May 26, 2022 58 / 58

DIT W16 Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DIT W16 Regression

Uploaded by

Copyright:

Available Formats

Statistics For Data Science

Topic: Simple Linear Regression

Dr. Fode Tounkara

May 26, 2022

Statistical Inference about the Regression Slope

Analysis of Variance (ANOVA)

Regression is a method for studying the relationship between two or

X: Height (measured in inch)

Data: SRS of 9 students

Suppose we have two quantitative variables X (say represents income) and

Think of a hypothetical line that goes through the points.

Correlation coefficient, r measures the linear relation ship between two

Correlation coefficient calculated from a sample,

Interpretation: There is strong positive relationship between Student’s Height

Let y = b1 + b2 x is the equation of the hypothetical line that we thought is

Source: https://medium.com/statistical- guess/least- square- regression- 34aaef3f76ec

Differentiating this expression with respect to b1 and b2 and equating to

x y (x − x̄) (x − x̄)2 (y − ȳ) (y − ȳ)2 (x − x̄)(y − ȳ)

Least square regression doesn’t require any distributional assumption.

Example 1: Height and Weight

Scatterplot of Height vs. Weight

Let’s take a look at a hypothetical example (may look a bit unrealistic)

X represents income category (10K, 20K etc.)

For any given σ 2 , this likelihood will be maximized when

Hence, the optimization becomes same as the least square regression

β1 represents the expected value of Y when X = 0

β2 represents the change in expected value of Y for 1-unit increase in X

Using our example:

Instead we observe a sample and calculate estimates of those parameters.

We can re-write the equations of the estimators

Technical Note: Since we are dealing with bunch of conditional

var[B1 ] and var[B2 ] can be calculated and will be a function of σ 2

Then using the definition of t-distribution,

Then 100 ∗ (1 − α)% level confidence interval for β2

For the numeric example on page 4, b2 = 2.06 , SE(B2 ) = 0.1023,

y_pred = 1.331 + 2.059*x

(or some other value)

H0 : β = 0 (x has no effect on Y) against HA : β 6= 0

The test statistic

The rejection region: at significant level α,

we reject H0 if | tobs |>| t α2 |

H0 : β = 0 (Height has no effect on Head Circumference ).

b ± t∗ se(b) = b − t∗ se(b) ; b + t∗ se(b) , where

I t∗ is a t-critical value associated with the confidence level and determined

Since the alternative is two sided (HA : β 6= 0) we reject H0 if the C.I.

1 Find the 95% C.I. for the regression slope (β).

The 95% C.I. for the slpoe β is: (0.1204 ; 0.2451)

Since the C.I. (0.1204 ; 0.2451) doesn’t comprise 0, we reject H0 .

ANOVA can be used to assess the SLR goodness of fit.

By comparing the F-statistic to the appropriate F-distribution, we (or

The proportion of total variability that is explained by the regression

In SLR, squaring the correlation coefficient (r2 ) gives us the Coefficient of

It can be shown that

Coefficient of determination (R2 ) is defined as

R2 represents the proportion of variation in Y that can be explained by

xi yi ŷi (ŷi − ȳ) (ŷi − ȳ)2 (yi − ŷi ) (yi − ŷi )2

SSreg=6000, SSE=4426 and SST=10 426

Analysis of Variance Table

The residuals ê = yi − ŷi can be used to check regression conditions.

Condition 2: The variation in the error terms is constant.

Condition 3: The errors are normally distributed.

independent errors (this essentially equates to independent

Graph: scatter plot of Weight against height type: