You are on page 1of 27

Simple Linear

Regression
UNIT-1 and UNIT-2

Dr. Tina Dutta


INTRODUCTION TO SIMPLE LINEAR
REGRESSION
• Simple linear regression is a statistical technique for finding the existence of
an association relationship between a dependent variable (aka response
variable or outcome variable) and an independent variable (aka explanatory
variable or predictor variable).
• Simple linear regression implies that there is only one independent variable in
the model.
The functional relationship
Linear in Parameters
• It is important to note that the linearity condition in linear regression is
defined with respect to the regression coefficients and not with
respect to the explanatory variables in the model.
• For e.g. the following two regression equations are linear regression models

linear models as B0 and B1 are linear


we do not need to impose linearity condition on X and Y
,as we can take another variable eg Z=X**2 or lnX to
impose linearity
Non-Linear in Regression Models
The following regression equations are non-linear regression models

Here B1 and B0 are nonlinear in nature


and the model cannot be made linear
by substitution
Method of Ordinary Least Squares (OLS)
Ques. Use OLS to derive regression formula

• The method of Ordinary Least Squares (OLS) is used to estimate the regression
parameters.
• OLS fits regression line through a set of data points such that the sum of the
squared distances between the actual observations in the sample and the regression
line is minimized.
• OLS provides the Best Linear Unbiased Estimate (BLUE)
Deriving the regression coefficients using OLS

• In ordinary least squares, the objective is to find the optimal values


of 𝛽0 𝑎𝑛𝑑 𝛽1 𝑡ℎ𝑎𝑡 𝑤𝑖𝑙𝑙 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑡ℎ𝑒 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝐸𝑟𝑟𝑜𝑟𝑠 (𝑆𝑆𝐸)
given in following equation:
Deriving the regression coefficients using OLS

To find the optimal values of 𝛽0 𝑎𝑛𝑑 𝛽1 that will minimize SSE, we have to
equate the partial derivative of SSE with respect to 𝛽0 𝑎𝑛𝑑 𝛽1 to zero
Deriving the regression coefficients using OLS
Deriving the regression coefficients using OLS

𝑛 σ 𝑋𝑖 𝑌𝑖 − σ 𝑋𝑖 σ 𝑌𝑖
= B1= cov(X,Y)/var(X)

𝑛 σ 𝑋𝑖2 − σ 𝑋𝑖 2
෢1
Other expressions of 𝛽

also correlation or r =cov(X,Y)/SDx *SDy


Solution:

Source: Book: Business Analytics: The Science of Data-driven decision making by U Dinesh Kumar (2017)
Assumptions of Simple Linear Regression
Gauss-Markov Theorem in Regression Analysis

The Gauss-Markov theorem states that:


• For a regression model with the assumptions E(ε) = 0, Var(ε) = σ2 , and
uncorrelated errors,
• the least-squares estimators are unbiased and have minimum variance when
compared with all other unbiased estimators that are linear combinations of
the yi.
• Also, the least-squares estimators are best linear unbiased estimators, where
“best” implies minimum variance.
VALIDATION OF THE SIMPLE LINEAR REGRESSION
MODEL
It is important to validate the regression model to ensure its validity and goodness of fit
before it can be used for practical applications.

Measures used to validate the simple linear regression models:


1. Coefficient of Determination (R-Square)
• The coefficient of determination measures the proportion of variation in
Y that is explained by the variation in the independent variable X in the
regression model.
• The range of is from 0 to 1, and the greater the value, the more the
variation in Y in the regression model can be explained by the variation in X.
SSR=sum((Ypredicted- Ymean)**2)

SST = sum((Yobs-Ymean)**2)
SSE = sum((Yobs-Ypred)**2)
SST= SSR +SSE
2. Hypothesis test for slope (𝛽1 ) of Regression

• The regression co-efficient (𝛽1 ) captures the existence of a linear


relationship between the response variable and the explanatory variable.
• If 𝛽1 = 0, we can conclude that there is no statistically significant linear
relationship between the two variables.

OR
Hypothesis test for slope (𝛽1 ) of Regression

This is a two-tailed t-test


with (n-2) degrees of
freedom
3. Analysis of Variance for Overall Model
Validity (F-test)
• Using the Analysis of Variance (ANOVA), we can test whether the overall
model is statistically significant.
• However, for a simple linear regression, the null and alternative hypotheses
in ANOVA and t-test are exactly same.
• When using the least-squares method to determine the regression
coefficients one needs to compute three measures of variation.
3. Analysis of Variance for Overall Model
Validity (F-test)
• The first measure, the total sum of squares (SST), is a measure of variation
of the Yi values around their mean, 𝑌ത .
• The total variation, or total sum of squares, is subdivided into explained
variation and unexplained variation.
• The explained variation, or regression sum of squares (SSR), represents the
variation that is explained by the relationship between X and Y.
• The unexplained variation, or error sum of squares (SSE), represents
variation due to factors other than the relationship between X and Y.
Analysis of Variance
Computation of Sum of Squares (Variances)
F-test for testing significance of the Regression
Model
F-test for testing significance of the Regression
Model

The test statistic follows F-distribution with (1,n-2)


degrees of freedom
df= (df regression, df residuals)
attempt this ques for practice

Ex: For the following data:


a) Analyze the relationship between years of education
and salary (fit a regression line)
b) Test whether the regression model is significant
(hint: carry out ANOVA and F-test) at alpha=0.05.
c) Explain the model’s goodness of fit (hint: compute
R-square)
d) Test whether the slope of the regression is
statistically significant alpha=0.05.
e) Calculate the interval estimate of the regression
coefficient (𝛽1 ).

You might also like