Elements of Statistics and Probability STA 201 S M Rajib Hossain MNS, BRAC University Lecture-8

Elements of Statistics and Probability
STA 201
S M Rajib Hossain
MNS, BRAC University
Lecture-8
Regression Analysis
Regression analysis is a set of statistical methods used for the estimation of

relationships between a dependent variable and one or more independent
variables.
For example,
✓ It can be used to predict the relationship between reckless driving and

the total number of road accidents.
✓ The effect on sales and spending a certain amount of money on
advertising.
Purpose of regression analysis:
✓ Cause effect relationship.

✓ Prediction.
Types of variables in regression analysis:
✓ Dependent variables.
✓ Independent variables.
✓
Dependent variables: The variables where value is influenced or is to be

predicted.
The dependent variable is also known as response, regress or explained

variable.
Independent variables: The variables which influence the values of the
dependent variables or are used for prediction.
The independent variable is also known as explanatory variable, predictor,

covariate or regressor.
Types of regression equation
✓ Simple regression equation: A regression equation containing only one

independent variable is called simple regression equation.
✓ Multiple regression equation: A regression equation containing more
than one independent variable is called multiple regression equation.
Simple linear regression equation
Let Y be the response variable and we want to explain Y by a single

explanatory variable X.
The basic model is 𝑦 = 𝛼 + 𝛽𝑥 + 𝜀
✓ y is the predicted value of the dependent variable for any given value
of the independent variable (x).
✓ 𝛼 is the intercept, the predicted value of y when the x is 0.
✓ 𝛽 is the regression coefficient (slope) – how much we expect y to
change as x increases.
✓ x is the independent variable (the variable we expect is influencing y).
Here, 𝛼 and 𝛽 are parameters that must be estimated. The symbol 𝜀

represents the random error term. This does not mean that a mistake is being
made. It is simply a symbol used to indicate the absence of exact
relationship between x and y.
Estimated/fitted equation is 𝑦̂ = 𝛼̂ + 𝛽̂ 𝑥
𝑛
∑ 𝑥 𝑦 −𝑛𝑥̅ 𝑦̅
Here, 𝛽̂ = ∑1𝑛 𝑖 2𝑖 2
1 𝑥𝑖 −𝑛𝑥̅
𝛼̂ = 𝑦̅ − 𝛽̂ 𝑥̅
Example 1: Exam Scores and Study Hours

Suppose you want to determine if there is a relationship between the number
of hours a student studies (independent variable) and their exam scores
(dependent variable). You collect data from a sample of students and want to
fit a simple linear regression model to the data.
Study hours (X) 2 3 4 5 6
Exam scores (Y) 60 70 75 80 85
Solution:
𝑥𝑖 𝑦𝑖 𝑥𝑖 𝑦𝑖 𝑥𝑖 2
2 60 120 4
3 70 210 9
4 75 300 16
5 80 400 25
6 85 510 36
∑𝑛1 𝑥𝑖 =20 ∑𝑛1 𝑦𝑖 =370 ∑𝑛1 𝑥𝑖 𝑦𝑖 =1540 ∑𝑛1 𝑥𝑖 2 =90
∑𝑛
1 𝑥𝑖 ∑𝑛
1 𝑦𝑖
Here, 𝑥̅ = 𝑦̅ =
𝑛 𝑛
20 370
= =
5 5
=4 = 74
𝑛
∑ 𝑥 𝑦 −𝑛𝑥̅ 𝑦̅
We know, 𝛽̂ = ∑1𝑛 𝑖 2𝑖 2 𝛼̂ = 𝑦̅ − 𝛽̂ 𝑥̅
1 𝑥𝑖 −𝑛𝑥̅
1540−5∗4∗74
= = 74- 6*4
90−5∗4 2
=6 = 50
Fitted equation is 𝑦̂ = 50 + 6𝑥
Interpretation:
✓ The slope (𝛽̂ = 6) indicates that, on average, for each additional hour
of study, the exam score is expected to increase by approximately 6
points.
✓ The intercept (𝛼̂ = 50) suggests that if a student doesn't study at all (0
hours), their expected exam score is around 50.
For study hour, x= 8 hours (say)
𝑦̂ = 50 + 6 ∗ 8
= 98
So, the predicted exam score is 98.
Coefficient of Determination: The coefficient of determination is the square

of the pearson correlation coefficient (r). The coefficient of determination,
𝑟 2 , is the proportion of variation in the observed values of the dependent
variable explained by the independent variable. The coefficient of
determination, 𝑟 2 , always lies between 0 and 1. A value of 𝑟 2 near 0
suggests that the regression equation is not very useful for making
predictions, whereas a value of 𝑟 2 near 1 suggests that the regression
equation is quite useful for making predictions.
For the above example, r = 0.9480542 (say) and 𝑟 2 =0.899, that means
89.9% variation in the exam score can be explained by the study hours.
Problem: You are analyzing the relationship between the number of

visitors on a website and the time it takes for the website to load:
Number of visitors 100 150 200 250 300

Load time (seconds) 20 30 40 50 60
Problem: You want to predict customer satisfaction based on the response

time of a customer support system:
Response time (minutes) 10 20 30 40 50
Customer satisfaction (point) 80 70 60 50 40
Multiple regression equation

Suppose, we have k independent variables 𝑥1 , 𝑥2 , … , 𝑥𝑘 and we want to
explain Y.
The basic model is 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜀
Where, 𝛽0 is the intercept of the regression equation.
𝛽𝑗 (j= 1, 2, …, k) are called the partial regression coefficients.
## The parameter 𝛽𝑗 represents the expected change in the response y per

unit change in 𝑥𝑗 when all the remaining regressor variables
𝑥𝑖 (𝑖 ≠ 𝑗) are held constant. For this reason the parameters 𝛽𝑗 (j= 1, 2, …, k)

are called the partial regression coefficients.

Elements of Statistics and Probability STA 201 S M Rajib Hossain MNS, BRAC University Lecture-8

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Elements of Statistics and Probability STA 201 S M Rajib Hossain MNS, BRAC University Lecture-8

Uploaded by

Copyright:

Available Formats

Elements of Statistics and Probability

MNS, BRAC University

Regression analysis is a set of statistical methods used for the estimation of

✓ It can be used to predict the relationship between reckless driving and

Purpose of regression analysis:

✓ Cause effect relationship.

Types of variables in regression analysis:

Dependent variables: The variables where value is influenced or is to be

The dependent variable is also known as response, regress or explained

The independent variable is also known as explanatory variable, predictor,

Types of regression equation

✓ Simple regression equation: A regression equation containing only one

Simple linear regression equation

Let Y be the response variable and we want to explain Y by a single

The basic model is 𝑦 = 𝛼 + 𝛽𝑥 + 𝜀

Here, 𝛼 and 𝛽 are parameters that must be estimated. The symbol 𝜀

Example 1: Exam Scores and Study Hours

For study hour, x= 8 hours (say)

Coefficient of Determination: The coefficient of determination is the square

Problem: You are analyzing the relationship between the number of

Number of visitors 100 150 200 250 300

Problem: You want to predict customer satisfaction based on the response

Multiple regression equation

The basic model is 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜀

Where, 𝛽0 is the intercept of the regression equation.

𝛽𝑗 (j= 1, 2, …, k) are called the partial regression coefficients.

## The parameter 𝛽𝑗 represents the expected change in the response y per

𝑥𝑖 (𝑖 ≠ 𝑗) are held constant. For this reason the parameters 𝛽𝑗 (j= 1, 2, …, k)

You might also like