You are on page 1of 28

BASICS OF STATISTICAL

MODELLING
Marnielle A. Salig
Introduction to
Linear Regression
and
Correlation Analysis
Linear Regression
Introduction to Regression Analysis
Regression analysis is used to:
predict the value of a dependent variable based
on the value of at least one independent variable
explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to explain
Independent variable: the variable used to explain
the dependent variable
Scatter Plots
a.k.a. scatter diagram
shows the relationship between
two variables
Scatter Plot Examples
Linear relationships Nonlinear relationships

y y

x x

y y

x x
Who coined the term ‘regression’?

Sir Francis Galton


(1822 –1911)
 Britishstatistician,
sociologist, etc…
Simple Linear Regression Model
 
Only one independent variable,
Relationship between and is
described by a linear function
Changes in are assumed to be
caused by changes in
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship


Linear Regression Assumptions
X is not a random variable.The values of the
independent variable X may be “fixed”, or the
researcher may select the values of X in
advance.
 The values of X are measured without error.
 Thevariances around the line is the same for all
values of the independent variable (X). This is
the condition called homoscedasticity.
 The subpopulation of the dependent variable
Y, given different values of the independent
variable X, is normally distributed.
 Themeans of the subpopulations of Y all lie
on the same straight line (This is called the
assumption of linearity)

11
Estimated Regression Model
The sample regression line provides an
estimate of the population regression line
Estimate of the
Estimate of the
Estimated (or regression
regression slope
predicted) y intercept
value
Independent
variable

The individual random error terms ei have


a mean of zero
Formula
Interpretation of
Slope b and intercept a
 
Interpretation of
A sociologist wants to know whether the number of
children in the family is linearly dependent on the
age of the mother at her wedding. He interviewed 9
housewives and the results are shown below:

Age at wedding 21 15 22 22 21 25 30 18 24
No. of children 4 8 3 4 2 3 1 5 6

1. Construct a scatter plot. Interpret.


2. Obtain the estimated regression line equation.
3. Estimate the number of children if a mother’s age at
wedding is 21.
1. Plot the scatter diagram/scatter plot. Interpret.

Age at wedding 21 15 22 22 21 25 30 18 24
No. of children 4 8 3 4 2 3 1 5 6
                     
7 8 9 10

                     
                     
Number of Children

                     
                     
4 5 6

                     
                     
                     
3

                     
3

                     
1 2

                     
                     

10 13 16 19 22 25 28 31 34 37

Age at wedding
2. Obtain the estimated regression line equation.
(X) Age at
21 15 22 22 21 25 30 18 24
wedding
(Y) No. of
4 8 3 4 2 3 1 5 6
children
 𝑛
∑ 𝑥 𝑖=21+15+22+22+21+25+30+18+24=198
𝑖=1
  𝑛
∑ 𝑦 𝑖= 4+8+3+ 4+2+3+1+5+6=36
𝑖=1

 𝑛
2 2 2 2 2 2 2 2 2 2
∑ 𝑥𝑖 =21 +15 +22 +22 +21 +25 +30 +18 +24 =4500
𝑖=1
2. Obtain the estimated regression line equation.
(X) Age at
21 15 22 22 21 25 30 18 24
wedding
(Y) No. of
4 8 3 4 2 3 1 5 6
children
  𝑛
∑ 𝑦 𝑖2=42 +82 +32 +4 2+22 +32 +12+5 2+6 2=180
𝑖=1

 𝑛

∑ 𝑥𝑖 𝑦𝑖=21( 4)+15 (8)+22 (3)+22 (4)+21 (2)+25 (3)+30 (1)+18 (5 )+24 (6)= 739
𝑖=1
𝑛 𝑛
  ∑ 𝑥𝑖   ∑ 𝑦𝑖
𝑛=9
  𝑥=
´ 𝑖=1
=
198
= 22 𝑦=
´ 𝑖=1
=
36
=4
𝑛 9 𝑛 9
2. Obtain the estimated regression line equation.

a=´𝑦 −𝑏´𝑥=4− ( −0.368 )( 22 )=12.096


 

Thus, the estimated


regression line equation
is
  ^𝑦 =𝑎+𝑏𝑥
¿ 12.096 −0.368 𝑥
 
3. Estimate the number of children if a
mother’s age at wedding is 21.

 
If , then
 

This means that the number of children in the family is four


if the age of the mother at wedding is 21.
Number of 3 5 4 10 9 8 7 6 5 4 12 3
hours studying
(X)

Scores in Final 30 54 40 90 85 82 78 68 60 48 96 35
Exam (Y)

Estimate the exam score of a student who spent


4.5 hours studying.
LINEAR REGRESSION
ANALYSIS ON SPSS
1. What is the effect of unit increase in age in maximum
heart rate achieved of an individual?
For every unit increase in age, there is also unit decrease in maximum heart rate
achieved on an individual. This can be verified by looking at the beta or the
coefficient of age (-1.023). Since beta is less than 0, as age increases, maximum
heart rate achieved of an individual decreases.

to present the regression equation as: Y=bX+a


Maximum Heart rate = (-1.023)(Age) + 205.4
2. Given an individual is 50 yrs old, what is the estimated
maximum heart rate achieved that person could have?

Y=bX+a where b= -1.023 and a= 205.4


Y= (-1.023)(50) + 205.4 = 154.25
Therefore, the estimated maximum heart rate
achieved given that an individual is 50 yrs old is
154.25 .
ASSIGNMENT:
Show all the necessary solutions.
1. When buying items, it is sometimes advantageous to buy in large
quantities because the unit price is usually less for larger quantities. To
test if there is a linear relationship between the number of quantities of a
particular item and the cost per quantity, the following data were
obtained.
a. Find the equation of the estimated regression line.
b. What is the expected cost per unit if we buy 2 dozen units of items?

Number of Units (X) 1 3 5 10 12 15


Cost per Unit (Y) 55 52 48 36 32 30

You might also like