You are on page 1of 27

12/6/2020

CHAPTER 5
BIVARIATE ANALYSIS

LEARNING OUTCOMES
At the end of this topic, students are expected to be able to:

o Perform a correlation and regression analysis manually as well as


using SPSS,
o Interpret all results obtained from the SPSS pertaining to the
correlation and regression analysis,
o Make a prediction from the regression equation obtained from the
regression analysis.

1
12/6/2020

INTRODUCTION
o Regression and correlation analysis are used to describe a relationship
b t
between d d t variable
dependent i bl andd independent
i d d t variable(s).
i bl ( )

o Example:
a) To describe the relationship between household expenditure and
number of children.
b) To
T predict
di t the
th number
b off complaints
l i t based
b d on type
t off room.
c) To investigate the relationship between house price and house area,
number of bedrooms, number of bathrooms and location of the
house.
3

INTRODUCTION

o Scatter Diagram / Scatter plot - An initial step to investigate the


relationship between the dependent and independent variable.

o Correlation analysis - measures the strength of a relationship.

o Regression analysis - produces an equation that express the


dependent variable (Y) as a function of independent variables (X).

2
12/6/2020

SCATTER DIAGRAM
o In a scatter diagram, the independent variable is plotted along the
X axis and the dependent variable is plotted along the
horizontal X-axis
vertical Y-axis.

o Information can be obtained from a scatter plot:


a) Type of a relationship (Linear / Nonlinear / No relationship)
b) Direction of a relationship (Positive / Negative)
g

o The less scattered the points in the scatter diagram, the higher is
the degree of relationship between the dependent and
independent variable.
5

TYPES OF RELATIONSHIP
Y Y

X X
Positive Linear Relationship Negative Linear Relationship

X
Y No Relationship Y

X X
Nonlinear Relationship Nonlinear Relationship
6

3
12/6/2020

SCATTER DIAGRAM

Example 1
A dietician wants to check the association between height (in inches)
and weight (in pounds) of a baseball player.
player A sample of 9 major
league baseball players is selected at random. The data were
recorded as follows:

Height 73 69 72 70 72 66 72 72 74
Weight 201 170 180 200 190 175 205 185 186

Construct a scatter plot for the above data. Comment on the


relationship between the heights and weights.

SCATTER DIAGRAM
Solution
Weight (lbs)

210 ---
x
200 --- x x

190 --- x
x
x
180 --- x
x
170 --- x

160 ---

0
| | | | | | | | | | | Height (in)
65 66 67 68 69 70 71 72 73 74 75

Comment: There is a positive linear relationship between height and


weight.
8

4
12/6/2020

CORRELATION ANALYSIS

o In a correlation analysis,
analysis the correlation between two variables
is measured by using a linear correlation coefficient, r.

o The correlation coefficient tells us the strength and direction of


a linear relationship between two variables.

o To measure a linear correlation between two quantitative


variables, we use the Pearson correlation coefficient.

CORRELATION ANALYSIS
o The Pearson correlation coefficient is calculated as follows:

Where,
 X Y
SS XY   XY  n

SS XX X 2
 X  2

SS XY n
r
SS XX SSYY SSYY  Y 
2
Y  2

10

5
12/6/2020

CORRELATION ANALYSIS
o r can only takes a value of between -1 and 1 i.e. -1 < r < 1
o Positive value of r indicates that there is a positive correlation between
Y and X (Y increases as X increases)
o Negative value of r indicates that there is a negative correlation
between Y and X (Y decreases as X increases)
o How to determine the strength of a correlation based on the value of r?

r value (ignore the sign) Interpretation


0 No correlation
r < 0.5 Weak/low correlation
0.5 ≤ r ≤ 0.7 Moderate correlation
r > 0.7 Strong/high correlation
1 Perfect correlation
11

CORRELATION ANALYSIS

Example 2
Recall the weight and height of 9 baseball players given in Example 1.
Compute the Pearson coefficient of correlation and interpret the
result
result.
Height 73 69 72 70 72 66 72 72 74
Weight 201 170 180 200 190 175 205 185 186

SOLUTION
Find the following summary of totals by using calculator.

Y  _____ Y 2
 _____ XY _____ X  _____ X
2
 _____ n  _____

12

6
12/6/2020

CORRELATION ANALYSIS
Instructions:
1 Click MODE two times  REG (2)  LIN (1)
(Small indicator “REG” should appear at the top of the screen
indicating that you are currently on the regression mode)

2 Key in the first pair of X and Y following the format:


X, Y and then click M+
201
(For example: The first pair of X and Y in Example 1 is 73 and 201.
Therefore, key in the data as “73,201”)
4 1
4
3 Repeat step 2 until all pairs of X and Y are entered

4 The summary of totals can be obtained as follows:


TOTAL INSTRUCTIONS
2 2
∑X2 SHIFT  1  1  =
∑X
∑ SHIFT  1  2  =
S
n SHIFT  1  3  =
4 ∑Y2 SHIFT  1  Right arrow ( )  1  =
∑Y SHIFT  1  Right arrow ( )  2  =
∑XY SHIFT  1  Right arrow ( )  3  =
Note: Make sure u clear all data before u start key in a new data. To clear data
click SHIFT  MODE  ALL (3)  =
13

CORRELATION ANALYSIS
Cont...
From calculator,
X 2
 45,558 X  640 n9 Y 2
 319,272 Y 1,692 XY120,437

Interpretation:
14

7
12/6/2020

CORRELATION ANALYSIS
Example 3
Suppose we take a sample of seven households from a low to
moderate income neighborhood and collect information on their
incomes and food expenditures for the past month. Determine
h th there
whether th is i any correlation
l ti between
b t i
income andd expenditures.
dit
If yes, how strong is the correlation?

Income ($ ‘00) Food Expenditure ($’00)


35 9
49 15
21 7
39 11
15 5
28 8
25 9
15

CORRELATION ANALYSIS
Solution

X 2
 7,222 X  212 n7 Y
2
 646 Y  64 XY  2,150

Interpretation:

16

8
12/6/2020

REGRESSION ANALYSIS

o Regression analysis – analyze the relationship between


dependent and independent variable (s).

o Simple linear regression – analyze the relationship between


one dependent variable and one independent variable.

o M
Multiple
lti l linear
li regression
i – analyze
l th relationship
the l ti hi
between one dependent variable and more than one
independent variables.

17

REGRESSION ANALYSIS
o Regression analysis produces an equation that express the
dependent variable (Y) as a function of independent variables (X).

o The equation is called a linear regression model.

o A regression model describes the relationship between the


dependent and independent variables.

o The regression line can be used to make a prediction about the


value of y for a given value of x.

18

9
12/6/2020

REGRESSION ANALYSIS
o In general, a simple linear regression model is written as:

y  A  Bx  
Where,
y = Dependent variable
x = Independent variable
A = Y-intercept
B = Slope of the regression line / Regression coefficient
 = Random error term
19

REGRESSION ANALYSIS

ASSUMPTIONS OF MODEL

The following assumptions must be fulfilled in order to use a linear


regression analysis:

1. The random error term,  has a zero mean.


2. The random error terms are independent.
3. The random error term is normally distributed.
4. The random error terms have a constant variance, σ2

20

10
12/6/2020

REGRESSION ANALYSIS

ESTIMATED MODEL
o In the linear regression model,
model the values of A and B are
unknown. Therefore, we have to estimate their values by using
the least square method.

o The regression model with the estimated values of A and B is


called an estimated regression model/regression line and is
written as follows,
follows

ŷ = a + bx
21

REGRESSION ANALYSIS

ESTIMATING THE COEFFICIENTS


o By using the least square method, we can calculate the
values of a and b as follows:

 xy 
 x  y 
SS xy n
b 
SS xx 2
x 
 x
2

y x 
a  b  or a  y  bx
n  n 
22

11
12/6/2020

REGRESSION ANALYSIS

INTERPRETING THE COEFFICIENTS


o a in the regression model represent the value of y when x=0
a has no practical meaning if x=0 is not in the range of the data
o b in the regression model represents the changes in Y for a one
unit change in X.

* Positive value of b means:


When X increase by 1 unit, Y increase by b units.

* Negative value of b means:


When X increase by 1 unit, Y decrease by b units.
23

REGRESSION ANALYSIS
Example 4
Refer to Example 1. Find the equation of the regression line and
interpret the values of the regression coefficient.

Solution
From the previous calculation,

 X  640 Y  1,692 SS xy  117 SS xx  46 .8889 n9

24

12
12/6/2020

REGRESSION ANALYSIS

Cont…

Therefore, regression equation:

25

REGRESSION ANALYSIS
Example 5
Refer to Example 3. Find the equation of the regression line and
interpret the values of the regression coefficient.

Solution
From the previous calculation,

 X  212 Y  64 SS xy  211 .7143 SS xx  801 .4286 n7

26

13
12/6/2020

REGRESSION ANALYSIS

Cont…

Therefore, regression equation:

27

REGRESSION ANALYSIS

MAKING PREDICTION
o Given a value of X, we can predict the value of Y by using the
g
estimated regression equation.
q

Example 6
Using the equation of the regression line found in Example 4,
predict the weight of a baseball player whose height is 72 inches.

Solution
S l ti

28

14
12/6/2020

REGRESSION ANALYSIS

MAKING PREDICTION
Example 7
Using the equation of the regression line found in Example 5,
predict the food expenditure of a household with income $3500.

Solution

29

REGRESSION ANALYSIS

COEFFICIENT OF DETERMINATION
o Coefficient of determination, r2, measures the total variation in
Y that is explained by the independent variable, X.

SSxy
r2  b
SSyy
 correlation coefficient 
2

o For
F iinstance, if r2 =0.80,
0 80 this
hi means that
h 80% off the
h totall
variation in Y is explained by X.

o If r2 > 0.80, we say that the regression model fits the data well.
Hence, the model is useful in predicting the dependent variable.
30

15
12/6/2020

REGRESSION ANALYSIS
Example 8
Compute the coefficient of determination for the data given in
Example 1. Interpret the values.

Solution
From the previous calculation,
SS xy  117 SSyy  1,176 b  2.4952

31

REGRESSION ANALYSIS
Example 9
Compute the coefficient of determination for Example 3.
Interpret the values.

Solution
From the previous calculation,
SS xy  211.7143 SSyy  60.8571 b  0.2642

32

16
12/6/2020

REGRESSION ANALYSIS

Example 10
An observation was carried out to determine the relationship between the age of
a chef and the time (in minutes) needed to prepare a dish. The table below shows
the data recorded by eight randomly selected chefs.

Age (years) Time (minutes)


23 63
45 52
34 55
50 54
44 50
29 60
36 57
52 50

An analysis was performed the results are shown below.


33

Cont... REGRESSION ANALYSIS

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .901 .811 .780 2.193

Coefficients
Unstandardized Coefficients Standardized Coefficients
Model B Std. Error Beta t Sig.
(Constant) 71.133 3.246 21.914 .000
1
Age -0.409 0.081 -0.901 -5.078 .002

a) Identify the independent and dependent variables.


b) Prove that the product moment correlation coefficient is -0.901 and explain
its meaning.
c) How many percent of the variation in the time taken to prepare a dish is
explained by the difference in age of chefs?
d) Determine the slope and y-intercept of the regression equation. Interpret the
coefficients in the context of the problem.
e) Write the complete regression equation. Estimate the time needed for a chef
who is 30 years old to prepare a dish. 34

17
12/6/2020

REGRESSION ANALYSIS
Solution
a) Independent variable:
Dependent variable:

b)

35

REGRESSION ANALYSIS
Solution

36

18
12/6/2020

Fiza owns a bakery shop. She has advertised the shop through newspaper and
magazine in order to persuade potential customers to buy her products. In an
sales she
attempt to analyze the relationship between advertising cost and sales,
recorded the monthly advertising cost and sales (RM’00) for a sample of twelve
months. The data are listed in Table 1.

Table 1: Monthly Advertising Cost and Sales for One Year


Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Sales
65 20 48 43 52 24 47 40 34 65 50 46
(RM’00)
(RM 00)
Advertising
700 60 400 300 540 70 300 380 250 540 490 310
Cost

Fiza used the SPSS software to analyze the data. The following table and figure
show the results of regression analysis performed on the data.
37

Coefficients
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
(Constant) 19.707 3.179 6.199 .000
1
Advertising .069 .008 .940 8.726 .000

38

19
12/6/2020

a) From the scatter plot provided, comment on the relationship between


advertising cost and sales.

b) Calculate the correlation coefficient and interpret its value.

c) Find the equation of the regression line.

d) Interpret the slope of the regression line in the context of the problem.

e) Determine the coefficient of determination and interpret its value.

f) Does the simple linear model appear to be a useful tool in predicting


sales from the advertising cost? If so, produce a prediction of the
sales when RM500 of advertising cost is allocated. If not, explain why.

39

Solution

40

20
12/6/2020

Solution

41

A random sample of eight drivers insured with a company and having similar
auto insurance policies was selected. The following table lists their driving
premiums
experiences (in years) and monthly auto insurance premiums.

Driving Experience Monthly Auto Insurance


(years) Premium ($)
5 64
2 87
12 50
9 71
15 44
6 56
25 42
16 60
42

21
12/6/2020

a) Construct a scatter diagram for the data.

b) Compute SSxx, SSyy, and SSxy.

c) Calculate r and r2 and explain the values obtained.

d) Obtain a regression equation using the least square method.

e) Interpret the meaning of a and b calculated in d).

f) D the
Draw th regression
i equation
ti on the
th graphh in
i a).)

g) Predict the monthly auto insurance premium for a driver with 10


years of driving experience.

43

Solution

44

22
12/6/2020

Solution

45

Solution

46

23
12/6/2020

Solution

47

The table below shows the results obtained in investigating the relationship
between the amount (in million of RM) spent on marketing and revenue (in millions
of RM) for given year of six different hotels:
Model Summary
Adjusted R Std. Error of the
Model R R Square
Square Estimate
1 .992a .985 .981 3.586
a. Predictors: (Constant), Marketing
Coefficientsa
Unstandardized
U t d di d Standardized
St d di d
Model Coefficients Coefficients t Sig.
B Std. Error Beta
1 (Constant) 6.222 3.019 2.061 .108
Marketing 18.309 1.148 .992 15.951 .000
a. Dependent Variable: Revenue
48

24
12/6/2020

a) State the estimated regression model for the above problem.

b) Is the model useful in explaining the revenue? Why?

c) Explain the meaning of the regression coefficients in the context of the


problem.

d) How many percent of the total variation in revenue is explained by the


cost of marketing?

e) Forecast the total revenues of a hotel that plans to spend RM5 million
on marketing for the year 2010.

49

Solution

50

25
12/6/2020

A regression analysis was performed to examine the relationship between the


working experience (in years) of tourist guides and their level of knowledge
ega d g tthee local
regarding oca pplaces
aces oof interest.
te est Thee followings
o o gs aaree so
somee oof tthee results
esu ts
obtained:

Coefficientsa
Unstandardized Standardized
Model Coefficients Coefficients t Sig.
B Std. Error Beta
(Constant) 60.981 1.909 31.948 .000
1
p
Experience 1.196 .125 .959 9.558 .000
a. Dependent Variable: Scores

a) State the estimated regression model.


b) Explain the meaning of the slope in the context of the problem.
c) Estimate the knowledge score of tourist with 20 years of working experience.
51

Solution

52

26
12/6/2020

END
OF
SYLLABUS..!!

27

You might also like