You are on page 1of 38

Statistik 2

Regression and Correlation


Capaian Pembelajaran
▪ Mampu menentukan tujuan pengamatan untuk pengolahan data
menggunakan analisis regresi.
▪ Mampu menemukan variabel amatan dalam melakukan analisis
regresi.

2
Correlation Analysis
SCATTER PLOT AND CORRELATION
• Scatter plot (or scatter diagram) is used to show the relationship
between two variables
• Correlation analysis is used to measure strength of the association
(linear relationship) between two variables

→Only concerned with strength of the relationship

→No causal effect is implied

4
SCATTER PLOT EXAMPLES

5
SCATTER PLOT EXAMPLES

6
SCATTER PLOT EXAMPLES

7
CORRELATION COEFFICIENT

✓The population correlation coefficient ρ measures the strength of


the relationship between two variables

✓The sample correlation coefficient r is an estimate of ρ and is used


to measure the strength of the linear relationship in the sample
observations

8
FEATURES OF CORRELATION COEFFICIENT
a. Unit free
b. A correlation coefficient of -1.00 or +1.00
c. The closer to -1.00, the stronger the negative linear relationship
d. The closer to +1.00, the stronger the positive linear relationship
e. The closer to 0, the weaker the linear relationship

9
EXAMPLES OF APPROXIMATE r VALUES

10
CALCULATION OF COEFFICIENT CORRELATION
• Sample correlation coefficient :
𝑺𝒙𝒚
𝒓=
𝑺𝒙𝒙 𝑺𝒚𝒚

With: Where:
r : sample correlation coefficient
𝑆𝑥𝑥 = σ𝑛𝑖=1(𝑥𝑖 − 𝑥)ҧ 2 , n : sample size
𝑆𝑦𝑦 = σ𝑛𝑖=1(𝑦𝑖 − 𝑦)
ത 2, xi : value of observation i in independent variable
yi : value of observation i in dependent variable
𝑆𝑥𝑦 = σ𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦
ҧ 𝑖 − 𝑦),
ത 𝑥ҧ : average value of independent variable
𝑦ത : average value of dependent variable

11
EXAMPLE OF COEFFICIENT CORRELATION CALCULATION
We want to evaluate the relationship between the number of
sales calls and the number of products sold.

Calls Sales (𝑥𝑖 − 𝑥)ҧ (𝑥𝑖 − 𝑥)ҧ 2 ത


(𝑦𝑖 − 𝑦) ത 2
(𝑦𝑖 − 𝑦) ҧ 𝑖 − 𝑦)
(𝑥𝑖 −𝑥)(𝑦 ത
20 $ 30.00 -2 4 $ -15.00 $ 225.00 $ 30.00 𝑆𝑥𝑦
40 $ 60.00 18 324 $ 15.00 $ 225.00 $ 270.00 𝑟=
20 $ 40.00 -2 4 $ -5.00 $ 25.00 $ 10.00
𝑆𝑥𝑥 𝑆𝑦𝑦
30 $ 60.00 8 64 $ 15.00 $ 225.00 $ 120.00
10 $ 30.00 -12 144 $ -15.00 $ 225.00 $ 180.00 $ 900
10 $ 40.00 -12 144 $ -5.00 $ 25.00 $ 60.00 = = 0.759
20 $ 40.00 -2 4 $ -5.00 $ 25.00 $ 10.00 760 (1850)
20 $ 50.00 -2 4 $ 5.00 $ 25.00 $ -10.00
20 $ 30.00 -2 4 $ -15.00 $ 225.00 $ 30.00
30 $ 70.00 8 64 $ 25.00 $ 625.00 $ 200.00
22 $ 45.00 𝑺𝒙𝒙 =760 𝑺𝒚𝒚 = $ 1,850.00 𝑺𝒙𝒚 =$ 900.00

12
EXAMPLE OF COEFFICIENT CORRELATION CALCULATION

Using Excel Features: Data – Data Analysis -Correlation

Calls Sales
Calls 1

Sales 0.759014 1 It does not show us any cause-


and-effect relationship between
two variables

r = 0.759 → strong positive linear


relationship between the number of
calls and the number of sales

13
SIGNIFICANCE TEST OF THE CORRELATION COEFFICIENT

On the previous example, we found that r = 0.759


→ It is based on 10 samples observed
→ How can we conclude about the relationship between two variables in the population?

𝐻𝑜 : 𝜌 = 0 𝑛𝑜 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
𝐻1 : 𝜌 ≠ 0 (𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑒𝑥𝑖𝑠𝑡𝑠)

Test statistic:
𝒓 𝒏−𝟐
𝑻=
𝟏 − 𝒓𝟐
With n-2 degrees of freedom

14
SIGNIFICANCE TEST OF THE CORRELATION COEFFICIENT

With α significance level, the critical region will be:


𝑇 < −𝑡𝛼/2,𝑣 or 𝑇 > 𝑡𝛼/2,𝑣

a/2 a/2

Reject H0 Do not reject H0 Reject H0


-tα/2 tα/2
0

15
EXAMPLE OF SIGNIFICANCE TEST OF r
1. State the hypothesis
𝐻𝑜 : 𝜌 = 0 𝑛𝑜 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
𝐻1 : 𝜌 ≠ 0 (𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑒𝑥𝑖𝑠𝑡𝑠)

2. Define critical value and critical region:

With α=0.05 and v=n-2=8 → 𝑡𝛼/2,𝑣 = 𝑡.025,8 = 2.306


Critical region: 𝑇 < −2.306 or 𝑇 > 2.306

3. Compute the T-statistic:


𝑟 𝑛−2 0.756 8
𝑇= = = 3.28
1− 𝑟2 1 − 0.7592

4. Evaluate the hypothesis


T = 3.28 > t = 2.306 → reject H0

5. Conclusion
At 5% significance level,
there is a positive correlation between the number of calls and the number of sales in the population
REGRESSION ANALYSIS

17
INTRODUCTION TO REGRESSION ANALYSIS
Regression analysis is used to:
▪ Predict the value of a dependent variable based on the value of at least one
independent variable
▪ Explain the impact of changes in an independent variable on the dependent variable

Analyse the causality relationship between independent and dependent variables

Independent variable:
• the variable we use to explain the dependent variable
• Predictor variable → use to predict the expected value of dependent variable
Dependent variable:
• the variable we wish to explain
• The variable that is being predicted or estimated

18
SIMPLE LINEAR REGRESSION MODEL

• Only one independent variable (x) → one regressor


• Relationship between x and y is described by a linear
function
• Changes in y are assumed to be caused by changes in x

19
TYPES OF REGRESSION MODELS

20
SIMPLE LINEAR REGRESSION MODEL
A linear relationship form between the response Y and the
regressor x:
𝒀 = 𝜶 + 𝜷𝒙

Where 𝛼 is the intercept, and 𝛽 is the slope

However, the relationship between Y and x is not


deterministic → there must be a random component to the
equation that relates to the variables.
Thus, the model will be:
𝒀 = 𝜶 + 𝜷𝒙 + 𝝐
Where 𝝐 is a random variable that is assumed to be
distributed with E(𝝐)=0 and Var(𝝐)=σ2
21
SIMPLE LINEAR REGRESSION MODEL

Interpretation:
✓ The quantity Y is a random since 𝝐 is random
✓ The value regressor x is not random

22
SIMPLE LINEAR REGRESSION ASSUMPTIONS

✓ Error values (ε) is statistically independent


✓ Error values are normally distributed for any given value of
x and have constant variance
✓ The underlying relationship between the x variable and the
y variable is linear

23
POPULATION AND SAMPLE REGRESSION MODEL

Unknown 𝑦ො = 𝑎 + 𝑏𝑥
relationship
𝑦 = 𝛼 + 𝛽𝑥

24
ESTIMATED REGRESSION MODEL

The sample regression model provides an estimate of the population


regression line

25
LINEAR REGRESSION MODEL
In regression analysis, the objective is to use the data to position a line
that best represent the relationship between the two variables
→How do we find the best fitted line for the data?

the first approach is to use a scatter diagram to visually position the


line

26
SCATTER DIAGRAM
1. Plot of all (Xi,Yi) pairs
2. Suggest how well model will fit

27
SCATTER DIAGRAM

How would you draw a line through the points? How do you determine which line ‘fits best’?

28
SCATTER DIAGRAM

How would you draw a line through the points? How do you determine which line ‘fits best’?

29
SCATTER DIAGRAM

How would you draw a line through the points? How do you determine which line ‘fits best’?

30
SCATTER DIAGRAM

How would you draw a line through the points? How do you determine which line ‘fits best’?

31
LEAST SQUARES PRINCIPLE
• We would like to choose a line that would minimize the error
between the actual data and the line → residual
Error in Fit:
Given a set of regression data [(xi,yi );i:1,2,…,n] and a fitted model
𝑦ො𝑖 = 𝑎 + 𝑏𝑥, the ith residual ei is given by
𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖

Using least squares method, we


minimize the sum of squares of
vertical deviations from the points to
the line → LEAST SQUARES VALUE

32
LEAST SQUARES EQUATION
Given a set of regression data [(xi,yi );i:1,2,…,n], the least squares estimate a and
b of the regression coefficients α and β are computed from the formulas:

σ𝒏𝒊=𝟏 𝒙𝒊 − ഥ ഥ) 𝑺𝒙𝒚
𝒙 (𝒚𝒊 − 𝒚
𝒃= 𝒏 𝟐
=
σ𝒊=𝟏(𝒙𝒊 − ഥ
𝒙) 𝑺𝒙𝒙

σ𝒏𝒊=𝟏 𝒚𝒊 − 𝒃 σ𝒏𝒊=𝟏 𝒙𝒊
𝒂= ഥ − 𝒃ഥ
=𝒚 𝒙
𝒏

33
INTERPRETATION OF LEAST SQUARES MODEL
➢ a is the estimated average value of y when the value of x is zero
➢ b is the estimated change in the average value of y as a result of a one-unit
change in x

Note:
The regression equation is not generally used for the points outside the range of
the sample values

34
SIMPLE LINEAR REGRESSION EXAMPLE
Recall the previous example!
We want to evaluate whether the number of sales calls affects the number of
products sold?
Calls (xi) Sales (yi) (𝑥𝑖 − 𝑥)ҧ (𝑥𝑖 − 𝑥)ҧ 2 ത
(𝑦𝑖 − 𝑦) ത 2
(𝑦𝑖 − 𝑦) ҧ 𝑖 − 𝑦)
(𝑥𝑖 −𝑥)(𝑦 ത
20 $ 30.00 -2 4 $ -15.00 $ 225.00 $ 30.00
40 $ 60.00 18 324 $ 15.00 $ 225.00 $ 270.00
-2 4 $ -5.00 $ 25.00 $ 10.00
900
20 $ 40.00 𝑏= = 1.18
30 $ 60.00 8 64 $ 15.00 $ 225.00 $ 120.00 760
10 $ 30.00 -12 144 $ -15.00 $ 225.00 $ 180.00 𝑎 = 45 − 1.18 22 = 18.95
10 $ 40.00 -12 144 $ -5.00 $ 25.00 $ 60.00
20 $ 40.00 -2 4 $ -5.00 $ 25.00 $ 10.00
20 $ 50.00 -2 4 $ 5.00 $ 25.00 $ -10.00
20 $ 30.00 -2 4 $ -15.00 $ 225.00 $ 30.00
30 $ 70.00 8 64 $ 25.00 $ 625.00 $ 200.00
𝑥ҧ =22 𝑦ത = $ 45.00 0 𝑺𝒙𝒙 =760 0 𝑺𝒚𝒚 =$ 1,850.00 𝑺𝒙𝒚 =$ 900.00

35
INTERPRETATION THE MODEL
Linear regression equation:

𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒂𝒍𝒆𝒔 = 𝟏𝟖. 𝟗𝟓 + 𝟏. 𝟏𝟖(𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐜𝐚𝐥𝐥𝐬)

a = 18.95 is the estimated average value of Y when the value of X is zero


However, x = 0 is not in the range of the sample values → should not be used to estimate the
number of products sold
→ The number of calls ranged from 10 to 40, so estimates should be limited to that range
The value a = 18.95 is also can be described as the portion of the number of products sold not
explained by the number of calls
→ Probably by any other variable

36
INTERPRETATION THE MODEL
Linear regression equation:

𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒂𝒍𝒆𝒔 = 𝟏𝟖. 𝟗𝟓 + 𝟏. 𝟏𝟖(𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐜𝐚𝐥𝐥𝐬)

b = 1.18 is the estimated change in the average value of Y as a result of


one-unit change in X
→ It tells use that the average value of the number of sales increases by
1.18 unit, on average, for each additional number of call

37
Selesai

You might also like