You are on page 1of 6

MATERIAL TO LECTURE 3

Narbut Victoria V.
PhD, Associate Professor,
Accounting, Analyses and Audit Department

Unit 7. Correlation and Regression Analysis

Types of correlation
• Positive / Direct (the dependent variable Y
increases as the independent variable X tends to
increase)
DIRECTION • Negative / Inverse (the dependent variable Y
decreases as the independent variable X tends to
increase)

• Linear (the equation of the line)


FORM • Curvilinear (the equation of a parabola, hyperbola
and so on)

• Simple (relationship between a dependent and an


independent variables)
NUMBER OF VARIABLES
• Multiple (relationship between several dependent
and one independent variables)

Scatterplot

y y

x x x

Positive correlation Negative correlation No correlation


Example 7.1
Define the relationship between the length of service and the average monthly wage
Length of service, year Monthly wage, thou. rubles
N
X Y
1 10 43
2 2 21
3 3 26
4 12 43
5 5 33
6 13 43
7 5 30
8 10 42
9 8 41
10 7 39

Сalculation table
N X Y X² Y² XY
1 10 43 100 1849 430
2 2 21 4 441 42
3 3 26 9 676 78
4 12 43 144 1849 516
5 5 33 25 1089 165
6 13 43 169 1849 559
7 5 30 25 900 150
8 10 42 100 1764 420
9 8 41 64 1681 328
10 7 39 49 1521 273
Total: 75 361 689 13619 2961

Correlation coefficient

Karl Pearson’s coefficient of correlation Karl Pearson’s formula for short-cut


method

 x y  ( x − x )( y − y )
 xy − n r=
r=  (x − x)  ( y − y)
2 2

( x ) 2
( y ) 2

( x −
2
)  ( y −
2
)
n n ̅̅̅ − 𝑥̅ ∙ 𝑦̅
𝑥𝑦
𝑟=
𝜎𝑥 ∙ 𝜎𝑦
Interpretation of correlation coefficient value

−1 ≤ 𝑟 ≤ +1
r<0 – negative correlation
r>0 – positive correlation
r=0 – no linear correlation

r – value Relationship
up to ±0,3 no linear relationship
±0,3 - ±0,5 weak
±0,5 - ±0,7 moderate
±0,7 - ±0,9 strong
±0,9 and more very strong

Correlation ratio
2
= 0 ≤ η ≤ +1
 02
Types of variation

Variation Russian equivalent Formula


Total variation Общая дисперсия ∑(𝑦 − 𝑦̄ 0 )2 ⋅ 𝑓
𝜎о2 =
∑𝑓
Variation between groups Межгрупповая
2 =
(y j − y0 ) 2  n j
дисперсия
 nj
Average of the variation Средняя внутригруппо-
j =
2   n
2
j j
вая дисперсия
within groups
n j

The rule of summarizing of Правило сложения  02 =  2 +  2

variation дисперсий

Simple Linear Regression Equation

̂ = 𝒃𝟎 + 𝒃𝟏 ∙ 𝑿
𝒀

Formulas for parameters a and b:


Parameter 𝑏0 Parameter 𝑏1
(intercept) (coefficient of regression)
∑ 𝑋𝑌 − 𝑛 ∙ 𝑋̅ ∙ 𝑌̅ or 𝑏 = 𝑟 ∙
𝜎𝑌
𝑏0 = 𝑌̅ − 𝑏1 𝑋̅ 𝑏1 = 1 𝜎𝑋
∑ 𝑋 2 − 𝑛 ∙ (𝑋̅)2
Spearman rank correlation coefficient
n 𝑑𝑖 – differences between paired ranks
6   d i2 𝑑𝑖2 – squared differences between paired ranks
 = 1− i =1
𝑛 – number of pairs (number of observations)
n  ( n − 1)
2

− 1    +1
ρ<0 – negative correlation
ρ >0 – positive correlation
|ρ|>0,5 - relationship is significant

Steps of calculating Spearman rank correlation coefficient

1. Rank x-values from lowest to highest


2. Rank y-values from lowest to highest
3. Compute differences between paired ranks
4. Compute squared differences between paired ranks
5. Compute Spearman rank coefficient
6. Interpret the value of Spearman correlation coefficient

Example 7.2
Experts assessed the technical and financial condition of 10 industrial enterprises. The total
scores were:
N Technical Financial Ranks di= Rx-Ry di ²
condition (x) condition (y)
Rx Ry
1 27 26
2 30 25
3 38 30
4 36 32
5 33 28
6 42 37
7 35 33
8 40 36
9 39 31
10 43 40
Total: - - - - -

Kendall rank correlation coefficient


2S 𝑛 – number of observations
 = S=P+Q
n  ( n − 1) P - the number of ranks following the current one and exceeding
its value
Q - the number of ranks following the current one and less than
its value
− 1    +1
τ<0 – negative correlation
τ >0 – positive correlation
|τ|>0,5 - relationship is significant
Steps of calculating Kendall rank correlation coefficient

1. Rank x-values from lowest to highest


2. Rank y-values from lowest to highest
3. Determine for each rank y the number of ranks following it that exceed its value. The resulting
numbers are summed and denoted by P, which is taken with the sign “+”
4. Determine for each rank y the number of subsequent ranks less than its value. The resulting
numbers are summed and denoted by Q, which is taken with the sign ”-“
5. Determine S=P+Q
6. Compute Kendall rank coefficient
7. Interpret the value of Kendall rank correlation coefficient.

Coefficient of association and contingency coefficient


2x2 table
Variable Variable y Total:
x y1 y2
x1 a b a+b
x2 c d c+d
Total: a+c b+d n

Coefficient of association Contingency coefficient


ad − bc Kc =
ad − bc
Ka =
ad + bc ( a + b )  (b + d )  ( a + c )  ( c + d )
− 1  K a  +1 − 1  K c  +1
|Ka|>0,5 - relationship is significant |Ka|>0,3 - relationship is significant

Example 7.3
Based on a survey of 500 Moscow residents and 500 visitors working in Moscow,
the distribution of their answers to the question of whether they are satisfied with
their wages:
Category of Satisfied with wage Total:
population yes no
Moscow visitors 380 120 500
Moscow residents 100 400 500
Total: 480 520 1000
Define association and contingency coefficients
Multiple regression

Linear equation of multiple regression:


уˆ 1, 2 ,... k = a 0 + a1 x1 + a 2 x 2 + ... + a k x k

a1 , a 2 ,..., a k - coefficients of regression


x1 , x 2 ,..., x k - independent variables
yˆ 1, 2 ,... k - theoretical values of dependent variable

For the analysis of influence factors on the dependent variable we determine the
coefficients of elasticity and β-coefficients.

Coefficient of elasticity and β-coefficient


Coefficient Russian Formula Characteristics
equivalent
Coefficient Коэффициент xj Shows percent of the average change of
эластичности Эj = bj 
of elasticity ( Э j ) y the dependent variable due to change of
the independent variable by 1%
β-coefficient (  j ) Бета-коэффи- x Shows the change of standard deviation of
 j = bj 
j

циент y the dependent variable due to change of


the independent variable by the value of
standard deviation

You might also like