You are on page 1of 54

Please ensure that your

registered name is being


displayed on your zoom
device… Eg. Shanice Smith
Introduction to Business & Economic
Statistic: MTH201

Lecturer: Chrystal Rhone


Refresher…
 Range = (Max. data entry) – (Min. data entry)

 Variance
The average of the squared differences from the mean
2
 ( x   )  ( x  x ) 2
2  s2 
N n 1

 Standard deviation
The square root of the Variance
( x   ) 2
2 ( x  x ) 2
   s 2
s 
N n 1
Measures of Dispersion:
Population Variance and Standard Deviation

Example
You grow 5 carrots in your backyard and
measure the length of each carrot in
centimeters. Here is your data:
9 7 5 4 12

Calculate variance, standard deviation and


range
Measures of Dispersion:
Population Variance and Standard Deviation
Solution
2
 ( x   )
2 
N
9 7 5 4 12

Calculate the mean


x = 7.4
N
Measures of Dispersion:
Population Variance and Standard Deviation
Solution  
( x   ) 2
2

Mean = 7.4 N

Length of Deviations: Squares:


carrots (cm) x–μ (x – μ)2
9 9 -7.4 = 1.6 (1.6) 2 = 2.56
7 7 -7.4 = -0.4 (-0.4) 2 = 0.16
5 5 -7.4 = -2.4 (-2.4) 2 = 5.76
4 4 -7.4 = -3.4 (-3.4) 2 = 11.56
12 12 -7.4 = 4.60 (4.6) 2 = 21.16
Σ(x – μ) = 0 Σ(x – μ)2 = 41.2
Measures of Dispersion:
Population Variance and Standard Deviation
Solution Population Variance
2 ( x   ) 2
 
Squares: N
Length of Deviations:
41.2
carrots (cm) x–μ (x – μ) 2 𝜎 2= =8.24 𝑐𝑚2
5
9 9 -7.4 = 1.6 (1.6)2 = 2.56
7 7 -7.4 = -0.4 (-0.4)2 = 0.16 Population Standard deviation
2
( x   )
  2 
5 5 -7.4 = -2.4 (-2.4)2 = 5.76 N
4 4 -7.4 = -3.4 (-3.4)2 = 11.56 8.24 = 2.87 cm
12 12 -7.4 = 4.60 (4.6)2 = 21.16
Σ(x – μ) = 0 Σ(x – μ)2 = 41.2
Range
Max value – Min value
12-4 = 8
Measures of Dispersion:
Sample Variance and Standard Deviation
Example
Lets assume we have the following sample
data: 12 18 7 10

Calculate variance, standard deviation and


range
Measures of Dispersion:
Sample Variance and Standard Deviation
2
Solution  ( x  x )
s2 
n 1
n= ?
? 𝒙

x x –  ̅ ሺ࢞࢏ െ࢞ഥሻ૛
12
18
7
10
( x  x ) 2
s2 
n 1

Measures of Dispersion:
Sample Variance and Standard Deviation 2
 ( x  x )
Solution s2 
n 1
n= 4 𝒙

x x –  ̅ ሺ࢞࢏ െ࢞ഥሻ૛
12 0.25 0.0625 n-1 𝒙 3
18 6.25 39.0625 mean 11.7500
7 -4.75 22.5625 Σ(x –  ̅) 0
10 -1.75 3.0625 Σ ሺ࢞࢏ െ࢞ഥሻ૛ 64.7500
Σ ሺ࢞࢏ െ࢞ഥሻ૛/n-1 21.583
Variance = 21.583
Std 4.646
Range = 18-7 =11
Unit 2
Correlation
 Correlation is an association or relationship
between two quantitative variables [NOTE:
Correlation is not (does not imply) causation!!!]

 Two variables are said to be correlated if changes


in one variable are associated with changes the
other variable.

 Correlation can be assessed both graphically and


numerically.
Assessing Correlation Graphically -
Scatterplots
It is used to examine:
 If there is a linear (straight line) relationship
between the quantitative variables
 The direction of the relationship (positive or
negative), if any
 The strength of the relationship (are the plots
closely clustered or dispersed), if any
 If there is any noticeable deviations (outliers) from
the relationship.
How to construct Scatterplots
Example:

The local ice cream shop keeps track of how much ice cream
they sell versus the noon temperature on that day. Here are
their figures for the last 12 days:
Ice Cream Sales vs Temperature
Temperature °C Ice Cream Sales
14.2° $215
16.4° $325
11.9° $185
15.2° $332
18.5° $406
22.1° $522
19.4° $412
25.1° $614
23.4° $544
18.1° $421
22.6° $445
17.2° $408
How to construct Scatterplots
Interpreting scatterplots
Direct Positive
Linear Relationship

Data Correlation Direct Negative Linear


Relationship

No Relationship
Outliers
Assessing Correlation Numerically –
Correlation Coefficient

 An objective numerical measure of correlation.


 It provides a quantitative measure of the strength of
the relationship between two variables.
 The coefficient can take any values from -1 to 1.
Assessing Correlation Numerically –
Correlation Coefficient
The correlation coefficient, denoted by r, can take on
any value from -1 to 1:

𝒓𝒓 = -1 indicates a perfect negative linear relationship

 -1 <𝒓𝒓< 0 indicates a negative linear relationship

𝒓𝒓 = 0 indicates no linear relationship

 0 <𝒓𝒓 < 1 indicates a positive linear relationship

𝒓𝒓 = 1 indicates a perfect positive linear relationship


Assessing Correlation Numerically –
Correlation Coefficient

Pearson Correlation Coefficient:

Alternative formula
Assessing Correlation Numerically –
Correlation Coefficient
Lets talk direction and strength
Assessing Correlation Numerically –
Correlation Coefficient
Example
Use the information in the table below to calculate the
correlation coefficient.

࢞ ࢟
17 150
15 154
19 169
17 172
21 175
89 820
Assessing Correlation Numerically –
Correlation Coefficient
Solution
Use the information in the table below to calculate the
correlation coefficient.

࢞ ࢟ ࢞ൈ
࢟ ࢞૛ ࢟૛
17 150
15 154
19 169
17 172
21 175
Assessing Correlation Numerically –
Correlation Coefficient
Solution
Use the information in the table below to calculate the
correlation coefficient.

࢞ ࢟ ࢞ൈ
࢟ ࢞૛ ࢟૛
17 150 N=?
15 154
19 169
17 172
21 175
Assessing Correlation Numerically –
Correlation Coefficient
Solution
Use the information in the table below to calculate the
correlation coefficient.

࢞ ࢟ ࢞ൈ
࢟ ࢞૛ ࢟૛
17 150 N=5
15 154
19 169
17 172
21 175
Assessing Correlation Numerically –
Correlation Coefficient
Solution
Use the information in the table below to calculate the
correlation coefficient.

࢞ ࢟ ࢞ൈ
࢟ ࢞૛ ࢟૛
17 150 N=5

∑ 𝑥=?
15 154
19 169
17 172

∑ 𝑦=?
21 175
Assessing Correlation Numerically –
Correlation Coefficient
Solution
Use the information in the table below to calculate the
correlation coefficient.

࢞ ࢟ ࢞ൈ
࢟ ࢞૛ ࢟૛
17 150
N=5

∑ 𝑥=89
15 154
19 169
17 172

∑ 𝑦=820
21 175
89 820
Assessing Correlation Numerically –
Correlation Coefficient
Solution
Use the information in the table below to calculate the
correlation coefficient.

࢞ ࢟ ࢞ൈ
࢟ ࢞૛ ࢟૛
17
15
19
150
154
169
∑ 𝑥=89
N=5
∑ 𝑦=82
∑ (𝑥×𝑦)=?
17 172
21 175
89 820
Assessing Correlation Numerically –
Correlation Coefficient
Solution
Use the information in the table below to calculate the
correlation coefficient.

࢞ ࢟ ࢞ൈ
࢟ ࢞૛ ࢟૛
17
15
150
154
2,550
2,310
∑ 𝑥=89
N=5
∑ 𝑦=82
19
17
21
169
172
175
3,211
2,924
3,675
∑ (𝑥×𝑦)=14,670
89 820 14,670
Assessing Correlation Numerically –
Correlation Coefficient
Solution
Use the information in the table below to calculate the
correlation coefficient.

࢞ ࢟ ࢞ൈ
࢟ ࢞૛ ࢟૛
17
15
150
154
2,550
2,310
∑ 𝑥=89
N=5
∑ 𝑦=82
19
17
169
172
3,211
2,924 ∑ (𝑥×𝑦)=14,670
21 175 3,675

∑ 𝑥 =? ∑ 𝑦 =?
89 820 14,670
2 2
Assessing Correlation Numerically –
Correlation Coefficient
Solution
Use the information in the table below to calculate the
correlation coefficient.

࢞ ࢟ ࢞ൈ
࢟ ࢞૛ ࢟૛
17 150 2,550 289 22,500 ∑ 𝑥=89
N=5
∑ 𝑦=82
∑ (𝑥×𝑦)=14,670
15 154 2,310 225 23,716
19 169 3,211 361 28,561
17 172 2,924 289 29,584
21 175 3,675 441 30,625 1,605
89 820 14,670 1,605 134,986
134,986
Assessing Correlation Numerically –
Correlation Coefficient
Solution

࢞ ࢟ ࢞ൈ
࢟ ࢞૛ ࢟૛
17 150 2,550 289 22,500
15 154 2,310 225 23,716
19 169 3,211 361 28,561
17 172 2,924 289 29,584
21 175 3,675 441 30,625
89 820 14,670 1,605 134,986
Assessing Correlation Numerically –
Correlation Coefficient
Solution using alternative formula

࢞ ࢟ ࢞࢏ െ࢞
ഥ ࢟࢏ െ࢟
ഥ ෍ ሺ࢞࢏ െ
࢞ഥሻሺ࢟࢏ െ࢟
ഥሻ ሺ࢞࢏ െഥሻ૛ ሺ࢟࢏ െ
࢞ ഥሻ૛

17 150 -0.8 -14 11.2 0.64 196


15 154 -2.8 -10 28 7.84 100
19 169 1.2 5 6 1.44 25
17 172 -0.8 8 -6.4 0.64 64
21 175 3.2 11 35.2 10.24 121
74.0 20.8 506.0
Assessing Correlation Numerically –
Correlation Coefficient
Facts About Correlation
 The order of variables in a correlation is not important.
 Correlations provide evidence of association, not
causation.
 r has no units
 Positive r values indicate positive association between
the variables, and negative r values indicate negative
associations.
 The correlation r is always a number between -1 and 1.
Simple Linear Regression
Analysis
Simple Linear Regression Analysis

 Regression model is a mathematical equation


that describes the relationship between two or
more variables. A simple regression model
includes only two variables: one independent
and one dependent.
 The dependent variable is the one being
explained/predicted; and
 The independent variable is the one used to
explain the variation in the dependent variable.
• Ordinary least squares is the most common type
of linear regression.
Simple Linear Regression Analysis
What factors or variables does a household
consider when deciding how much money it should
spend on food?

 Income
 Size of household
 Preferences and taste

Independent variable =?
Dependent Variable =?
Simple Linear Regression: Equation

 Population : Y   0  1X  

 Sample: regression line

Where
Y = dependent variable (response)
X = independent variable (predictor or explanatory)
0 = intercept (value of Y when X = 0)
1 = slope of the regression line
e = random error (unexplained variation)
= the estimated or predicted value of y based on regression
model
Simple Linear Regression: Estimating
Alpha & Beta

^
𝒚 =𝜶 +𝒃 𝒙 Slope coefficient:

y - intercept:
Or

Or
Simple Linear Regression: Estimating
Example:
A sample of households was taken from a small city and information
on their incomes and food expenditures is displayed in the table
below (in hundreds of dollars: find the values of and for the
regression model
Income Food Expenditure
55 14
83 24
38 13
61 16
33 9
49 15
67 17
Simple Linear Regression: Estimating
Solution:
What is the x variable & Y variable?
What is n?
Income Food Expenditure
^𝒚 =𝜶+𝒃 𝒙 55 14
83 24
38 13
𝛼=𝑦 − 𝑏 𝑥 61 16
33 9
𝒏 ( ∑ 𝑥𝑦 ) − ( ∑ 𝑥 )( ∑ 𝑦 ) 49 15
𝑏= 2 67 17
𝒏 ∑ 𝑥 2 − (∑ 𝑥 )
Simple Linear Regression: Estimating
𝛼=𝑦 − 𝑏 𝑥
Solution:
𝒏 ( ∑ 𝑥𝑦 ) − ( ∑ 𝑥 ) ( ∑ 𝑦 )
𝑏= 2
𝒏 ∑ 𝑥 − (∑ 𝑥 )
2

Food
Income Expenditure
࢞ ࢞ൈ
࢟ ࢞૛ ࢟૛

1 55 14
2 83 24
3 38 13
4 61 16
5 33 9
6 49 15
7 67 17
Simple Linear Regression: Estimating
𝛼=𝑦 − 𝑏 𝑥
Solution:
𝒏 ( ∑ 𝑥𝑦 ) − ( ∑ 𝑥 ) ( ∑ 𝑦 )
𝑏= 2
𝒏 ∑ 𝑥 − (∑ 𝑥 )
2

Food
Income Expenditure ࢞ൈ
࢟ ࢞૛ ࢟૛
࢞ ࢟
55 14 770 3,025 196
83 24
38 13
61 16
33 9
49 15
67 17
386 108
Simple Linear Regression: Estimating
𝛼=𝑦 − 𝑏 𝑥
Solution:
𝒏 ( ∑ 𝑥𝑦 ) − ( ∑ 𝑥 ) ( ∑ 𝑦 )
𝑏= 2
𝒏 ∑ 𝑥 − (∑ 𝑥 )
2

Food
Income Expenditure
࢞ ࢞ൈ
࢟ ࢞૛ ࢟૛

55 14 770 3,025 196
83 24 1,992 6,889 576
38 13 494 1,444 169
61 16 976 3,721 256
33 9 297 1,089 81
49 15 735 2,401 225
67 17 1,139 4,489 289
386 108 6,403 23,058 1,792
Simple Linear Regression: Estimating
𝛼=𝑦 − 𝑏 𝑥
Solution:
𝒏 ( ∑ 𝑥𝑦 ) − ( ∑ 𝑥 ) ( ∑ 𝑦 )
𝑏= 2
𝒏 ∑ 𝑥 − (∑ 𝑥 )
2

Food
Income Expenditure 7 ( 6,403 ) −(386 ×108)

࢞ൈ
࢟ ࢞૛ ࢟૛ 𝑏= =¿
࢟ 7 ( 23,058 ) −386
2

55 14 770 3,025 196


83 24 1,992 6,889 576
38 13 494 1,444 169
44,821− 41,688
𝑏= =¿
61 16 976 3,721 256 161,406 − 148,996
33 9 297 1,089 81
49 15 735 2,401 225
3,133
67 17 1,139 4,489 289 𝑏= =0.2525
386 108
12,410
6,403 23,058 1,792
Simple Linear Regression: Estimating
𝛼=𝑦 − 𝑏 𝑥
Solution: ALTERNATIVE FORMULA 𝑏=∑
(𝒙 ¿¿ 𝒊 − 𝒙)(𝒚 𝒊 − 𝒚 ¿)  
¿¿
∑ (𝒙 𝒊 − 𝒙)𝟐

࢞ ࢟ ࢞࢏ െ࢞
ഥ ࢟࢏ െ࢟
ഥ ෍ ሺ࢞࢏ െ
࢞ഥሻሺ࢟࢏ െ࢟
ഥሻ ሺ࢞࢏ െഥሻ૛ ሺ࢟࢏ െ
࢞ ഥሻ૛

55 14 -0.143 -1.429 0.204 0.020 2.041 Mean(x) 55.1429


83 24 27.857 8.571 238.776 776.020 73.469 mean(y) 15.4286
38 13 -17.143 -2.429 41.633 293.878 5.898
61 16 5.857 0.571 3.347 34.306 0.327
33 9 -22.143 -6.429 142.347 490.306 41.327
49 15 -6.143 -0.429 2.633 37.735 0.184
67 17 11.857 1.571 18.633 140.592 2.469
447.5714 1,772.9 125.7 coefficent (b) 0.2525

447.6
𝑏= =0.2525
1,772.9
Simple Linear Regression: Estimating
Solution: 𝑏=0.2525

𝜶=𝒚 −𝒃 𝒙
Food
Income Expenditure
࢞ ࢞ൈ
࢟ ࢞૛ ࢟૛

55 14 770 3,025 196
83 24 1,992 6,889 576 What is the mean of the
38 13 494 1,444 169 X and Y variables?
61 16 976 3,721 256
33 9 297 1,089 81
49 15 735 2,401 225
67 17 1,139 4,489 289
386 108 6,403 23,058 1,792
Simple Linear Regression: Estimating
Solution: 𝑏=0.2525

𝜶=𝒚 −𝒃 𝒙
Food
Income Expenditure
࢞ ࢞ൈ
࢟ ࢞૛ ࢟૛

55 14 770 3,025 196 𝒚=𝟏𝟓.𝟒𝟐𝟖𝟔
83 24 1,992 6,889 576
38 13 494 1,444 169 𝒙=𝟓𝟓. 𝟏𝟒𝟐𝟗
61 16 976 3,721 256
33 9 297 1,089 81
49 15 735 2,401 225
67 17 1,139 4,489 289
386 108 6,403 23,058 1,792
Simple Linear Regression: Estimating
Solution: 𝑏=0.2525

𝛼=𝑦 − 𝑏 𝑥

Food
Income Expenditure ࢞ൈ
࢟ ࢞૛ ࢟૛ 𝒚=𝟏𝟓.𝟒𝟐𝟖𝟔
࢞ ࢟
55 14 770 3,025 196 𝒙=𝟓𝟓. 𝟏𝟒𝟐𝟗
83 24 1,992 6,889 576
38 13 494 1,444 169 15.4286 - 0.2525(55.1429)
61 16 976 3,721 256
33 9 297 1,089 81 𝛼=15.4286 −13.9212
49 15 735 2,401 225
67 17 1,139 4,489 289 𝛼=1.5073
386 108 6,403 23,058 1,792
Simple Linear Regression: Estimating
Solution: ^
𝒚 = 𝜶 +𝒃 𝒙

Food
Income Expenditure
𝛼=1.5073
࢞ ࢞ൈ
࢟ ࢞૛ ࢟૛

55 14 770 3,025 196 𝑏=0.2525
83 24 1,992 6,889 576
38 13 494 1,444 169
61 16 976 3,721 256
33 9 297 1,089 81 ^
𝑦 =1.5073+0.2525 𝑥
49 15 735 2,401 225
67 17 1,139 4,489 289
386 108 6,403 23,058 1,792

Income
Simple Linear Regression: Prediction

• We can find the predicted value of y for any specific value of x.


Income

Lets assume we selected a household whose monthly income is


$6100. Forecasted food expenditure = ?

^
𝐹𝑜𝑜𝑑 𝐸𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒=1.5073+ 0.2525 ( 61 )
¿𝟏𝟔.𝟗𝟎𝟕𝟑 𝒉𝒖𝒏𝒅𝒓𝒆𝒅=$𝟏,𝟔𝟗𝟎.𝟕𝟐𝟓𝟐
On average, all households with a monthly income of $6100 spend approximately
$1,690.7252 per month on food.
Simple Linear Regression: Prediction
Income

Calculate the forecasted value for food expenditure when income


is $3200 and $9800

^
𝐹𝑜𝑜𝑑 𝐸𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒=1.5073+ 0.2525 ( 32 )=𝟗 . 𝟓𝟖𝟔𝟎=$ 𝟗𝟓𝟖 .𝟓𝟗𝟕𝟗

^
𝐹𝑜𝑜𝑑 𝐸𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒=1.5073+ 0.2525 ( 98 ) =𝟐𝟔 . 𝟐𝟒𝟖𝟖=$ 𝟐 , 𝟔𝟐𝟒 . 𝟖𝟏𝟖𝟕
Coefficient of Determination ()

 Determines the percent of the variation in the y variable


explained by the x-variables
 how well the regression model fits the observed data.
 The square of r – simple linear model
 a high R2 value indicates that the model is a good fit for
the data, vice versa
 The range is 0 to 1
Simple Linear Regression: Assumptions
 The regression model is linear in parameters
 The residuals are independent – primarily affects time series data
 (No autocorrelation)

 Constant variance of residuals


 (Homoscedasticity)

 Normality: The residuals of the model are normally distributed.


The End!

You might also like