You are on page 1of 39

Introduction to Regression Analysis

Assoc. Prof. Boryana Bogdanova, PhD


Let’s start with a hypothetical example
Income Groups
80 100 120 140 160 180 200 220 240 260
55 65 79 80 102 110 120 135 137 150
consumption

60 70 84 93 107 115 136 137 145 152


65 74 90 95 110 120 140 140 155 175
Family

70 80 94 103 116 130 144 152 165 178


75 85 98 108 118 135 145 157 175 180
88 113 125 140 160 189 185
115 162 191
Let’s visualize the entire population
Conditional mean value of consumption
𝑬 𝒀 𝑿 = 𝟖𝟎 =

𝟓𝟓 + 𝟔𝟎 + 𝟔𝟓 + 𝟕𝟎 + 𝟕𝟓
=
𝟓

= 𝟔𝟓

𝑬 𝒀 𝑿 = 𝟏𝟎𝟎 =

𝟔𝟓 + 𝟕𝟎 + 𝟕𝟒 + 𝟖𝟎 + 𝟖𝟓 + 𝟖𝟖
=
𝟔

= 𝟕𝟕
Population regression line (curve):
The locus of the conditional means of the
dependent variable for the fixed values of the
explanatory variable(s)

𝑬 𝒀 𝑿𝒊 = 𝜷 𝟏 + 𝜷 𝟐 𝑿𝒊

(Linear Population
Regression Function)
The Linear Population Regression Function

𝜷𝟏 = 𝟏𝟕, 𝜷𝟐 = 𝟎. 𝟔𝟎

Linearity is in parameters!
The Linear Population Regression Function

𝑬 𝒀 𝑿𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝟐𝒊
Let’s go back to the hypothetical example
Income Groups
80 100 120 140 160 180 200 220 240 260
55 65 79 80 102 110 120 135 137 150
consumption

60 70 84 93 107 115 136 137 145 152


65 74 90 95 110 120 140 140 155 175
Family

70 80 94 103 116 130 144 152 165 178


75 85 98 108 118 135 145 157 175 180
88 113 125 140 160 189 185
115 162 191
Stochastic Specification of PRF

𝒖𝒊 = 𝒀𝒊 − 𝑬 𝒀 𝑿𝒊
𝒖𝒊
𝒀𝒊 = 𝑬 𝒀 𝑿𝒊 + 𝒖𝒊

𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝒊 + 𝒖𝒊
The significance of the stochastic term

• Substitute for all the excluded/ omitted variables


form the model:
Lack of quantitative info
Small joint influence that might be treated as a
random variable
Errors of measurement
KISS
The Sample Regression function

Goal: Estimate PRF on the basis of available


sample information
Consider the following sample
80 100 120 140 160 180 200 220 240 260
65 80 84 80 110 130 145 135 155 180
Its Sample Regression Function is…

𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝒊 + 𝒆𝒊
Consider the following sample
80 100 120 140 160 180 200 220 240 260
75 88 84 115 102 130 144 152 189 150
Its Sample Regression Function is…

𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝒊 + 𝒆𝒊
Consider the following sample
80 100 120 140 160 180 200 220 240 260
75 88 94 93 116 110 145 162 165 178
Its Sample Regression Function is…

𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝒊 + 𝒆𝒊
Let’s compare PRF and SRFs
Reading

• Ch. 2 of the textbook.


Review of some core concepts in
Statistics
Populations and samples

• Population:
All members of a specified group
What is a population parameter?
• Sample:
A subset of a population
Descriptive statistics (DS)

• DS is the study of how data can be summarized


effectively to describe the important aspects of
large data sets.
• DS turns data into information.
Inference

• Inference involves making forecasts, estimates,


or judgements about a larger group from the
smaller group actually observed.
Consider the following data set
Summarize data: Visual representation
Summarize data: Visual representation
Consider transforming data
ESTIMATE
MEAN 2.88E-04
STD 0.0097
MIN -0.229
MAX 0.1096
RANGE 0.3386
SKEWNESS -1.01
KURTOSIS 30.0781
Moments: definitions
Moments: estimation
The Normal distribution
Chebyshev’s Inequality
• For any distribution with finite variance, the
proportion of the observations within 𝑘 standard
deviations of the arithmetic mean is at least
1 − 1/𝑘 2 .
Proportions from Chebyshev
Inequality
k Proportion (%)
1.25 36
1.50 56
2.00 75
2.50 84
3.00 89
4.00 94
Normal Distribution
T-distribution
Chi-square distribution

You might also like