Introductory Econometrics Week 2 Key Concepts

Introductory Econometrics Week 2
Muhammad Yudha Pratama
2022-09-17
Linear Regression Concepts

Best Linear Unbiased Estimator (BLUE)
• The OLS regression is BLUE if the Gauss-Markov Assumptions hold:
• Linear in Parameters, Random Sampling, Sample variation in Explanatory Variables, Zero Conditional
mean, Homoskedasticity and Normality of error
• Best is another word for efficient estimator, meaning that the standard error of the estimated parameter
β ′ s is at the minimum.
• Unbiased means that the estimation is true in a sense of repeated estimation.
E(β̂) = β
Biased Estimator
Biased estimation occurs when the estimated parameter β ′ s is systematically different from the true parameter
β. Mathematically, it is denoted by E(β̂) ̸= β. The biased estimator is represented as the black-line, while
the unbiased estimator is depicted in the red-line. To understand this concept, we must treat β̂ as a
random variable, that is, if we conduct sampling and estimation repeatedly, the resulting β̂ ′ s will be vary.
0.4
0.3
density
0.2
0.1
0.0
−2 0 2 4
Estimated Beta
1
Efficient vs Inefficient Estimator
An estimator might be unbiased, but it can be inefficient. An inefficient estimator has a higher standard error,
0.4
0.3
density
0.2
0.1
0.0
0 5
Estimated Beta
which makes the statistical decision becomes more difficult.
Monte Carlo Simulations

All assumptions hold
Let there be no violations in the OLS assumptions.
# Generate the 1000 sample data in a list format. Each list consists of 100 observations
all_sample <- list()
true_slope <- 2
for(i in 1:1000) {
set.seed(5*i)
new_element <- tibble(
err = 36 * rnorm(100), #Normally distributed error and homoskedastic standard deviation
x = 9 * rnorm(100), #No effects from the error term
y = 3 + true_slope * x + err)
all_sample[[length(all_sample) + 1]] <- new_element
}
2
100
75
count
50
25
1 2 3
betas
We have a BLUE estimator.
Zero conditional mean assumption does not hold

# Generate the 1000 sample data in a list format. Each list consists of 100 observations
all_sample <- list()
true_slope <- 2
for(i in 1:1000) {
set.seed(5*i)
new_element <- tibble(
err = 36 * rnorm(100),
x = 9 * rnorm(100) + 0.1 * err, #the independent variable is affected by the error
y = 3 + true_slope * x + err)
all_sample[[length(all_sample) + 1]] <- new_element
}
3
90
60
count
30
2 3 4
betas
We have a Biased estimator.
Goodness of Fit
• R-squared implies the goodness-of-fit of the estimator: How much the variation in the dependent
variable could be explained by the model?
SSR
R2 = 1 −
SST
where
− ȳ)2
P
• SST : sum of squared total with (yi P
• SSR: sum of squared regression with (ŷi − ȳ)2
• SSE: sum of squared error with (yi − ŷi )2
P
Interval Estimation and Hypothesis Testing

Recall the t-statistics:
β̂ − β
t=
SE(β̂)
We can also construct the confidence interval:
β̂ ± tα/2 ∗ SE(β̂)
4
Exercises
One
##
(Potential) Answer
a. Jenis data cross-section: setiap observasi memiliki satu data hasil pengamatan.
b. Interpretasi β̂0 adalah intercept, sehingga bisa logis dan bisa juga tidak. Dalam hal ini, β̂0 mencerminkan
jumlah TPTKW (Y ) ketika presentase jumlah penduduk wanita (X) adalah nol. Untuk interpretasi
β̂1 : kenaikan presentase jumlah penduduk wanita sebanyak 1 percentage poin, meningkatkan TPTKW
sebanyak 0.656 percentage poin.
βˆ1 −1 0.656−1
c. Uji signifikansi bisa dihitung dengan menggunakan t-test: t = SE(βˆ1 )
= 0.1961 = −1.75. Sehingga
β̂1 tidak signifikan secara statistik dengan level signifikansi 95%. Asumsi yang diperlukan: Normality
error.
5
Two
##
(Potential) Answer
• a. OLS Estimation
data <- tibble(M = c(21, 24, 26, 27, 28, 29, 30, 33, 35, 37, 39),
Y = c(81, 95, 103, 110, 114, 117, 121, 134, 139, 150, 156)
)
lm <- lm(M ~ Y, data)
summary(lm)
##
## Call:
## lm(formula = M ~ Y, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5000 -0.2341 -0.1363 0.3023 0.5137
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.000205 0.645326 1.55 0.156
## Y 0.240907 0.005289 45.55 5.93e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3863 on 9 degrees of freedom
## Multiple R-squared: 0.9957, Adjusted R-squared: 0.9952
## F-statistic: 2074 on 1 and 9 DF, p-value: 5.932e-12
• b. Hypothesis testing
Hasil tes statistik t:
coefficients(lm)[2]/coef(summary(lm))[, "Std. Error"][2]
6
## Y
## 45.54533
Pendapatan Y berpengaruh positif terhadap permintaan uang M
• c. Rescaling effect: Variabel M dirubah menjadi milyar rupiah (sehingga dikalikan dengan 1000).
Efeknya akan mengubah koefisien β0 dan β1 , dengan dikalikan 1000 juga. Kesimpulan Statistical
significance akan tidak berubah.
data2 <- tibble(Y = data$Y, M = data$M*1000)
lm2 <- lm(M ~ Y, data2)
summary(lm2)
##
## Call:
## lm(formula = M ~ Y, data = data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -500.0 -234.1 -136.3 302.3 513.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1000.205 645.326 1.55 0.156
## Y 240.907 5.289 45.55 5.93e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 386.3 on 9 degrees of freedom
## Multiple R-squared: 0.9957, Adjusted R-squared: 0.9952
## F-statistic: 2074 on 1 and 9 DF, p-value: 5.932e-12
Stata Exercises

Introductory Econometrics Week 2 Key Concepts

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introductory Econometrics Week 2 Key Concepts

Uploaded by

Copyright:

Available Formats

Introductory Econometrics Week 2

Muhammad Yudha Pratama

Linear Regression Concepts

Monte Carlo Simulations

Zero conditional mean assumption does not hold

Interval Estimation and Hypothesis Testing

We can also construct the confidence interval:

You might also like