You are on page 1of 7

Introductory Econometrics Week 2

Muhammad Yudha Pratama

2022-09-17

Linear Regression Concepts


Best Linear Unbiased Estimator (BLUE)
• The OLS regression is BLUE if the Gauss-Markov Assumptions hold:
• Linear in Parameters, Random Sampling, Sample variation in Explanatory Variables, Zero Conditional
mean, Homoskedasticity and Normality of error
• Best is another word for efficient estimator, meaning that the standard error of the estimated parameter
β ′ s is at the minimum.
• Unbiased means that the estimation is true in a sense of repeated estimation.
E(β̂) = β

Biased Estimator
Biased estimation occurs when the estimated parameter β ′ s is systematically different from the true parameter
β. Mathematically, it is denoted by E(β̂) ̸= β. The biased estimator is represented as the black-line, while
the unbiased estimator is depicted in the red-line. To understand this concept, we must treat β̂ as a
random variable, that is, if we conduct sampling and estimation repeatedly, the resulting β̂ ′ s will be vary.

0.4

0.3
density

0.2

0.1

0.0

−2 0 2 4
Estimated Beta

1
Efficient vs Inefficient Estimator
An estimator might be unbiased, but it can be inefficient. An inefficient estimator has a higher standard error,

0.4

0.3

density
0.2

0.1

0.0

0 5
Estimated Beta
which makes the statistical decision becomes more difficult.

Monte Carlo Simulations


All assumptions hold
Let there be no violations in the OLS assumptions.
# Generate the 1000 sample data in a list format. Each list consists of 100 observations
all_sample <- list()
true_slope <- 2
for(i in 1:1000) {
set.seed(5*i)
new_element <- tibble(
err = 36 * rnorm(100), #Normally distributed error and homoskedastic standard deviation
x = 9 * rnorm(100), #No effects from the error term
y = 3 + true_slope * x + err)
all_sample[[length(all_sample) + 1]] <- new_element
}

2
100

75
count

50

25

1 2 3
betas
We have a BLUE estimator.

Zero conditional mean assumption does not hold


# Generate the 1000 sample data in a list format. Each list consists of 100 observations
all_sample <- list()
true_slope <- 2
for(i in 1:1000) {
set.seed(5*i)
new_element <- tibble(
err = 36 * rnorm(100),
x = 9 * rnorm(100) + 0.1 * err, #the independent variable is affected by the error
y = 3 + true_slope * x + err)
all_sample[[length(all_sample) + 1]] <- new_element
}

3
90

60
count

30

2 3 4
betas
We have a Biased estimator.

Goodness of Fit
• R-squared implies the goodness-of-fit of the estimator: How much the variation in the dependent
variable could be explained by the model?
SSR
R2 = 1 −
SST
where
− ȳ)2
P
• SST : sum of squared total with (yi P
• SSR: sum of squared regression with (ŷi − ȳ)2
• SSE: sum of squared error with (yi − ŷi )2
P

Interval Estimation and Hypothesis Testing


Recall the t-statistics:
β̂ − β
t=
SE(β̂)

We can also construct the confidence interval:

β̂ ± tα/2 ∗ SE(β̂)

4
Exercises
One

##

(Potential) Answer
a. Jenis data cross-section: setiap observasi memiliki satu data hasil pengamatan.
b. Interpretasi β̂0 adalah intercept, sehingga bisa logis dan bisa juga tidak. Dalam hal ini, β̂0 mencerminkan
jumlah TPTKW (Y ) ketika presentase jumlah penduduk wanita (X) adalah nol. Untuk interpretasi
β̂1 : kenaikan presentase jumlah penduduk wanita sebanyak 1 percentage poin, meningkatkan TPTKW
sebanyak 0.656 percentage poin.
βˆ1 −1 0.656−1
c. Uji signifikansi bisa dihitung dengan menggunakan t-test: t = SE(βˆ1 )
= 0.1961 = −1.75. Sehingga
β̂1 tidak signifikan secara statistik dengan level signifikansi 95%. Asumsi yang diperlukan: Normality
error.

5
Two

##

(Potential) Answer
• a. OLS Estimation
data <- tibble(M = c(21, 24, 26, 27, 28, 29, 30, 33, 35, 37, 39),
Y = c(81, 95, 103, 110, 114, 117, 121, 134, 139, 150, 156)
)
lm <- lm(M ~ Y, data)
summary(lm)

##
## Call:
## lm(formula = M ~ Y, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5000 -0.2341 -0.1363 0.3023 0.5137
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.000205 0.645326 1.55 0.156
## Y 0.240907 0.005289 45.55 5.93e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3863 on 9 degrees of freedom
## Multiple R-squared: 0.9957, Adjusted R-squared: 0.9952
## F-statistic: 2074 on 1 and 9 DF, p-value: 5.932e-12
• b. Hypothesis testing
Hasil tes statistik t:
coefficients(lm)[2]/coef(summary(lm))[, "Std. Error"][2]

6
## Y
## 45.54533
Pendapatan Y berpengaruh positif terhadap permintaan uang M
• c. Rescaling effect: Variabel M dirubah menjadi milyar rupiah (sehingga dikalikan dengan 1000).
Efeknya akan mengubah koefisien β0 dan β1 , dengan dikalikan 1000 juga. Kesimpulan Statistical
significance akan tidak berubah.
data2 <- tibble(Y = data$Y, M = data$M*1000)
lm2 <- lm(M ~ Y, data2)
summary(lm2)

##
## Call:
## lm(formula = M ~ Y, data = data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -500.0 -234.1 -136.3 302.3 513.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1000.205 645.326 1.55 0.156
## Y 240.907 5.289 45.55 5.93e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 386.3 on 9 degrees of freedom
## Multiple R-squared: 0.9957, Adjusted R-squared: 0.9952
## F-statistic: 2074 on 1 and 9 DF, p-value: 5.932e-12

Stata Exercises

You might also like