You are on page 1of 18

Workshop

Quantitative Methods for


Economics and Finance I
Dr Dragana Radicic
DCB2207
dradicic@lincoln.ac.uk
Important concepts
• Central limit theorem
• Null hypothesis
• p value and hypothesis testing
• Confidence interval
• Homoscedasticity and heteroscedasticity
• Gauss Markov theorem
• Omitted variable bias
Central limit theorem
• The central limit theorem says that, under general conditions,
the distribution of 𝑌ത is well approximated by a normal
distribution when the sample size n is large even if variable Y
itself is not normally distributed.
Central limit theorem
• The central limit theorem plays a key role when hypotheses are tested
using the sample mean.
• Since the sample mean is approximately normally distributed when the
sample size is large, critical values for hypothesis tests and p-values
for test statistics can be computed using the normal distribution, of
which we know the statistical features.
Null hypothesis

• The problem facing the statistician is to use the evidence in a randomly


selected sample of data to decide whether to accept the null hypothesis
𝐻0 or to reject it in favour of the alternative hypothesis 𝐻1.
• If the null hypothesis is “accepted,” this does not mean that the
statistician declares it to be true; rather, it is accepted tentatively with the
recognition that it might be rejected later based on additional evidence.
p value
• In any given sample, the sample mean 𝑌ത will rarely be
exactly equal to the hypothesized value 𝜇𝑌,0. Differences
between these two can arise because
a) the true mean indeed does not equal 𝜇𝑌,0 (the null is false); or
b) because the true mean equals 𝜇𝑌,0 (the null is true) but 𝑌ത we
obtained differs from 𝜇𝑌,0 because of random sampling.
• It is impossible to distinguish between these two possibilities
with certainty.
p value
• Suppose that in your sample of recent college graduates, the average wage is
£38.
• The p-value in this case is the probability of observing a value of 𝑌ത at least as
different from £35 (the population mean under the null) as the observed value
of £38 by pure random sampling variation, assuming that the null hypothesis is
true.
• If this p-value is small, say 0.5%, then it is very unlikely that this sample (with
sample mean £38) would have been drawn if the null hypothesis is true; thus it
is reasonable to conclude that the null hypothesis is not true, as with a single
trial we obtain this sample.
• By contrast, if this p-value is large, say 40%, then it is quite likely that the
observed sample average of £38 could have arisen just by random sampling
variation if the null hypothesis is true; accordingly, the evidence against the
null hypothesis is weak in this probabilistic sense, and it is reasonable not to
reject the null hypothesis.
p value
• Assume that we would like to test that the population mean, 𝐸 𝑌 ,
takes on a specific value, denoted 𝜇𝑌,0 .
• Let 𝑌ത 𝑎𝑐𝑡 denote the value of the sample average actually computed in
the data set at hand.
• The p-value then is the probability of drawing a value of 𝑌ത that differs
from 𝜇𝑌,0 by at least as much as 𝑌ത 𝑎𝑐𝑡 . This can be seen from the
following formula:
p-value = Pr𝐻 𝑌ത − 𝜇𝑌,0 > 𝑌ത 𝑎𝑐𝑡 − 𝜇𝑌,0 .
0
Hypothesis Testing with a Pre-Specified Significance Level

• If you choose a pre-specified probability of rejecting the null


hypothesis when it is true (for example, 5% significance level
which is usually denoted as 𝛼), then you will reject the null
hypothesis if and only if the p-value is less than 0.05.
• The critical value of the test statistic is the value of the statistic
for which the test just rejects the null hypothesis at the given
significance level.
Confidence interval

• Due to random sampling error, it is impossible to learn the true value of


the population mean of 𝑌 using only the information in a sample.
• However, it is possible to use data from a random sample to construct a
set of values that contains the true population mean 𝜇𝑌 with a certain
pre-specified probability.
• Such a set is called a confidence set, and the pre-specified probability
that 𝜇𝑌 is contained in this set is called the confidence level.
Confidence interval
• A confidence interval contains all values of the parameter (for
example, the mean) that cannot be rejected when used as a null
hypothesis.
• Thus, it summarizes results from a very large number of hypothesis
tests, so it contains more information than the result of a single
hypothesis test.
Confidence Intervals for the Population Mean

• A 95% two-side confidence interval for 𝜇𝑌 is an interval constructed so


that it contains the true value of 𝜇𝑌 in 95% probability of all possible
random samples.


90% confidence interval: 𝑌−1.64𝑆𝐸( ത ≤ 𝜇Y ≤ 𝑌+1.64𝑆𝐸(
𝑌) ത ത
𝑌)


95% confidence interval: 𝑌−1.96𝑆𝐸( ത ≤ 𝜇Y ≤ 𝑌+1.96𝑆𝐸(
𝑌) ത ത
𝑌)


99% confidence interval: 𝑌−2.58𝑆𝐸( ത ≤ 𝜇Y ≤ 𝑌+2.58𝑆𝐸(
𝑌) ത ത
𝑌)
Homoskedasticity and heteroskedasticity

Homoscedasticity means that the variance of u, the random


error term, is unrelated to the value of X.

If the value of X is chosen using a randomized controlled


experiment, then u is homoscedastic.

Heteroscedasticity means that the variance of u, the random


error term, is related to the value of X.
True or false
• The White matrix is robust against heteroskedasticity while the
conventional OLS covariance matrix is not.
• Thus, big difference of point estimates from regressions with and
without using White matrix could be a signal of heteroskedasticity.

• False
• Big difference between the White standard errors and conventional
standard errors is a signal of heteroskedasticity.
• Point estimates however will be the same whether White matrix is
used or not.
True or false
• We worry about heteroskedasticity because the estimated coefficient is
biased when heteroskedasticity is presented.
• False.
• We worry about heteroskedasticity as the estimated standard errors are
biased standard errors.
• As we need unbiased standard errors when we construct t statistic,
heteroskedasticity leads to misleading results for hypothesis testing.
• Heteroskedasticity is not a reason for biased estimate of coefficients.
Gauss-Markov theorem
• What is the Gauss-Markov theorem for multiple regression?
• What are the conditions for the Gauss-Markov theorem to hold?
Gauss-Markov theorem

The Gauss-Markov theorem for multiple regression provides conditions under which the
OLS estimator is efficient among the class of linear conditionally unbiased estimators
(BLUE). The Gauss-Markov conditions for multiple regression are:

(1) The error term u has conditional mean zero given X : E u X = 0( )


(2) Conditional on X , the error terms ui , i = 1,2,..., n are i.i.d. draws.
(3) There is no perfect collinearity among regressors.
Omitted variable bias
• Omitted variable bias occurs when
i) the omitted variable is a determinant of the dependant variable
ii) the omitted variable is correlated with the variable of interests