You are on page 1of 51

Week 3:Hypothesis testing and inference

1
Essence of hypothesis testing
• A hypothesis is a conjecture (claim) about an unknown population parameter(s).

• Hypothesis tests are procedures for comparing conjectures that we might have about the
regression parameters to the parameter estimates we have obtained from a sample of data.

• Hypothesis tests allow us to say that the data are compatible, or are not compatible, with a
particular conjecture or hypothesis.
• Testing a hypothesis means comparing whether the difference between the estimated
coefficient (in a particular sample regression function) is sufficiently different or not from the
claimed value(s) of the unknown population parameter(s) i.e. given empirical estimates of
regression parameters, we want to make inferences about the population from which the
data were obtained
2
Essence of hypothesis testing
• A null hypothesis is the belief we will maintain until we are convinced by the sample evidence
that it is not true, in which case we reject the null hypothesis- e.g. ‘innocent until proven guilty’

• Paired with every null hypothesis is a logical alternative hypothesis H1 that we will accept if the

null hypothesis is rejected. The alternative hypothesis is flexible and depends to some extent
on economic theory or on general intuition or empirical consensus.

• Logically, inequality alternative hypotheses (e.g. or some value) are widely used in economics
because economic theory frequently provides information about the signs of relationships
between variables.
3
Interval estimation
• The question of how much and actually tell us about population
parameters, and , remains. This is the question of inference.
• We have yet to examine whether the clusters that we might expect for and , as given
by their variances, tell us enough about where and might be to be of practical value
• Inference is the name that we give to this examination, because we are attempting to
“infer” something relatively specific about the parameter values from the sample
information
• Confidence intervals give us ranges that contain the parameters with prespecified
degrees of certainty.
• They are more useful if they are narrower but this depends on the standard error of
the estimate in question

4
Interval estimation
• Difference between point and interval estimators
• Point estimates are individual values that are our single best guesses of
and
• point estimates are almost surely wrong, to some degree because there is a
high probability of them missing the true value of the parameter
• Interval estimation provides a fallback position against the biases of point
estimation
• Inference is, therefore, the construction of interval estimators

5
Interval estimation
• Basics building blocks of interval estimation:
• Assume an unbiased sample statistic is “” and the population
parameter is “”. This means
• In addition suppose the true/theoretical variance of “” is
• The estimated standard deviation (standard error) is thus
• We can further assume that is normally distributed with expected
value, , and variance i.e.

6
Interval estimation
• The standardised Z
• Normal random variables with expected values of zero and
variances of one are called standard normal random variables
• we can use the normal distribution to make probabilistic statements
about β1
• varies only because and its standard error can vary from sample to
sample
• we don’t know the true value of because it’s a parameter of the
distribution of . All we have is an estimate of from our sample

7
Interval estimation (test of significance
approach)
• To account for the two sources of variation just mentioned, we adjust the
distribution of from a normal distribution to a t distribution
• Hence,

• A t-distribution describes the standardised distances of sample means to the


population mean when the population standard deviation is not known, and the
observations come from a normally distributed population

8
Interval estimation (test of significance
approach)
• When you calculate a t-test or t-score, you are checking if your test statistic is a
more extreme value than expected from the t-distribution.
• Here t means t-distribution and df means degrees of freedom (df=n-k)
• NB: is the value of under H0
• We are now ready to engage in a test of significance.
• We can formally state it as follows:

9
10
Test of significance
• is the probability that has to be excluded from our interval estimates
• is a random variable which can take any value between +/- infinity
• The value for the t-distribution of that leaves half of the omitted probability in the
upper tail and half in the lower tail is
• is a specific value and not a random variable
• Thus, the symbol represents a random variable. The same symbol, with a
subscript as in the case of , represents a constant, a particular point in the
distribution of the random variable
• NB: so, the probability of constructing an interval that contains β1 is 1 − α.

11
An example- test-of-significance approach:
Source SS df MS Number of obs = 51,100
F(1, 51098) = 3184.45
Model 4308.72289 1 4308.72289 Prob > F = 0.0000
Residual 69138.1157 51,098 1.35304935 R-squared = 0.0587
Adj R-squared = 0.0586
Total 73446.8386 51,099 1.43734395 Root MSE = 1.1632

lnearnings Coef. Std. Err. t P>|t| [95% Conf. Interval]

lnedu .4952663 .0087765 56.43 0.000 .4780643 .5124684


_cons 6.950634 .0225446 308.31 0.000 6.906446 6.994821
12
Interpretation:
H0: β1 = β*1 = 0.0
H1: β1 ≠ 0.0

= 56.43

T (critical value)= 1.96 (α=0.05; two-tailed)


se() = 0.0087765
= 0.4952663
NB: the calculated t statistic can also be negative
13
14
15
16
Interval estimation (Confidence intervals)

( )
^
𝛽1 − 𝛽 1

( )
( 𝑑𝑓 )
^
𝛽1 − 𝛽 1 𝑃𝑟 >𝑡 𝛼/ 2 =𝛼 / 2
^
𝑠𝑒 ( 𝛽 )

( )
( )
𝑃𝑟 <− 𝑡 𝛼𝑑𝑓/2 =𝛼 / 2 ^
𝛽1 − 𝛽1 1
𝑠𝑒 ( ^
𝛽 )
1 𝑃𝑟 − 𝑡
( 𝑑𝑓 )
𝛼/2 <
( 𝑑𝑓 )
<𝑡 𝛼/ 2 =1− 𝛼
^
𝑠𝑒 ( 𝛽 )
1

The central equation is the fundamental equation for interval


estimation . This interval is actually a random interval & varies
from sample to sample. Hence it is a long run (repeated sampling)
probabilistic statement.

17
Interval estimation (Confidence intervals)
• (1) rearranging.
• This confidence interval (CI) has a probability of 1- of containing the true.
In essence it states that if we repeatedly construct confidence intervals like
this one a 100 times, there is a chance that 100(1- ) of them will contain
true
• Thus the size of and of are crucial for determining the size of the interval.
• The advantage of interval estimation is that we have the liberty of choosing
an that ranges from as low as 0.1%, 0.5%, 1%, 2%, 5% etc.
• A trade off between confidence & usefulness is inevitable in the choice of .

18
An example:
Source SS df MS Number of obs = 51,100
F(1, 51098) = 3184.45
Model 4308.72289 1 4308.72289 Prob > F = 0.0000
Residual 69138.1157 51,098 1.35304935 R-squared = 0.0587
Adj R-squared = 0.0586
Total 73446.8386 51,099 1.43734395 Root MSE = 1.1632

lnearnings Coef. Std. Err. t P>|t| [95% Conf. Interval]

lnedu .4952663 .0087765 56.43 0.000 .4780643 .5124684


_cons 6.950634 .0225446 308.31 0.000 6.906446 6.994821
19
Interpretation:
• Given the confidence coefficient of 95 percent, in 95 out of 100 cases
the interval 0.478- 0.512 will contain the true β1
t= 1.96 (α=0.05; two-tailed)
se() = 0.0087765
= 0.4952663

0.4952663 + (1.96)(0.0087765)= 0.512


0.4952663 – (1.96)(0.0087765)= 0.478

20
Earnings example & confidence interval
estimation
Earnings ($000) Model (for n=50) Model (for n=300) Model (n=526)
Intercept -3.4015 -1.2879 -0.9049
Standard error (of estimate) 2.9765 1.1133 0.6850
tobs -1.1428 -1.1569 -1.3210
P-value 0.2588 0.2482 0.1871
Education (years of schooling) 0.8164 0.5840 0.5414
Standard error (of estimate) 0.2280 0.0857 0.0532
tobs 3.5801 6.8125 10.1667
P-value 0.0008 0.0000 0.0000
n 50 300 526
R^2 0.2108 0.1348 0.1648
Fobs 12.8173 (p=0.0008) 46.4097 (p=0.0000) 103.3627 (p=0.0000)
Standard error of regression 3.9587 3.7108 3.3784
21
Earnings example & confidence interval
estimation
• Construct a__________ for the coefficient of education for n=50.
• 90% CI; 95% CI; 99% CI; 99.5% CI
• Let’s do the 90% together. What do we need to construct the CI90?
• Coefficient of education, standard error of coefficient of education,
alpha = 10%, critical value of where n= 50 and k = number of
coefficients estimated in the regression n-k =50-2 = 48
• 816.4 –(1.684*228) << 816.4 + (1.684*228) => 432.45< < 1200.35
• Given a 0.9 confidence coefficient, 90 in 100 of confidence intervals
like this one will contain the true .

22
23
Confidence interval analysis: Hypothesis
testing approach
• In practice, we need to formulate a hypothesis to be able to say
something about whether true beta really falls in the interval or not
• Generally economists are interested in testing a generic hypothesis
that a variable has no effect at all. Therefore
• H0: =0 H : (for illustration purposes***) LOS = 10%
• The 90% CI is 432.45< < 1200.35
• It is clear that the point =0 lies outside the confidence interval,
therefore we have evidence to conclude that education has an effect
on earnings.

24
Hypothesis testing: point estimation
• A hypothesis test is identical to interval estimation in many ways
• Difference is in hypothesis testing we have a priori strong convictions about
the population parameter, such convictions can be from theory, empirical
consensus etc
• In hypothesis testing we use the data not for instruction but for validation of
our convictions
• We want to give convincing proof that our convictions are true.
• The objective here is to be very convincing. Therefore, we begin, essentially,
by asserting the opposite of what we really expect. We refer to this
assertion as the “null hypothesis,” represented by H0

25
Hypothesis testing: point estimation
• The critical question here is: Is close enough to the true value to be
regarded as a validation of the claim in the null hypothesis?
• We develop a range wide enough to give a wide range of acceptable
values for . We call that region the acceptance region.
• More generally, in economic analysis we are interested in testing
whether a variable has any effect at all or not. Rarely do we have
some non-zero values we have established a priori. Hence we are
interested in testing if the hypothesised value of the true parameter is
zero or not.

26
Hypothesis testing: point estimation
• Procedure for hypothesis testing
• State the null and alternative hypotheses

• State the level of significance (LOS) (i.e. )

• State critical values

• State decision criteria

• Compute the tobs

• Make a conclusion

27
Hypothesis testing: point estimation
• Lets test coefficient of education for n=50 again. We are going to test the generic
hypothesis that education has no effect on earnings.
• Human Capital Theory tells us that education has a positive effect on earnings.
• H0: Education = 0
• H1: Education ≠ 0 (for illustration purposes***)
• LOS = 10% or just 0.10
• tcritical = +/-1.684
• Decision rule: if tobs< -1.684 or > 1.684 reject H0; if -1.684<t0bs<1.684 fail to reject
• (in reality it’s , why? )
• Since > 1.684, we reject H0 and conclude that at the 10% LOS, there is evidence that
education has an effect on earnings.
28
Hypothesis testing: point estimation
• Lets test coefficient of education for n=50 again. We are going to test the generic
hypothesis that education has no effect on earnings.
• Human Capital Theory tells us that education has a positive effect on earnings.
• H0: Education = 0
• H1: Education ≠ 0 (for illustration purposes***)
• LOS = 1% or just 0.01
• tcritical = +/-2.704
• Decision rule: if tobs< -2.704 or > 2.704 reject H0; if -2.704<t0bs<2.704 fail to reject
• (in reality it’s , why? )
• Since > 2.704, we reject H0 and conclude that at the 1% LOS, there is evidence that
education has an effect on earnings.
29
30
Hypothesis testing: point estimation
• Recall we state the hypotheses as follows:
• H0: Education = 0
• H1: Education ≠ 0 (for illustration purposes***)
• The way we stated the alternative hypothesis was quite mechanical just to
illustrate how a two-sided test is carried out. Where theory is clear that the
effect of education is positive (i.e. direction of effect is unambiguous), we
cannot use “not equal to zero” in the alternative but “greater than zero”
• It does not make sense to say “not equal to” because this includes negative
and positive values, yet theory restricts the effect to positive values ONLY!
• So let’s look at look at a one sided test
31
Hypothesis testing: point estimation
• One sided test:
• H0: Education = 0 (denies theory)
• H1 : Education > 0 (as theory predicts)
• LOS = 0.10
• tcritical = +1.303 (upper tail test)
• Decision rule: if tobs > 1.303 reject H0; if t0bs< 1.303 fail to reject

• Since > 1.303, we reject Ho and conclude that at the 10% LOS there is
evidence to suggest that education has a positive effect on earnings.
32
33
Test of significance approach: an alternative
• Here, we are seeking to check if the estimated beta lies within a
confidence interval of the true parameter value we specified in the H0
• H0: Education = 0 (denies theory)
• H1 : Education > 0 (as theory predicts)
• LOS = 0.10
• tcritical = +1.303 (upper tail test)
• 90% CI: 0+1.303*228 (=336.17)
• We can see that (=816.4) > 336.17, there we reject H0 and conclude that
there is evidence to suggest that education positively influence earnings.
34
Hypothesis testing is like a judicial trial
• Suspect is assumed innocent until proven guilty (null hypothesis)
• Four things can happen:
• An innocent person is acquitted, which is morally good and just
• A guilty person is not convicted which is morally terrible and unjust
• An innocent person is falsely convicted, which is by far the worst outcome
• A guilty person is convicted, which means justice is served
• To avoid convicting an innocent person (the worst outcome), the
court sets a very strict standard of evidence adequacy called “beyond
reasonable doubt” (this is the equivalence of LOS) -see next

35
Hypothesis testing is like a judicial trial
Because of this stringent test:
• Either, a guilty person is convicted, which is good
• Or, a guilty person is not convicted because the available evidence
failed to meet the appropriate standard. Sad news for offended
parties (this in statistics is called Type II error)
• Type I error is therefore convicting an innocent person. See next.

36
Type I and Type II Errors
• Example: = 0.05
• If the CI of is the probability of a random interval containing true in repeated
sampling, then is the probability of (incorrectly) rejecting a true null hypothesis.
out of 100 true hypotheses will be rejected.
• Type I error (most worrisome) is the probability of rejecting a true null
hypothesis. As the size of alpha increases, the probability of committing this error
increases too. This makes the critical region too large.
• Type II error is the probability of accepting a false null hypothesis. This happens
when we choose too small an alpha that almost the entire area under the curve is
an acceptance region
• There is a trade-off between Type I and Type II errors. As increases, the risk of a
Type I error also increases. At the same time, however, the risk of a Type II error
37
decreases.
Exact Level of Significance
• Conventionally, we use 1%, 5% or 10% levels of significance
• But on what basis are these chosen? Arbitrary!!!
• To overcome this we use exact level of significance also called the p-
value. The exact probability of rejecting the null hypothesis and so
exact probability of committing type 1 error
• The p-value is the smallest significance level at which the null
hypothesis can be rejected.

38
Statistical significance vs economic
(practical) significance
• Statistical significance assesses if we have enough statistical evidence
of rejecting or not rejecting the claim we have made about
parameters
• Economic significance looks at the size of the estimated coefficients
i.e. are they large enough to suggest a variable has a huge effect on
the dependent variable

39
Normality test
• Because the CLRM theory is build around the normality assumption, we often
have to test if our estimated residuals behave normally i.e. mimic a normal
distribution.
• Several methods are used.
• A normal probability plot (if normality is achieved, the plot must approximate a straight
line) – this is an informal test
• Jarque-Bera test of normality (formal test), which strictly speaking is a large sample test
• It combines two estimators Skewness (S) (whether the distribution is symmetrical
around the mean) and Kurtosis (K) (how flat a distribution is)
• A normal distribution is symmetrical (non-skewed) hence skewness is zero for normally
distributed variables
• Kurtosis is 3 for normally distributed variables

40
Monthly earnings (ZAR) from the 2017 LMD
. reg Q54a_monthly Q17EDUCATION if Q54a_monthly ~=.

Source SS df MS Number of obs = 51,223


F(1, 51221) = 3235.28
Model 8.6121e+11 1 8.6121e+11 Prob > F = 0.0000
Residual 1.3635e+13 51,221 266191817 R-squared = 0.0594
Adj R-squared = 0.0594
Total 1.4496e+13 51,222 282999818 Root MSE = 16315

Q54a_monthly Coef. Std. Err. t P>|t| [95% Conf. Interval]

Q17EDUCATION 654.1419 11.50048 56.88 0.000 631.6008 676.6829


_cons -442.9126 157.1887 -2.82 0.005 -751.0041 -134.8212

41
Graphically:
Density
1.5e-04
1.0e-04
5.0e-05
0

-20000 0 20000 40000


Residuals
42
Kernel density (probability density function)
.00015

Kernel density estimate


.00005 .0001
Density
0

-20000 0 20000 40000 60000


Residuals

Kernel density estimate


Normal density
kernel = epanechnikov, bandwidth = 445.9490
43
Normal Probability Plot
1.00
Normal F[(yhat-m)/s]
0.75
0.50
0.25
0.00

0.00 0.25 0.50 0.75 1.00


Empirical P[i] = i/(N+1)
44
Estimating the residuals:
Residuals

Percentiles Smallest
1% -14152.35 -17873.06
5% -10218.92 -17873.06
10% -6606.79 -17862.06 Obs 51,223
25% -4754.364 -17862.06 Sum of Wgt. 51,223

50% -2718.506 Mean .0000103


Largest Std. Dev. 16315.23
75% 1093.21 400442.9
90% 7759.21 400442.9 Variance 2.66e+08
95% 14593.21 400442.9 Skewness 14.52527
99% 40901.49 425926.2 Kurtosis 302.4129
45
Normality test

• It can be shown that where a sample is taken from a population with a normal distribution
then the possible values of Sk also have a normal distribution about a mean of 0 (which is
the value of Sk for a normal distribution) and a variance of 6/n. We write this as:
Sk N(0 ,6 /n)
• Similarly the possible values of the standardised measure of kurtosis Kt where a sample is
taken from a population with a normal distribution have a normal distribution with a mean
of 3 (i.e.the value of Kt for a normal distribution) and a variance of 24/n as follows:
Kt N(3,24 /n)

46
Normality test
• The test procedure is as follows
• H0: residuals are normally distributed
• H1: residuals are not normally distributed
• LOS: 5% (for example)
• Critical value: =5.99

• Decision rule: reject H0 if JB > Chi^2(2, 0.05)


• Using the earnings-education we can demonstrate as follows:
47
Normality test
• H0: residuals are normally distributed
• H1: residuals are not normally distributed
• Critical value: =5.99

• Although most software packages have a built in command for the JB


test- see next slide

48
Jarque-Bera Test
============================================
* Jarque-Bera Non Normality LM Test *
============================================

Source SS df MS Number of obs = 51,223


F(1, 51221) = 3235.28
Model 8.6121e+11 1 8.6121e+11 Prob > F = 0.0000
Residual 1.3635e+13 51,221 266191817 R-squared = 0.0594
Adj R-squared = 0.0594
Total 1.4496e+13 51,222 282999818 Root MSE = 16315

Q54a_monthly Coef. Std. Err. t P>|t| [95% Conf. Interval]

Q17EDUCATION 654.1419 11.50048 56.88 0.000 631.6008 676.6829


_cons -442.9126 157.1887 -2.82 0.005 -751.0041 -134.8212

Lagrange Multiplier Jarque-Bera Normality Test

Ho: Normality in Error Distribution


Ha: Non Normality in Error Distribution

LM Test = 1.9e+08
DF Chi2 = 2
Prob. > Chi2 = 0.00000

. 49
So now what?
• Give up and go home?
• Panic?

• ‘It is important to know the asymptotic properties or large sample properties


of estimators and test statistics. These properties are not defined for a
particular sample size; rather, they are defined as the sample size grows
without bound. Fortunately, under the assumptions we have made, OLS has
satisfactory large sample properties. One practically important finding is that
even without the normality assumption, t and F statistics have approximately
t and F distributions, at least in large sample sizes.’ (Woolridge p. 168)
• So- normality of the errors is a sufficient, but not a necessary condition for
getting good confidence intervals and hypothesis tests.
50
Or if you’re feeling a bit more energetic:
• Bootstrapping (repeated samples from the sample)
• Winsorize or trim the data (if outliers are the suspected problem)
• Transform the data (but sacrifice the ease of interpretation)
• Estimate a non-linear model

51

You might also like