You are on page 1of 78

Political Studies 285

Lecture 8a:
Null hypothesis testing

Winter 2023
Hypothesis testing

Hypotheses are observable implications of a theory.


Hypothesis testing

Hypotheses are observable implications of a theory.

In statistics, they are also conjectural statements about the values


of population parameters
Hypothesis testing

Hypotheses are observable implications of a theory.

In statistics, they are also conjectural statements about the values


of population parameters

Testing these hypotheses is a form of statistical inference


Hypothesis testing

Hypotheses are observable implications of a theory.

In statistics, they are also conjectural statements about the values


of population parameters

Testing these hypotheses is a form of statistical inference

Two types of hypotheses:


Hypothesis testing

Hypotheses are observable implications of a theory.

In statistics, they are also conjectural statements about the values


of population parameters

Testing these hypotheses is a form of statistical inference

Two types of hypotheses:


I Alternative hypothesis: What we expect to see if our theory
is true; usually, it’s that the parameter isn’t 0, e.g., µ 6= 0
Hypothesis testing

Hypotheses are observable implications of a theory.

In statistics, they are also conjectural statements about the values


of population parameters

Testing these hypotheses is a form of statistical inference

Two types of hypotheses:


I Alternative hypothesis: What we expect to see if our theory
is true; usually, it’s that the parameter isn’t 0, e.g., µ 6= 0
I Null hypothesis: What we expect to see if our theory is false;
usually, it’s that the parameter is 0, e.g., µ = 0
Rejecting the null

In statistics, null hypothesis tests are always performed on the


null; that’s why we call it null hypothesis testing
Rejecting the null

In statistics, null hypothesis tests are always performed on the


null; that’s why we call it null hypothesis testing

The purpose of the test is to see whether we can reject the null.
Rejecting the null

In statistics, null hypothesis tests are always performed on the


null; that’s why we call it null hypothesis testing

The purpose of the test is to see whether we can reject the null.

Rejecting the null doesn’t mean...


Rejecting the null

In statistics, null hypothesis tests are always performed on the


null; that’s why we call it null hypothesis testing

The purpose of the test is to see whether we can reject the null.

Rejecting the null doesn’t mean...


I the theory is proven
Rejecting the null

In statistics, null hypothesis tests are always performed on the


null; that’s why we call it null hypothesis testing

The purpose of the test is to see whether we can reject the null.

Rejecting the null doesn’t mean...


I the theory is proven
I the alternative hypothesis is true
Rejecting the null

In statistics, null hypothesis tests are always performed on the


null; that’s why we call it null hypothesis testing

The purpose of the test is to see whether we can reject the null.

Rejecting the null doesn’t mean...


I the theory is proven
I the alternative hypothesis is true
I or even the null hypothesis is false
Rejecting the null

In statistics, null hypothesis tests are always performed on the


null; that’s why we call it null hypothesis testing

The purpose of the test is to see whether we can reject the null.

Rejecting the null doesn’t mean...


I the theory is proven
I the alternative hypothesis is true
I or even the null hypothesis is false

It means
Rejecting the null

In statistics, null hypothesis tests are always performed on the


null; that’s why we call it null hypothesis testing

The purpose of the test is to see whether we can reject the null.

Rejecting the null doesn’t mean...


I the theory is proven
I the alternative hypothesis is true
I or even the null hypothesis is false

It means
I the null hypothesis (e.g., the parameter is 0) is unlikely and
Rejecting the null

In statistics, null hypothesis tests are always performed on the


null; that’s why we call it null hypothesis testing

The purpose of the test is to see whether we can reject the null.

Rejecting the null doesn’t mean...


I the theory is proven
I the alternative hypothesis is true
I or even the null hypothesis is false

It means
I the null hypothesis (e.g., the parameter is 0) is unlikely and
I we favour the alternative hypothesis (e.g., the parameter is
something other than 0)
P-values and null hypothesis testing

How do we know whether to reject the null?


P-values and null hypothesis testing

How do we know whether to reject the null?

It comes down to our p-value


P-values and null hypothesis testing

How do we know whether to reject the null?

It comes down to our p-value


I P-values measure the lack of fit between our null hypothesis
and our sample data
P-values and null hypothesis testing

How do we know whether to reject the null?

It comes down to our p-value


I P-values measure the lack of fit between our null hypothesis
and our sample data; it’s the probability of observing our
results or something less likely in the event the null is true
P-values and null hypothesis testing

How do we know whether to reject the null?

It comes down to our p-value


I P-values measure the lack of fit between our null hypothesis
and our sample data; it’s the probability of observing our
results or something less likely in the event the null is true
I If p is sufficiently low, the null hypothesis is likely false
P-values and null hypothesis testing

How do we know whether to reject the null?

It comes down to our p-value


I P-values measure the lack of fit between our null hypothesis
and our sample data; it’s the probability of observing our
results or something less likely in the event the null is true
I If p is sufficiently low, the null hypothesis is likely false
I If p < α (or our critical p-value or threshold), we reject the
null, and say the is result statistically significant
P-values and null hypothesis testing

How do we know whether to reject the null?

It comes down to our p-value


I P-values measure the lack of fit between our null hypothesis
and our sample data; it’s the probability of observing our
results or something less likely in the event the null is true
I If p is sufficiently low, the null hypothesis is likely false
I If p < α (or our critical p-value or threshold), we reject the
null, and say the is result statistically significant
I What is a sufficiently low p-value low?
P-values and null hypothesis testing

How do we know whether to reject the null?

It comes down to our p-value


I P-values measure the lack of fit between our null hypothesis
and our sample data; it’s the probability of observing our
results or something less likely in the event the null is true
I If p is sufficiently low, the null hypothesis is likely false
I If p < α (or our critical p-value or threshold), we reject the
null, and say the is result statistically significant
I What is a sufficiently low p-value low?
I The choice is arbitrary
P-values and null hypothesis testing

How do we know whether to reject the null?

It comes down to our p-value


I P-values measure the lack of fit between our null hypothesis
and our sample data; it’s the probability of observing our
results or something less likely in the event the null is true
I If p is sufficiently low, the null hypothesis is likely false
I If p < α (or our critical p-value or threshold), we reject the
null, and say the is result statistically significant
I What is a sufficiently low p-value low?
I The choice is arbitrary
I But the most common threshold or α value is 0.05
P-values and null hypothesis testing

How do we know whether to reject the null?

It comes down to our p-value


I P-values measure the lack of fit between our null hypothesis
and our sample data; it’s the probability of observing our
results or something less likely in the event the null is true
I If p is sufficiently low, the null hypothesis is likely false
I If p < α (or our critical p-value or threshold), we reject the
null, and say the is result statistically significant
I What is a sufficiently low p-value low?
I The choice is arbitrary
I But the most common threshold or α value is 0.05
I How would you interpret this value? How would you interpret
a p-value of .95?
P-values and null hypothesis testing

How do we know whether to reject the null?

It comes down to our p-value


I P-values measure the lack of fit between our null hypothesis
and our sample data; it’s the probability of observing our
results or something less likely in the event the null is true
I If p is sufficiently low, the null hypothesis is likely false
I If p < α (or our critical p-value or threshold), we reject the
null, and say the is result statistically significant
I What is a sufficiently low p-value low?
I The choice is arbitrary
I But the most common threshold or α value is 0.05
I How would you interpret this value? How would you interpret
a p-value of .95?
I Note: I use a different definition of p-values than KW.
P-values and the null sampling distribution

How do we determine whether p < α?


P-values and the null sampling distribution

How do we determine whether p < α?


1. Construct a null sampling distribution
P-values and the null sampling distribution

How do we determine whether p < α?


1. Construct a null sampling distribution
I A sampling distribution is the distribution of sample statistics
from an infinite number of samples of the same size
P-values and the null sampling distribution

How do we determine whether p < α?


1. Construct a null sampling distribution
I A sampling distribution is the distribution of sample statistics
from an infinite number of samples of the same size
I A null sampling distribution is the distribution of sample
statistics in the event the null hypothesis is true
P-values and the null sampling distribution

How do we determine whether p < α?


1. Construct a null sampling distribution
I A sampling distribution is the distribution of sample statistics
from an infinite number of samples of the same size
I A null sampling distribution is the distribution of sample
statistics in the event the null hypothesis is true
I The null sampling distribution is bell-shaped (we’ll work with
t-distributions, which are similar to the standard normal)
P-values and the null sampling distribution

How do we determine whether p < α?


1. Construct a null sampling distribution
I A sampling distribution is the distribution of sample statistics
from an infinite number of samples of the same size
I A null sampling distribution is the distribution of sample
statistics in the event the null hypothesis is true
I The null sampling distribution is bell-shaped (we’ll work with
t-distributions, which are similar to the standard normal)
2. Draw our sample, and calculate the sample statistic, and
something called a t-statistic
P-values and the null sampling distribution

How do we determine whether p < α?


1. Construct a null sampling distribution
I A sampling distribution is the distribution of sample statistics
from an infinite number of samples of the same size
I A null sampling distribution is the distribution of sample
statistics in the event the null hypothesis is true
I The null sampling distribution is bell-shaped (we’ll work with
t-distributions, which are similar to the standard normal)
2. Draw our sample, and calculate the sample statistic, and
something called a t-statistic
I The t-statistic corresponds to a p-value:
P-values and the null sampling distribution

How do we determine whether p < α?


1. Construct a null sampling distribution
I A sampling distribution is the distribution of sample statistics
from an infinite number of samples of the same size
I A null sampling distribution is the distribution of sample
statistics in the event the null hypothesis is true
I The null sampling distribution is bell-shaped (we’ll work with
t-distributions, which are similar to the standard normal)
2. Draw our sample, and calculate the sample statistic, and
something called a t-statistic
I The t-statistic corresponds to a p-value: the bigger the t, the
smaller the p
P-values and the null sampling distribution

How do we determine whether p < α?


1. Construct a null sampling distribution
I A sampling distribution is the distribution of sample statistics
from an infinite number of samples of the same size
I A null sampling distribution is the distribution of sample
statistics in the event the null hypothesis is true
I The null sampling distribution is bell-shaped (we’ll work with
t-distributions, which are similar to the standard normal)
2. Draw our sample, and calculate the sample statistic, and
something called a t-statistic
I The t-statistic corresponds to a p-value: the bigger the t, the
smaller the p
3. See where, within the null distribution, the t-statistic falls
P-values and the null sampling distribution

How do we determine whether p < α?


1. Construct a null sampling distribution
I A sampling distribution is the distribution of sample statistics
from an infinite number of samples of the same size
I A null sampling distribution is the distribution of sample
statistics in the event the null hypothesis is true
I The null sampling distribution is bell-shaped (we’ll work with
t-distributions, which are similar to the standard normal)
2. Draw our sample, and calculate the sample statistic, and
something called a t-statistic
I The t-statistic corresponds to a p-value: the bigger the t, the
smaller the p
3. See where, within the null distribution, the t-statistic falls
I The further away it is from the centre, the less likely the null is
true
P-values and the null sampling distribution: A visualization
Assume the following:
P-values and the null sampling distribution: A visualization
Assume the following: our null hypothesis is µ = 0
P-values and the null sampling distribution: A visualization
Assume the following: our null hypothesis is µ = 0 ; our null
sampling distribution is a standard normal;
P-values and the null sampling distribution: A visualization
Assume the following: our null hypothesis is µ = 0 ; our null
sampling distribution is a standard normal; and α = .05
P-values and the null sampling distribution: A visualization
Assume the following: our null hypothesis is µ = 0 ; our null
sampling distribution is a standard normal; and α = .05

The 68-95-99 rule:


P-values and the null sampling distribution: A visualization
Assume the following: our null hypothesis is µ = 0 ; our null
sampling distribution is a standard normal; and α = .05

The 68-95-99 rule: For the standard normal, 95% of sample means
will fall within 2 standard deviations of µ
P-values and the null sampling distribution: A visualization
Assume the following: our null hypothesis is µ = 0 ; our null
sampling distribution is a standard normal; and α = .05

The 68-95-99 rule: For the standard normal, 95% of sample means
will fall within 2 standard deviations of µ (it’s 1.96 standard
deviations to be exact);
P-values and the null sampling distribution: A visualization
Assume the following: our null hypothesis is µ = 0 ; our null
sampling distribution is a standard normal; and α = .05

The 68-95-99 rule: For the standard normal, 95% of sample means
will fall within 2 standard deviations of µ (it’s 1.96 standard
deviations to be exact); 5% don’t
P-values and the null sampling distribution: A visualization
Assume the following: our null hypothesis is µ = 0 ; our null
sampling distribution is a standard normal; and α = .05

The 68-95-99 rule: For the standard normal, 95% of sample means
will fall within 2 standard deviations of µ (it’s 1.96 standard
deviations to be exact); 5% don’t
P-values and the null sampling distribution: A visualization
Assume the following: our null hypothesis is µ = 0 ; our null
sampling distribution is a standard normal; and α = .05

The 68-95-99 rule: For the standard normal, 95% of sample means
will fall within 2 standard deviations of µ (it’s 1.96 standard
deviations to be exact); 5% don’t

I If our sample mean falls within the white range (or non-rejection
region), we fail to reject the null
P-values and the null sampling distribution: A visualization
Assume the following: our null hypothesis is µ = 0 ; our null
sampling distribution is a standard normal; and α = .05

The 68-95-99 rule: For the standard normal, 95% of sample means
will fall within 2 standard deviations of µ (it’s 1.96 standard
deviations to be exact); 5% don’t

I If our sample mean falls within the white range (or non-rejection
region), we fail to reject the null
I If our sample mean falls within either blue tail (or the rejection
region), we reject the null
The t distribution
We’re not going to use the normal distribution to construct null
sampling distributions.
The t distribution
We’re not going to use the normal distribution to construct null
sampling distributions. We’ll use a t-distribution instead
The t distribution
We’re not going to use the normal distribution to construct null
sampling distributions. We’ll use a t-distribution instead

Like the normal distribution, t-distributions are bell-shaped, unimodal,


and symmetric.
The t distribution
We’re not going to use the normal distribution to construct null
sampling distributions. We’ll use a t-distribution instead

Like the normal distribution, t-distributions are bell-shaped, unimodal,


and symmetric. But they...
The t distribution
We’re not going to use the normal distribution to construct null
sampling distributions. We’ll use a t-distribution instead

Like the normal distribution, t-distributions are bell-shaped, unimodal,


and symmetric. But they...
I have heavier tails and more observations further from the mean
The t distribution
We’re not going to use the normal distribution to construct null
sampling distributions. We’ll use a t-distribution instead

Like the normal distribution, t-distributions are bell-shaped, unimodal,


and symmetric. But they...
I have heavier tails and more observations further from the mean
I don’t represent any one distribution but a family of them
The t distribution
We’re not going to use the normal distribution to construct null
sampling distributions. We’ll use a t-distribution instead

Like the normal distribution, t-distributions are bell-shaped, unimodal,


and symmetric. But they...
I have heavier tails and more observations further from the mean
I don’t represent any one distribution but a family of them
I get taller, and thinner, and converge on a normal distribution with
higher degrees of freedom (which is closely linked to sample size)
The t distribution
We’re not going to use the normal distribution to construct null
sampling distributions. We’ll use a t-distribution instead

Like the normal distribution, t-distributions are bell-shaped, unimodal,


and symmetric. But they...
I have heavier tails and more observations further from the mean
I don’t represent any one distribution but a family of them
I get taller, and thinner, and converge on a normal distribution with
higher degrees of freedom (which is closely linked to sample size)
Implications

We use the t distributions when the population parameters are


unknown, and the sample size is small
Implications

We use the t distributions when the population parameters are


unknown, and the sample size is small

Why is the t distribution wider in small samples?


Implications

We use the t distributions when the population parameters are


unknown, and the sample size is small

Why is the t distribution wider in small samples?


I Because our estimates are less certain
I The wider distribution accounts for this
Implications

We use the t distributions when the population parameters are


unknown, and the sample size is small

Why is the t distribution wider in small samples?


I Because our estimates are less certain
I The wider distribution accounts for this

What does this mean for statistical inference?


Implications

We use the t distributions when the population parameters are


unknown, and the sample size is small

Why is the t distribution wider in small samples?


I Because our estimates are less certain
I The wider distribution accounts for this

What does this mean for statistical inference?


I Our non-rejection region is wider for small samples
I This makes the null hypothesis harder to reject
Critical t-table
Critical t-table

I t is the number of standard errors a value is above or below µ


Critical t-table

I t is the number of standard errors a value is above or below µ


I df refers to degrees of freedom. Think of it as the sample size for now.
Critical t-table

I t is the number of standard errors a value is above or below µ


I df refers to degrees of freedom. Think of it as the sample size for now.
I the rectangle contains t-values for a two-tailed test at a .05 level of
statistical significance for different sample sizes
Critical t-table

I t is the number of standard errors a value is above or below µ


I df refers to degrees of freedom. Think of it as the sample size for now.
I the rectangle contains t-values for a two-tailed test at a .05 level of
statistical significance for different sample sizes
I as df increases, t-decreases, and the non-rejection region narrows...
Critical t-table

I t is the number of standard errors a value is above or below µ


I df refers to degrees of freedom. Think of it as the sample size for now.
I the rectangle contains t-values for a two-tailed test at a .05 level of
statistical significance for different sample sizes
I as df increases, t-decreases, and the non-rejection region narrows...
I ± 12.76 for 1; ± 2.228 for 10; ± 1.962 for 1,000; ± 1.96 for
infinity
Critical t-table

I t is the number of standard errors a value is above or below µ


I df refers to degrees of freedom. Think of it as the sample size for now.
I the rectangle contains t-values for a two-tailed test at a .05 level of
statistical significance for different sample sizes
I as df increases, t-decreases, and the non-rejection region narrows...
I ± 12.76 for 1; ± 2.228 for 10; ± 1.962 for 1,000; ± 1.96 for
infinity
I Why does it narrow? What are the implications for hypothesis testing?
Limits of p-values

I p-values are only valid if we’re working with random or


probability samples
Limits of p-values

I p-values are only valid if we’re working with random or


probability samples
I low p-values don’t necessarily imply causation
Limits of p-values

I p-values are only valid if we’re working with random or


probability samples
I low p-values don’t necessarily imply causation
I statistical significance is often conflated with substantive
significance
Limits of p-values

I p-values are only valid if we’re working with random or


probability samples
I low p-values don’t necessarily imply causation
I statistical significance is often conflated with substantive
significance
I rejection of the null is often based on strict and arbitrary
cutoffs (e.g., a p-value of 0.05), leading to simplistic and
binary conclusions about whether a relationship exists
Bivariate hypothesis tests

The remaining lectures discuss two different bivariate hypothesis


tests
Bivariate hypothesis tests

The remaining lectures discuss two different bivariate hypothesis


tests

While they differ in their details, they both follow the same basic
process
Bivariate hypothesis tests

The remaining lectures discuss two different bivariate hypothesis


tests

While they differ in their details, they both follow the same basic
process
1. State the hypotheses
Bivariate hypothesis tests

The remaining lectures discuss two different bivariate hypothesis


tests

While they differ in their details, they both follow the same basic
process
1. State the hypotheses
2. Construct a null sampling distribution and non-rejection region
Bivariate hypothesis tests

The remaining lectures discuss two different bivariate hypothesis


tests

While they differ in their details, they both follow the same basic
process
1. State the hypotheses
2. Construct a null sampling distribution and non-rejection region
3. Calculate the test statistic
Bivariate hypothesis tests

The remaining lectures discuss two different bivariate hypothesis


tests

While they differ in their details, they both follow the same basic
process
1. State the hypotheses
2. Construct a null sampling distribution and non-rejection region
3. Calculate the test statistic
4. Compare the critical and calculated values and interpret the
results

You might also like