You are on page 1of 18

Inference for Numerical Data

Sections 4.1-4.3

Introductory Note
We now turn our attention toward numerical data, and inferential procedures for their mean/average.
We will no longer be using the normal model like we did in Chapter 3. Rather, our inferential
procedures will involve the t-distribution. Our general approach will be to:
1. Determine which point estimate or test statistic is useful for a research question.
2. Identify the appropriate distribution for the point estimate or test statistic.
3. Apply confidence interval and hypothesis testing techniques using the distribution from Step 2.

Even so, we need to talk about the Central Limit Theorem for means, so we begin there.

Central Limit Theorem for Means


We saw in previous chapters that the sampling distribution for the sample proportion 𝑝̂ and the
sampling distribution for the difference between sample proportions 𝑝̂! − 𝑝̂ " are approximately
normal provided certain conditions are met. In this chapter, we have an analogous result. The sampling
distribution associated with a sample mean or the difference of two sample means is nearly normal,
provided certain conditions are met.

Central Limit Theorem for the Sample Mean


When certain conditions are met, the sampling distribution of a single sample mean 𝑥̅ will be nearly
#
normal with mean 𝜇 and standard error . The two conditions are
√%
1. Independence: The observations must be independent of one another.
2. Nearly normal: The distribution of the population is nearly normal or the sample size is large
enough.
#
When these two conditions are met, the sampling distribution of the sample mean 𝑥̅ is the 𝑁 (𝜇, *
√%
distribution.

Section 4.1 One-sample Means with the t Distribution


The Normal Model for a Sample Mean
Unfortunately, we rarely know the value of the population standard deviation. By extension, we rarely
#
know the value of the standard error . We can estimate the population standard deviation 𝜎 with
√%
&
the sample standard deviation 𝑠 and therefore estimate the standard error with .
√%

&
We can identify some intuitive qualities by examining the formula for the standard error (𝑆𝐸'̅ = *:
√%
1. A large sample standard deviation 𝑠 corresponds to a larger standard error. This makes sense: if
the data are more variable, then we will be less certain of the location of the true mean, so the
standard error should be bigger. On the other hand, if the observations all fall very close
together, then 𝑠 will be smaller, and the sample mean should be a more precise estimate of the
true mean.

Chapter 4, page 1
2. A larger sample size corresponds to a smaller standard error. This too makes sense: we expect
estimates to be more precise when we have more data, so the standard error SE should get
smaller when n gets bigger.

When we use 𝑠 to estimate the 𝜎 in the standard error formula, the sampling distribution of 𝑥̅ has
&
mean 𝜇 and estimated standard error , but the sampling distribution is no longer a normal
√%
distribution. When we estimate the population standard deviation in the standard error formula, we
add more variability to the sampling distribution of 𝑥̅ , so we need to use a new distribution called the
Student’s 𝑡 distribution.

Student’s t-Distribution
The Student’s 𝑡 distribution came from a desire to brew better beer. Yes, you read that right. William
Sealy Gosset “joined the Guinness Brewery on 1 October 1899 as a junior brewer. He was appointed
Brewer in charge of the newly established experimental brewery in 1907 and later established the
statistical department, which he ran until 1936.”1 It was in this role at Guinness that Gosset developed
In 1908, Gosset published his work “The Probable Error of a Mean” in in Biometrika under the
pseudonym Student. One of the results of that paper is that Gosset’s distribution “has been shown that
this curve represents the facts fairly well even when the distribution of the population is not strictly
normal.”2

The Student’s 𝑡 distribution, a few of which are shown in the figure below, has a bell shape that looks
very similar to a standard normal distribution (black dotted line). The tails of the 𝑡 distribution are
thicker than the tails of the 𝑁(0,1) distribution, which means observations are more likely to fall
beyond two standard deviations from the mean than under the normal distribution. When our sample
is small, the value s used to compute the standard error isn't very reliable. The extra thick tails of the t
distribution are exactly the correction we need to resolve this problem.

The t distribution, always centered at zero, only has one parameter: the degrees of freedom. This
quantity called degrees of freedom (df) describes the precise form of the bell-shape the t-distribution
takes on.

1
https://www.guinness-storehouse.com/content/pdf/archive-factsheets/general-history/wsgosset-and-students-t-test.pdf
2
Student. (March 1908). The Probable Error of the Mean. Biometrika 6(1), pages 1-25. https://doi-
org.proxy.lib.umich.edu/10.2307/2331554
Chapter 4, page 2
In general, t distributions...
• Are very much like the standard normal distribution 𝑁(0,1): symmetric, unimodal, centered at 0.
• They are flatter with heavier tails compared to the 𝑁(0,1) distribution.
• As the degrees of freedom (df) increases, the t distribution approaches the 𝑁(0,1) distribution.

On Your Own: What are degrees of freedom anyway?


A Minitab blog post3 provides the answer to this age-old question that I have dared not tackle until now. The
following example is from that blog post:
“Imagine you’re a fun-loving person who loves to wear hats. You couldn't care less what a degree of
freedom is. You believe that variety is the spice of life.
Unfortunately, you have constraints. You have only 7 hats. Yet you want to wear a different hat
every day of the week.

On the first day, you can wear any of the 7 hats. On the second day, you can choose from the 6
remaining hats, on day 3 you can choose from 5 hats, and so on.
When day 6 rolls around, you still have a choice between 2 hats that you haven’t worn yet that
week. But after you choose your hat for day 6, you have no choice for the hat that you wear on Day 7.
You must wear the one remaining hat. You had 7 – 1 = 6 days of ‘hat’ freedom—in which the hat you
wore could vary!
That’s kind of the idea behind degrees of freedom in statistics. Degrees of freedom are often
broadly defined as the number of "observations" (pieces of information) in the data that are free to vary
when estimating statistical parameters.”

Using R to Find Probabilities, Areas, and Percentiles


Consider a t distribution with df degrees of freedom. The following R code can be used to find areas:
• Area to the left 𝑃(𝑇 < 𝑐): pt(c, df)
• Area in the middle 𝑃(𝑎 < 𝑇 < 𝑏): pt(b, df) – pt(a, df)
• Area to the right 𝑃(𝑇 > 𝑐): pt(c, df, lower.tail = FALSE)
The following can be used to find percentiles:
• pth percentile 𝑃(𝑇 < result from R) = 𝑝: qt(p, df)
We will use these functions once we get to our examples.

Applying the t Distribution to the Single-mean Situation


As mentioned above, the t distribution is a little more accurate than the normal model for the sampling
distribution of 𝑥̅ . This is true for both small and large samples, although the benefits for larger samples
are limited.

Before using the t distribution for inference about a single mean, we must check two conditions.
• Independence: The observations must be independent of one another. Independence can be
assumed when we have a random sample from the population or when the data come from an
experiment where each subject was randomly assigned to a group and the subjects do not
interact. If the data were not collected in one of these two ways, we need to carefully check to
the best of our ability that the observations were independent.
• Nearly normal: The observations are from a population with a nearly normal distribution.

3
https://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-degrees-of-freedom-in-statistics
Chapter 4, page 3
The nearly normal condition is difficult to verify with small data sets. We should
i. take a look at a plot of the data for obvious departures from the normal model, usually in the
form of prominent outliers, and
ii. consider whether any previous experiences alert us that the data may not be nearly normal.
When the sample size is somewhat large, we can relax the nearly normal condition. For example,
moderate skew is acceptable when the sample size is about 30 or more, and strong skew is acceptable
when the sample size is about 60 or more.4

When doing inference for a sample mean calculated from a sample of 𝑛 independent observations
from a nearly normal distribution, we will use the 𝑡 distribution with 𝑛 − 1 degrees of freedom. Other
than using a different distribution (the t instead of the z), our hypothesis tests and confidence intervals
are just like they were before.

Hypothesis Tests for A Single Mean


We have already been introduced to the logic and steps of hypothesis testing for learning about
categorical data. Now we will extend these ideas to testing about means, focusing first on hypothesis
testing about a single population mean.

A couple of notes:
• Hypotheses and conclusions apply to the population(s) represented by the sample(s).
• And if the distribution of a quantitative variable is highly skewed, we should consider analyzing the
median rather than the mean. Methods for testing hypotheses about medians are a special case of
nonparametric methods, which we will not cover in this course, but do exist when the need arises.

Basic steps of a hypothesis test are:


1. Determine appropriate null and alternative hypotheses.
2. Check the conditions for performing the test.
3. Calculate the test statistic and determine the p-value.
4. Evaluate the p-value and the compatibility of the null model.
5. Make a conclusion in the context of the problem.

Remember that our hypotheses come in the form of two competing claims. To test a particular value of
a population proportion, we have the following possible pairs of hypotheses:
𝐻) : 𝜇 ≤ 𝜇) versus 𝐻* : 𝜇 > 𝜇)
𝐻) : 𝜇 = 𝜇) versus 𝐻* : 𝜇 ≠ 𝜇)
𝐻) : 𝜇 ≥ 𝜇) versus 𝐻* : 𝜇 < 𝜇)

What is this 𝝁𝟎 and where does it come from? This is the hypothesized value of the population mean 𝜇
that we will use to build the null model. We will then check to see if our sample results are compatible
with the null model.

4
These are just rough guidelines.
Chapter 4, page 4
t Hypothesis Test for the Mean
Based on a sample of n independent observations from a nearly normal distribution, the test statistic is
𝑥̅ − 𝜇)
𝑡=
𝑠⁄√𝑛
where 𝑥̅ is the sample mean, 𝑠 is the sample standard deviation, 𝑛 is the sample size, and 𝜇)
corresponds to the null value of the population mean 𝜇. We use the t distribution with 𝑑𝑓 = 𝑛 − 1 to
calculate our p-values.

Example: How Long Was That?


Does it ever seem like time drags on (perhaps during one of your least favorite classes) or time flies by
(like a summer vacation)? Perception, including that of time, is one of the things that psychologists
study. Students in a statistics class collected data on other students' perception of time. They told their
subjects that they would be listening to some music and then after it was over they would be asked
some questions. They played 10 seconds of the Jackson 5's song “ABC.” Afterward, they simply asked
the subjects how long they thought the song snippet lasted. They wanted to see whether students
could accurately estimate the length of this short song segment.5

What is the research question?

The researchers asked 48 students on campus to be subjects in the experiment. The participants were
asked to listen to a 10-second snippet of a song, and after it was over, they were asked to estimate
how long the song snippet lasted. (The subjects did not know in advance that they would be asked this
question.) Their estimates of the song length are the data that will be analyzed. Because the true
length of the song is 10 seconds, we are interested in learning whether people, on average, correctly
estimate the song length as 10 seconds or if they, on average, over- or underestimate the song length.

Step 1: Determine the appropriate null and alternative hypotheses.

Step 2: Check the conditions for performing the test.

5
Example 2.2 from Tintle et. Al’s Introduction to Statistical Investigations
Chapter 4, page 5
Step 3: Calculate the test statistic and determine the p-value. The sample mean was 13.71 seconds,
and the sample standard deviation was 6.5 seconds.

Step 4: Evaluate the p-value and the compatibility of the null model with observed results.

Step 5: Make a conclusion in the context of the problem.

Confidence Intervals for a Single Mean


t Confidence Interval for the Mean
Based on a sample of n independent observations from a nearly normal distribution, a confidence
interval for the population mean 𝜇 is
𝑠
𝑥̅ ± 𝑡 ∗
√𝑛
where 𝑥̅ is the sample mean, 𝑠 is the sample standard deviation, and 𝑡 ∗ corresponds to the confidence
level and degrees of freedom. In this case, we use 𝑑𝑓 = 𝑛 − 1.

• The confidence level tells us how confident we can be that the interval we construct contains
the true population mean.
• Typical choices for confidence levels are 90%, 95%, and 99%, but any value larger than 0 and
smaller than 100 can be chosen.
• We select 𝑡 ∗ so that the percentage of the t-distribution between –𝑡 ∗ and 𝑡 ∗ is equal to the
confidence level we’ve chosen for the interval.

Finding 𝑡 ∗
For example, let’s consider finding the 𝑡 ∗ value for a 95%
confidence interval where our sample size is 30. We need to
look at the t-distribution with 𝑑𝑓 = 𝑛 − 1 = 30 − 1 = 29
Since we want an area of 0.95 between –𝑡 ∗ and 𝑡 ∗ , that leaves
1 – 0.95 = 0.05 to be split between the two tails.

Chapter 4, page 6
Example: Readability of Road Signs
A researcher wanted to learn about the average maximum distance at which drivers are able to read
the sign. The researcher took a random sample of n = 16 drivers and measured the maximum distances
(in feet) at which each can read the sign. The data are provided below.
440 490 600 540 540 600 380 440
360 600 490 400 490 540 440 490

Here is some R output for these data:

a. Verify the necessary conditions for computing a confidence interval for the population mean
distance.6

b. Use a 95% confidence interval to estimate the population mean maximum distance at which all
drivers can read the sign. Write a statement that interprets both the confidence interval and the
confidence level.

6
When checking whether the population is nearly normal, we give the population distribution the benefit of the doubt.
However, if the histogram is clearly not normal and/or if the points on the normal q-q plot are clearly not following the line,
then the nearly normal condition is not reasonable. See page 10 of the Section 2.5-2.8 notes for more details.
Chapter 4, page 7
Note on interpretations:
• The confidence interval provides a range of reasonable values for the parameter with an associated
level of confidence.
• We cannot say that 95% of the sample will fall in the confidence interval.
• We cannot say that 95% of the population will fall in the confidence interval.
• The confidence level describes our confidence in the procedure we used to make the interval.
Suppose the confidence level was 95%. If we repeated the confidence interval construction
procedure many times, we would expect about 95% of the resulting intervals to contain the
corresponding population parameter of interest.

Effect Size, Another Way of Interpreting Test Results7


Introducing Effect Sizes
We are frequently interested in comparing a population parameter to a value specified by a null model
or (in future lectures) to a parameter value of another population. In many research situations, we
want to estimate the magnitude of the difference we are comparing. The test statistic and 𝑝-value
cannot help us here because they both heavily influenced by the sample size. A confidence interval may
also not be useful because it uses the unit of measurement of the data, and we may be interested in a
unit that is independent of the unit of measurement, especially when comparing results across studies
that used different units of measurement.

The effect size provides an estimate of the magnitude of the difference/relationship in a study, while
taking into account the amount of variability in the scores.89

A common formula for the effect size when conducting a hypothesis test regarding a single
population mean 𝜇 is Cohen’s d. This value relates a null model to the true population mean, 𝜇. The
formula for Cohen’s d is given by:
𝜇 − 𝜇)
𝑑=
𝜎
We estimate Cohen’s d with available sample statistics, namely
𝑥̅ − 𝜇)
𝑑H =
𝑠
In the context of tests for a single population mean, 𝜇, Cohen’s d has an important relationship with
the test statistic t.
(𝑥̅ − 𝜇) ) 𝑥̅ − 𝜇)
𝑡= 𝑠 = √𝑛 ∗ = √𝑛 ∗ 𝑑H
𝑠
√𝑛
The consequence of the algebra above gives us a new interpretation of the test statistic:

test statistic = (square root of the sample size)*(size of the effect)10

7
Your future classes may go more in-depth into effect sizes.
8
Goodwin, K.A., and Goodwin, C.J. (2017). Research in Psychology (8th edition). Wiley.
9
We’ve already seen an effect size! 𝑅! gives us the effect size for regression.
10
You should be able to see how the sample size impacts the test statistic—as the sample size increases, the test statistic
also increases.
Chapter 4, page 8
Effect Size Conventions
What makes an effect “big” or “small”? Psychologist and statistician Jacob Cohen did seminal work in
statistical power11 and effect size. Cohen’s 1988 book Statistical Power Analysis for the Behavioral
Sciences is filled with formulas and tables that that were extremely useful to researchers before we
could calculate statistical power and effect sizes with the touch of a button.

Cohen considered a small effect size to be around 0.2, a medium effect to be around 0.5, and a large
effect to be around 0.8:
𝑑 Effect Size
𝑑 = 0.2 small
𝑑 = 0.5 medium
𝑑 = 0.8 large

“Cohen’s effect size conventions provide a guide for deciding on the importance of the effect of a study
in relation to what is typical in psychology. However, they are only a guide. That is, when evaluating a
particular effect size, it is important to consider the magnitude of effect that is typically found in that
specific area of research, as well as the potential practical or clinical implications of such an effect.”12

So we must consider the context of a study when interpreting an effect size as “small,” “medium,” or
“large.” Sometimes, studies that show only a small effect size can have huge practical implications.

For instance, a school might find that a new teaching technique improves average student reading
scores among kindergarten students with 𝑑H = 0.03 – a very small effect size, indeed! But this slight
shift at the beginning of a student’s educational career might give them a slight edge that increases
while in kindergarten and later snowballs in elementary school through to high school and college. The
cumulative impacts of the teaching technique over time could be hugely consequential, even if the
initial effect size seems small.

Similarly, studies with very large effect sizes might not be of practical importance. For instance,
assigning each kindergarten student a year-long personal tutor might improve average student reading
scores by a much larger effect size, say 𝑑H = 0.97, but the cost of implementing this for every student
at the school makes studying its potential impact a non-starter.

Section 4.3: Intervals and Tests for a Difference Between Two Means 𝜇" − 𝜇#
Now we turn our attention to inferential procedures for a difference between two independent
population means 𝜇! − 𝜇" .13 Our key method of conducting inference on this parameter is by using the
sampling distribution of 𝑥̅! − 𝑥̅" , the difference between two means of samples from these
independent populations.

11
Power is coming up in our return to Section 2.3.
12
Aron, A., Coups, E. J., and Aron, E. N. (2013). Statistics for Psychology (6th edition). Pearson Education.
13
We will come back to Section 4.2 after we discuss Section 4.3.
Chapter 4, page 9
Sampling Distribution of a Difference Between Sample Means 𝑥̅! − 𝑥̅"
The sampling distribution of 𝑥̅! − 𝑥̅" will be approximately normal when the following two conditions
are met:
1. Independence within each sample: The observations within each sample are independent (e.g.,
we have a random sample from each of the two populations).
2. Independence between the samples: The two samples are independent of one another such
that observations in one sample tell us nothing about the observations in the other sample
(and vice versa).
3. Nearly normal: The distributions of both populations are nearly normal or the sample sizes are
both large enough.

The mean of this normal distribution is the expected difference in the two proportions, 𝜇! − 𝜇" .
The standard error of this distribution is
𝜎" 𝜎"
𝑆𝐸'̅" -'̅ # = M ! + "
𝑛! 𝑛"

Again, since we do not usually know the population standard deviations 𝜎! and 𝜎" , the sample
difference of two means 𝑥̅! − 𝑥̅" can be modeled using a t distribution and the standard error 𝑆𝐸 =
&# &#
O%" + %# when each sample mean can itself be modeled using a t distribution and the samples are
" #
independent. To calculate the degrees of freedom, use statistical software or the smaller of 𝑛! − 1 and
𝑛" − 1.14

Confidence Intervals for 𝜇! − 𝜇"


We continue to use the basic idea for a confidence interval of point estimate ± (a few) standard errors.
For this situation our formula becomes:
𝑠" 𝑠"
(𝑥̅! − 𝑥̅" ) ± 𝑡 ∗ M ! + "
𝑛! 𝑛"

Example: Public vs. Private Universities


Since public universities are subsidized by state governments, it is usually less expensive to attend a
public university in your home state than a private university. Is this still true when you must pay out-
of-state tuition? A random sample of 75 private universities had an average tuition of $24,800 with an
SD of $11,800. A similar random sample of 80 public universities had an average out-of-state tuition of
$19,600 with an SD of $6,400.

14
Use statistical software unless you must calculate degrees of freedom by hand.
Chapter 4, page 10
a. Does it appear that the conditions for using a t distribution are met? Explain.

b. Use the summary data to calculate a 99% confidence interval for the true difference in mean
tuition.

c. Based on the interval computed in (a), does there appear to be a difference in mean tuition at
private universities and mean out-of-state tuition at public universities? How would you explain
this to someone who has not taken a statistics course?

Chapter 4, page 11
Hypothesis Tests for 𝜇! − 𝜇"
Hypothesis tests for the difference in two means proceed in a similar way to the hypothesis tests we
have done so far in the course.

Step 1: Determine the appropriate null and alternative hypotheses. In the case where we are
testing hypotheses about the difference 𝜇! − 𝜇" , the possible hypotheses are:
𝐻) : 𝜇! = 𝜇" versus 𝐻* : 𝜇! < 𝜇"
𝐻) : 𝜇! = 𝜇" versus 𝐻* : 𝜇! ≠ 𝜇"
𝐻) : 𝜇! = 𝜇" versus 𝐻* : 𝜇! > 𝜇"
While it is possible to conduct a hypothesis test to determine if the difference in means is some value
other than 0, we will confine our discussion here to only the case where 𝐻) : 𝜇! = 𝜇" .

Step 2: Check the conditions for performing the test. The conditions are the same as they were for
the confidence interval.

Step 3: Calculate the test statistic and determine the p-value.


Again we measure the difference between the sample statistic 𝑥̅! − 𝑥̅" and the null value (0) using the
./0123 .5/56.567-8922 :/293
standardized statistic: (8922) .5/8=/>= 3>>?> , which in this situation becomes the t statistic:
𝑥̅! − 𝑥̅" − 0
𝑡=
𝑠" 𝑠"
M !+ "
𝑛! 𝑛"
This standardized statistic follows a t distribution with degrees of freedom calculated from statistical
software (technology will do this automatically!).15

Step 4: Evaluate the p-value and the compatibility of the null model.
As we have done up to now, we will continue to use the p-value to measure the strength of the
evidence against the null hypothesis. An alternate approach is to decide the level of significance/Type 1
Error rate you are comfortable with, and decide against the null hypothesis if the p-value is less than
the significance level, 𝛼.

Step 5: Make a conclusion in the context of the problem.

Using R
Because the formulas for this hypothesis test are complicated, we recommend that you use R when
performing hypothesis tests for the difference in means. You will learn how to use R for hypothesis
tests and confidence intervals in lab.

15
If you are calculating degrees of freedom by hand, you can use the smaller of 𝑛$ − 1 and 𝑛! − 1.
Chapter 4, page 12
Example: Sugar in Cereals16
Statistics students found the sugar content of breakfast cereals, in grams per serving, for those placed
on the high shelves of the store versus the low shelves.

If you want to compare the sugar content of cereals on high versus low shelves, what would be the
explanatory variable and what would be the response?

Based on just how the distributions look, do you think there is an association between where the
cereal is located and its sugar content? If so, in what way? Explain.

How do the following summary statistics support what you saw in the graphical displays?

Determine appropriate null and alternative hypotheses.

16
Exercise 6.1.27 from Tintle et al.’s Introduction to Statistical Investigations
Chapter 4, page 13
Check the conditions for performing the test.

Calculate the test statistic and determine the p-value.

Evaluate the p-value and the compatibility of the null model.

Make a conclusion in the context of the problem.

Here is the R output for the hypothesis


test we just performed:17

17
R puts the groups in order alphabetically, so R tested 𝐻% : 𝜇& = 𝜇' versus 𝐻( : 𝜇& < 𝜇' .
Chapter 4, page 14
Section 4.2: Paired Data
Paired data sets “have a structure with two features:
1. There is one pair of response values for each observational unit.
2. There is a built-in comparison: One number in each pair is for Condition A, the other for
Condition B.
Studies leading to paired data are often more efficient—better able to detect differences between
conditions—because you have controlled for a source of variation (individual variability among the
observational units).”18

Paired Data
Two sets of observations are paired if each observation in one set has a special correspondence or
connection with exactly one observation in the other data set.

With paired data, we have two measurements on each pair. If you ask the question “is there
independence between the data sets?,” the answer will be “no.” Instead of being interested in the two
measurements separately, we are interested in the differences within each pair. What may originally
look like two data sets is actually just one data set of the differences.

By examining the differences instead of the individual measurements,

Example: Assessing Improvement in Statistics Content Knowledge


The Comprehensive Assessment of Outcomes in Statistics (CAOS) test was developed by statistics
education researchers to be “a reliable assessment consisting of a set of items that students
completing any introductory statistics course would be expected to understand.”19 When doing
research for the HyFlex model of teaching, Dr. Miller had students take the CAOS test at the beginning
of the semester (pretest) and at the end of the semester (posttest). Why does it make sense to treat
these measurements as paired data?

By examining the differences between the pretest and posttest measurements, we eliminate the
variability across different students and focus in on that improvement measure (posttest – pretest)
itself. As mentioned above, we have controlled for one source of variation (variability among students).

Once we compute the set of differences, we use our one-sample methods for inference. Our notation
changes to indicate that we are working with differences. Our parameter of interest is 𝜇@ , the
population mean of the differences. The point estimate for 𝜇@ is 𝑥̅@ , the sample mean of the
differences.

18
Tintle et al., (2016). Introduction to Statistical Investigations. Wiley.
19
delMas et al. (November 2007). “Assessing Students’ Conceptual Understanding after a First Course in Statistics.”
Statistics Education Research Journal, 6(2), 28-58, https://www.stat.auckland.ac.nz/~iase/serj/SERJ6(2)_delMas.pdf.

Chapter 4, page 15
Sampling Distribution of the Sample Mean of the Differences 𝑥̅#
The sampling distribution of 𝑥̅ @ will be approximately normal when the following two conditions are
met:
1. Independence: The observations within the sample of differences are independent (e.g., the
differences are a random sample from the population of differences).
2. Nearly normal: The distributions of both populations are nearly normal or the sample sizes are
both large enough.

Note: We do not need to check for independence between the samples because we already know that
the samples are not independent.

The mean of this normal distribution is 𝜇@ . The standard error of this distribution is
𝜎@
𝑆𝐸'̅) =
𝑛

where 𝜎@ is the population standard deviation of the differences and 𝑛 is the number of pairs.

Again, since we do not usually know the population standard deviation 𝜎@ , the sample difference of
&
two means 𝑥̅@ can be modeled using a t distribution and the standard error 𝑆𝐸 = %* when each
sample mean can itself be modeled using a t distribution with 𝑛 − 1 degrees of freedom.

t Hypothesis Test for Paired Data


Based on a sample of n independent differences from a nearly normal distribution, the test statistic is
𝑥̅= − 𝜇)
𝑡=
𝑠= ⁄√𝑛
where 𝑥̅= is the sample mean of the differences, 𝑠= is the sample standard deviation of the differences,
𝑛 is the sample size (number of pairs), and 𝜇) corresponds to the null value of the population mean of
the differences, 𝜇= . We use the t distribution with 𝑛 − 1 degrees of freedom to calculate our
p-values.

t Confidence Interval for Paired Data


Based on a sample of n independent differences from a nearly normal distribution, a confidence
interval for the population mean of the differences 𝜇= is
𝑠=
𝑥̅= ± 𝑡 ∗
√𝑛
where 𝑥̅= is the sample mean of the differences, 𝑠= is the sample standard deviation of the differences,
𝑛 is the sample size (number of pairs), and 𝑡 ∗ corresponds to the confidence level and 𝑛 − 1 degrees of
freedom.

Example: Portion Size20


Does your bowl size affect how much you eat? Food psychologist Brian Wansink (www.mindlesseating.org)
ran an experiment with a group of undergraduates from the University of Illinois at Urbana-Champaign
to investigate whether bowl size would affect how many M&M’s candy pieces students took. In this
example, we will analyze some of the data collected as part of that study. The study was conducted

20
This example is Example 7.3 in Tintle et al.’s Introduction to Statistical Investigations.
Chapter 4, page 16
over several days. At one of the sessions, some participants were assigned to receive either a small
bowl or a large bowl and allowed to take as many M&M’s as they planned on eating during the study
session. At a following session, the bowl sizes were switched, with those who received small bowls at a
previous session now receiving large bowls, and vice versa.

We will use the data from these undergraduates (the cases) to test whether bowl size (the explanatory
variable, categorical) affects how much we eat, as measured by number of M&M’s taken (the response
variable, quantitative), in this case. Thus, we can state our hypotheses as:
• 𝐻) : On average, people take the same number of M&M’s from large and small bowls.
• 𝐻* : On average, people take more M&M’s from large bowls than from small bowls.

Data on the number of M&M’s taken by the 17 participants who attended both sessions is shown in
the table below. Note that the data are ordered by participant ID; that is, the number of M&M’s taken
from the small bowl is matched up with the number of M&M’s taken by that individual from the large
bowl, regardless of which bowl the student was given first.
Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Small bowl 33 24 35 24 40 33 88 36 65 38 28 50 26 34 51 25 26
Large bowl 41 92 61 19 21 35 42 50 11 104 97 36 43 62 33 62 32

Because the data are paired and quantitative, we will be looking at the difference in the number of
M&M’s taken when a large bowl is used compared to when a small bowl is used (large – small):
Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Difference 8 68 26 –5 –19 2 –46 14 –54 66 69 –14 17 28 –18 37 6

What is our parameter of interest?

Restate the hypotheses in symbols.

Chapter 4, page 17
Check the conditions for performing the test:

Calculate the test statistic and determine the p-value:

Evaluate the p-value and the compatibility of the null model.

Make a conclusion in the context of the problem. To what population can we generalize these results?

Can we make a causal conclusion here?

Chapter 4, page 18

You might also like