You are on page 1of 20

Journal of Statistics and Data Science Education

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/ujse21

Teaching Statistical Inference Through a


Conceptual Lens: A Spin on Existing Methods with
Examples

Mortaza Jamshidian & Parsa Jamshidian

To cite this article: Mortaza Jamshidian & Parsa Jamshidian (2024) Teaching Statistical
Inference Through a Conceptual Lens: A Spin on Existing Methods with Examples, Journal of
Statistics and Data Science Education, 32:1, 54-72, DOI: 10.1080/26939169.2023.2190011

To link to this article: https://doi.org/10.1080/26939169.2023.2190011

© 2023 The Author(s). Published with


license by Taylor and Francis Group, LLC.

Published online: 17 Apr 2023.

Submit your article to this journal

Article views: 2325

View related articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=ujse21
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION
2024, VOL. 32, NO. 1, 54–72
https://doi.org/10.1080/26939169.2023.2190011

Teaching Statistical Inference Through a Conceptual Lens: A Spin on Existing Methods


with Examples
Mortaza Jamshidiana and Parsa Jamshidianb
a
Department of Mathematics, California State University, Fullerton, Fullerton, CA; b Department of Biostatistics, University of California, Los Angeles, Los
Angeles, CA

ABSTRACT ARTICLE HISTORY


Using software to teach statistical inference in introductory courses opens the door for methods and Received October 2022
practices that are more conceptually appealing to students. With an increasing number of fields requiring Accepted March 2023
competency in statistics including data science, natural and social sciences, public health and more, it is
crucial that we as instructors deliver the basic concepts of statistics effectively. In line with guidelines pre- KEYWORDS
sented in the GAISE College Report, this article demonstrates intuitive approaches to teaching proportion Binomial exact test;
Constructing confidence
and mean inference that take advantage of statistical software and emphasize conceptual understanding. intervals; Introductory
The article recommends putting aside asymptotic-based methods for proportion inference and using the statistics; Inverting a test of
exact binomial method. Regarding mean inference, we propose a more contextualized and simplified hypothesis; Rguroo;
process that uses the distribution of the sample mean directly and avoids standardized statistics such Statistical software; Teaching
as z or t. In both the proportion and mean inference contexts, we discuss the benefits of the proposed mean inference; Teaching
approaches and provide detailed examples that demonstrate the methods using the Rguroo statistical proportion inference; Testing
software. hypothesis

1. Introduction Education Report (GAISE 2005) provided an impetus for an


exchange of ideas about teaching introductory statistics courses.
Recognizing the importance of statistics as a general education
This report was updated in 2016 (GAISE 2016) to account for
subject, many departments require students to take at least
significant spikes in enrollments and the availability of better
one statistics course. The Conference Board of the Mathemat-
technology and data. The GAISE College Report (2016) outlines
ical Sciences (CBMS) Report (Blair, Kirkman, and Maxwell
six recommendations on what to teach and how to teach courses.
2018) states that 253,000 students enrolled in introductory level
Two of these recommendations are to “focus on conceptual
statistics courses in Fall 2015 in the mathematics and statistics
understanding” and to “use technology to explore concepts and
departments in the United States, a 70% increase since 2005.
analyze data.” In this article, we aim to introduce methods for
Additionally, according to the College Board, 220,000 high
teaching proportion and mean inference to address both of these
school students enrolled in the Statistics Advanced Placement
recommendations.
(AP) exam in 2018, a 69% growth in enrollment compared to
The GAISE College Report and others (e.g., see Cobb 2006,
2005. There was a slight dip in the Statistics AP exam enroll-
2015 for other references) advocate using simulation-based
ments from 2019 to 2021, mainly attributed to the COVID-19
methods for teaching inference. However, simulation is not
pandemic.
currently at the forefront of methods used. We have searched
Recently, data science has emerged as a field that pays par-
20 mainstream introductory statistics textbooks (Gould, Ryan
ticular attention to its students’ statistical foundations. The
and Wong 2016; Rosner 2016; Agresti, Franklin, and Klingen-
National Academies of Science, Engineering and Medicine’s
berg 2017; McClave and Sincich 2017; De Veaux, Vellman, and
“Data Science for Undergraduates” consensus report (2018)
Bock 2018; Bluman 2019; Diez, Cetinkaya-Rundel, and Barr
puts forward various goals for data science students, including
2019; Hawkes 2019; Johnson and Gouri 2019; Peck, Short, and
understanding the “basic statistical concepts of data analysis,
Olsen 2020; Lock et al. 2021; Mann 2021; Tintle et al. 2021;
data collection, modeling, and inference.” Data science is just
Warren, Denley, and Atchley 2021; Moore, Not, and Fligner
one example of many areas of study in which competency in
2021a, 2021b; Navidi and Monk 2022a, 2022b; Quinton and
basic statistical concepts is essential. Other fields include the
Hawkes 2022; Triola 2022). Of these, 16 mainly use normal-
social sciences, natural sciences, business, and health sciences.
theory-based methods, and only De Veaux, Vellman, and Bock
This increased attention to statistics has encouraged inno-
(2018), Diez, Cetinkaya-Rundel, and Barr (2019), Lock et al.
vation and development of new pedagogy in the subject.
(2021), and Tintle et al. (2021) use simulation-based methods
The Guidelines for Assessment and Instruction for Statistics
for inference.

CONTACT Mortaza Jamshidian mori@fullerton.edu Department of Mathematics, California State University, Fullerton, Fullerton, CA.
© 2023 The Author(s). Published with license by Taylor and Francis Group, LLC.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by
the author(s) or with their consent.
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 55

While this article’s methods for teaching proportion and t-statistic. As we will show in Section 3, we can eliminate the
mean inference do not include simulation, we give credence to standardization step in both the normal-theory and t-based
ideas included in Cobb (2015) and GAISE (2016) who recom- inference. Two benefits of avoiding standardization are that
mend using simulation-based methods to teach inference. How- the number of computation steps is reduced, and working in
ever, adopting simulation-based methods in teaching introduc- the original data units allows us to introduce the inferential
tory courses has been slow, perhaps because of the following methods in the units of the data and thus in a natural context.
reasons: (a) Some instructors of introductory courses may not A key to implementing the methods that we discuss here is
have formal training in statistics or may be uncomfortable or the use of appropriate statistical software. While many textbooks
unfamiliar with simulation-based methods. According to the use technology in their presentations, there is room for fur-
CBMS 2015 report, over 60% of faculty who teach introductory ther modernization. Of the 20 introductory statistics textbooks
statistics courses in the Mathematics and Statistics departments that we looked at, 18 include z and t tables, and 14 provide
are not amongst tenured or tenure-eligible faculty. (b) Teaching TI calculator instructions. Considering the limitations of cal-
simulation concepts to introductory students requires a good culators and the availability of affordable statistical software,
amount of class time. This time may not be available to instruc- interestingly, many textbooks have not steered away from using
tors who must cover a set of topics within a semester or a probability tables and calculators. One of GAISE’s (2016) goals
quarter. (c) Simulation machinery may be difficult to learn for for students in introductory statistics courses is that “students
introductory students, especially if it involves writing computer should be able to interpret and draw conclusions from standard
code. A note on this point is that applets such as ArtofStat, output from statistical software packages.” Moreover, there are
Rossman/Chance, or StatKey provide great simulation tools indications that using software improves overall course success
to teach sampling distributions. However, statistical software rates (Robinson 2020). The GAISE College Report (2016) lists
where students can save and reproduce results should be used ten considerations for teachers when selecting technology tools.
for exercises and assessment. In this article, we use the Rguroo statistical software (https://
Although simulations are effective in teaching sampling vari- rguroo.com/) to present our examples. This software adheres to
ability, based on our experience, they are not as effective in the ten GAISE technology guidelines.
introducing inference concepts. We recommend presenting the
elements of inference using one-population proportion infer-
ence and propose the use of the exact binomial method. As 2. Teaching Proportion Inference Using Exact
we illustrate in Section 2, this approach allows us to put aside Binomial
the required simulation machinery and, as a result, makes it In teaching inference about a population proportion p, most
simpler for students to focus on the underlying concepts of introductory statistics texts use the sampling distribution of the
inference. Additionally, using the binomial distribution avoids sample proportion p̂ = X/n, where X is the number of successes
unnecessary complications such as sample size restrictions that in n trials of an experiment. Of these textbooks, most use the
arise with approximating the binomial by the normal through asymptotic distribution
the central limit theorem. This recommendation is in line with   
Chance and Rossman (2001), who state p(1 − p)
p̂ ∼ N p, , (1)
n
“Important but peripheral concerns associated with infer-
and a few texts (e.g., Lock et al. 2021; Tintle et al. 2021) obtain
ence for means should wait until students have an under-
the sampling distribution of p̂ by simulation.
standing of basic inferential principles. Studying proportions
If a student has understood the central limit theorem, then
first also allows for exact calculations of p-values and power
it is reasonable to assume that they would have a conceptual
from the binomial distribution.”
understanding of the asymptotic result in (1). From our expe-
As in proportion inference, understanding mean inference rience, however, most students in an introductory course do
has its own challenges for students. Grasping the sampling not get a good grasp of the central limit theorem and tend to
distribution of the sample mean, which forms the foundation follow prescribed steps to solve proportion inference problems.
of mean inference, is arguably more complicated than that Additional complexities in this approach include introducing
of the sample proportion. Again, using simulation to under- the formula for the standard error of p̂, shown in (1), and lim-
stand the sampling distribution of the sample mean is helpful; itations that are imposed on n and p by requiring large sample
however, moving on with simulation-based methods to teach sizes. Some books introduce relatively simple methods to deal
mean inference adds similar complications as those described with small n, for example, the “plus four” method of Agresti and
previously in the proportion inference context. All textbooks Coull (1998) for obtaining 95% confidence intervals; yet, this
that we researched, including Diez, Cetinkaya-Rundel, and Barr adds another layer of complexity.
(2019), Lock et al. (2021), and Tintle et al. (2021) that empha- Demonstrating the distribution of p̂ using simulation offers
size simulation-based methods include the normal and t-based students a conceptual understanding of sampling variability and
methods for mean inference. A question then is whether there distribution. Since counts are generally easier for students to
is room to improve the way we teach the normal- and t-based fathom than proportions, we prefer teaching sampling variabil-
methods. ity by simulating from the distribution of X, the number of
To our knowledge, every introductory statistics text trans- successes in n trials, where
forms the sample mean to obtain the unitless standard z- or X ∼ Binomial(n, p). (2)
56 M. JAMSHIDIAN AND P. JAMSHIDIAN

Only after the idea of sampling variability is understood, we state Condition 1: The probability that X belongs to the critical
the equivalence of the distribution of p̂ and X. region is at most α under the null distribution X ∼
As noted in the Introduction, we recommend teaching pro- Binomial(n, p0 ).
portion inference by computing binomial distribution proba- Condition 2: The X-values are determined so that the Type II
bilities directly through the exact binomial method. This only error is minimized.
requires the knowledge of the binomial random variable, a topic
that is covered early on in most introductory courses. Computa- Because the binomial distribution is discrete, a limitation of the
tion of binomial probabilities, which is generally difficult to do exact binomial test is that we are not guaranteed a size α test
by hand, will be left to software. where we would achieve the exact significance level, but we get
The idea of using the exact binomial method for inference a level α test, where we guarantee that the Type I error remains
about a population proportion dates back to about 90 years ago, below the significance level α. The distinction between the size α
as proposed by Clopper and Pearson (1934). But this method is test and level α test is not very important in introductory courses
not used as a main method in introductory textbooks. The older or generally in practice.
textbooks did not cover the exact binomial method, and reason- When explaining critical regions to students, we suggest
ably so, because computational software was not widely avail- starting with a one-sided test. Consider an alternative of the
able. Fortunately, calculating binomial probabilities for arbitrary form Ha : p > p0 . With a bit of guidance, students can conclude
values of n and p is now readily available via statistical software. that large values of X favor the alternative hypothesis Ha , and
Three advantages of using the exact binomial method are our critical region would consist of values in the right tail of the
as follows: (a) Count data has an elementary construct that distribution of X. Now let’s go through an example.
helps students learn elements of inference in a context-friendly
Example 1 (Pineapple Example). In this example, we conduct a
setting. (b) Knowledge of the central limit theorem, simulation
survey of the students in our classroom to answer the question:
methods, or formulas is not required. (c) Student learning is not
Do more than 20% of students in our class like to have pineapple
impeded by adding assumptions on the sample size n and the
on their pizza? The hypotheses here are
population proportion p. 
In the following two sections, we give details and examples of H0 : p = 0.20
(4)
the exact binomial method for hypothesis testing and obtaining Ha : p > 0.20,
confidence intervals for a one-population proportion.
where p is the proportion of students in our class who like
pineapple on their pizza. We take a random sample of size n =
2.1. Hypothesis Testing 10 from our class and ask each of the selected students if they
like pineapple on their pizza.1 Before revealing the result of our
In teaching hypothesis testing about a population proportion, sample, we ask students to guess the critical region. We then use
we start by introducing the following three types of null and our probability calculator to test whether their guesses satisfy
alternative hypotheses: Condition 1. After this exercise, we use a probability calculator
to find the critical region.
  
H0 : p = p0 H0 : p = p0 H0 : p = p0 Figure 1(a) shows Rguroo’s probability calculator dialog box.
(3) In the dialog box we select the Probability ⇒ Values
Ha : p < p0 Ha : p > p0 Ha : p = p0 ,
option to indicate that we are computing an inverse proba-
where p0 is a given fixed value. We explain to our students that bility. We then select the Binomial distribution, specify the
prior to looking at the data, we should decide on a significance parameters No of Trials, n = 10, Prob of Success,
level α for our test, which determines the amount (probability) p = 0.20, select the option Upper Tail, and type in our
of Type I error that we are willing to tolerate. In using the exact significance level 0.05. Figure 1(b) shows the resulting output.
binomial test, the decision on whether to reject or not reject the It shows that P(X ≥ 4) = 0.1209 > 0.05 and P(X ≥
null hypothesis will depend on our test statistic X, the number of 5) = 0.03279 < 0.05. Therefore, we conclude that our critical
successes that we observe in a sample of size n. Having learned region consists of X-values greater than or equal to 5. In other
the binomial distribution, students can identify the distribution words, if five or more students in our sample stated that they like
of X as binomial with the number of trials n and probability of pineapple on their pizza, we reject the null hypothesis.
success p0 . We refer to this distribution as the null distribution Usually, multiple student guesses satisfy Condition 1 and not
since p0 is the value of p in the null hypothesis H0 : p = p0 . In Condition 2. We then pose the question of why X ≥ 5 is the
the following two sections, we explain how to make a decision on correct critical region as opposed to other regions that satisfy
rejecting or not rejecting the null hypothesis based on a critical Condition 1. To explain this to students, we take the example
region and a p-value. of X ≥ 6. We compute P(X ≥ 6) = 0.0064 which is less
than 0.05 and therefore satisfies Condition 1. However, we note
that Condition 2 requires a region consisting of X-values that
2.1.1. Decision Based on a Critical Region
minimizes the Type II error. The Type II error for the critical
The critical region approach to testing a hypothesis about a pop-
region X ≥ 5 is smaller than that of X ≥ 6. We can clarify
ulation proportion at a significance level α involves determining
a set of X-values that form the critical (rejection) region. These 1
Our classes usually have much more than 10 students, and this provides a
X-values favor the alternative hypothesis and must satisfy the good sampling exercise as well as a good discussion about the generaliz-
following two conditions: ability of our decision.
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 57

Figure 1. Using Rguroo’s probability calculator to determine the critical region for the test of hypothesis (4).

this point to our students without computing the Type II error sample. Perform a test of hypothesis at the α = 0.05 significance
directly, and by noting that the region X ≥ 5 consists of more level.
X-values than the region X ≥ 6. Thus, we have a lower chance We use Rguroo’s One Population Proportion Inference func-
of failing to reject H0 when X ≥ 5 as compared to X ≥ 6, and tion to obtain the critical region. Figure 2 shows the dialog boxes
therefore X ≥ 5 is the region that has a smaller Type II error. for this function. The left panel shows the Basics dialog where
Students should understand this explanation since they would we have labeled the Factor as “Student Ethnicity,” and the
know that Success as observing a “Hispanic.” The values of Sample
Size, n = 25, # of Successes, xobs = 2, Alternative
P(Type II error) = P(Failing to reject H0 | Ha is true). Hypothesis, Ha : p = 0.2, and the Significance
Level, α = 0.05 are specified. To conduct the exact binomial
At this point, we state that the critical region for the cases
test, we select the Binomial option. The right panel shows
where Ha : p > p0 consists of the smallest k such that P(X ≥
Rguroo’s Details dialog where the Critical Region and
k) ≤ α, where X ∼ Binomial(n, p0 ). Analogously, for an
P-Value checkboxes are selected to obtain the corresponding
alternative hypothesis of the form H0 : p < p0 the critical region
graphs.
consists of the largest value of k for which P(X ≤ k) ≤ α.
Figure 3 includes the Rguroo output that shows the critical
The critical region for two-sided tests involves both the
region graph for this example. As shown by the red bars in
lower- and the upper-tail of the null distribution. In this case, we
the graph, we would reject the null hypothesis if our xobs is one
need to determine values k1 < k2 such that P(X ≤ k1 ) + P(X ≥
of 0, 1, 10, 11, . . . , 25. Our observed value is 2, as indicated by
k2 ) ≤ α while at the same time satisfying Condition 2. We leave
the green triangle, which does not fall in the critical region.
the task of obtaining k1 and k2 to software because it is somewhat
Therefore, we do not reject the null hypothesis. As noted earlier,
more complex than the one-sided case. The following is a two-
due to the discreteness of the binomial distribution, we cannot
sided example.
always obtain a critical region with the exact significance level of
Example 2 (UCLA Example). UCLA’s website2 states that 20% α = 0.05. The graph legend shows the exact significance level
of its students are Hispanic. Since some students do not report of α = 0.044722.
their ethnicity, we do not know whether this reported value
is an underestimate or an overestimate of the true proportion. 2.1.2. Decision Based on a p-Value
To investigate, we test the hypotheses H0 : p = 0.20 versus We begin by introducing the p-value as a measure of how
Ha : p = 0.20, where p denotes the true proportion of Hispanic strongly the evidence in our observed sample favors the alter-
students at UCLA. We took a random sample of size 25 from native hypothesis assuming that the null hypothesis is true. In
UCLA students and observed two Hispanic students in our teaching this concept, we go through the following steps:
Step 1: We explain to students that our evidence is xobs , the
2
observed number of successes in our sample of size n.
The first author uses data for his own university for this example and asks
students to take a sample at their university. The xobs value for UCLA in this Step 2: The strength of the evidence is measured using prob-
example is hypothetical. abilities. In particular, here we use the probabilities from
58 M. JAMSHIDIAN AND P. JAMSHIDIAN

Figure 2. Using Rguroo’s proportion Inference function to obtain the critical region and p-value graphs for the UCLA example.

Figure 3. Critical region graph for the UCLA Example.

a binomial distribution with parameters n, and probabil- Step 3: Using a few examples, we have students think about
ity of success p0 . We emphasize that we use p0 since in what we call Ha support, the set of values of X that are as favor-
our testing procedure we assume that the null hypothesis able or more favorable than xobs to the alternative hypothesis
is true. Ha under the null distribution. We prefer this wording of “as
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 59

Figure 4. Determining the Ha support and computing the p-value for the UCLA Example.

favorable or more favorable” to the commonly used wording graph should be familiar to students from when they learned
of “as extreme or more extreme,” since favorability exerts a about the binomial distribution. In our example, xobs = 2 with
direction. P(X = 2 | p = 0.2) = 0.07084. Students can see from the graph
Step 4: We define the p-value as that the values X = 0, 1, 8, 9, . . . , 25 have lower probabilities
  (shorter bars) than xobs = 2 and are more favorable to Ha . Thus,
p-value = P X belongs to Ha support | H0 is true .
the Ha support is the set of values {0, 1, 2, 8, 9, . . . , 24, 25}, and
Step 5: We decide to reject H0 if the p-value is smaller than the p-value is the sum of the probabilities for the values of X in
our predetermined significance level α. Otherwise, we fail to the Ha support which is 0.2073, as shown in Figure 4(b).
reject H0 . Once students understand the concept and steps involved
A key step here is determining the Ha support. In the case of in computing the p-value, we suggest that they use standard
one-sided tests, for example Ha : p > p0 , it is not difficult to proportion inference functions in statistical software to perform
explain to students that a larger number of successes X is more the computations. Figure 5 shows the output from the Rguroo
favorable to the alternative Ha . Thus, the Ha support for this case One Population Proportion function resulting from the input
consists of xobs , and values of X that are larger than xobs . Similarly, shown in Figure 2. The output includes a table consisting of the
it can be explained that the Ha support for Ha : p < p0 consists p-value of 0.20735 and a graph that shows the null distribution
of xobs and values of X that are less than xobs . marking xobs by a green triangle with red bars corresponding to
Determining the Ha support for two-sided tests is somewhat the Ha support. The legend includes a formula for how the p-
more involved. We explain to our students that one value being value is computed.
more favorable than another value to Ha is measured based on Figure 6 shows how to use the binom.test() function in
comparing probabilities; specifically, the values of X that are R to compute the p-value for the exact binomial test. This Figure
more favorable to Ha : p = p0 than xobs are those with prob- also includes the R output. The p-value in the R output agrees
abilities less than that of X = xobs under the null distribution. with that reported in Rguroo’s output in Figure 5.
Fortunately, this concept can be illustrated using the probability
bar graph of the null distribution X ∼ Binomial(n, p0 ). On the
2.2. Confidence Intervals
graph, we locate our observed value xobs . Then, the values of X
with bars as tall as xobs or shorter form our Ha support. We leave Most introductory statistics textbooks construct a 100(1 − α)%
the computation of the p-value to software. Let’s consider an confidence interval for one population proportion using
example.
p̂ ± z∗ × S.E.(p̂), (5)
Example 3 (UCLA Example—Continued). Recall the UCLA where p̂ is the sample proportion, and z∗ is the 1 − α/2 quantile
Example setting. To begin teaching the p-value approach, we of the standard normal distribution. Some approximate  the
ask students to determine the Ha support by looking at the null standard error of p̂ by replacing p̂ into the formula p(1 − p)/n.
distribution graph. Figure 4(a) shows a bar graph of the proba- A few textbooks use bootstrap to estimate the standard error
bility mass function for Binomial(n = 25, p = 0.2), the null dis- of p̂ or use the bootstrap distribution quantiles to obtain a
tribution, obtained using Rguroo’s probability calculator. This confidence interval.
60 M. JAMSHIDIAN AND P. JAMSHIDIAN

Figure 5. Rguroo’s One-Population Proportion output for the UCLA example.

Figure 6. R’s binom.test() output for computing the p-value for the UCLA example.

We recommend introducing confidence intervals for one e.g., Casella and Berger 2002). The method of inverting a test
population proportion by inverting an exact binomial test (see of hypothesis for obtaining a 100(1 − α)% confidence intervals
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 61

was initially proposed by Clopper and Pearson (1934). To obtain computing the Blyth-Still confidence interval. The Clopper-
the lower and upper confidence limits, they proposed finding all Pearson interval is shown in the R output in Figure 6 and agrees
p0 ’s for which we would not reject H0 : p = p0 versus two one- with that computed by Rguroo.
sided alternatives, Ha : p > p0 and Ha : p < p0 , at the α/2
significance level. Blyth and Still (1983) proposed an alternative
approach to inverting the two-sided test where one finds all p0 ’s 3. Teaching Mean Inference without Standardized
such that H0 : p = p0 is not rejected versus Ha : p = p0 at a Statistics
significance level of α. They go on to show that their proposed In many introductory texts, inference about a population mean
interval is narrower and is less conservative than the Clopper- is often introduced using the z- and t-based methods. To make
Pearson interval. inference about a population mean μ, the sample mean X̄ from
Triola (2022) mentions the use of the Clopper-Pearson con- a sample of size n is used as the statistic of choice. Using the cen-
fidence intervals for small samples and notes that they are too tral limit theorem, for a sufficiently large sample, the sampling
conservative. He goes on to say that their computation is beyond distribution of X̄ is
the scope of his textbook and does not offer any further details
or heuristics about the method. Tintle et al. (2021) introduce the X̄ ∼ N (μ, σx̄ ) , (6)
Blyth-Still method. To get around its computational complexity, √
where σx̄ = σ/ n denotes the standard error of X̄ with
they use a grid of plausible p0 values and perform two sided
σ denoting the population standard deviation. Almost every
tests at each of the grid points, using simulation. The boundaries
textbook standardizes the statistic X̄ by using the transformation
of the confidence interval are determined by the smallest and
largest grid point values for which H0 for the tests are not X̄ − μ
rejected. Z= ∼ N(0, 1). (7)
σx̄
We recommend using the Blyth-Still confidence interval
because of its direct connection with the two-sided exact bino- √ σ is not known, the estimate of the standard error σ̂x̄ =
When
mial hypothesis test presented in Section 3, as well is its more s/ n is used in place of the standard error in (7), where s is the
optimal properties. We explain to students that the interval is sample standard deviation.
comprised of plausible values p0 for the population proportion When the population distribution of the variable under study
p in which we do not reject the null hypothesis. To present has a normal distribution and σ is unknown, the t-statistic
confidence intervals based on the formula given in (5), some
X̄ − μ
textbooks simply give the formula and others resort to deriving T= ∼ tn−1 , (8)
the interval using probability statements and algebraic inequal- σ̂x̄
ities. Regardless of whether or not derivation of the formulas is used, where tn−1 denotes the standard Student t distribution
is included, this method does not provide the same concep- with n − 1 degrees of freedom.
tual understanding and connection to hypothesis testing as the In teaching mean inference, we have two recommendations:
inversion of a hypothesis test.
1. Begin by assuming that the population distribution of the
To introduce the method, we like the idea of using a grid of
variable under study is normal.
values as in Tintle et al. (2021). Once students understand the
2. Avoid using standardized statistics such as T and Z, shown in
idea behind inverting a test, we leave the computation of the
(7) and (8), and use the distribution of X̄ directly.
interval to software. Most software report the Clopper-Pearson
interval as the “Exact Binomial” confidence interval. The reason Recommendation 1 is motivated by the idea that students
for this may be that Blyth-Still intervals are more complex to would not need to have knowledge of the central limit theorem
implement. In the following example, we show how you can use nor would they need to be concerned with small or large-sample
Rguroo to obtain both the Clopper-Pearson and the Blyth-Still sizes while they begin to learn the basics of mean inference. We
confidence intervals. teach the central limit theorem only after introducing elements
of hypothesis testing and confidence intervals for proportions
Example 4 (UCLA Example—Continued). As in Example 2, and means assuming normality. In addition to reducing com-
suppose that in a random sample of 25 students from plexity, this delay makes us teach the central limit theorem
UCLA, we observe two Hispanic students. Figure 7(a) shows when students have reached more maturity about statistics and
Rguroo’s dialog for obtaining a 95% confidence interval sampling distributions.
using both the Blyth-Still and Clopper-Pearson methods. The Regarding the second recommendation, standardization is
Blyth-Still method is the default method and is calculated an age-old practice that is inherited from the times when
by checking the Binomial(Exact) option in the Basics probability calculators were not readily available, and we were
dialog. To obtain the Clopper-Pearson interval, the option forced to use tables for computing probabilities. As previously
Binomial(Exact-CP) is selected in Rguroo’s Details mentioned, surprisingly many textbooks continue to include
dialog. probability tables. As we will show, putting aside this tradition
Figure 7(b) shows Rguroo’s output. The Blyth-Still confi- and using a probability calculator from software reduces the
dence interval is (0.0144, 0.2559) which is narrower than the steps needed to perform hypothesis test or construct a confi-
Clopper-Pearson interval (0.0098, 0.2603). It is worth noting dence interval. Furthermore, avoiding standardization allows
that the binomial.test() function in R computes the for a conceptual understanding of inferential methods, since
Clopper-Pearson interval and does not have an option for we work in the units of the data rather than the standardized
62 M. JAMSHIDIAN AND P. JAMSHIDIAN

Figure 7. The binomial confidence intervals for the UCLA Ethnicity example.

Z and T statistics which are unit-less and cannot be directly 3.1.1. Decision Based on a Critical Region: the Normal
interpreted in the context of a problem. Distribution Case (σ Known)
In the following two sections, we will outline the steps that Consider testing the hypotheses in (9) at a significance level α,
we propose for hypothesis testing and developing confidence where we assume that the population is normal and therefore,
intervals for a population mean and include some examples. regardless of the sample size, the sample mean has the normal
distribution shown in (6) . Let x̄obs denote the observed sample
mean. Table (1) compares the steps required to obtain the critical
3.1. Hypothesis Testing region when using the standardized Z statistic (left panel) versus
We present the ideas in this section using a two-sided hypothesis using the distribution of X̄ directly (right panel).
test which has the form As we see in Table 1, using the Z statistic involves five steps,
whereas using the distribution of X̄ directly requires three steps.

H0 : μ = μ0 In both methods we would introduce the sampling distribution
(9) X̄ ∼ N(μ0 , σX̄ ). Also, to obtain the critical values, both methods
Ha : μ = μ0 ,
require the α/2 and 1−α/2 quantiles of the normal distribution.
where μ denotes the population mean and μ0 is a fixed hypothe- In the standardized case, we obtain the upper critical value z∗
sized value. The methods that we present can be adapted to one- using P(Z > z∗ ) = α/2, where Z ∼ N(0, 1), and the lower
sided tests where Ha : μ < μ0 or Ha : μ > μ0 . Although we critical value is −z∗ (Step 3). In the direct method, we obtain the
find it easier to use one-sided tests when initially explaining the lower critical value xL∗ using P(X̄ < xL∗ ) = α/2 and the upper
critical value using P(X̄ > xU ∗ ) = α/2, where X̄ ∼ N(μ , σ )
concepts of rejection regions and p-values, we choose to work 0 X̄
with two-sided tests here since they are more commonplace in (Step 2). The two additional steps in using the Z statistic involve
the real world. introducing the standardized statistic Z (Step 2) and computing
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 63

Table 1. Comparing steps to obtain the critical region using the standardized Z statistic versus using the distribution of X̄ directly for a two-sided test.

Using standardized Z Using distribution of X̄ Directly


Step 1: X̄ ∼ N(μ0 , σX̄ ) Step 1: X̄ ∼ N(μ0 , σX̄ )
X̄ − μ0
Step 2: Introduce Z = ∼ N(0, 1). Step 2: Obtain the lower and upper critical values such that P(X̄ < xL∗ ) = α/2 and P(X̄ > xU∗ ) = α/2.
σX̄
Step 3: Obtain the critical value z∗ such that P(Z > z∗ ) = α/2. Step 3: Reject H0 , if x̄obs < xL∗ or x̄obs > xU∗ .
x̄obs − μ0
Step 4: Standardize x̄obs , zobs =
σX̄
Step 5: Reject H0 , if zobs > z∗ or zobs < −z∗ .

Figure 8. Rguroo dialog for specifying parameters for the SAT Example.

zobs , the standardized value of the observed statistic x̄obs different from the national average. The mean math SAT score
(Step 4). for our sample was x̄obs = 565. Considering the standard devi-
When using the standardized Z statistic method, the required ation of 120 as the population standard deviation, we perform
probabilities can be looked up in a probability table. While this the following test
was a major advantage in not too far past when probability 
H0 : μ = 528
calculators were not easily accessible, today spending the time (10)
Ha : μ = 528,
to teach students how to use a probability table is no longer a
good use of our class time. We can compute probabilities and at α = 0.05 level, where μ is the mean math SAT score for CSUF
inverse probabilities using probability calculators that are widely freshmen students who started in Fall 2022.
available in software. More importantly, the critical values xL∗ An option is to use a probability calculator to obtain the
and xU ∗ and the observed value x̄
obs are in the units of the critical values. However, to get a more detailed output, we use
observed data. Contrast this with explaining the standardized Rguroo’s One Population Mean Inference function. The left
value zobs and z∗ which are unit-less and do not directly relate to image in Figure 8 shows Rguroo’s Basic dialog, where we specify
the context of the problem. Let’s consider an example to illustrate the required parameters to perform the test. The right image in
this point. Figure 8 is the Details dialog, where we have checked the options
of P-Value Graph and Critical Region Graph to
Example 5 (SAT Example). According to the College Board’s obtain graphs that can be used to visually explain the p-value
SAT Suite of Assessment Annual report, 1,509,133 high school and critical region.
students took the SAT exam in 2021. For these students, the Figure 9 shows a portion of the resulting Rguroo output
mean for the math portion of the exam was 528 out of 800 with report. Above the table, the alternative hypothesis is stated in
a standard deviation of 120. In Fall semester 2022, we asked words “Mean of CSUF SAT Math Scores is not equal to 528,”
our students at California State University, Fullerton (CSUF) to and the 2.5% lower and upper critical values of xL∗ = 494.74
take a random sample of 50 first year students and use their and xU ∗ = 561.26 are shown, respectively. The table consists
sample data to investigate whether the mean math SAT score of lower and upper critical Z scores of −1.96 and 1.96, respec-
for students at the CSUF campus for that year was significantly tively, and the standardized observed value of zobs = 2.18.
64 M. JAMSHIDIAN AND P. JAMSHIDIAN

Figure 9. Rguroo output showing critical region graph for the SAT Example.

Now let’s consider two possible explanations of these results to students that “if our observed sample mean for the Math SAT
students: score is outside of the plausible range of 495 to 561, we would
consider it significantly different than the hypothesized value of
Using standardized values: If zobs , the z-value corresponding
528.” Compare this to stating the rule using the standardized
to our observed sample mean, either exceeds the critical
values of 2.18 and the range −1.96 to 1.96, which don’t have a
value 1.96 or is less than −1.96, we reject the null hypothesis.
direct interpretation in the context of our problem.
In this example, our observed z value is 2.18 which falls in
Figure 9 shows two critical region graphs. The graph on the
the critical region. Therefore, we reject H0 and conclude that
left shows the null distribution in the scale of the observed
there is sufficient evidence at the 5% level that the mean math
data, and that on the right shows the null distribution in the
SAT scores of CSUF students is significantly different from
standardized z-scale. The red-shaded regions in both graphs
the national average of 528.
show the critical region. The green triangle on the left graph
Using data-scaled values: If our observed sample mean x̄obs of
points to the location of the observed sample mean 565, while
math SAT scores is less than the lower critical value of 495 or
that on the right graph shows the location of the standardized
exceeds the upper critical value of 561, we would reject the
observed sample mean 2.18. Both green triangles fall in the
null hypothesis. Since our sample mean of 565 is larger than
critical (red) region, indicating that we should reject the null
561 and falls in the critical region, we conclude that there is
hypothesis. The graph on the left is helpful in explaining the
sufficient evidence at the 5% level that the mean Math SAT
concept of the critical region. Specifically, it shows the distri-
scores of CSUF students is significantly different from the
bution of the sample mean X̄ in the scale of the data, centered
nation’s average of 528.
at the hypothesized null value of μ = 528. By looking at this
The main idea here is to teach our students the concept distribution, students can see plausible values for the sample
of “significantly different” than the hypothesized value. In this mean and make comparisons to the observed sample mean
example, it makes conceptual sense when we explain to our directly. On the other hand, the standardized graph on the right
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 65

Table 2. Comparing steps to obtain the p-value using the standardized Z statistic versus using the distribution of X̄ directly for a two-sided test.

Using Standardized Z Using Distribution of X̄ Directly


Step 1: X̄ ∼ N(μ0 , σX̄ ) Step 1: X̄ ∼ N(μ0 , σX̄ ) ⎧
X̄ − μ0 ⎨ P(X̄ ≤ xobs ) if xobs < μ0 .
Step 2: Introduce Z = ∼ N(0, 1). Step 2: Compute p-value = 2 ×
σX̄ ⎩
P(X̄ ≥ xobs ) if xobs ≥ μ0 .
x̄obs − μ0
Step 3: Standardize x̄obs , zobs = Step 3: Reject H0 if the p-value < α.
σX̄

⎨ P(Z ≤ zobs ) if zobs < 0.
Step 4: Compute p-value = 2 ×
⎩ P(Z ≥ z ) if z ≥ 0.
obs obs
Step 5: Reject H0 if the p-value < α.

Figure 10. Rguroo output showing p-value graph for the SAT Example.

is centered at zero, which has no direct connection to the stated Example 6 (SAT Example—Continued). Figure 10 shows the p-
null hypothesis and fails to contextualize the variability of the value graphs for Example 5, the SAT Example. The left panel
SAT math scores. shows the p-value graph in the scale of the data and the right
panel shows the p-value graph in the z scale. Again, because
the graph on the left panel is in the scale of the data it can
3.1.2. Decision Based on a p-Value: The Normal Distribution be used to explain the p-value conceptually, whereas the graph
Case (σ Known) on the right hand side, which is in the standard scale is not
Table 2 shows the steps required in teaching and computing p- quite as easily interpretable in the context of the data. So, we
values when using the commonly used standardized statistic Z continue with the graph on the left. The green triangle points
versus the un-standardized X̄. Similar to the critical region case, to the location of the observed sample mean of x̄obs = 565.
the standardized case involves more steps than directly using The orange horizontal dashed-line is drawn at the height of
X̄. As in the exact binomial case, to introduce the p-value for the density at x̄obs . We explain to students that the Ha support
testing hypothesis about μ, we define the Ha support as the X̄- (indicated by the red-shaded region) consists of all X̄ values
values that are as favorable or more favorable to Ha than x̄obs , (sample mean values) for which the density curve falls below
provided that the null hypothesis H0 is true; here, favorabil- the orange dashed-line; these values are as favorable or more
ity is determined based on density values. Specifically, when favorable to the alternative hypothesis than x̄obs . Thus, comput-
looking at the probability density function of X̄ ∼ N(μ0 , σX̄ ), ing P(X̄ belongs to the Ha support) amounts to finding the area
X̄-values for which the density is less than or equal to the of the red-shaded region. Note that by using this explanation,
density at x̄obs , are as favorable or more favorable than x̄obs to we justify the multiplication by 2 in the p-value formula for the
the alternative hypothesis. Then, as before, we define the p- two-sided test.
value as P(X̄ belongs to the Ha support). As we show in the next Figure 11 shows an R function for computing the p-value
example, it is very helpful to explain these concepts using a p- for a two-sided test with the normality assumption. We use this
value graph. function to calculate the p-value for this example. This p-value
66 M. JAMSHIDIAN AND P. JAMSHIDIAN

Figure 11. An R function for obtaining the p-value for the two-sided normal-theory-based test with the output for the SAT Example.

Table 3. Comparing steps to obtain the critical region using the standardized T statistic versus using the distribution of X̄ directly for a two-sided test.
Using standardization Not using standardization
X̄ − μ
Step 1: Introduce the statistic T = ∼ t(n−1) . Step 1: Introduce the statistic X̄ ∼ tn−1 (mean = μ0 , scale = σ̂X̄ )
σ̂X̄
Step 2: Obtain the critical value t∗ such that P(T > t∗ ) = α/2. Step 2: Obtain the lower and upper critical values such that P(X̄ < xL∗ ) = α/2 and P(X̄ > xU∗ ) = α/2.
xobs − μ0
Step 3: Standardize xobs , tobs = Step 3: Reject H0 , if x̄obs < xL∗ or x̄obs > xU∗ .
σ̂X̄
Step 4: Reject H0 , if tobs > t∗ or tobs < −t∗ .

agrees with that computed in Rguroo and shown in Figure 10. mean and standard deviation. Based on our experience, intro-
The z.test() function in the BSDA package in R can be used ductory students have no problem understanding the location-
to perform a z-test if raw data is available. scale family concept presented here. Much like using the general
normal distribution instead of the standardized Z, using the
3.1.3. The t-Distribution Case general t-distribution instead of its standardized counterpart
In our introductory courses, in addition to the general normal affords our students a better conceptual understanding by work-
distribution with arbitrary mean and standard deviation, we ing with a statistic that is in the scale of the data. As we will show,
introduce the standard t-distribution through the unit-less T computation of probabilities for the general t-distribution can
statistic given in (8). Like the general normal distribution, there be easily performed using software.
is also a general t-distribution with an arbitrary location (mean) Again, consider conducting the two-sided test in (9) at a

and scale (standard error). Let σ̂X̄ = s/ n denote the standard significance level α assuming that the population is normal with
error of X̄, where s is the sample standard deviation based an unknown standard deviation. Table 3 compares the steps
on a sample of size n. If T is a random variable which has a required for making a decision based on the distribution of
standard t-distribution with n − 1 degrees of freedom, then standardized T statistics (left panel) and using the distribution
X̄ = σ̂X̄ T + μ has a t-distribution with location μ, and scale σ̂X̄ . of X̄ directly (right panel). Like in the normal distribution case,
The probability density of X̄ is given by fX̄ (x) = (1/σ̂X̄ )fT ((x − there are fewer steps involved when avoiding standardization.
μ)/σ̂X̄ ), where fT (x) denotes the probability density function of Moreover, the decision in the last step is more conceptual when
the standard t-distribution with n−1 degrees of freedom. Using stated using the xL∗ and xU ∗ that are in the scale of the data rather

than t that is on the standardized scale. The following is an
this location-scale family result, if our data come from a normal
distribution with unknown population standard deviation, we example.
propose making inference about a population mean μ using the
distribution Example 7 (SAT Example—the t-test). Consider the hypothesis
test in Example 5, and assume that the population is normal with
X̄ ∼ tn−1 (μ, σ̂X̄ ), (11) sample standard deviation s = 120 based on a sample of size 50.
where tn−1 (μ, σ̂X̄ ) denotes the t distribution with location μ, In this case, our null distribution will be
scale σ̂X̄ , and n − 1 degrees of freedom. √
X̄ ∼ t49 (location = 528, scale = 120/ 50). (12)
The process of teaching the general t-distribution in (11)
after students get familiar with the standard t-distribution is Figure 12 shows Rguroo’s probability calculator for comput-
similar to that of transitioning from the standard normal dis- ing the critical values xL∗ and xU
∗ . In the probability dialog shown

tribution to the general normal distribution with an arbitrary in Figure 12(a), we select the option Probability ⇒
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 67

Figure 12. Calculating the critical region for Example 7.

Values to obtain an inverse probability, and fill in the values To get a more detailed output we can perform this test in
for DF (degrees of freedom = 49), Center
√ (location = 528), Rguroo’s One Population Mean Inference function. The input
and Scale (standard error = 120/ 50 ≈ 16.971). In the for this case is exactly the same as that shown in Figure 8
drop-down menu, we select the option Outside Tails and with two exceptions; instead of filling the population standard
specify a probability value of 0.025 for each of the left and right deviation we fill in the Sample S.d. with the value of 120,
tails of the distribution. The probability calculator shows the and we select the option t-statistic.
critical values of xL∗ = 493.9 and xU ∗ = 562.1. Figure 12(b) Figure 13 shows the resulting output. This output consists of
shows the density of X̄ ∼ tn−1 (μ, σ̂X̄ ), with the critical values the critical region, a critical region graph, the p-value, and a p-
marked with small green triangles. value graph. All of these quantities are reported both in standard
68 M. JAMSHIDIAN AND P. JAMSHIDIAN

Figure 13. Rguroo’s mean inference output for Example 7.

form, using the T statistic, and in the scale of the data using the Rguroo report, shown in Figure 13. The t.test() function
distribution of X̄ directly. in R can be used to perform a t-test if raw data is available.
Figure 14 shows an R function for computing the p-value for
a two-sided t-test. The p-value for this example is computed Note that the process of obtaining p-values using the distri-
using this function, and it agrees with that computed in the bution in (11) is a similar to that shown in Table 2. Changing Z
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 69

Figure 14. An R function for obtaining the p-value for the two-sided t-test with the output for the SAT Example.

to T and replacing the N(μ0 , σ̂X̄ ) with tn−1 (μ0 , σ̂X̄ ) in Table 2 value of x̄obs and σX̄ by creating a grid of μ0 -values around
forms the steps for obtaining the p-value in the t-distribution x̄obs and performing repeated tests at each value of the grid.
case. A confidence interval for μ in this case would be the grid
boundaries for which we do not reject the null hypothesis. The
following is a specific example that we give to students to work
3.2. Confidence Intervals on in groups.
As in proportion inference, we recommend teaching confidence
intervals for a population mean after teaching hypothesis test- Example 8 (Inverting a Test—Group Activity). Consider a sit-
ing. Assuming a normal population, the most commonly used uation where the sample size n = 100, x̄obs = 10, and the
method for obtaining a confidence interval is population
√ standard deviation σ = 20, which leads to σX̄ =
20/ 100 = 2. We ask our students to perform the test of
X̄ ± margin of error, (13) hypothesis in (9) at a significance level α = 0.05 for the
where X̄ is the sample mean, and the margin of error is z∗ × following values of μ0 : 5, 6, 7, 8, 10, 12, 13, 14, and 15 and form
σX̄ , if σ is known or t ∗ × σ̂X̄ , if σ is unknown with z∗ and a table of p-values for each μ0 . Then, students would determine
t ∗ denoting the 1 − α quantile of the standard normal and for which values of μ0 the hypothesis H0 : μ = μ0 is rejected.
the Student t-distributions, respectively. For the most part, this Based on this exercise, we ask our students to guess the greatest
formula is simple for students to use, and confidence intervals value less than x̄obs and the smallest value greater than x̄obs for
for a population mean can be computed easily by hand or using which they would not reject the null hypothesis. We hint that the
software. To interpret a confidence interval, we typically teach smallest and the largest values may not be one of the μ0 values
students to use the template: “We are 100(1−α)% confident that that they have tested.
the true population parameter lies between the lower bound and Students should obtain the following p-value table:
the upper bound.” μ0 5 6 7 8 10 12 13 14 15
Beyond these formulas and routine interpretations, it is
p-value 0.0124 0.0455 0.1336 0.3173 1.00 0.3173 0.1336 0.0455 0.0124
important for students to get a more thorough understanding
of a confidence interval. For instance, students should be taught As seen from the table, H0 is not rejected at the α = 0.05 level
the classical interpretation that a constructed confidence inter- for the values 7, 8, 10, 12, and 13, and it is rejected for the values
val would contain the true mean 100(1 − α)% of the time 5, 6, 14, and 15. In obtaining a guess for the lower- and upper-
in repeated experiments. To demonstrate this, we often use bound of the confidence interval, most student groups in our
applets and simulations. Applets are very useful in teaching classes are able to deduce that the lower-bound value should be
the classical interpretation of confidence intervals, but beyond between 6 and 7 and the upper-bound value should be between
mechanically rerunning the applets, it is difficult to devise 13 and 14. At this stage, we ask groups to present their guesses.
elementary-level exercises for introductory students that drill Then, we compute the confidence interval using the formula in
down the idea. Again, we recommend using the concept of (13). For the most part, the guesses should be fairly close to the
inverting a test to obtain confidence intervals for a population actual confidence interval bounds. This gives us an opportunity
mean. As we explain, introducing this inversion method pro- to discuss the difference between the actual 1.96 multiplier and
vides students with a hands-on opportunity to construct con- the values of μ0 , 6 and 14, that were exactly 2 standard errors
fidence intervals and develop a conceptual understanding for away from x̄obs . In our experience, this has been an engaging
them. activity that helps students understand the relationship between
For a given value of xobs , we describe a 100(1−α)% confidence confidence intervals and tests of hypotheses.
interval for the population mean μ as all values of μ0 for which
the two-sided test in (9) is not rejected. As a hands-on activity, Depending on the level of students in your class, you may
we ask students to construct a confidence interval using a given further exploit the duality between confidence intervals and
70 M. JAMSHIDIAN AND P. JAMSHIDIAN

Figure 15. Relationship between confidence interval and acceptance region for one population mean inference.

tests of hypotheses. The general idea is that the confidence line with GAISE guidelines to place emphasis on conceptual
interval fixes the sample value (observed sample mean x̄obs ) and understanding:
asks for what values of μ0 the hypothesis (9) is not rejected.
On the other hand, the hypothesis test fixes the parameter μ0 1. Teach discrete random variables, and in particular the bino-
and asks for what values of xobs we do not reject H0 (i.e., what is mial distribution.
the acceptance region?). This concept is illustrated in Figure 15, 2. Teach concepts of variability utilizing in-class activities and
where the shaded blue is the region where simulating from a binomial random variable.
3. Introduce elements of test of hypothesis (null and alternative
μ − z∗ σX̄ ≤ X̄ ≤ μ + z∗ σX̄ . hypotheses, Type I and Type II errors) in the context of the
one-population proportion problem.
Therefore, for a given μ0 , as shown on the horizontal axis, we 4. Use the exact binomial test to perform tests about a one
get the acceptance region, shown on the vertical axis, depicting population proportion.
all X̄ for which the null hypothesis H0 : μ = μ0 is not 5. Use the idea of inverting a test to obtain confidence intervals
rejected. for a population proportion and use software to obtain the
Analogously, the shaded blue region can be thought of as the Blyth-Still confidence intervals.
region where 6. Teach continuous random variables and in particular the
general normal and t distributions.
X̄ − z∗ σX̄ ≤ μ ≤ X̄ + z∗ σX̄ . 7. Teach mean inference, assuming a normal population distri-
bution, and use the distribution of the sample mean directly;
Therefore, for a given x̄obs , as shown on the vertical axis, we get avoid standardized statistics, the central limit theorem, or
the confidence interval, shown on the horizontal axis, depicting simulation-based methods.
all μ0 values for which the null hypothesis H0 : μ = μ0 is not 8. Teach the central limit theorem using simulation and applets.
rejected. 9. Adapt the normal-theory inference methods to the nonnor-
mal case by utilizing the central limit theorem result.
4. Summary and Discussion
It is important that students get familiar with the concept
Introductory statistics courses have adapted to technological of sampling variability before learning statistical inference. As
advancements over the years by including instructions on how noted by Chance and Rossman (2001), the context of pro-
to perform statistical analyses using software. To use the full portion inference is helpful in teaching sampling variability
capabilities of statistical software, we should not simply use since simulation of binary data, whether through class activities
software as a means of avoiding by-hand computations but or via computer software, is relatively easy to implement and
take advantage of its potential to demonstrate and teach sta- understand.
tistical concepts. The following roadmap for teaching pro- Using the exact binomial method when beginning to teach
portion and mean inference aims to achieve the latter in hypothesis testing provides a smooth segue into learning
JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION 71

inference concepts since the binomial distribution is often intro- students have been performing significantly better in answer-
duced early on in introductory courses. Furthermore, using ing conceptual questions given in their exams as compared to
this method gives us the opportunity to present inference in when the first author used more traditional approaches to teach
the context-friendly setting of count data and allows us to put inference.
aside any assumptions on the sample size n and proportion p
that one would have when using the asymptotic distribution
of p̂. References
Teaching the Blyth-Still confidence intervals naturally fol- Agresti, A., Franklin, C., and Klingenberg, B. (2017), Statistics: The Art and
lows from using the exact binomial test in proportion inference. Science of Learning from Data, Fourth Edition, Pearson, Boston.
Although we need to use software to compute this interval, Agresti, B., and Coull, B. A. (1998), “Approximate is Better than Exact for
it allows us to talk about inverting a hypothesis test, which Interval Estimation of Binomial Proportions,” The American Statistician,
52, 119–126. DOI:10.1080/00031305.1998.10480550.
provides students with a conceptual view of a confidence inter-
Blair, R., Kirkman, E. E., and Maxwell, J. W. (2018), Statistical Abstract
val. As a side note, since the distribution of p̂ is symmetric or of Undergraduate Programs in the Mathematical Sciences in the
nearly symmetric for large samples, half of the width of a Blyth- United States, Fall 2015 CBMS Survey, The American Mathematical
Still confidence interval gives a good measure of the margin of Society, available at https://www.ams.org/profession/data/cbms-survey/
error. In small sample sizes, the distribution symmetry is not cbms2015-Report.pdf .
Bluman, A. G. (2019), Elementary Statistics, a Step by Step Approach (8th
guaranteed, and one cannot obtain an interpretable margin of
ed.), New York: McGraw Hill.
error. Blyth, C., and Still H. (1983), “Binomial Confidence Intervals,” Jour-
Regarding mean inference, standardized statistics such as z nal of the American Statistical Association, 78, 108–116. DOI:10.1080/
and t obscure the context of inference problems. By teaching 01621459.1983.10477938.
mean inference using the distribution of the sample mean X̄ Casella, G., and Berger, R. L. (2002), Statistical Inference, Pacific Grove:
directly, we allow students to work with a more tangible statis- Duxbury.
Chance, B., and Rossman, A. (2001), “Sequencing Topics in Introductory
tic that is in the units of the data. Furthermore, this reduces Statistics: A Debate on What to Teach When,” The American Statistician,
the number of steps required to perform a hypothesis test 55, 140–144. DOI:10.1198/000313001750358626.
for a population mean. To teach mean inference, we begin Clopper, C. J., and Pearson, E. S. (1934), “The Use of Confidence or Fiducial
by assuming that the population is normal. This prevents stu- Limits Illustrated in the Case of the Binomial,” Biometrika, 26, 404–413.
dents from being distracted from unneeded concepts such as DOI:10.1093/biomet/26.4.404.
Cobb, G. (2006), “Characterizations of Exemplary Statistics Instructions:
the central limit theorem or needing to understand simula- Five Approaches.” Journal of Statistics Education, 14, 1–13.
tion machinery and methods. Only when students have gained Cobb, G. (2015), “Mere Renovation is Too Little Too Late: We Need
more maturity with sample variability through basic methods to Rethink Our Undergraduate Curriculum from the Ground Up,”
should we resort to simulation to introduce the central limit The American Statistician, 69, 266-282. DOI:10.1080/00031305.
theorem. 2015.1093029.
De Veaux, R. D., Vellman, P. F., and Bock (2018), Intro Stats (5th ed.),
It is fair to ask how these methods would extend to the two-
Boston: Pearson.
population case where we compare proportions or means. We Diez, D., Cetinkaya-Rundel, M., and Barr, C. D. (2019), OpenIntro Statistics,
can resort to Fisher’s exact test to teach an exact method for Openintro.org, available at https://www.openintro.org/book/os.
the two-population proportion problem. However, we like the GAISE. (2005), Guidelines for Assessment and Instructions in Statistics Edu-
permutation test advocated by GAISE (2016) because it can cation. PreK-12 and College Report. Alexandria, VA: American Statistical
be introduced nicely through class activities and is intuitive. Association.
GAISE College Report ASA Revision Committee. (2016), Guidelines for
Permutation tests are included in the Common Core State Stan- Assessment and Instruction in Statistics Education College Report 2016.
dards and have been part of a number of states’ curricula for Available at http://www.amstat.org/education/gaise.
many years. As for mean inference, the one-population methods Gould, R., Ryan C., and Wong, R. (2016), Essential Statistics: Exploring the
described in Section 3 can be generalized to the two-population World Through Data, Boston: Pearson.
case where we use the distribution of the difference of sample Hawkes, J. S. (2019), Discovering Statistics and Data (3rd ed.), Mt Pleasant,
SC: Hawkes Learning.
means for the two populations directly instead of standardizing Johnson, R. A., and Gouri K. B. (2019), Statistics: Principles and Methods,
them. This will have similar advantages to those we discussed Hoboken, NJ: Wiley.
for the one population case. Lock, R. H., Lock, P. F., Lock, K., Lock, E. F., and Lock, D. F. (2021),
The methods discussed in this article emanate from the Statistics: Unlocking the Power of Data, New York: Wiley.
first author’s many years of experience in teaching introductory Mann, P. S. (2021), Introductory Statistics (10th ed.), New York:
Wiley.
courses and the second author’s perspective as a student. The
McClave, J., and Sincich, T. (2017), Statistics (13th ed.), Boston: Pearson.
first author’s department offers a general education introductory Moore, D. S., Not, W. I., and Fligner, M. A. (2021a), The Basic Practice of
statistics course that is taken by students majoring mainly in Statistics (9th ed.), New York: W. H. Freeman and Company.
the natural sciences, in particular biology, chemistry, and geol- (2021b), Introduction to the Practice of Statistics (10th ed.), New
ogy. In teaching this course, he has implemented the approach York: W. H. Freeman and Company.
outlined in this article in the past few years. While no formal National Academies of Science, Engineering, and Medicine. (2018),
Data Science for Undergraduates: Opportunities and Options, National
study has been conducted to compare the proposed meth- Academies. Available at https://nap.nationalacademies.org/read/25104.
ods’ effectiveness to more commonly used or simulation-based Navidi, W., and Monk, B. (2022a), Elementary Statistics (4th ed.), New York:
methods, student feedback has been overwhelmingly positive. McGraw Hill.
Also, as an indication that the approaches proposed are effective, (2022b), Essential Statistics (3rd ed.), New York: McGraw Hill.
72 M. JAMSHIDIAN AND P. JAMSHIDIAN

Peck, R., Short, T., and Olsen, C. (2020), Statistics & Data Analysis (6th ed.), Rosner, B. (2016), Fundamentals of Biostatistics, Boston: Cengage Learning.
Boston: Cengage Learning. Tintle, N., Chance, B. L., Cobb, G. W., Rossman, A. J., Roy, S., Swanson, T.,
Quinton N. S., Hawkes, J. S. (2022), Discovering Business Statistics (2nd ed.), and VanderStoep, J. (2021), Introduction to Statistical Investigations, New
Mt Pleasant, SC: Hawkes Learning. York: Wiley.
Robinson, S. (2020), “Mixing It up: The Impact of Resequencing Topics Triola, M. F. (2022), Elementary Statistics (4th ed.), Boston: Pearson.
in an Undergraduate Introductory Statistics Course,” PRIMUS, 30, 415– Warren, C., Denley, K., and Atchley, E. (2021), Beginning Statistics (3rd ed.),
446. DOI:10.1080/10511970.2019.1600178. Mt Pleasant, SC: Hawkes Learning.

You might also like