You are on page 1of 12

Mogull M247 F23 – Final Review

Final Exam Review


This exam covers Lectures 1-23, Chapters 1-11 in the textbook. The first part of this review is the
learning objectives of each lecture. The second part of the review discusses statistical inference, which
has been our most important topic.

You should be reviewing exams, quizzes, lecture notes, textbook, and homework in preparation for the
exam.

As you study, you should consider the following:

• Statistics is about interpretation, you should expect that I ask you ‘what does it mean’ or to
interpret your answer.
• You will be asked conceptual questions to check your understanding of definitions, how
statistics works, and what does it all mean.
• You should be looking at the examples we did in the lectures for a sense of the types of
questions I write, and how I want them answered
• You may use a calculator and notes at all times, but you may not work with others.

My lectures are a reorganization of the textbook, so if you read and understand both then you will get
two different perspectives on the same material. The review topics are listed in order of my lecture
notes.

Review by Lecture
Lecture 1: Introduction and Gathering Data (1.1-1.3, 1.5)
1. Discuss the difference between a population and a sample.
2. Discuss why a representative sample is important and how to obtain one.
3. Identify an observation and a variable in a dataset.
4. Identify a numerical variable vs a categorical variable.
5. Discuss and apply the Statistical Investigation Process (Data Cycle).
6. Discuss the differences between descriptive statistics and inferential statistics.
7. Describe information given in a dotplot graph.
8. Describe the difference between explained variability and random variability.
9. Identify a controlled experiment vs an observational study.
10. Discuss the purpose of performing a controlled experiment.
11. Describe how to perform a controlled experiment, and the role of each of the following:
randomization, treatment variable, response variable, confounding variable, treatment group,
control group, blinding, double blind, placebos, and placebo effect.
12. Discuss the purpose of performing an observational study.
13. Discuss whether a study can determine an association or causation between variables.

Lecture 2: Describing Categorical Data and Graphing Data (1.2, 1.4, 2.1-2.5)
1. Discuss the terms frequency, relative frequency, proportion, and percent. Convert between
these.
2. Create and discuss a frequency, relative frequency, and two-way (contingency) table.

1
Mogull M247 F23 – Final Review

3. Discuss the types of graphs that can be created for categorical variables, and the benefits (or
downfalls) of each type of graph. Do the same for numerical variables.
4. Recode categorical data as numerical data.
5. Discuss stacked vs unstacked data, and change the data format between the two.
6. Utilize Statcrunch to complete the following: upload a dataset, create frequency and relative
frequency tables, create contingency tables, unstack data, create a graph, and change settings
of a graph.
7. Discuss how graphs can be misleading, what to check for, and what to avoid.
8. Read information from a graph.
9. Discuss how to determine if an observation is unusual.
10. Discuss the shape, center, and spread of a numerical graph.

Lecture 3: Summary Statistics for a Numerical Variable (3.1, 3.3-3.5)


1. Discuss the different measurements of center for a numerical variable.
a. Which measures are preferred, and how do we identify them on a graph?
b. What notation do we use for the sample mean and population mean?
c. What does the shape of the graph tell us about the mean and median?
2. Discuss the different measures of variability for a numerical variable.
a. Which measures are preferred, and how do we identify them on a graph?
b. What notation do we use to represent these measures (distinguish between sample and
population)?
c. Interpret the standard deviation and IQR in context.
3. Describe the term: resistant (to outliers); discuss which statistics are resistant, and which are
not.
4. Discuss what a percentile is, and how it connects to Q1, the median, and Q3.
5. Discuss how the five-number summary is used, calculate it by hand (for small sets of data), and
identify it on a boxplot.
6. Sketch a boxplot by hand, and identify the parts to the boxplot.
7. Interpret a boxplot, determine if a variable is skewed right, skewed left, or symmetric using a
boxplot, and compare two groups using boxplots.
8. Compute the following by hand (for a small set of data), and be able to use the proper notation
(where applicable): mean, median, mode, range, variance, standard deviation, Q1, Q3, IQR, the
five-number summary.
9. Utilize Statcrunch to complete the following:
a. Calculate summary statistics for a numerical variable (for both stacked and unstacked
data):
i. The mean, median, mode, Q1, Q3, range, IQR, variance, standard deviation,
five-number summary, etc.
ii. Select only the summary statistics of interest for the output, and have them
printed in the same output.
b. Graph boxplots (be able to put multiple boxplots on the same set of axes).

Lecture 4: Measures of Relative Standing (3.2)


1. Describe, calculate, and interpret a percentile.
2. Describe, calculate, and interpret a z-score.

2
Mogull M247 F23 – Final Review

3. Discuss what standard units refer to.


4. Find the value of an observation, if given a z-score.
5. Compare two observations using z-scores.
6. Identify unusual z-scores.
7. Describe the meaning of the Empirical Rule.
8. Calculate intervals within 1, 2, and 3 standard deviations of the mean using the proper notation.
a. Use the Empirical Rule to discuss the percent of observations we expect in each of those
intervals.
b. Use Chebyshev's Rule to discuss the percent of observations that must be in each of
those intervals.
9. Discuss the difference of when to use Empirical Rule vs Chebyshev's Rule.
10. Use the Empirical Rule to breakup a unimodal symmetric distribution into percentages.
11. Find percentiles using the Empirical Rule.

Lecture 5: Correlations Between Numerical Variables (4.1-4.2)


1. Create a scatterplot for two numerical variables.
2. Identify and interpret the trend of a scatterplot.
3. Identify linear vs nonlinear scatterplots.
4. Discuss the strength of an association in a scatterplot.
5. Identify outliers in scatterplots.
6. Calculate the correlation coefficient r.
7. Identify the appropriate value of r for a scatterplot.
8. Discuss the properties of r.

Lecture 6: Regression Lines (4.3-4.4)


1. Determine if it is a appropriate to calculate a least squares regression line for two variables.
2. Create a least squares regression line using Statcrunch.
3. Identify the y-intercept of a regression line, and interpret its meaning in context.
4. Determine if it is appropriate to interpret the y-intercept for the context of a particular dataset.
5. Identify the slope of a least squares regression line, and interpret the slope in context.
6. Discuss the meaning of a 0 slope for a regression line.
7. Predict values of the response variable for given values of the predictor variable using the
equation of a regression line.
8. Discuss extrapolation with regression lines.
9. Discuss the meaning of residuals.
10. Describe where the name 'least squares regression line' comes from.
11. Calculate the coefficient of determination (𝑟 2 ) using Statcrunch, or by squaring the value of 𝑟.
12. Interpret the meaning of the coefficient of determination in context.
13. Use r-squared to determine which regression line is better.
14. Discuss explained variability, unexplained variability, and random variability.
15. Discuss the potential effect of outliers on a regression line.

Lecture 07: Probability Part I (5.1-5.2)


1. Discuss the importance of probability.
2. Discuss randomness and random experiments.

3
Mogull M247 F23 – Final Review

3. Discuss the three methods for computing probability.


4. Create the sample space for an experiment (using tree diagrams as necessary).
5. Find the size of a sample space
6. Write sample points in an event and find its complement, find the union and intersection
7. Discuss the probability rules.
8. Compute a theoretical probability with equally likely outcomes.

Lecture 08: Probability Part II (5.2-5.3)


1. Combine events using AND and OR.
2. Create a Venn Diagram.
3. Calculate probabilities of AND and OR events with equally likely outcomes.
4. Discuss mutually exclusive events.
5. Apply probability rules to compute the probability of an OR event.
6. Discuss the meaning of conditional probabilities.
7. Discuss the differences in wording "given that" vs "and."
8. Calculate conditional probabilities.
9. Discuss the meaning of independent events vs associated events.
10. Determine if two events are independent or associated by performing calculations.
11. Determine if two events are independent or associated by using reasoning.

Lecture 09: Probability Part III (5.3-5.4)


1. Discuss the differences in the multiplication rule for associated events vs independent events.
2. Calculate the probability of a sequence of associated events.
3. Calculate the probability of a sequence of independent events.
4. Calculate probabilities by using the complement.
5. Create a tree diagram for a sequence of events.
6. Use a tree diagram to calculate the probability of a sequence of events.
7. Discuss the importance of the Law of Large Numbers.
8. Use a simulation to approximate a probability.

Lecture 10: Discrete Probability Distributions (6.1, 6.3)


1. Describe what a random variable is.
2. Identify a random variable as either discrete or continuous.
3. Define a probability distribution.
4. Create a probability distribution for a discrete random variable.
5. Use a probability distribution to answer probability questions.
6. Calculate and interpret the expected value of a discrete random variable.
7. Determine if an experiment is modeled by the Binomial Random Variable.
8. Identify the components of a Binomial Random Variable.
9. Use Statcrunch to calculate probabilities in the Binomial Distribution.
10. Calculate and interpret the expected value of a Binomial Random Variable.
11. Calculate the standard deviation of a Binomial Random Variable.
12. Interpret the expected value of a Binomial Random Variable.

4
Mogull M247 F23 – Final Review

Lecture 11: Continuous Probability Distributions (6.1, 6.2)


1. Describe the differences between how we work with a continuous distribution vs a discrete
distribution.
2. Describe the characteristics of the Uniform Distribution.
3. Identify a histogram as roughly uniform.
4. Calculate the probability distribution function ÔøΩ(ÔøΩ) for the Uniform Distribution.
5. Calculate and interpret probabilities in the Uniform Distribution.
6. Calculate and interpret percentiles in the Uniform Distribution.
7. Calculate and interpret the expected value of a Uniform Distribution.
8. Calculate the standard deviation of the Uniform Distribution.
9. Describe the characteristics of the Normal Distribution.
10. Describe the characteristics of the Standard Normal Distribution.
11. Convert values between any Normal Distribution and the Standard Normal Distribution.
12. Find probabilities in a real application of the Normal Distribution by computing z-scores,
drawing/shading sketches, and using Statcrunch.
13. Apply the inverse Normal Distribution problems to find percentiles (and similar types of
questions) by drawing and shading a sketch, and converting values of z to values of x.
14. Use Statcrunch to:
a. Calculate probabilites in the Standard Normal Distribution.
b. Calculate probabilites in any Normal Distribution.
c. Calculate percentiles and other inverse Standard Normal Distribution problems.
d. Calculate percentiles and other inverse Normal Distribution problems.

Lecture 12: Survey Sampling and Inference (7.1-7.3)


1. Discuss the population, parameter, sample, and statistic for a particular question.
2. Apply the correct symbol to a parameter or statistic.
3. Discuss possible sources of measurement bias.
4. Discuss possible reasons for selection bias.
5. Discuss the idea of biased statistics.
6. Discuss the idea of a sampling distribution.
7. Discuss the meaning of accuracy and its connection to bias.
8. Discuss the meaning of precision and its connection to standard error.
9. Discuss the effect of changing sample size to accuracy and precision.
10. Calculate the standard error.
11. Discuss the importance of the Central Limit Theorem.
12. Discuss how to check the conditions of the Central Limit Theorem.
13. Apply the Central Limit Theorem to calculate probabilities in a sampling distribution.

Lecture 13: Estimating Population Proportions with Confidence Intervals (7.4-7.5)


1. Discuss the idea behind confidence intervals, and terminology relating to confidence intervals.
2. Discuss and calculate values of z* for creating a confidence interval.
3. Calculate a confidence interval for one proportion using the steps: check conditions, calculate
the interval, interpret the interval.
4. Discuss the meaning of a confidence interval for the difference of two proportions.
5. Discuss and determine independent vs dependent samples.

5
Mogull M247 F23 – Final Review

6. Discuss random assignment vs random sampling.


7. Create a confidence interval for the difference of two proportions using the steps: check
conditions, calculate the interval (using Statcrunch), interpret the interval.

Lecture 14: The Four Step Process to Hypothesis Testing for One Proportion (8.1-8.2)
1. Write the null and alternate hypotheses in symbols and words, and discuss their meanings.
2. Describe the difference between a right tail test, a left tail test, and a two tail test. Determine
which test we have.
3. Check the conditions for a hypothesis test and discuss their importance.
4. Discuss significance level and its meaning.
5. Calculate the test statistic and discuss its meaning.
6. Discuss the meaning of a p-value, and calculate it using Statcrunch.
7. Discuss the possible outcomes from a hypothesis test and make the appropriate conclusion.
8. Write the conclusion to a hypothesis test.

Lecture 15: Hypothesis Testing for One Proportion (8.2-8.3)


1. Apply the four step process for a hypothesis test of one proportion.
2. Discuss the importance of Type I and Type II Errors.
3. Determine whether a Type I or a Type II Error could have been made.
4. Interpret a Type I and Type II Error in context.
5. Discuss the connection between a confidence interval and a hypothesis test.
6. Discuss statistical significance vs practical significane.

Lecture 16: Comparing Proportions from Two Populations (8.4)


1. Discuss and write the parameters of interest for a study.
2. Write the null and alternate hypotheses in symbols and in words.
3. Discuss and check the conditions for the test.
4. Use Statcrunch to calculate a hypothesis test for two proportions, and discuss the output.
5. Discuss the meaning of the p-value in context.
6. Make the appropriate conclusion, and interpret this conclusion in words relating to the original
question.
7. Discuss and write Type I and Type II Errors for a specific example.

Lecture 17: Sampling Distribution of the Sample Mean (9.1-9.2)


1. Discuss the type of data that's associated with the mean.
2. Discuss the estimators used for each parameter, and the symbols used for all statistics and
parameters.
3. Discuss the accuracy and precision of the sample mean.
4. Describe the sampling distribution of the sample mean.
5. Describe the shape, center, and spread of the sampling distribution of the sample mean.
6. Calculate the standard error.
7. Discuss the effects of increasing sample size on the sampling distribution.
8. Discuss the Central Limit Theorem for Sample Means.
9. Check the conditions of the CLT.
10. Describe the shape and center of the t-distribution.

6
Mogull M247 F23 – Final Review

11. Discuss why we need to use the t-distribution.


12. Discuss and calculate degrees of freedom.

Lecture 18: Answering Questions about the Mean of a Population (9.3)


1. Discuss the types of questions confidence intervals answer.
2. Discuss the types of questions hypothesis tests answer.
3. Describe a population and a parameter.
4. Check the conditions for a confidence interval of the mean.
5. Calculate a confidence interval for the mean using Statcrunch, and by hand.
6. Find critical values of ÔøΩ‚àó
7. Discuss the two formats for reporting a confidence interval.
8. Interpret a confidence interval.
9. Discuss the effect of changing confidence level.

Lecture 19: Hypothesis Testing for a Population Mean (9.4)


1. Write the parameter in words for a one-sample t-test (hypothesis test about a mean).
2. Write the hypotheses for a one-sample t-test.
3. Check the conditions for a one-sample t-test
4. Perform calculations for a one-sample t-test by hand and using Statcrunch.
5. Make the appropriate conclusion and interpretation for a one-sample t-test.
6. Discuss the differences between a one-tailed test and a two-tailed test.
7. Discuss the connection between a hypothesis test and a confidence interval, and use a
confidence interval to make a conclusion about a hypothesis test.
8. Decide whether a Type I or a Type II Error might have been made.
9. Write a Type I and Type II Error in words relating to a hypothesis test.

Lecture 20: Comparing Two Population Means (9.5-9.6)


1. Discuss and identify independent vs paired samples.
2. Estimate the difference of means with confidence intervals for independent samples, by
checking conditions, using Statcrunch to create the confidence interval, and interpreting the
interval.
3. Test a hypothesis about two means by using the four-step process.
4. Discuss and interpret Type I and Type II Errors in context.
5. Discuss statistical vs practical significance.
6. Discuss and apply the connection between confidence intervals and hypothesis tests.
7. Determine the correct process to use for a given question.

Lecture 21: Chi-Square Goodness-of-Fit Test (10.1-10.2)


1. Perform a chi-square goodness-of-fit test.
2. Determine if a question should be answered using the chi-square goodness-of-fit test.
3. Discuss the properties of the chi-square distribution.
4. Discuss and interpret errors associated with the chi-square goodness-of-fit test.
5. Discuss the connection between confidence intervals and the goodness-of-fit test.

Lecture 22: Chi-Square Test of Independence (10.1, 10.3)


1. Perform a chi-square test of independence.

7
Mogull M247 F23 – Final Review

2. Determine if a question should be answered using the chi-square test of independence.


3. Discuss the connection to the test of proportions

Lecture 23: ANOVA (11.2-11.3)


1. Perform an ANOVA test.
2. Discuss explained variation vs unexplained variation.
3. Discuss the F distribution.
4. Discuss and complete the ANOVA table.
5. Determine if a question should be answered using the ANOVA test.

Review of Statistical Inference


Chapters 7-11 are about statistical inference. We want to make a conclusion about a population by using
sample data. The difference in these methods depends on your data type(s) (categorical or numerical),
the number of variables you have, and the number of populations you have.

Deciding Which Method to Use


Before starting any statistical inference, you should go through the following questions in your head to
determine the correct method to use.

1. What type of data do we have, numerical or categorical?


a. Numerical data – the parameter is a mean 𝜇 (or multiple means: 𝜇1 , 𝜇2 , …). Use
methods in Chapters 9 and 11
b. Categorical data – the parameter is a proportion 𝑝 (or multiple proportions: 𝑝1 , 𝑝2 , …).
Use methods in Chapters 7, 8, and 10
2. How many variables are there? Remember, a variable would basically be a different question on
a survey you could answer. For example, your height and your weight are two different
variables.
3. How many populations are there? For example, If you have the heights of women and the
heights of men, that is still just one variable, but it’s being measured from two populations.
4. If you have categorical data: how many categories are there? Think about the question that
could be asked on a survey, how many possible answers do you have, only two, or more? For
example, what is your ethnicity? That question has more than two possible answers. On the
other hand, will you vote Yes on Prop 212? That question only has two possible answers, Yes or
No.

Confidence Interval or Hypothesis Test?


We have two primary methods for statistical inference: confidence intervals and hypothesis tests.

What is the question asking of you?

Confidence Interval
• What is the value of the population parameter (mean or proportion)? (CI with 1 population)
• What is the value of the difference between two population parameters? (CI with 2
populations)

8
Mogull M247 F23 – Final Review

Hypothesis Test
• Is the parameter equal to some value, or unequal to that value? (HT with 1 population)
• Is the parameter from population one equal to the same parameter from population two, or
are they unequal? (HT with 2 populations)

Inference Methods by Textbook Sections


*Sample questions use hypothetical data we might gather.

• Section 7.4: Confidence Interval for 1 Population Proportion, 𝑝


o What is the value of a population proportion?
o Eg: What proportion of Cuesta students plan to transfer to a 4-year university?
• Section 7.5: Confidence Interval for the difference in two Population Proportions: 𝑝1 − 𝑝2
o Find an interval of numbers that we’re confident contains the difference: 𝑝1 − 𝑝2
o Eg: What is the difference in the proportions between males who plan to transfer to a 4-
year university and females who plan to transfer to a 4-year university?
• Section 8.3: Hypothesis Test for 1 Population Proportion, 𝑝
o Test whether the population proportion equals some number, or differs from that number
o Eg: Do more than 50% of college students prefer online courses?

9
Mogull M247 F23 – Final Review

• Section 8.4: Hypothesis Test for 2 Population Proportions, 𝑝1 and 𝑝2


o Test whether two population proportions are equal, or differ from each other
o Eg: Does the proportion of male college students with children differ from the proportion of
female college students with children?
• Section 9.3: Confidence Interval for 1 Population Mean, 𝜇
o What is the value of the population mean?
o Eg: What is the average rent of a 2 bedroom apartment in SLO?
• Section 9.4: Hypothesis Test for 1 Population Mean, 𝜇
o Test whether the population mean equals some number or differs from that number
o Eg: Do college students take fewer than 15 semester units, on average?
• Section 9.5:
o Part I - Confidence Interval for the difference in two population means: 𝜇1 − 𝜇2
▪ Find an interval of numbers that we’re confident contains the difference 𝜇1 − 𝜇2
▪ What’s the difference in the number of units that people who plan to transfer take
vs the people who don’t plan to transfer, on average?
o Part II – Hypothesis Test for 2 Population Means, 𝜇1 and 𝜇2
▪ Test whether 𝜇1 and 𝜇2 differ from each other
▪ Eg: Do people with children pay more for rent than people without children, on
average?
• Section 10.2: Hypothesis Test for Population Proportions – 1 variable with more than 2 categories
(Chi-Square Goodness of Fit Test) Eg. What color cars can you buy? there are more than 2 categories
(colors) to answer this question.
o Eg: Do people select the car colors white, black and grey in equal proportion?
• Section 10.3: Hypothesis Test for Population Proportions – two categorical variables, each of which
has 2 or more categories (this can be viewed in a two-way table) - Test of Independence
o Eg: Is the gender of someone associated with their political party?
• Section 11.2-11.3: Hypothesis Test for 3+ Population Means (ANOVA). Numerical data with 3 or
more groups being compared
o Eg: Does the average rent for a two bedroom apartment differ among SLO, 5-Cities, and
North County?

Steps for Hypothesis Testing and Confidence Intervals


The following steps should always be present in hypothesis testing and confidence intervals. While the
specific conditions and calculations differ by variable type, the following steps are consistent.

Steps for Hypothesis Testing


For Hypothesis Testing, we need to be able to do the following things:

o Hypothesize: State the hypotheses in both symbols and in words. You may choose to write out
the null hypothesis as something like: 𝐻0 : 𝜇 = 6, but then you must tell us what 𝜇 represents
(eg: Where 𝜇 is the average number of hours it takes for all people to drive from SLO to San
Diego)

10
Mogull M247 F23 – Final Review

o Prepare:
Check the conditions for the test. Remember, for those conditions you will always need
random and independent samples from your populations, your samples will need to be large
enough (and what ‘large enough’ changes based on the test you’re doing), if you have multiple
samples – they need to be independent of each other, and some tests have additional
conditions to consider – like for ANOVA we need the populations to have approximately equal
standard deviations.
▪ Be sure you explicitly check the conditions for the test you’re doing

o Calculate: Choose a significance level 𝛼 (common values are .1, .05, and .01)
Perform the calculations either by hand or using Statcrunch (depending on what the problem
asks for).
▪ If you’re using a computer to perform the calculations, then make sure you include all of the
important information, like:
o What was the test statistic (𝑧, 𝑡, 𝐹, 𝜒 2 )? Write something like 𝑡 = 2.11
o What are the degrees of freedom?
o What is the p-value?
▪ If you’re performing the calculations by hand, be sure to show your work.

o Interpret:
Determine whether to reject or fail to reject your null hypothesis. We reject 𝐻0 if the 𝑝-value ≤
α

Interpret the decision in context of the problem. This statement always has the format:
“There (is/is not) evidence to conclude (the alternate hypothesis in words).”

This should be a clear statement that tells the reader:


▪ What was the test about?
▪ What was your conclusion?
▪ If you rejected the null hypothesis, then how significant was it (p-value)?
o Eg: There is evidence to conclude that the average time it takes for all people to drive
from San Luis Obispo to San Diego is more than 6 hours (p=.02).
o Eg 2: There is not enough evidence to conclude that the proportion of people whose
headache improves using Ibuprofen is greater than those using Aspirin.
o Notice how neither of the two statements above talk about things like ‘reject’ or ‘fail to
reject.’ These are both statements that you could publish in something like a
newspaper or a journal and expect the general public to understand what you’re talking
about.

Steps for Confidence Intervals


For Confidence Intervals, we need to include the following steps:

1. Check the conditions of the test (see notes in Step 2 of Hypothesis Testing, previous page).
2. Perform the calculations, and write out the confidence interval as either: (10, 20), or in the
form 15 ± 5 (for example).

11
Mogull M247 F23 – Final Review

3. Interpret the confidence interval in the context of the problem. This should be something like:
▪ Eg. We are 90% confident that the average score for Exam 4 of all students in Math 247 will
be 85%.
▪ Eg. We are 95% confident that the Class 1 of Math 247 scored between 4% worse and 6%
better than Class 2 of Math 247, on average. (In this example, the confidence interval for
μ1 − μ2 is (−.04, .06).)

In addition to performing the steps, you need to make sure you interpret these intervals correctly –
especially if it’s an interval for the difference of two means (or proportions).

Eg: We have a confidence interval of (-2.1, 3.2) for the difference in the average number of words per
minute men can type vs women (𝜇𝑀 − 𝜇𝑊 ), what does this mean?

12

You might also like