You are on page 1of 22

Unit 3: Collecting Data

AP Statistics
Topics
3.1: Introducing Statistics: Do the Data We Collected Tell the Truth?

3.2: Introduction to Planning a Study

3.3: Random Sampling and Data Collection

3.4: Potential Problems with Sampling

3.5: Introduction to Experimental Design

3.6: Selecting an Experimental Design

3.7: Inference and Experiments


3.1: Introducing Statistics: Do the Data We Collected Tell the Truth?

● Remember:
- Methods for data collection that do not rely on chance (randomness) result in
untrustworthy conclusions.

Image 1:
https://www.displayr.com/what-i
s-random-sampling/
practice 1: albert.io
A large company has regional offices in Boston and Chicago. Recently, the personnel department at each location surveyed employees
to determine the modes of transportation that employees use to get to work. The results of the survey are shown in the side-by-side bar
graph below. Which of the following statements is NOT indicated by the graph?
A) The percentage of employees who get to
work by car is greater in Chicago than in
Boston.
B) The percentage of employees who take public
transportation to work is lower in Chicago
than in Boston.
C) In both cities a greater percentage of
employees walk to work than bike to work.
D) In Chicago, the percentage of employees who
get to work by public transportation is about
half that of those who get to work by car.
E) The number of people who use other modes
of transportation to get to work is just a little
higher in Boston than in Chicago.
3.2: Introduction to Planning a Study
● Remember:
- A population consists of all items or subjects of
interest. A sample selected for study is a subset of
the population.
- Know the difference between an observational
study (prospective and retrospective) and an
experiment in terms of determining causality.
- It is only appropriate to make generalizations about
a population based on samples that are randomly
selected or otherwise representative of that
Image 2:
population. http://www.med.uottawa.ca/sim/data/Stu
- It is not possible to determine causal relationships dy_Design_1.jpg

between variables using data collected in an


observational study. In an experiment, it may be
possible.
Image 3:
https://www.stomponstep1.com/experimental-design-observational-study-research-methods-ex
periments/
APSTATS1999
3.3: Random Sampling and Data Collection
● Remember:
- You need to know how to identify sampling methods (simple random sample
-SRS-, stratified random sample, cluster sample, systematic random sample,
census)
- Know the difference between sampling with and without replacement
(1) A simple random sample (SRS) is a sample in which every group of a given size has
an equal chance of being chosen.
ps: when doing simple random sampling, it’s not
guaranteed that a representative group of
individuals will be picked up.

Imagem:
https://www.statisticshowto.com/simple-random-sample/
3.3: Random Sampling and Data Collection
(2) Stratified Random Sample: involves the division of a population into separate groups,
called strata, based on shared attributes or characteristics (homogeneous grouping).
Within each stratum a simple random sample is selected, and the selected units are
combined to form the sample.

Image 4:
http://www.datasciencemadesimple.c
om/stratified-random-sampling-in-r-
dataframe-2/
3.3: Random Sampling and Data Collection
(3) Cluster Sample: involves the division of a population into smaller groups, called
clusters. Ideally, there is heterogeneity within each cluster, and clusters are similar to one
another in their composition. A simple random sample of clusters is selected from the
population to form the sample of clusters.

Image 5:
https://faculty.elgin.edu/dkernler/sta
tistics/ch01/1-4.html
3.3: Random Sampling and Data Collection
(4) Systematic Random Sample: a method in which sample members from a population
are selected according to a random starting point and a fixed, periodic interval.

Image 6:
https://www.netquest.com/blog/en/sy
stematic-sampling
3.3: Random Sampling and Data Collection
(5) Census: selects all items/subjects in a population.

● Non-random sampling methods (for example, samples chosen by convenience or


voluntary response) introduce potential for bias because they do not use chance to
select the individuals.
3.4: Potential Problems with Sampling
● Remember:
- Know how to identify bias:
1) Voluntary response bias: When a sample is comprised entirely of volunteers or people
who choose to participate, the sample will typically not be representative of the
population
2) Undercoverage bias: When part of the population has a reduced chance of being
included in the sample, the sample will typically not be representative of the population
3) Non-response bias: Individuals chosen for the sample for whom data cannot be obtained
(or who refuse to respond) may differ from those for whom data can be obtained
4) Question-wording bias: questions that are confusing or leading.
APSTATS2004
APSTATS2011
3.5: Introduction to Experimental Design
● Components of an experiment:
(1) experimental units/individuals/participants/subjects
(2) variables (explanatory/manipulated,
response/measured)
(3) treatments (levels of the explanatory variable)

ps: A confounding variable in an experiment is a variable


that is related to the explanatory variable and influences the
response variable and may create a false perception of
association between the two. Random assignment tends to
balance the effects of uncontrolled (confounding) variables.
APSTATS2016
3.5: Introduction to Experimental Design
● In a single-blind experiment, subjects do not know which treatment they are
receiving, but members of the research team do, or vice versa.
● In a double-blind experiment neither the subjects nor the members of the research
team who interact with them know which treatment a subject is receiving.
● A control group is a collection of experimental units either not given a treatment of
interest or given a treatment with an inactive substance (placebo) in order to
determine if the treatment of interest has an effect. The placebo effect occurs when
experimental units have a response to a placebo.
3.5: Introduction to Experimental Design
● Completely Randomized Assignment versus Randomized Block Design
- Completely randomized: each experimental unit is randomly assigned to a
random group to receive a different treatment.
- Randomized block design: first assigns people into a block based on a
characteristic that is expected to influence the response of the experimental
units to the treatments. Then, a completely randomized design is applied to each
block.
3.5: Introduction to Experimental Design
● A matched pairs design is a special case of a randomized block design. Using a blocking variable, subjects
(whether they are people or not) are arranged in pairs matched on relevant factors. Matched pairs may be
formed naturally or by the experimenter. Every pair receives both treatments by randomly assigning one
treatment to one member of the pair and subsequently assigning the remaining treatment to the second
member of the pair. Alternately, each subject may get both treatments.
APSTATS2002
Answers
● Practice 1: e
● FREE-RESPONSE QUESTIONS:
https://apcentral.collegeboard.org/courses/ap-statistics/exam/past-exam-questions

You might also like