You are on page 1of 27

MATH& 146

Lesson 4
Section 1.3
Study Beginnings

1
Populations and Samples
The population is the complete collection of
individuals or objects that you wish to learn about.

To study larger populations, we select a sample. The


idea of sampling is to select a portion of the population
and study that portion to gain information about the
population.

2
Parameters and Statistics
A parameter is a value (usually a proportion or
average) that describes the population.

For every parameter there is a corresponding


sample statistic. The statistic is a numerical value
summarizing the sample data and describing the
sample the same way the parameter describes the
population.

3
Example 1
Define the population, sample, parameter, and
statistic from the following study:

We want to know the proportion of new students


that were satisfied with New Student Orientation at
YVC. 100 first year students at the college were
randomly sampled, and 72 said they were satisfied.

4
Example 2
Define the population, sample, parameter, and
statistic from the following study:

You want to determine the average number of


glasses of milk college students drink per day.
Suppose yesterday, in your English class, you
asked five of your friends how many glasses of milk
they drank the day before. The answers were 1, 0,
1, 3, and 12 glasses of milk.

5
Research Questions
The first step in conducting research is to identify
topics or questions that are to be investigated.

A clearly laid out research question is helpful in


identifying what subjects or cases should be
studied and what variables are important.

6
Research Questions
A research question should refer to a target
population. Often times, however, it is too
expensive or difficult to collect data for every case
in a population. Instead, a sample is taken.

A sample represents a subset of the cases and is


often a small fraction (usually less than one-tenth)
of the population. Sample data is then used to
estimate the population parameter and answer the
research question.
7
Example 3
Consider the following research question: "Over
the last 5 years, what is the average time to
degree for Duke undergraduate students?"

a) What are the target population and parameter?


b) Suppose the researcher met two students who
took more than 7 years to graduate from Duke.
Does that prove it takes longer to graduate at
Duke than at other colleges? Why or why not?

8
Example 4
Consider the following research question: "Does a
new drug reduce the number of deaths in patients
with severe heart disease?"

a) What are the target population and parameter?


b) Suppose my friend's dad had a heart attack
and died after they gave him the new heart
disease drug. Does that prove that the drug
does not work? Why or why not?

9
Anecdotal Evidence
Both of the conclusions of the last two examples
were based on some data. However, there were
two problems.
First, the data only represent one or two cases.
Second, it is unclear whether these cases are
actually representative of the population.
Data collected in this haphazard fashion are called
anecdotal evidence.

10
Anecdotal Evidence
When anecdotal
evidence is cited, there
is no reason to expect
the individuals to be
representative of
anyone but themselves.
They can make nice
stories, but lousy
statistics.

11
Bias
If someone was permitted to pick and choose
exactly which cases were included in a sample, it
is entirely possible that the sample could be
skewed to that person's interests. This introduces
bias into a sample.

A biased sample causes problems because any


statistic computed from that sample has the
potential to be consistently erroneous.

12
An Example of Bad Data
The 1936 presidential election
between Franklin Roosevelt
and Alf Landon is notable for
the Literary Digest poll, which
was based on over two million
returned postcards.
In its October 31 issue, Landon
was predicted to easily win with
370 electoral votes and 57% of
the popular vote.

13
1936 Election Results
Landon's electoral vote total of eight is a tie for the
record low for a major-party nominee since the
current U.S. two-party system began in the 1850s.
The Literary Digest was completely discredited
because of the poll and was soon discontinued.

Predicted Vote Actual Vote


FDR 161 (~43%) 523 (60.8%)
Alf Landon 370 (57%) 8 (36.5%)

14
Why did the Literary Digest fail?

The first major problem with the poll was in the


selection process for the names on the mailing list,
which were taken from telephone directories, club
membership lists, lists of magazine subscribers,
etc.

Such a list is guaranteed to be slanted toward


middle- and upper-class voters, and by default to
exclude lower-income voters.

15
Why did the Literary Digest fail?

The second problem with the Literary Digest poll


was that out of the 10 million people whose names
were on the original mailing list, only about 2.4
million responded to the survey.

Thus, the size of the sample was about one-fourth


of what was originally intended. (In addition,
people who respond to surveys are different from
people who don't.).

16
Bias
In general, there are three common types of bias that
might occur in a sample:
Selection bias: The method for selection makes
the sample unrepresentative of the population.
Nonresponse bias: A sample is chosen, but a
subset cannot or will not respond.
Response bias: Participants to a survey provide
incorrect information, intentionally or unintentionally.

17
Bias
Bias is the bane of sampling the one thing above
all to avoid.

Conclusions based on samples drawn with biased


methods are inherently flawed. There is usually no
way to fix bias after the sample is drawn and no
way to salvage useful information from it.

18
Example 5
Indicate whether the potential bias is a selection
bias, a nonresponse bias, or a response bias.

A survey question asked of unmarried men was


"What is the most important feature you consider
when deciding whether to date somebody?" The
results were found to depend on whether the
interviewer was male or female.

19
Example 6
For each situation, explain why selection bias could be
introduced, and how it could affect your results.
a) A cage has 1000 rats, you pick the first 20 you can
catch for your experiment.
b) A public opinion poll is conducted using the
telephone directory.
c) You are conducting a study of a new diabetes drug;
you advertise for participants in the newspaper and
TV.
Example 7
You need to conduct a study of longevity for
people who were born in the decade following the
end of World War II in 1945. If you were to visit
graveyards and use only the birth/death dates
listed on tombstones, would you get good results?
Why or why not?
Example 8
"If you had to do it over again, would you have
children?" This is the question that advice columnist
Ann Landers asked her readers back in 1976. It turns
out that nearly 70% of the 10,000 responses she
received were "No." A professional poll by Newsday
found that 91% of randomly chosen respondents
would have children again.

Explain the apparent contradiction between these two


surveys using what you have learned about sampling.

22
Types of Variables
In many studies more than one variable is
recorded per case or individual.
It is often the purpose of a study to determine if
and/or how one variable (called the explanatory
variable) affects another (called the response
variable).

23
Types of Variables
Response Variable: The outcome of a study. A
variable you would be interested in predicting or
forecasting.
Explanatory Variable: Any variable that explains
the response variable.

24
Example 9
Pick out which variable you think should be the
explanatory variable and which variable should be the
response.
a) Weights of nuggets of gold (in ounces) and their
market value (in $) over the last few days are
provided, and you wish to use this to estimate the
value of a gold ring that weighs 4 ounces.

25
Example 9 continued
b) You have data collected on the amount of time
since chlorine was added to the public swimming
pool and the concentration of chlorine still in the
pool. Chlorine was added at 8 AM, and you wish to
know what the concentration is now, at 3 PM.
c) You have data on the circumference of oak trees
(measured 12 inches from the ground) and their
age (in years). An oak tree in the park has a
circumference of 36 inches, and you wish to know
approximately how old it is.

26
Example 10
Suppose your wanted to conduct a study to predict
a student's success. Using a student's GPA as the
response variable, what are some explanatory
variables that might be worth considering.
Determine the variable type (categorical,
numerical) of each explanatory variable. For each
numerical explanatory variable, guess whether the
association with the response will be positive,
negative, or none.

27