You are on page 1of 25

The Nature of Probability and Statistics

Chapter 1

Statistics is the science of conducting studies to collect,


organize, summarize, analyze, and draw conclusions from data.

Why Study Statistics?

Ch1: The Nature of Probability and Statistics Santorico - Page 1


Example: Archaeology
Four measurements were made of male Egyptian
skulls from five different time periods ranging from
4000 B.C. to 150 A.D. Are there differences in the
skull sizes between the time periods? The
researchers theorize that a change in skull size over
time is evidence of the interbreeding of the
Egyptians with immigrant populations over the
years. Thirty skulls are measured from each of the 5
time periods.
Measurements:
1. MB: Maximal Breadth of Skull
2. BH: Basibregmatic Height of Skull
3. BL: Basialveolar Length of Skull
4. NH: Nasal Height of Skull

Ch1: The Nature of Probability and Statistics Santorico - Page 2


Sec 1-1: Descriptive and Inferential Statistics

Main Areas of Statistics

Descriptive statistics consists of the collection, organization,


summarization, and presentation of data.

Inferential statistics consists of generalizing from samples to


populations, performing estimation and hypothesis tests,
determining relationships among variables, and making predictions.

 Inferential statistics uses probability (the likelihood of an


outcome occurring) to make conclusions and predictions.

Ch1: The Nature of Probability and Statistics Santorico - Page 3


A population consists of all subjects that are being studied.

A sample is a group of subjects selected from a population.

A variable is a characteristic or attribute that can assume


different values.

 A random variable is a variable whose values are


determined by chance.

Data are the values (measurements or observations) that the


variables can assume.

A data set is a collection of data values.

Ch1: The Nature of Probability and Statistics Santorico - Page 4


MB BH BL NH Time Period Population?
131 138 89 49 -4000
125 131 92 48 -4000
131 132 99 50 -4000 Sample?
139 130 108 48 -4000
125 136 93 48 -4000
Variable?
131 134 102 51 -4000
134 134 99 51 -4000
.......
Random variable?
138 136 92 46 150
131 129 97 44 150
132 127 97 52 150 Data?
137 125 85 57 150
129 128 81 52 150 Data set?
140 135 103 48 150
147 129 87 48 150
136 133 97 51 150
Ch1: The Nature of Probability and Statistics Santorico - Page 5
Sec 1-2: Variables and Types of Data
Data

Qualitative Quantitative

Discrete Continuous
Types of Variables

Qualitative variables are variables that can be placed into categories,


according to some characteristic or attribute.
Example:

Quantitative variables are numeric in nature and can be ordered or ranked.


Quantitative variables can be either discrete or continuous.
Example:

Ch1: The Nature of Probability and Statistics Santorico - Page 6


Qualitative Variables can be nominal and ordinal.

The nominal level of measurement classifies the data into


categories with no meaningful order or ranking can be
imposed on the data.

The ordinal level of measurement classifies the data into


categories that can be meaningfully ranked or ordered.

Ch1: The Nature of Probability and Statistics Santorico - Page 7


More on Quantitative variables:
A continuous variable is a quantitative variable that can assume ANY numerical
value between any two specific values.

 Obtained by measuring.
 May include fractions and decimals.

A discrete variable is a quantitative variable that has either a finite number of


possible values or a countable number of possible values.

 Countable means that the values result from counting, such as 0, 1, 2, 3…


Examples: Determine if the variable would be discrete or continuous for the
following examples:
Weight (lbs):
Number of car accidents in which you’ve been involved:

Temperature (F):

Ch1: The Nature of Probability and Statistics Santorico - Page 8


Continuous data measurements must be rounded because of the
limits of the measuring device. Typically answers will be
measured to the nearest unit.

The boundaries of a measurement provide the range of


possible values, up to the upper bound, that could have led to
the recorded value.

Example: Recorded Value Boundaries


12 in [11.5 – 12.5)
0.57 sec
3.8 g

Ch1: The Nature of Probability and Statistics Santorico - Page 9


The measurement scale/level of a variable describes how the
variable is categorized, counted, or measured.

Qualitative variables: nominal and ordinal

Quantitative variables: interval and ratio

The interval level of measurement ranks (quantitative) data, but the


“zero” value is arbitrary.

 The “zero” value is when the variable is zero. It is arbitrary if it


does NOT mean a total absence of that variable.

The ratio level of measurement ranks (quantitative data) and there


is a true zero value. Additionally, true ratios exist when the same
variable exists on two different members of the population.

Ch1: The Nature of Probability and Statistics Santorico - Page 10


Ch1: The Nature of Probability and Statistics Santorico - Page 11
Sec. 1-3: Data Collection and Sampling Techniques

Data is often collected via surveys.

 Telephone Surveys
 Mailed Questionnaire
 Personal Interview
 Internet Survey

What are advantages and disadvantages of data collection


through surveys?

Ch1: The Nature of Probability and Statistics Santorico - Page 12


Advantages Disadvantages
Telephone  Less costly than  Not everyone has a phone.
Surveys  Cell phones typically not included.
personal interview.
 People may be more  Tone of interviewer’s voice may
candid. affect response.

Mailed  Can cover wider  Low number of responses


Questionnaire geographic area  Inappropriate answers to
questions
than phone
 Low reading abilities or not
survey or
understanding questions may
personal
create useless responses.
interview
 Respondents can
remain anonymous
 Less expensive

Ch1: The Nature of Probability and Statistics Santorico - Page 13


Personal  In-depth responses to  Interviewers must be
Interview questions trained in asking
questions and recording
responses (which is
costly).
 Interviewer may be
biased in the selection of
participants.

Internet As with telephone and mail  Will miss demographics


survey surveys, without computer access
 Inexpensive, often free  May have inappropriate
 Candor answers if questions are
 Large geographic misunderstood

coverage
 Anonymity

Ch1: The Nature of Probability and Statistics Santorico - Page 14


Sampling Techniques
 Researchers use samples to collect data and information
about a particular variable from a population.
 Samples save time, money, and may actually allow a
researcher to collect better information.
 Samples need to be representative of the population or
they are meaningless in drawing conclusions about the
population.
 Sampling must be done in a way that the samples are
unbiased—that each subject in the population has an equal
chance of being in the sample.

Scenario: Suppose we are interested in studying how the


University of Colorado Denver undergraduate population feels
about the outcome of the presidential election.

Ch1: The Nature of Probability and Statistics Santorico - Page 15


Technique Description Example
Random Uses chance methods or random numbers
sampling
to select the sample. Everyone or
everything from the population has the
same chance of being selected for the
sample and it is the best way of obtaining a
representative sample.
Systematic Numbers each subject of the population
sampling
and then selects every kth subject.

Convenience Selects subjects that are convenient for the


sampling
researcher. These samples are typically of
not statistical value.

Ch1: The Nature of Probability and Statistics Santorico - Page 16


Stratified Divides the population into groups (called
sampling
strata) according to some characteristic
that is important to the study, then
randomly samples subject from each group.

Cluster Divides the population into groups called


Sampling
clusters by some means such as geographic
area or schools in a school district, etc.
Then randomly select some of the clusters
and use ALL members of the selected
clusters.

Ch1: The Nature of Probability and Statistics Santorico - Page 17


Sec 1-4: Observational and Experimental Studies

Observational study - the researcher merely observes what is


happening or what has happened in the past and tries to draw
conclusions based on these observations.

Example:

Experimental study - the researcher manipulates one of the


variables and tries to determine how the manipulation
influences other variables.

Example:

Ch1: The Nature of Probability and Statistics Santorico - Page 18


Experiments have at least two groups:
Treatment Group – the group(s) in the sample that receives a treatment or
experimental condition.
Control Group – the group in the sample that is treated identically in all
respects to the treatment group EXCEPT that they don’t receive the active
treatment.
 Using a control group allows us to see what would have happened to the
response variable if treatments had not been applied.
Placebo – a treatment that looks like a real drug but has no active ingredient
(meaning it doesn’t do anything!).
Placebo Effect – when people take a placebo and it works like the treatment
or better.
 This is usually because of psychological reasons. Our minds are powerful!
 Good experiments include a placebo group when humans are involved.

Ch1: The Nature of Probability and Statistics Santorico - Page 19


Independent Variable – the variable that is being manipulated by the
researcher (also called the explanatory variable).
Dependent Variable – the response to the independent variable or
the result of the explanatory variable (also called the response or
outcome variable).
Example: Taking nicotine patch and smoking status.

Explanatory (independent) variable –

Response (dependent) variable –

Example: Completing homework and grades

Explanatory (independent) variable –

Response (dependent) variable -

Ch1: The Nature of Probability and Statistics Santorico - Page 20


Advantages of Experiments

 The effect of an explanatory variable can be studied more precisely.


 Researcher has (some) control over selecting participants, assigning
them to groups, and manipulating the independent variable.
 Cause and effect relationships can be established using
randomized experiments (e.g., smoking causes cancer in lab rats).
Note: In order to make cause and effect conclusions in an
experiment, the subjects must be randomly assigned among the
treatment groups.

Disadvantages of Experiments

 May occur in unnatural settings (e.g., laboratories).


 Hawthorne Effect - when subjects know they are participating in an
experiment and change their behavior in ways that affect the results of
the study. (weight loss studies)
 Not all variables can be controlled for in a study.

Ch1: The Nature of Probability and Statistics Santorico - Page 21


Advantages of Observational Studies

 Occur in natural settings.


 Allows us to study situations for which it would be
illegal/unethical to conduct an experiment (e.g., rape, suicide,
illegal drug use).

Disadvantages of Observational Studies

 Cannot make cause and effect conclusions because of


confounding variables.
 Data quality may be poor if researcher didn’t collect the data.
 Confounding variables – one that influences the dependent or
outcome variable but was not separated from the independent variable
(e.g., vitamins and health, weight and income).

Ch1: The Nature of Probability and Statistics Santorico - Page 22


Examples of Confounding Variables

 Age and income


 Vitamins and health
 Weight tends to be higher among lower socio-economic
groups.

What are the confounding variables?

Sections 1-5 and 1-6 (Read on your own)


 Misuses of statistics.
 Computers and calculators.

Ch1: The Nature of Probability and Statistics Santorico - Page 23


We will come back to this for a longer discussion, but it is good to have a
look now and start thinking about it:

Should you believe the results of a study?


Eight Guidelines for Evaluating a Statistical Study

1. Identify the goal of the study, the population considered, and


type of study.
2. Consider the source, particularly with regard to whether the
researcher may be biased.
3. Look for bias that may prevent a sample from being
representative of the population.
a. Selection bias occurs whenever researchers select their sample
in a way that tends to make it unrepresentative of the
population.
b. Participation bias occurs primarily with surveys and polls; it
arises whenever people choose whether to participate.

Ch1: The Nature of Probability and Statistics Santorico - Page 24


4. Look for problems in defining or measuring the variables of
interest, which can make it difficult to interpret results.
5. Watch out for confounding variables that can invalidate the
conclusions of a study.
a. Are there viable alternate explanations of the results?
6. Consider the setting and the wording of questions in any survey,
looking for anything that might tend to produce inaccurate or
dishonest responses.
7. Check that the results are presented fairly in graphs and
concluding statements.
8. Stand back and consider the conclusions.
a. Did it achieve its goals?
b. Do conclusions make sense?
c. Do results have any practical significance?

Ch1: The Nature of Probability and Statistics Santorico - Page 25

You might also like