You are on page 1of 30

Chapter 1

The Role of Statistics


and the Data Analysis
Process
What is statistics?

• the science of collecting,


analyzing, and drawing
conclusions from data
Why should one study
statistics? Can dogs help
patients with heart
1. To be informed failure
... by reducing
stress
a) Extract information and
from anxiety?
tables, charts
and graphs
b) Follow numerical arguments
c) Understand the basics of how
When data should
people take
be gathered, summarized, and analyzed
a vacation do they to
draw statistical conclusions
really leave work
behind?
Why should one study
statistics? (continued)
Many companies now require drug
2.screening
To makeasinformed judgments
a condition of employment.
With these screening tests there is a
risk of a false-positive reading. Is the
If you choose a particular major,
risk of a false result acceptable?
what are your chances of finding a
job when you graduate?
3. To evaluate decisions that affect
your life
What is variability?
Suppose you went into a convenience
In fact,
store variability
to purchase is almost
a soft drink. universal!
Does
every can on the shelf contain exactly 12
ounces?
It is variability that makes
life interesting!!
NO – there may be a little more or less in
the various cans due to the variability
that is inherent in the filling process.
The Data Analysis Process
1. Understand the nature of the problem

It is what
2. Decide important to have
to measure anda how
clear
to
It is important
direction
measure it beforetogathering
select and apply
data.
3. It
Collectthedata
appropriate inferential
is important to carefully define the
statistical methods
It This
variables step
to often
be studied
is important leads
and totodevelop
to understand the
how
4. Summarize
appropriate
data data
formulation and
methods
is collected of perform
new
because thepreliminary
research
for determining
type of
analysis
analysis that their values.
questions.
is appropriate depends
5. Perform formal analysis
This initial analysis was
on how the data collected!
provides insight
into important characteristics of the
6. Interpret results data.
Suppose we wanted to know the
average GPA of high school
graduates in the nation this year.

We could collect data from all


high schools in the nation.
What term would be used to describe
“all high school graduates”?
Population
• The entire collection of
individuals or objects about which
information is desired

What do you call it when


• A census is performed to gather
you collect data about
about the entire population the
entire population?
GPA Continued:
Suppose we wanted to know the
average GPA of high school
graduates in the nation this year.
Why might we not want to use
a census here?

We could collect data from all


high schools in the nation.
If we didn’t perform a census,
what would we do?
Sample

• A subset of the population, selected


for study in some prescribed manner

What would a sample of all high school graduates


across the nation look like?

High school graduates from each state


(region), ethnicity, gender, etc.
GPA Continued:
Suppose we wanted to know the
Once we have collected the
average GPA of high school
data, what would we do with it?
graduates in the nation this year.

We could collect data from a sample


of high schools in the nation.
Descriptive statistics
• the methods of organizing &
summarizing data

If the sample of high school GPAs contained


1,000 numbers, how could the data be organized
or summarized?

• Create a graph
• State the range of GPAs
• Calculate the average GPA
GPA Continued:
Suppose we wanted to know the
average GPA of high school graduates
in the nation this year.

We could collect data from a sample


Could we use the data from our
of high schools in the nation.
sample to answer this question?
Inferential statistics
• involves making generalizations from
a sample to a population
Based on the sample, if the average GPA for high school
graduates was 3.0, what generalization could be made?

The average national GPA for this year’s


high school graduate is approximately 3.0.
Could someone claim that the average GPA for
Be sure
graduates tolocal
in your sample
school from
districtthe
is 3.0?
No. Generalizations based on the results of a sample
population of interest!!
can only be made back to the population from which the
sample came from.
Variable
• any characteristic whose value may
change from one individual to
another

• Suppose we wanted to know the


Is this a variable . . .
average GPA of high school
The number
graduates ofnation
in the wrecks peryear.
this week
at the
Define the variable
intersection outside
of interest.
The variable ofschool?
interest isYES
the
GPA of high school graduates
Data
• The values for a variable from
individual observations

For this variable . . .


The number of wrecks per week at
the intersection outside . . . What
could observations be?
0, 1, 2, …
Two types of variables

categorical numerical

discrete continuous
Categorical variables
• Qualitative

• Identifies basic differentiating


characteristics of the population

Can you name any categorical


variables?
Numerical variables
• quantitative

• observations or measurements take on


numerical values

• makes
Cansense
you to average
name any these values
numerical
variables?
• two types - discrete & continuous
Discrete (numerical)
• Isolated points along a number line

• usually counts of items


Continuous (numerical)
• Variable that can be any value in a
given interval

• usually measurements of something


Identify the following variables:
1. the color of cars in the teacher’s lot
Categorical

2. the number of calculators owned by


students at your school Discrete numerical

3. the zip code of an individual


Is money a measurement orCategorical
a count?

4. the amount of time it takes students to


drive to school Continuous numerical
5. the appraised value of homes
discreteinnumerical
your city
Classifying variables by the
number of variables in a data set
Suppose that the PE coach records the
height of each student in his class.

This is an example of a
univariate data

Univariate - data that describes a single


characteristic of the population
Classifying variables by the
number of variables in a data set
Suppose that the PE coach records the
height and weight of each student in his
class.
This is an example of a
bivariate data

Bivariate - data that describes two


characteristics of the population
Classifying variables by the
number of variables in a data set
Suppose that the PE coach records the
height, weight, number of sit-ups, and
number of push-ups for each student in
his class.
This is an example of a
multivariate data

Multivariate - data that describes more than two


characteristics (beyond the scope of this course)
Frequency Distribution and Bar
Charts for Categorical Data
Frequency Distribution for a categorical data . A table that deploys the
possible categories along with the associated frequencies and/or relative
frequencies.
Frequency: the frequency for a particular category is the number of
times the category appears in the dataset.
Relative frequency: the relative frequency for a particular category is
calculated as follows:
Relative frequency= frequency /Number of observations in the data
set
The relative frequency for a particular category is the proportion of
the observations that belong to that category.
Relative frequency Distribution- A frequency distribution that
includes relative frequencies.
Bar Chart
When to Use Categorical data

How to construct
– Draw a horizontal line; write the categories or
labels below the line at regularly spaced
intervals
– Draw a vertical line; label the scale using
frequency or relative frequency
– Place equal-width rectangular bars above each
category label with a height determined by its
frequency or relative frequency
Bar Chart (continued)
What to Look For
Frequently or infrequently occurring
categories

Collect the following data and then display the data in a


bar chart:
What is your favorite ice cream flavor?

Vanilla, chocolate, strawberry, or other


Dotplot
When to Use Small numerical data
sets

How to construct
– Draw a horizontal line and mark it with an
appropriate numerical scale
– Locate each value in the data set along the scale
and represent it by a dot. If there are two are
more observations with the same value, stack the
dots vertically
Dotplot (continued)
What to Look For
– The representative or typical value
– The extent to which the data values spread out
– The nature of the distribution along the number line
– The presence of unusual values

Collect the following data and then display the data in a dotplot:

How many body piercings do you


have?

You might also like