You are on page 1of 29

MATH& 146

Lesson 3
Sections 1.1 and 1.2
Understanding Data

1
So What Is Statistics?
Let us begin with the Course Description:
"Introduction to the basic principles of probability,
descriptive statistics, and inferential statistics.
Topics include properties of probability, graphical
and tabular summaries of data, measures of
central tendency and variability, probability
distributions, confidence intervals, hypothesis
testing, and linear regression."

2
So What Is Statistics?
The Course Description is a bit wordy, so lets
make it simpler:
"Statistics is a way to get information from data."

3
The Statistical Process
No matter what line of work you select, you will
find yourself faced with decisions where an
understanding of data analysis is helpful.
Careful use of statistical methods (called the
statistical process) will enable you to use the data
to make informed decisions.

4
The Statistical Process
The steps to the statistical process include:
1) Carefully defining the research question
2) Gathering data
3) Accurately summarizing the data (known as
descriptive statistics)
4) Deriving and communicating meaningful
conclusions (known as inferential statistics)

5
Descriptive and Inferential
Statistics
The last two parts (descriptive and inferential statistics)
make up the two main branches of statistics:
Descriptive statistics involves methods of organizing,
picturing and summarizing information from data.
Inferential statistics involves methods of using
information from a sample to draw conclusions about
the population.

6
Some Random Factoids
There are 293 ways to make change for a dollar.
Heart attacks are more likely to occur on a
Monday.
Polar bears can eat as many as 86 penguins in
a single setting.
Over 2500 left handed people a year are killed
from using products made for right handed
people.
Banging your head against a wall uses 150
calories an hour.
7
Change for a Dollar
For the research question "How many ways are
there to make change for a dollar?" we could find
an answer by making a list of all possibilities:

Way Dollars Half-Dollars Quarters Dimes Nickels Pennies Total


1 1 0 0 0 0 0 $1.00
2 0 1 1 0 5 0 $1.00
3 0 0 1 2 4 35 $1.00
4 0 0 0 5 10 0 $1.00

293 0 0 0 0 0 100 $1.008
Change for a Dollar
Tables like below are called a data matrix, and are
a common way to organize information.
Each row (called a case) represents a different
way to make change for a dollar.

Way Dollars Half-Dollars Quarters Dimes Nickels Pennies Total


1 1 0 0 0 0 0 $1.00
2 0 1 1 0 5 0 $1.00
3 0 0 1 2 4 35 $1.00
4 0 0 0 5 10 0 $1.00

293 0 0 0 0 0 100 $1.009
Change for a Dollar
Each column (called a variable) represents a
specific piece of information about a case.
For example, Case 3 used 1 quarter, 2 dimes, 4
nickels, and 35 pennies to make change for a
dollar.
Way Dollars Half-Dollars Quarters Dimes Nickels Pennies Total
1 1 0 0 0 0 0 $1.00
2 0 1 1 0 5 0 $1.00
3 0 0 1 2 4 35 $1.00
4 0 0 0 5 10 0 $1.00

293 0 0 0 0 0 100 $1.0010
Mondays and Heart Attacks
For the research question "Are heart attacks more
likely on Mondays?" we could survey hospitals for
heart attack patients and determine the day of the
week they were admitted. The data matrix could
look something like:
Patient Day of week admitted
001 Tuesday
002 Friday
003 Monday
004 Tuesday

999 Monday 11
Mondays and Heart Attacks
Each case (the rows) represents a different patient
and the variable (the column) would represent the
day of the week admitted.

Patient Day of week admitted


001 Tuesday
002 Friday
003 Monday
004 Tuesday

999 Monday 12
Variables and Data
Variables are the characteristics of interest for each
person or thing in a population or sample.
Data are the possible values of the variable. They may
be numbers or they may be words.
Cases are the different rows of a data matrix, with each
case representing a different person or object.

13
Types of Variables
Variables can be put into the following categories:

Ordinal

Categorical
Regular
Categorical
All
Variables
Discrete

Numerical

Continuous

14
Categorical Variables
Categorical variables (also known as qualitative
variables) are the result of categorizing or describing
attributes of a population. Hair color, blood type, ethnic
group, the car a person drives, and the day of the
week that a person has a heart attack are all examples
of categorical variables.
Categorical data are generally described by words or
letters. For instance, hair color might be black, dark
brown, light brown, blonde, gray, or red. Blood type
might be AB+, O, B+, or Vulcan.

15
Ordinal Variables
Categorical variables that have a natural ordering to
them are called ordinal. Grades is an example of an
ordinal variable, since there is a natural order of best
(an A) to worst (an F).
To simplify analyses, any ordinal variables in this class
will be treated as categorical variables.

16
Numerical Variables
Numerical variables (also known as quantitative
variables) are the result of either counting or
measuring attributes of a population or sample.
Weight, height, and the number of penguins eaten by
a polar bear are examples of numerical variables.
Numerical data are always numbers and are usually
the data of choice because there are many methods
available for analyzing these data.
Note, however, that numbers are not always
numerical.
17
Numbers can be categorical
or numerical
For instance, ZIP Codes are categorical because
they are neither counts nor measurements.
One way to test whether numbers are numerical or
categorical is to try adding them. If it doesn't make
sense, then the numbers are likely categorical.

43210 56048 99258


Columbus, OH + Janesville, = Spokane, WA ?
MN
18
Discrete Variables
Numerical variables in which data are the result of
counting are called discrete variables.
For example, if you count the number of quarters
in your change, you might have 0, 1, 2, 3, etc.

19
Continuous Variables
Numerical variables in which data are the result of
measuring are continuous variables, assuming
that we can measure accurately.
If you and your friends carry backpacks with books
in them to school, the numbers of books in the
backpacks is a discrete variable and the weights of
the backpacks is a continuous variable. The color
of the backpacks is a categorical variable.

20
Example 1
Determine the correct variable type: regular
categorical, ordinal, discrete, or continuous.
a) The number of pairs of shoes you own.
b) The type of car you drive.
c) A soldier's military rank.
d) The distance it is from your home to school.
e) Your highest level of education.
f) Movie ratings. (Careful with this one.)

21
Relationships Between
Variables
Many analyses are motivated by a researcher
looking for a relationship between two or more
variables. A social scientist may like to answer
some of the following questions:
a) Do counties with higher poverty rates receive
more federal spending per capita? (Num-Num)
b) Which counties have a higher average income:
those that enact one or more smoking bans or
those that do not? (Num-Cat)

22
Relationships Between
Variables
Scatterplots are one type of graph used to study the
relationship between two numerical variables. For
instance, the figure below compares the federal
spending per capita and the poverty rate (as a
percent) for each of the 3,143 counties in the country.

23
Relationships Between
Variables
Each point on the plot represents a single county. For
instance, the highlighted dot corresponds to County
1088 in the county data set: Owsley County,
Kentucky, which had a poverty rate of 41.5% and
federal spending of $21.50 per capita.

24
Relationships Between
Variables
The scatterplot suggests a relationship between the
two variables: counties with a high poverty rate also
tend to have slightly more federal spending. We might
brainstorm why this relationship exists and investigate
each idea to determine the most reasonable
explanation.

25
Relationships Between
Variables
The federal spending and poverty variables are said to
be associated because the plot shows a discernible
pattern. Two variables that show some connection
with one another are called associated, or
dependent, variables.

26
Relationships Between
Variables
When a graph has an upward trend, we say there is a
positive association. In the graph below, counties
with higher poverty rates tend to receive more federal
spending per capita.

27
Relationships Between
Variables
Likewise, graphs that show a downward trend are said
to be negatively associated.
If two variables are not associated, then they are said
to be independent. That is, two variables are
independent if there is no evident relationship between
the two.

28
Example 2
The graph below shows a scatterplot of
homeownership versus the percent of units that are in
multi-unit structures (e.g. apartments, condos) for all
3,143 counties in the country. Are these variables
associated?

29

You might also like