You are on page 1of 12

9/8/2022

Descriptive Statistics
◦ Helps us to make sense out of large data sets
(words or numbers)
◦ Organizes data in meaningful ways
◦ Easier to find trends, highs, lows, etc.

Data
◦ 4 different measurement scales of data
Nominal, Ordinal, Interval, and Ratio

◦ Important to know which type of data you have so


you can perform the correct analysis

QUALITATIVE

Nominal (Categorical) Data


◦ Numbers are for ‘naming’ purposes
◦ Does not indicate quantity or rank
◦ Categories must be mutually exclusive
_____________________
◦ Mathematical calculations are meaningless
(only counting for each category)
◦ Examples: can only be a part of one class or group

- eye colour of students in a classroom


- hockey jersey numbers
- what program students are in

1
9/8/2022

Ordinal Data QUALITATIVE

◦ Numbers/letters used to __________________


rank or show
relative order

Some mathematical calculations


◦ Intervals between the values are not
necessarily equal
◦ Examples:
- how much pain a person is in (9/10) vs (3/10)
- relative/absolute —> does not matter
- satisfaction surveyed

difference between the choices are the same


Interval Data
◦ Includes same characteristics as ordinal
data but the intervals between numbers
are EQUAL
◦ Zero does not indicate absence of
measurement
(ie 0 degrees on Celsius scale)
◦ Can not be multiplied or divided
◦ Examples
- temperature scale (C or F)
- **can be negative!
- ratio data is a subset of interval data

2
9/8/2022

Ratio Data
◦ Special type of Interval data
◦ Numbers can be compared to each other
◦ Can be multiplied and divided
◦ Zero represents the complete absence of
value or a characteristic
(ie $0 represents the complete absence of
money)
◦ Examples:
- money
- age, height, weight
- Kelvin scale temperature
Other source:
http://www.graphpad.com/faq/viewfaq.cfm?faq=1089
5

Nominal
Ordinal
(Categorical) Interval Data Ratio Data
(Ranked) Data
Data
can be multiplied and divded
Data is only
Data is ranked but There is a
classified 'Zero'
intervals between meaningful
represents
data are not difference between
Examples the absence
necessarily values. Intervals
include: Eye of value.
considered equal are considered
color The ratio
equal. The
between
presence of zero
two values
does not represent
Examples is
absence of
include: Weather illustrated.
measurement
condition
warnings, Road
conditions
Examples
Examples include:
include:
Temperature,
Income,
Dress size.
Age.
6

3
9/8/2022

What are some problems with collecting the


various forms of data:
◦ Nominal
easiest to collect
◦ Ordinal
easy to collect; able to make less significant calculations; put in a
range, not exact
◦ Interval

◦ Ratio
hardest to collect
more meaningful; able to make significant calculations

Indicate if the following variables are


◦ Qualitative or quantitative
◦ Discrete or continuous
◦ Nominal, ordinal, interval, or ratio
qualitative quantitative

Gender
nominal, discrete (qualitative)

Income
ratio, continuous (quantitative)

Credit score (good, average, low)


ordinal, discrete (qualitiative)

8
difference between ordinal and nominal: ordinal has an order!

4
9/8/2022

Ungrouped/Raw Data
◦ Data not grouped or categorized
list of data

Grouped Data
◦ Data is categorized or placed in classes
Easier to use and understand

Frequency Distribution or Data Table


◦ Data is grouped into mutually exclusive classes
◦ Data points are then placed in one group or class

3 steps to making a Data Table

1. Determining Size and Number of Classes

Approach A: Intuitive Approach


◦ Common sense approach
◦ Great for data with ages or grades
ex. size of range? (1-10? 0-100?)

Approach B: Sturges Method


◦ Method of estimation used when data is unusual

◦ Optimal # of classes (OOC) = 1+3.33xLog(N)

◦ Optimal class width (OCW) = Range


OOC

10

10

5
9/8/2022

Examples - page 10 & 11 of course notes


package

◦ Temperature of last 50 days

◦ Midterm marks of 40 students

◦ NHL 2009-2010 team statistics

11

11

2. Determining Class Boundaries

Class Boundaries:
◦ Set clear limits; no overlap (mutually exclusive)
◦ Every observation must be included (exhaustive)
◦ Boundaries may depend on data type
- 10 to 20; 10 is lowest class boundary, 20 is highest class boundary
-

12

12

6
9/8/2022

Set #1 Set #2 Set #3


0 – 10 0 – 10 0–9
10 – 20 11 – 20 10 – 19
20 – 30 21 – 30 20 – 29

X 10 falls into more than 1 class X different widths in each class Good

Good for discrete values


Set #4
0 to under 10
10 to under 20
20 to under 30
Good
Good for continuous data

13

13

3. Determining Class Frequencies

Absolute Frequencies (f)


◦ Count number of values in each class
◦ Sum of absolute frequencies must = N

Relative Frequencies
◦ Absolute Frequency divided by N
◦ Sum of relative frequencies = 1
always equals 1
Percent Frequencies
◦ Relative frequency multiplied by 100%
◦ Sum of percent frequencies = 100%

14

14

7
9/8/2022

Class exercise – pages 13 and 14

Frequencies

Class Absolute Relative Percent

15

15

From the TV show “How


I met your mother”

The character is using a


pie chart to show his
favorite bars and a bar
graph to show his
favorite pies.

(one of the funniest moments in TV history)

16

16

8
9/8/2022

1&2 Bar Graphs or Column Graphs


Vertical (Y) axis represents magnitude
◦ Nominal data along the x-axis
◦ Graph can be clustered or stacked to show 2 or
more data sets

17

17

3. Pie Charts
Used when intent is to focus on one bar/age
group
Emphasizes parts or clearly marks proportions

18

18

9
9/8/2022

4. Line Graphs
Can be used when both axes contain interval
scale data
◦ Typically x-axis represents time
◦ Good to show change over time
Edmonton Oilers Wins per Season
45

40

35

30

25
Wins

20

15

10

0
2005/06 2006/07 2007/08 2008/09 2009/10 2010/11 2011/12 2012/13 2013/14 2014/15

19

19

5. Scatter Plot Graphs


Typically uses interval scale number for both
the x and y axis
◦ Plots one variable against the other
◦ Regression lines can be drawn to show any trend in
the data
Do wins increase as salary increases?
How do wins change with a change in salary?

20

20

10
9/8/2022

Scatter Plots – page 19


Major League Baseball Salaries
2010 MLB Salaries by Team
TEAM TOTAL PAYROLL Wins
Arizona
60,718,166 65
Diamondbacks
Atlanta Braves 84,423,666 91
Baltimore Orioles 81,612,500 66
Boston Red Sox 162,447,333 89 Complete data
Chicago Cubs 146,609,000 75
Chicago White Sox 105,530,000 88
available in course
package

21

21

Scatter Plot Graphs


◦ Regression (trend) line can be drawn to show how
as Salary increases, Wins also increase.
Baseball Salary v. Wins
120

100

80
Wins

60

40

20

0
0 50 100 150 200 250
Salary (Millions)

22

22

11
9/8/2022

6. Histograms
◦ Special bar graph that represents the frequency
distribution of continuous (interval or ratio)
variables.
◦ No gaps between the columns!
Incomplete Histogram
Histogram
20
20
15
15
10
10
5 5
0 0
0 to 50 to 100 to 150 to 200 to 0 to 50 to 100 to 150 to 200 to
under under under under under under under under under under
50 100 150 200 250 50 100 150 200 250

nominal data is distinct and separate no spacing


starts as a line graph, bar graph, etc

23

23

12

You might also like