You are on page 1of 35

Presenting and

summarizing data
Spori Goran, PhD.
http://kif.hr/predmet/mki
http://www.science4performance.com/
Statistics ?
The Science of collecting,
organizing, analyzing,
interpreting and presenting data

Topics we will review
Descriptive Statistics
Frequency Distributions and Histograms
Relative / Cumulative Frequency
Measures of Central Tendency
Mean, Median, Mode, Midrange

Topics (continued)
Measures of Dispersion (Variation)
Range, Standard Deviation,
Variance and Coefficient of variation
Shape
Symmetric, Skewed, using Box-and-
Whisker Plots
Quartile
Statistical Relationships
Correlation , Covariance
A collection of quantitative measures and
ways of describing data. This includes:
Frequency distributions & histograms,
measures of central tendency
and
measures of dispersion

Descriptive Statistics
Descriptive Statistics
Collect Data e.g. Survey
Present Data e.g. Tables and Graphs
Characterize Data e.g. Mean

n
x
i

A Characteristic of a:
Population is a Parameter
Sample is a Statistic.
Collection of Data
Survey/questionnaires/interviews
Direct observation
Secondary data source (e.g., Medical charts)

Presenting Data
Graphics
The visual representation of data may be used not
only to present results/findings in the data, but
may also be used to learn about the data.
Summary Measures in Descriptive
Statistics
Central Tendency
Mean
Median
Mode
Midrange
Quartile
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of
Variation
Range
Measures of Central Tendency
Central Tendency
Mean Median Mode
Midrange
The Mean (Arithmetic Average)
It is the Arithmetic Average of data values:


The Most Common Measure of Central Tendency
Affected by Extreme Values (Outliers)
n
x
n
1 i
i

=
n
x x x
n 2 i
+ - - - + +
=
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
= x
Sample Mean
The Median
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
Median = 5
Important Measure of Central Tendency
In an ordered array, the median is the
middle number.
If n is odd, the median is the middle number.
If n is even, the median is the average of the 2
middle numbers.
Not Affected by Extreme Values
The Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
A Measure of Central Tendency
Value that Occurs Most Often
Not Affected by Extreme Values
There May Not be a Mode
There May be Several Modes
Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6
No Mode
Midrange
A Measure of Central Tendency
Average of Smallest and Largest
Observation:

Affected by Extreme Value
2
x x
smallest est l
+
=
arg
Midrange
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Midrange = 5 Midrange = 5
Summary Measures in Descriptive
Statistics
Central Tendency
Mean
Median
Mode
Midrange
Quartile
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of
Variation
Range
Quartiles
Not a Measure of Central Tendency
Split Ordered Data into 4 Quarters
Position of i-th Quartile: position of point
25% 25% 25% 25%
Q
1
Q
2
Q
3

Q
i(n+1)
i
=
4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Position of Q
1
= 2.50 Q
1
=12.5
=
1(9 + 1)
4
Quartiles
Not a Measure of Central Tendency
Split Ordered Data into 4 Quarters
Position of i-th Quartile: position of point
25% 25% 25% 25%
Q
1
Q
2
Q
3

Q
i(n+1)
i
=
4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Position of Q
3
= 7.50 Q
3
=19.5
=
3(9 + 1)
4
Summary Measures
Central Tendency
Mean
Median
Mode
Midrange
Quartile
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of
Variation
Range
Measures of Dispersion (Variation)
Variation
Variance Standard Deviation Coefficient of
Variation
Population
Variance
Sample
Variance
Population
Standard
Deviation
Sample
Standard
Deviation
Range
Understanding Variation
The more Spread out or dispersed data
the larger the measures of variation
The more concentrated or homogenous the data
the smaller the measures of variation
If all observations are equal
measures of variation = Zero
All measures of variation are Nonnegative


Measure of Variation
Difference Between Largest & Smallest
Observations:
Range =
Ignores How Data Are Distributed:
The Range
Smallest rgest La
x x
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5


Important Measure of Variation
Shows Variation About the Mean:
For the Population:
For the Sample:
Variance
N
X
i


=
2
2

o
1
2
2


=
n
X X
s
i
For the Population: use N in the
denominator.
For the Sample : use n - 1
in the denominator.


Most Important Measure of Variation
Shows Variation About the Mean:
For the Population:
For the Sample:
Standard Deviation
( )
N
X
i


=
2

o
( )
1
2


=
n
X X
s
i
For the Population: use N in the
denominator.
For the Sample : use n - 1
in the denominator.


Sample Standard Deviation
( )
1
2


=
n
X X
i
For the Sample : use n - 1
in the denominator.
Data: 10 12 14 15 17 18 18 24
s =
n = 8 Mean =16
1 8
16 24 16 18 16 17 16 15 16 14 16 12 16 10
2 2 2 2 2 2 2

+ + + + + + ) ( ) ( ) ( ) ( ) ( ) ( ) (
= 4.2426
s
: X
i
Comparing Standard Deviations
1
2


n
X X
i
s =
= 4.2426
( )
N
X
i


=
2

o
= 3.9686
Value for the Standard Deviation is larger for data considered as a Sample.
Data : 10 12 14 15 17 18 18 24
: X
i
N= 8 Mean =16
Comparing Standard Deviations
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C
Coefficient of Variation
Measure of Relative Variation
Always a %
Shows Variation Relative to Mean
Used to Compare 2 or More Groups
Formula ( for Sample):
100%
|
.
|

\
|
=
X
S
CV
Comparing Coefficient of Variation
Group A: Average Health Measure = 50
Standard Deviation = 5
Group B: Average Health Measure = 100
Standard Deviation = 5
100%
|
.
|

\
|
=
X
S
CV
Coefficient of Variation:
Group A: CV = 10%
Group B: CV = 5%
Shape
Describes How Data Are Distributed
Measures of Shape:
Symmetric or skewed
Shape
Describes How Data Are Distributed
Measures of Shape:
Symmetric or skewed
Symmetric
Mean = Median = Mode
-0.5 <0 < 0.5
Shape
Describes How Data Are Distributed
Measures of Shape:
Symmetric or skewed
Left-Skewed Symmetric
Mean = Median = Mode Mean Median Mod
e
< -1 -0.5 <0 < 0.5
Shape
Describes How Data Are Distributed
Measures of Shape:
Symmetric or skewed
Right-Skewed Left-Skewed Symmetric
Mean = Median = Mode Mean Median Mode Median Mean Mod
e
< -1
> 1
-0.5 <0 < 0.5
Negatively Skewed Positively Skewed
Box-and-Whisker Plot
Graphical Display of Data Using
5-Number Summary
Median
4 6 8 10
12
Q
3
Q
1
X
largest
X
smallest
Distribution Shape &
Box-and-Whisker Plots
Right-Skewed Left-Skewed Symmetric
Q
1
Median Q
3
Q
1
Median Q
3
Q
1
Median Q
3
Summary
Discussed Measures of Central Tendency
Mean, Median, Mode, Midrange
Quartiles
Addressed Measures of Variation
The Range, Interquartile Range, Variance,
Standard Deviation, Coefficient of Variation
Determined Shape of Distributions
Symmetric, Skewed, Box-and-Whisker Plot
Mean

= Median

= Mode Mean

Median

Mode Mode Median

Mean