Slides Chp03 Stats 20221

biometry – bio220
chapter 3
describing data
1
• descriptive statistics
• numeric data:
– mean & median = location
– Standard deviation, variance & IQR = spread
– (coefficient of variation = spread)
• categorical data:
– proportion
2
mean
• measure of location (center)
• sample mean: arithmetic average of sample
values
• example: undulation rates of 8 gliding snakes,

measured in Hertz (cycles per second)
3
mean
0.9, 1.4, 1.2, 1.2

1.3, 2.0, 1.4, 1.6
4
mean
Yi : ith observation
Y: sample mean
n : number of observations (sample size) 5

mean
6
standard deviation
• measure of spread
• s.d. = s = average deviation from the mean
• variance = s2
7
standard deviation
sample variance
sum of squares
why square?
8
standard deviation
9
standard deviation
10
standard deviation
11
standard deviation
sample variance
sample s.d.
both are always positive values 12

the unbiased sample estimate of standard
deviation
• why divide by n-1?
• if divide by n, on average:
sample var < population var
• the smaller the sample  higher the
difference
13
deviation
https://en.wikipedia.org/wiki/Bias_of_an_estimator
Don`t try to fully

understand this slide. This
is just here to show you
that there is a
mathematical proof
sample variance pop variance 14

deviation
• if divide by n  sample variance becomes a
biased estimate of pop var (toward smaller
var)
• if we divide by n-1  sample variance is an
unbiased estimate of pop var
dividing by n dividing by n - 1
Population var
Sample variance
(estimates for
pop var)
15
the normal distribution's
standard deviation
• for normal
distributions:
• Y ± 2 s.d. ~ 95%
of the data
16
www.mathsisfun.com
the issue of rounding
• how to round the mean and s.d.?
– 0.346789452 vs 0.3 vs 0.35
• too many decimal places  difficult to interpret
• too much rounding  error in downstream
calculations
• one approach: one more decimal point than
the original data
– if original: 7.2, 4.3, ...  mean 5.62
17
coefficient of variation
• comparing variation among very different
populations with very different means
• e.g. weight of elephants vs weight of humans
– which one is more variable? which one has higher
variance?
18
• if means are different  s.d. tends to be
different
– sd elephant weight > sd human weight
– sd elephant weight > sd mouse weight
just because units / scales are larger
• how to normalize the effect of the mean, to

reflect variability?
19
• coefficient of variation (cv)  a standardized
measure for variability
20
mean and sd from freq tables
21
• for the mean: multiply each value with its freq
• for the sd: multiply each difference squared
with its freq
22
mean
standard
deviation
23
median and interquartile range
• median = middle point = 50th percentile = 0.5th
quantile
• if n is odd:
• median = Y(n+1)/2
• if n is even:
• median = ( Yn/2+ Y(n/2)+1 ) / 2
24
• first quartile = 25th percentile = 0.25 quantile
• third quartile = 75th percentile = 0.75 quantile
25
• quartiles: divide data into 4 pieces
• interquartile range (IQR) = third – first quartile
26
boxplot
Normally whiskers extend to min and max values. If there are values are out of
upper and lower limits (+/- 1.5*IQR) then outliers are specified and whiskers
extend to limits.
https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
27
boxplot
28
boxplot
29
mean vs median
• we can guess the distribution shape just
studying some statistics
• when distribution symmetric  mean ~
median, preferably mean
• when asymmetric  mean ≠ median, we use
midean, and each represent different properties
of the data
• their difference indicates the direction of the tail
30
comparing 3 distributions
31
comparing 3 distributions
32
mean vs median:
the mean will be closer to the tail
33
mean vs median:
the mean will be closer to the tail
34
proportions
• proportion: location for categorical variables
35
proportions
36
proportions
same as mean,
assuming
one category = 1,
others = 0
37
summary
38
summary
39
summary
Mean: increase by C
SD: every value will shift to the right by the units of C, box length stays the same
Variance: the same
Median: the value will be added by C
IR: shift our box to right, length the same
If multiply by C:
Mean : doubled
SD: DOUBLED
Variance: 4 times
Median: doubled
40
IR: move to right, increase of box size
exercise
• please solve all problems at the back of the
chapter 3
• Ask your questions at Office hours
41

Slides Chp03 Stats 20221

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slides Chp03 Stats 20221

Uploaded by

Copyright:

Available Formats

biometry – bio220

• example: undulation rates of 8 gliding snakes,

0.9, 1.4, 1.2, 1.2

n : number of observations (sample size) 5

both are always positive values 12

Don`t try to fully

sample variance pop variance 14

• how to normalize the effect of the mean, to

• quartiles: divide data into 4 pieces

• interquartile range (IQR) = third – first quartile

You might also like