Professional Documents
Culture Documents
2 Distributions
2 Distributions
ioc.pdf
It provides simple summaries about the observations that have been made. Such
summaries may be either quantitative, i.e. summary statistics, or visual, i.e.
simple-to-understand graphs.
ioc.pdf
It provides simple summaries about the observations that have been made. Such
summaries may be either quantitative, i.e. summary statistics, or visual, i.e.
simple-to-understand graphs.
ioc.pdf
It provides simple summaries about the observations that have been made. Such
summaries may be either quantitative, i.e. summary statistics, or visual, i.e.
simple-to-understand graphs.
ioc.pdf
A central tendency
... is a typical or central value for a probability distribution.
ioc.pdf
A central tendency
... is a typical or central value for a probability distribution.
ioc.pdf
A central tendency
... is a typical or central value for a probability distribution.
ioc.pdf
Median
Mode
But also
Geometric mean
Trimean
etc.
ioc.pdf
Denition
The arithmetic mean is dened as being equal to the sum of the numerical values of each and
every observation divided by the total number of observations.
Symbolically, if we have a data set with values x1 , x2 , . . . , xn , then the arithmetic mean is
∑ni=1 xi
x=
n
ioc.pdf
Denition
The arithmetic mean is dened as being equal to the sum of the numerical values of each and
every observation divided by the total number of observations.
Symbolically, if we have a data set with values x1 , x2 , . . . , xn , then the arithmetic mean is
∑ni=1 xi
x=
n
If the data set is a statistical population (i.e., consists of every possible observation and
not just a subset of them), then the mean of that population is called the population
mean and denoted by µ = ∑ xi /n.
If the data set is a statistical sample (a subset of the population), we call the statistic
resulting from this calculation a sample mean and denote it by M = ∑ xi /N , where N is
the size of the sample.
ioc.pdf
Symbolically, if we have a data set with values x1 , x2 , . . . , xn , then the arithmetic mean is
∑ni=1 xi
x=
n
Example
Let X = {2500, 2700, 2400, 2300, 2550, 2650, 2750, 2450, 2600, 2400} is a set of the monthly salary
of the employees of a rm. The population mean is
2500 + 2700 + 2400 + 2300 + 2550 + 2650 + 2750 + 2450 + 2600 + 2400
µ= = 2530.
10
Selecting a sample of four values, say 2500, 2300, 2550 and 2450, we can compute the mean of
this sample:
2500 + 2300 + 2550 + +2450 9800
M= = = 2450.
4 4
ioc.pdf
ioc.pdf
ioc.pdf
Note that the mode may be very dierent from the mean and the median.
ioc.pdf
ioc.pdf
Example
Let X = {2500, 2700, 2400, 2300, 2550, 2650, 2750, 2450, 2600, 2400}
Then
Mean: 2530
Median: 2525
Mode: 2400
ioc.pdf
The mean, median, and mode are by far the most commonly used measures of central
tendency
We consider here tree additional measures of central tendency: the trimean, the geometric
mean, and the trimmed mean.
ioc.pdf
Q1 + 2Q2 + Q3
TM =
4
Example
Let X = {2500, 2700, 2400, 2300, 2550, 2650, 2750, 2450, 2600, 2400}
Then
2412.5 + 2 · 2525 + 2637.5 10100
TM = = = 2525
4 4
ioc.pdf
Q1 + 2Q2 + Q3
TM =
4
Example
Let X = {2500, 2700, 2400, 2300, 2550, 2650, 2750, 2450, 2600, 2400}
Then
2412.5 + 2 · 2525 + 2637.5 10100
TM = = = 2525
4 4
ioc.pdf
Formally,
1/n √
n x ·...·x
G (x1 , . . . , xn ) = ∏ xi = 1 n
Example
Let X = {4, 1, 1/32}
Then
G (x) = 3 4 · 1 · 1/32 = 3 1/8 = 1/2
p p
Geometric mean can be computed as arithmetic mean from the data in logarithmic scale,
and then using the exponentiation to return the computation to the original scale:
" #
1/n n
1
∏ xi = exp ∑
n i=
xi
1
1
log2 4 + log2 1 + log2 2−5
For example,
G (4, 1, 1/32) = exp 2, =
3
1
= exp 2, (2 + 0 − 5) =
3
1
= exp (2, −1) = 2−1 =
ioc.pdf
2
margarita.spitsakova@taltech.ee ICY0006: Lecture 2 13 / 59
Geometric mean
The geometric mean is dened as the n-th root of the product of n numbers,
Formally,
1/n √
n x ·...·x
G (x1 , . . . , xn ) = ∏ xi = 1 n
Example
Let X = {4, 1, 1/32}
Then
G (x) = 3 4 · 1 · 1/32 = 3 1/8 = 1/2
p p
Geometric mean can be computed as arithmetic mean from the data in logarithmic scale,
and then using the exponentiation to return the computation to the original scale:
" #
1/n n
1
∏ xi = exp ∑
n i=
xi
1
1
log2 4 + log2 1 + log2 2−5
For example,
G (4, 1, 1/32) = exp 2, =
3
1
= exp 2, (2 + 0 − 5) =
3
1
= exp (2, −1) = 2−1 =
ioc.pdf
2
margarita.spitsakova@taltech.ee ICY0006: Lecture 2 13 / 59
Geometric mean
The geometric mean is dened as the n-th root of the product of n numbers,
Formally,
1/n √
n x ·...·x
G (x1 , . . . , xn ) = ∏ xi = 1 n
Example
Let X = {4, 1, 1/32}
Then
G (x) = 3 4 · 1 · 1/32 = 3 1/8 = 1/2
p p
Geometric mean can be computed as arithmetic mean from the data in logarithmic scale,
and then using the exponentiation to return the computation to the original scale:
" #
1/n n
1
∏ xi = exp ∑
n i=
xi
1
1
log2 4 + log2 1 + log2 2−5
For example,
G (4, 1, 1/32) = exp 2, =
3
1
= exp 2, (2 + 0 − 5) =
3
1
= exp (2, −1) = 2−1 =
ioc.pdf
2
margarita.spitsakova@taltech.ee ICY0006: Lecture 2 13 / 59
Trimmed mean
A truncated mean or trimmed mean involves the calculation of the mean after discarding
given parts of a probability distribution at the high and low end
Terminology:
A mean trimmed 10% is a mean computed with 10% of the scores trimmed o: 5% from
the bottom and 5% from the top.
A mean trimmed 50% is computed by trimming the upper 25% of the scores and the
lower 25% of the scores and computing the mean of the remaining scores.
Example
Consider the data set consisting of:
{92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, −40, 101, 86, 85, 15, 89, 89, 28, −5, 41}
(n = 20, mean = 101.5)
The 5-th percentile (-6.75) lies between -40 and -5, while the 95-th percentile (148.6) lies
between 101 and 1053 (values shown in bold). Then, a 10% trimmed mean would result in the
following:
{92, 19, 101, 58, 91, 26, 78, 10, 13, 101, 86, 85, 15, 89, 89, 28, −5, 41} (n = 18, mean = 56.5ioc.pdf
)
Terminology:
A mean trimmed 10% is a mean computed with 10% of the scores trimmed o: 5% from
the bottom and 5% from the top.
A mean trimmed 50% is computed by trimming the upper 25% of the scores and the
lower 25% of the scores and computing the mean of the remaining scores.
Example
Consider the data set consisting of:
{92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, −40, 101, 86, 85, 15, 89, 89, 28, −5, 41}
(n = 20, mean = 101.5)
The 5-th percentile (-6.75) lies between -40 and -5, while the 95-th percentile (148.6) lies
between 101 and 1053 (values shown in bold). Then, a 10% trimmed mean would result in the
following:
{92, 19, 101, 58, 91, 26, 78, 10, 13, 101, 86, 85, 15, 89, 89, 28, −5, 41} (n = 18, mean = 56.5ioc.pdf
)
ioc.pdf
Variability
... refers to how spread out a group of scores is.
The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is.
ioc.pdf
Variability
... refers to how spread out a group of scores is.
The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is.
ioc.pdf
Variability
... refers to how spread out a group of scores is.
The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is.
ioc.pdf
The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is.
Measures of variability:
There are four frequently used measures of variability:
range
interquartile range
variance
standard deviation
But also
etc.
ioc.pdf
ioc.pdf
Range
... is simply the highest score minus the lowest score.
Examples: On Quiz 1, the lowest score is 5 and the highest score is 9. Therefore,
the range is 4.
The range on Quiz 2 was larger: the lowest score was 4 and the highest
score was 10. Therefore the range is 6.
The range of the group of numbers 10, 2, 5, 6, 7, 3, 4 is 10 − 2 =8
The range of 10 numbers 99, 45, 23, 67, 45, 91, 82, 78, 62, 51 is 99 − 23 = 76
ioc.pdf
Range
The interquartile range (IQR) is the range of the middle 50% of the scores in a
distribution.
It is computed as follows:
IQR = Q3 − Q1
Using terminology of box plots, the interquartile range is referred to as the H-spread.
ioc.pdf
Variability can also be dened in terms of how close the scores in the distribution are to
the middle of the distribution.
Using the mean as the measure of the middle of the distribution, the variance is dened as
the average squared dierence of the scores from the mean.
ioc.pdf
Using the mean as the measure of the middle of the distribution, the variance is dened as
average squared dierence of the scores from the mean.
the
ioc.pdf
Using the mean as the measure of the middle of the distribution, the variance is dened as
average squared dierence of the scores from the mean.
the
Averages:
Scores: 7.0 Absolute deviations: 1.0
Deviations from Mean: 0.0 Squared deviations: 1.5
ioc.pdf
ioc.pdf
If you are computing variances with a hand calculator. the following formula is easier to
use:
(∑ X )2
!
1
σ2 = ∑X2 −
n n
(∑ X )2 2
I
n = 140
20 = 980
ioc.pdf
Property
2
!
1 1 (∑ X )
(X − µ)2 = X2 −
n∑ n ∑ n
Proof
2
2= ∑X
∑(X − µ) ∑ X − =
n
2 1 2
= ∑X2 − X ∑X + 2 ∑ ∑X =
n∑ n
2 2 n 2 1 2
= ∑X2 − ∑X + 2 ∑X = ∑X2 − ∑X
n n n
ioc.pdf
m.o.t.t.
If the variance in a sample is used to estimate the variance in a population, then the
previous formula underestimates the variance and
∑(X − M)2
s2 =
N −1
where s2 is the estimate of the variance and M is the sample mean, and N is the size of
the sample.
ioc.pdf
There is also the alternate formula to compute the estimate for the variance:
(∑ X )2
!
1
s2 = ∑X2 −
N −1 N ioc.pdf
There is also the alternate formula to compute the estimate for the variance:
(∑ X )2
!
1
s2 = ∑X2 −
N −1 N ioc.pdf
Denitions:
The sample standard deviation of a set of N sample data is the number s given by the formula
(∑ X )2
s s
∑(X − M)2 ∑X2 − N
s= =
N −1 N −1
.
The population standard deviation of a set of n population data is the number σ given by the
formula
∑(X − µ)2
r
σ=
n
ioc.pdf
.
Denitions:
The sample standard deviation of a set of N sample data is the number s given by the formula
(∑ X )2
s s
∑(X − M)2 ∑X2 − N
s= =
N −1 N −1
.
The population standard deviation of a set of n population data is the number σ given by the
formula
∑(X − µ)2
r
σ=
n
ioc.pdf
.
ioc.pdf
A distribution is skewed if one tail extends out further than the other.
A distribution has a positive skew (is skewed to the right) if the tail to the right is longer.
It has a negative skew (skewed to the left) if the tail to the left is longer.
Pearson's measure:
3(Mean − Median)
σ
Third moment:
(X − µ)3
∑ σ3
ioc.pdf
Comparison of mean, median and mode of two log-normal distributions with dierent skewness.
ioc.pdf
Kurtosis measure shows how fat or thin the tails of a distribution are relative to a normal
distribution.
ioc.pdf
ioc.pdf
ioc.pdf
Square root
Power
Logarithmic
Cosine
ioc.pdf
10
2 = 100 log10 100 = 2
A series of numbers that increase proportionally will increase in equal amounts when
converted to logs.
For example
log(10 · 100) = log10 10 + log10 100 = 1 + 2 = 3
log(100/10) = log10 100 − log10 10 = 2 − 1 = 1
ioc.pdf
ioc.pdf
Y = bX + A
will have
I a mean of bµ + A,
I a standard deviation of bσ , and
I a variance of b2 σ 2 .
ioc.pdf
C = 0.55556F − 17.7778
is used.
ioc.pdf
The leaf contains the last digit of the number and the stem contains all of the other digits.
In the case of very large numbers, the data values may be rounded to a particular place
value.
key: 6|3=63
leaf unit: 1.0
stem unit: 10.0
4 |4 6 7 9
5 |
6 |3 4 6 8 8
7 |2 2 5 6
8 |1 4 8
9 |
10 |6 ioc.pdf
Sturges' rule: number of intervals = 1 + log2 (N); gives 11 classes for 1000 observations.
Rice rule: twice the cube root of the number of observations; gives 20 classes for 1000
ioc.pdf
observations.
Sturges' rule: number of intervals = 1 + log2 (N); gives 11 classes for 1000 observations.
Rice rule: twice the cube root of the number of observations; gives 20 classes for 1000
ioc.pdf
observations.
ioc.pdf
ioc.pdf
Example: a task in which the goal is to move a computer cursor to a target on the screen
as fast as possible. There are two data sets for targets of dierent sizes:
ioc.pdf
ioc.pdf
An experiment with 16 men and 31 women to name as quickly as possible all the colours
of the 30 coloured rectangles shown.
Their times (in seconds) were recorded and in order to compare, were plotted for each
gender.
14 17 18 19 20 21 29
15 17 18 19 20 22
16 17 18 19 20 23
16 17 18 20 20 24
17 18 18 20 21 24
For these data, the 25th percentile is 17, the 50th percentile is 19, and the 75th percentile
is 20.
For the men (whose data are not shown), the 25th percentile is 19, the 50th percentile is
22.5, and the 75th percentile is 25.5.
ioc.pdf
For the women, the 25th percentile is 17, the 50th percentile is 19, and the 75th
percentile is 20.
For the men (whose data are not shown), the 25th percentile is 19, the 50th percentile is
22.5, and the 75th percentile is 25.5.
ioc.pdf
The terminology used for proceeding further (using data for women as an example):
ioc.pdf
ioc.pdf
ioc.pdf
ioc.pdf
Percent increase in three stock indexes from May 24th 2000 to May 24th 2001.
ioc.pdf
ioc.pdf
The number of people playing various card games on a Sunday and on a Wednesday.
ioc.pdf