You are on page 1of 81

SUMMARIZING DISTRIBUTIONS

David M. Lane. et al. Introduction to Statistics : pp. 132181

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 1 / 59


Descriptive statistics
Descriptive statistics is quantitatively describing the main features of a collection of
information.

It provides simple summaries about the observations that have been made. Such
summaries may be either quantitative, i.e. summary statistics, or visual, i.e.
simple-to-understand graphs.

It involves two kinds of analysis:

Univariate analysis: describing the distribution of a single variable, including


I central tendency (mean, median, and mode)
I dispersion (range and quantiles of the data-set, measures of
spread such as the variance and standard deviation)
I shape of the distribution (skewness and kurtosis
Bivariate analysis: more than one variable are involved and describing the relationship

between pairs of variables. In this case, descriptive statistics include:


I Cross-tabulations and contingency tables
I Graphical representation via scatterplots
I Quantitative measures of dependence
I Descriptions of conditional distributions

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 2 / 59


Descriptive statistics
Descriptive statistics is quantitatively describing the main features of a collection of
information.

It provides simple summaries about the observations that have been made. Such
summaries may be either quantitative, i.e. summary statistics, or visual, i.e.
simple-to-understand graphs.

It involves two kinds of analysis:

Univariate analysis: describing the distribution of a single variable, including


I central tendency (mean, median, and mode)
I dispersion (range and quantiles of the data-set, measures of
spread such as the variance and standard deviation)
I shape of the distribution (skewness and kurtosis
Bivariate analysis: more than one variable are involved and describing the relationship

between pairs of variables. In this case, descriptive statistics include:


I Cross-tabulations and contingency tables
I Graphical representation via scatterplots
I Quantitative measures of dependence
I Descriptions of conditional distributions

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 2 / 59


Descriptive statistics
Descriptive statistics is quantitatively describing the main features of a collection of
information.

It provides simple summaries about the observations that have been made. Such
summaries may be either quantitative, i.e. summary statistics, or visual, i.e.
simple-to-understand graphs.

It involves two kinds of analysis:

Univariate analysis: describing the distribution of a single variable, including


I central tendency (mean, median, and mode)
I dispersion (range and quantiles of the data-set, measures of
spread such as the variance and standard deviation)
I shape of the distribution (skewness and kurtosis
Bivariate analysis: more than one variable are involved and describing the relationship

between pairs of variables. In this case, descriptive statistics include:


I Cross-tabulations and contingency tables
I Graphical representation via scatterplots
I Quantitative measures of dependence
I Descriptions of conditional distributions

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 2 / 59


Contents
1 Central Tendency
2 Variability
3 Shape
4 Linear Transformations
5 Logarithms
6 Eects of Linear Transformations
7 Graphing Quantitative Variables
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 3 / 59


Next section
1 Central Tendency
2 Variability
3 Shape
4 Linear Transformations
5 Logarithms
6 Eects of Linear Transformations
7 Graphing Quantitative Variables
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 4 / 59


What is central tendency?

A central tendency
... is a typical or central value for a probability distribution.

It is also called a measure of central tendency or a centre or a location of the distribution.

Conversationally, it is called average.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 5 / 59


What is central tendency?

A central tendency
... is a typical or central value for a probability distribution.

It is also called a measure of central tendency or a centre or a location of the distribution.

Conversationally, it is called average.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 5 / 59


What is central tendency?

A central tendency
... is a typical or central value for a probability distribution.

It is also called a measure of central tendency or a centre or a location of the distribution.

Conversationally, it is called average.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 5 / 59


What is central tendency?
A central tendency
... is a typical or central value for a probability distribution.

It is also called a measure of central tendency or a centre or a location of the distribution.

Conversationally, it is called average.

Measures of central tendency


The following may be applied to one-dimensional data:

Arithmetic mean (or simply, mean)

Median

Mode

But also

Geometric mean

Trimean

Truncated mean (or trimmed mean)

etc.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 5 / 59


Arithmetic mean

Denition
The arithmetic mean is dened as being equal to the sum of the numerical values of each and
every observation divided by the total number of observations.

Symbolically, if we have a data set with values x1 , x2 , . . . , xn , then the arithmetic mean is

∑ni=1 xi
x=
n

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 6 / 59


Arithmetic mean

Denition
The arithmetic mean is dened as being equal to the sum of the numerical values of each and
every observation divided by the total number of observations.

Symbolically, if we have a data set with values x1 , x2 , . . . , xn , then the arithmetic mean is

∑ni=1 xi
x=
n

If the data set is a statistical population (i.e., consists of every possible observation and
not just a subset of them), then the mean of that population is called the population
mean and denoted by µ = ∑ xi /n.
If the data set is a statistical sample (a subset of the population), we call the statistic
resulting from this calculation a sample mean and denote it by M = ∑ xi /N , where N is
the size of the sample.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 6 / 59


Arithmetic mean
Denition
The arithmetic mean is dened as being equal to the sum of the numerical values of each and
every observation divided by the total number of observations.

Symbolically, if we have a data set with values x1 , x2 , . . . , xn , then the arithmetic mean is

∑ni=1 xi
x=
n

Example
Let X = {2500, 2700, 2400, 2300, 2550, 2650, 2750, 2450, 2600, 2400} is a set of the monthly salary
of the employees of a rm. The population mean is

2500 + 2700 + 2400 + 2300 + 2550 + 2650 + 2750 + 2450 + 2600 + 2400
µ= = 2530.
10

Selecting a sample of four values, say 2500, 2300, 2550 and 2450, we can compute the mean of
this sample:
2500 + 2300 + 2550 + +2450 9800
M= = = 2450.
4 4
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 6 / 59


Balance Scale

Arithmetic mean is said to be the point at which the distribution is in balance

(a) An asymmetric distribution (b) The distribution is not balanced


balanced on the tip of a triangle

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 7 / 59


Median

The median is the 50th percentile of a distribution.

To nd the median of a number of values,


1 order them,
2 nd the observation in the middle: the median of 5, 2, 7, 9, and 4 is 5. (Note that
if there is an even number of values, one takes the average of the middle two: the
median of 4, 6, 8, and 10 is 7.)

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 8 / 59


Mode

The mode is the most frequent value in a distribution: the mode of 3, 4, 4, 5, 5, 5, 8 is 5.

Note that the mode may be very dierent from the mean and the median.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 9 / 59


Mean, Median and Mode

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 10 / 59


Mean, Median and Mode

Example
Let X = {2500, 2700, 2400, 2300, 2550, 2650, 2750, 2450, 2600, 2400}
Then

Mean: 2530

Median: 2525

Mode: 2400

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 10 / 59


Other central tendencies

The mean, median, and mode are by far the most commonly used measures of central
tendency

But they are by no means the only measures

We consider here tree additional measures of central tendency: the trimean, the geometric
mean, and the trimmed mean.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 11 / 59


Trimean

The trimean is a robust measure of central tendency;

it is a weighted average of the 25th, 50th, and 75th percentiles.

The trimean is computed as follows:

Q1 + 2Q2 + Q3
TM =
4

Example
Let X = {2500, 2700, 2400, 2300, 2550, 2650, 2750, 2450, 2600, 2400}
Then
2412.5 + 2 · 2525 + 2637.5 10100
TM = = = 2525
4 4

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 12 / 59


Trimean

The trimean is a robust measure of central tendency;

it is a weighted average of the 25th, 50th, and 75th percentiles.

The trimean is computed as follows:

Q1 + 2Q2 + Q3
TM =
4

Example
Let X = {2500, 2700, 2400, 2300, 2550, 2650, 2750, 2450, 2600, 2400}
Then
2412.5 + 2 · 2525 + 2637.5 10100
TM = = = 2525
4 4

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 12 / 59


Geometric mean
The geometric mean is dened as the n-th root of the product of n numbers,

Formally,
1/n √
n x ·...·x
G (x1 , . . . , xn ) = ∏ xi = 1 n

Example
Let X = {4, 1, 1/32}
Then
G (x) = 3 4 · 1 · 1/32 = 3 1/8 = 1/2
p p

Geometric mean can be computed as arithmetic mean from the data in logarithmic scale,
and then using the exponentiation to return the computation to the original scale:
" #
1/n n
1
∏ xi = exp ∑
n i=
xi
1
 
1
log2 4 + log2 1 + log2 2−5
For example, 
G (4, 1, 1/32) = exp 2, =
3

 
1
= exp 2, (2 + 0 − 5) =
3

1
= exp (2, −1) = 2−1 =
ioc.pdf

2
margarita.spitsakova@taltech.ee ICY0006: Lecture 2 13 / 59
Geometric mean
The geometric mean is dened as the n-th root of the product of n numbers,

Formally,
1/n √
n x ·...·x
G (x1 , . . . , xn ) = ∏ xi = 1 n

Example
Let X = {4, 1, 1/32}
Then
G (x) = 3 4 · 1 · 1/32 = 3 1/8 = 1/2
p p

Geometric mean can be computed as arithmetic mean from the data in logarithmic scale,
and then using the exponentiation to return the computation to the original scale:
" #
1/n n
1
∏ xi = exp ∑
n i=
xi
1
 
1
log2 4 + log2 1 + log2 2−5
For example, 
G (4, 1, 1/32) = exp 2, =
3

 
1
= exp 2, (2 + 0 − 5) =
3

1
= exp (2, −1) = 2−1 =
ioc.pdf

2
margarita.spitsakova@taltech.ee ICY0006: Lecture 2 13 / 59
Geometric mean
The geometric mean is dened as the n-th root of the product of n numbers,

Formally,
1/n √
n x ·...·x
G (x1 , . . . , xn ) = ∏ xi = 1 n

Example
Let X = {4, 1, 1/32}
Then
G (x) = 3 4 · 1 · 1/32 = 3 1/8 = 1/2
p p

Geometric mean can be computed as arithmetic mean from the data in logarithmic scale,
and then using the exponentiation to return the computation to the original scale:
" #
1/n n
1
∏ xi = exp ∑
n i=
xi
1
 
1
log2 4 + log2 1 + log2 2−5
For example, 
G (4, 1, 1/32) = exp 2, =
3

 
1
= exp 2, (2 + 0 − 5) =
3

1
= exp (2, −1) = 2−1 =
ioc.pdf

2
margarita.spitsakova@taltech.ee ICY0006: Lecture 2 13 / 59
Trimmed mean
A truncated mean or trimmed mean involves the calculation of the mean after discarding
given parts of a probability distribution at the high and low end

typically discarding an equal amount of both.

Terminology:
A mean trimmed 10% is a mean computed with 10% of the scores trimmed o: 5% from
the bottom and 5% from the top.
A mean trimmed 50% is computed by trimming the upper 25% of the scores and the
lower 25% of the scores and computing the mean of the remaining scores.

Example
Consider the data set consisting of:

{92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, −40, 101, 86, 85, 15, 89, 89, 28, −5, 41}
(n = 20, mean = 101.5)

The 5-th percentile (-6.75) lies between -40 and -5, while the 95-th percentile (148.6) lies
between 101 and 1053 (values shown in bold). Then, a 10% trimmed mean would result in the
following:

{92, 19, 101, 58, 91, 26, 78, 10, 13, 101, 86, 85, 15, 89, 89, 28, −5, 41} (n = 18, mean = 56.5ioc.pdf
)

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 14 / 59


Trimmed mean
A truncated mean or trimmed mean involves the calculation of the mean after discarding
given parts of a probability distribution at the high and low end

typically discarding an equal amount of both.

Terminology:
A mean trimmed 10% is a mean computed with 10% of the scores trimmed o: 5% from
the bottom and 5% from the top.
A mean trimmed 50% is computed by trimming the upper 25% of the scores and the
lower 25% of the scores and computing the mean of the remaining scores.

Example
Consider the data set consisting of:

{92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, −40, 101, 86, 85, 15, 89, 89, 28, −5, 41}
(n = 20, mean = 101.5)

The 5-th percentile (-6.75) lies between -40 and -5, while the 95-th percentile (148.6) lies
between 101 and 1053 (values shown in bold). Then, a 10% trimmed mean would result in the
following:

{92, 19, 101, 58, 91, 26, 78, 10, 13, 101, 86, 85, 15, 89, 89, 28, −5, 41} (n = 18, mean = 56.5ioc.pdf
)

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 14 / 59


Comparing Measures of Central Tendency
For symmetric distributions, the mean, median, trimean, and trimmed mean are equal, as
is the mode except in bimodal distributions.

Dierences among the measures occur with skewed distributions.

For example. Let the distribution of 642 scores of a test is as follows

(a) Distribution (b) Measures of central


tendency
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 15 / 59


Comparing Measures of Central Tendency (2)
Example 2. The salaries of major league baseball players (in thousands of dollars).

(a) Distribution (b) Measures of central


tendency

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 16 / 59


Next section
1 Central Tendency
2 Variability
3 Shape
4 Linear Transformations
5 Logarithms
6 Eects of Linear Transformations
7 Graphing Quantitative Variables
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 17 / 59


What is Variability?

Variability
... refers to how spread out a group of scores is.

The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 18 / 59


What is Variability?

Variability
... refers to how spread out a group of scores is.

The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 18 / 59


What is Variability?

Variability
... refers to how spread out a group of scores is.

The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 18 / 59


What is Variability?
Variability
... refers to how spread out a group of scores is.

The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is.

Measures of variability:
There are four frequently used measures of variability:

range

interquartile range

variance

standard deviation

But also

Absolute deviation from central tendencies

Squared deviation from central tendencies

etc.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 18 / 59


Example: relative variability of two distributions

Bar charts of two quizzes:

(a) Quiz 1 (b) Quiz 2

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 19 / 59


Range

Range
... is simply the highest score minus the lowest score.

Examples: On Quiz 1, the lowest score is 5 and the highest score is 9. Therefore,
the range is 4.
The range on Quiz 2 was larger: the lowest score was 4 and the highest
score was 10. Therefore the range is 6.
The range of the group of numbers 10, 2, 5, 6, 7, 3, 4 is 10 − 2 =8
The range of 10 numbers 99, 45, 23, 67, 45, 91, 82, 78, 62, 51 is 99 − 23 = 76

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 20 / 59


Interquartile Range

Range
The interquartile range (IQR) is the range of the middle 50% of the scores in a
distribution.

It is computed as follows:
IQR = Q3 − Q1

Using terminology of box plots, the interquartile range is referred to as the H-spread.

Examples: For Quiz 1, Q3 = 8 and Q1 = 6. The interquartile range is therefore 2.


For Quiz 2, which has greater spread, Q3 = 9, and Q1 = 5, and the
interquartile range is 4.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 21 / 59


Variance

Variability can also be dened in terms of how close the scores in the distribution are to
the middle of the distribution.

Using the mean as the measure of the middle of the distribution, the variance is dened as
the average squared dierence of the scores from the mean.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 22 / 59


Variance
Variability can also be dened in terms of how close the scores in the distribution are to
the middle of the distribution.

Using the mean as the measure of the middle of the distribution, the variance is dened as
average squared dierence of the scores from the mean.
the

Example: the data from Quiz 1 (the mean is 7.0):

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 22 / 59


Variance
Variability can also be dened in terms of how close the scores in the distribution are to
the middle of the distribution.

Using the mean as the measure of the middle of the distribution, the variance is dened as
average squared dierence of the scores from the mean.
the

Example: the data from Quiz 1 (the mean is 7.0):

Averages:
Scores: 7.0 Absolute deviations: 1.0
Deviations from Mean: 0.0 Squared deviations: 1.5
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 22 / 59


Variance of a population

Suppose a population of numbers X consists of n members, and the mean of these


numbers is µ. The variance of these numbers is the average squared deviation from the
mean µ. The variance is typically designated as σ 2.
Formally,
1
σ2 = (X − µ)2
n∑

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 23 / 59


Variance of a population
Suppose a population of numbers X consists of n members, and the mean of these
numbers is µ. The variance of these numbers is the average squared deviation from the
mean µ. The variance is typically designated as σ 2.
Formally,
1
σ2 = (X − µ)2
n∑

If you are computing variances with a hand calculator. the following formula is easier to
use:
(∑ X )2
!
1
σ2 = ∑X2 −
n n

For Quiz 1 example:


I ∑ X 2 = 1010

(∑ X )2 2
I
n = 140
20 = 980

I σ 2 = (1010 − 980)/20 = 30/20 = 1.5

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 23 / 59


Variance of a population
Suppose a population of numbers X consists of n members, and the mean of these
numbers is µ. The variance of these numbers is the average squared deviation from the
mean µ. The variance is typically designated as σ 2.
Formally,
1
σ2 = (X − µ)2
n∑

Property
2
!
1 1 (∑ X )
(X − µ)2 = X2 −
n∑ n ∑ n

Proof
 2
2= ∑X
∑(X − µ) ∑ X − =
n

2 1 2
= ∑X2 − X ∑X + 2 ∑ ∑X =
n∑ n
2 2 n 2 1 2
= ∑X2 − ∑X + 2 ∑X = ∑X2 − ∑X
n n n
ioc.pdf
m.o.t.t.

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 23 / 59


Variance of a sample

If the variance in a sample is used to estimate the variance in a population, then the
previous formula underestimates the variance and

the following formula should be used:

∑(X − M)2
s2 =
N −1

where s2 is the estimate of the variance and M is the sample mean, and N is the size of
the sample.

The use of the term N −1 is called Bessel's correction

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 24 / 59


Bessel's correction
To demonstrate the eect of the Bessel's correction
Let's recall that Quiz 1 has the mean µ = 7 and the variance of is σ 2 = 1.5
Suppose that we have the following sample of Quiz 1 scores:

The mean of this sample is M = 7.6


The variance of the sample computed without correction term is equal to 1.04.
The estimation the variance in the population accounting correction term is

∑(X − M)2 1.96 + 0.16 + 0.16 + 0.36 + 2.56 5.2


s2 = = = = 1.3
N −1 5−1 4

There is also the alternate formula to compute the estimate for the variance:

(∑ X )2
!
1
s2 = ∑X2 −
N −1 N ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 25 / 59


Bessel's correction
To demonstrate the eect of the Bessel's correction
Let's recall that Quiz 1 has the mean µ = 7 and the variance of is σ 2 = 1.5
Suppose that we have the following sample of Quiz 1 scores:

The mean of this sample is M = 7.6


The variance of the sample computed without correction term is equal to 1.04.
The estimation the variance in the population accounting correction term is

∑(X − M)2 1.96 + 0.16 + 0.16 + 0.36 + 2.56 5.2


s2 = = = = 1.3
N −1 5−1 4

There is also the alternate formula to compute the estimate for the variance:

(∑ X )2
!
1
s2 = ∑X2 −
N −1 N ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 25 / 59


Standard Deviation
The standard deviation is simply the square root of the variance.
This makes the standard deviations of the Quiz 1 equal to 1.225.
The standard deviation is an especially useful measure of variability when the distribution
is normal or approximately normal because the proportion of the distribution within a
given number of standard deviations from the mean can be calculated.
For example, 68% of the distribution is within one standard deviation of the mean and
approximately 95% of the distribution is within two standard deviations of the mean.
Therefore, if you had a normal distribution with a mean of 50 and a standard deviation of
10, then 68% of the distribution would be between 50 - 10 = 40 and 50 +10 =60.

Denitions:
The sample standard deviation of a set of N sample data is the number s given by the formula

(∑ X )2
s s
∑(X − M)2 ∑X2 − N
s= =
N −1 N −1
.

The population standard deviation of a set of n population data is the number σ given by the
formula
∑(X − µ)2
r
σ=
n
ioc.pdf
.

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 26 / 59


Standard Deviation
The standard deviation is simply the square root of the variance.
This makes the standard deviations of the Quiz 1 equal to 1.225.
The standard deviation is an especially useful measure of variability when the distribution
is normal or approximately normal because the proportion of the distribution within a
given number of standard deviations from the mean can be calculated.
For example, 68% of the distribution is within one standard deviation of the mean and
approximately 95% of the distribution is within two standard deviations of the mean.
Therefore, if you had a normal distribution with a mean of 50 and a standard deviation of
10, then 68% of the distribution would be between 50 - 10 = 40 and 50 +10 =60.

Denitions:
The sample standard deviation of a set of N sample data is the number s given by the formula

(∑ X )2
s s
∑(X − M)2 ∑X2 − N
s= =
N −1 N −1
.

The population standard deviation of a set of n population data is the number σ given by the
formula
∑(X − µ)2
r
σ=
n
ioc.pdf
.

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 26 / 59


Variance Sum Law I

If X and Y are independent (uncorrelated) variables, then

σX2 ±Y = σX2 + σY2

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 27 / 59


Next section
1 Central Tendency
2 Variability
3 Shape
4 Linear Transformations
5 Logarithms
6 Eects of Linear Transformations
7 Graphing Quantitative Variables
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 28 / 59


Skewness

Skewness is a measure of the asymmetry of a probability distribution.

A distribution is skewed if one tail extends out further than the other.

A distribution has a positive skew (is skewed to the right) if the tail to the right is longer.

It has a negative skew (skewed to the left) if the tail to the left is longer.

There are two measures of skewness:

Pearson's measure:
3(Mean − Median)
σ
Third moment:
(X − µ)3
∑ σ3

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 29 / 59


Skewness (2)
Example. The baseball players' salaries data (in thousands of dollars).

(a) Distribution (b) Measures of central


tendency

The standard deviation of this data set is 133.

The Pearson's measure of skew 3(122.4 − 75)/133 = 1.07.


ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 30 / 59


Skewness (3)

Comparison of mean, median and mode of two log-normal distributions with dierent skewness.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 31 / 59


Kurtosis

Kurtosis is the sharpness of the peak of a frequency-distribution curve.

Kurtosis measure shows how fat or thin the tails of a distribution are relative to a normal
distribution.

It is commonly dened as:


(X − µ)4
∑ −3
σ4
The value 3 is subtracted to dene no kurtosis (0) as the kurtosis of a normal
distribution. Otherwise, a normal distribution would have a kurtosis of 3.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 32 / 59


Next section
1 Central Tendency
2 Variability
3 Shape
4 Linear Transformations
5 Logarithms
6 Eects of Linear Transformations
7 Graphing Quantitative Variables
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 33 / 59


Linear Transformations

Sometimes, it is necessary to transform data from one measurement scale to another.

Inches Metres City Degrees Centigrade Degrees Fahrenheit


60 1.52 Paris 12 53.6
75 1.91 New York 16 60.8
66 1.68 Moscow -6 21.2
69 1.75 Delhi 26 78.8
71 1.80 Tallinn 1 33.8

1 meter = 39.37007874 inches F = 1.8C + 32


1 inch = 0.0254 meters C = 0.5556F − 17.778

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 34 / 59


Linear Transformations (2)
A linear transformation is any transformation of a variable that can be achieved by
multiplying it by a constant, and then adding a second constant.

If Y is the transformed value of X, then Y = aX + b .

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 35 / 59


Some Nonlinear Transformations

Square root
Power
Logarithmic
Cosine

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 36 / 59


Next section
1 Central Tendency
2 Variability
3 Shape
4 Linear Transformations
5 Logarithms
6 Eects of Linear Transformations
7 Graphing Quantitative Variables
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 37 / 59


Logarithms
The log transformation reduces positive skew.

This can make the data more interpretable.

Basics of Logarithms (Logs)


Logs are the opposite of exponents:

10
2 = 100 log10 100 = 2

A series of numbers that increase proportionally will increase in equal amounts when
converted to logs.

log10 100 =2.000 log10 150 =2.176


log10 100 =2.301 log10 300 =2.477
ioc.pdf

Dierence: 0.301 Dierence: 0.301

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 38 / 59


Arithmetic Operations

Rules for logs of products and quotients:

log(AB) = log(A) + log(B)


log(A/B) = log(A) − log(B)

For example
log(10 · 100) = log10 10 + log10 100 = 1 + 2 = 3
log(100/10) = log10 100 − log10 10 = 2 − 1 = 1

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 39 / 59


Linear vs Logarithmic

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 40 / 59


Next section
1 Central Tendency
2 Variability
3 Shape
4 Linear Transformations
5 Logarithms
6 Eects of Linear Transformations
7 Graphing Quantitative Variables
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 41 / 59


Eects of Linear Transformations

If a variable X has a mean of µ , a standard deviation of σ , and a variance of σ 2, then a


new variable Y created using the linear transformation

Y = bX + A

will have

I a mean of bµ + A,
I a standard deviation of bσ , and
I a variance of b2 σ 2 .

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 42 / 59


Fahrenheit degrees vs Centigrade
To transform the degrees Fahrenheit to degrees Centigrade, the formula

C = 0.55556F − 17.7778

is used.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 43 / 59


Next section
1 Central Tendency
2 Variability
3 Shape
4 Linear Transformations
5 Logarithms
6 Eects of Linear Transformations
7 Graphing Quantitative Variables
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 44 / 59


Stem and Leaf Displays
The sorted set of data values:
44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106

The leaf contains the last digit of the number and the stem contains all of the other digits.

In the case of very large numbers, the data values may be rounded to a particular place
value.

key: 6|3=63
leaf unit: 1.0
stem unit: 10.0

4 |4 6 7 9

5 |
6 |3 4 6 8 8

7 |2 2 5 6

8 |1 4 8

9 |
10 |6 ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 45 / 59


Histograms
Histograms are based on grouped frequency distributions

Example: scores of a psychology test (642 scores divided into 13 groups):

Suggested bin widths (numbers of class intervals):

Sturges' rule: number of intervals = 1 + log2 (N); gives 11 classes for 1000 observations.

Rice rule: twice the cube root of the number of observations; gives 20 classes for 1000
ioc.pdf

observations.

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 46 / 59


Histograms
Histograms are based on grouped frequency distributions

Example: scores of a psychology test (642 scores divided into 13 groups):

Suggested bin widths (numbers of class intervals):

Sturges' rule: number of intervals = 1 + log2 (N); gives 11 classes for 1000 observations.

Rice rule: twice the cube root of the number of observations; gives 20 classes for 1000
ioc.pdf

observations.

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 46 / 59


Frequency Polygons
Similar to histograms: a point is placed in the middle of each class interval at the height
corresponding to its frequency and the points are connected by the line.
One class interval below the lowest value in your data and one above the highest value
should be included.
Example: frequency polygon for the psychology test scores:

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 47 / 59


Frequency Polygons (2)
Cumulative frequency polygon for the psychology test scores:

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 48 / 59


Frequency Polygons (3)
Comparison of distributions by overlaying the frequency polygons drawn for dierent data
sets.

Example: a task in which the goal is to move a computer cursor to a target on the screen
as fast as possible. There are two data sets for targets of dierent sizes:

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 49 / 59


Frequency Polygons (4)
The same example: overlaid cumulative frequency polygons:

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 50 / 59


Box plots
There are dierent versions of box plots.

To introduce box plots, the following example is used:

An experiment with 16 men and 31 women to name as quickly as possible all the colours
of the 30 coloured rectangles shown.

Their times (in seconds) were recorded and in order to compare, were plotted for each
gender.

The data for the women in our sample are as follows:

14 17 18 19 20 21 29
15 17 18 19 20 22
16 17 18 19 20 23
16 17 18 20 20 24
17 18 18 20 21 24

For these data, the 25th percentile is 17, the 50th percentile is 19, and the 75th percentile
is 20.

For the men (whose data are not shown), the 25th percentile is 19, the 50th percentile is
22.5, and the 75th percentile is 25.5.
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 51 / 59


Box plots (2)

For the women, the 25th percentile is 17, the 50th percentile is 19, and the 75th
percentile is 20.

For the men (whose data are not shown), the 25th percentile is 19, the 50th percentile is
22.5, and the 75th percentile is 25.5.
ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 52 / 59


Box plots (3)

The terminology used for proceeding further (using data for women as an example):

Name Meaning Value


Upper Hinge 75th Percentile 20
Lower Hinge 25th Percentile 17
H-Spread Upper Hinge - Lower Hinge 3
Step 1.5 x H-Spread 4.5
Upper Inner Fence Upper Hinge + 1 Step 24.5
Lower Inner Fence Lower Hinge - 1 Step 12.5
Upper Outer Fence Upper Hinge + 2 Step 29
Lower Outer Fence Lower Hinge - 2 Step 8
Upper Adjacent Largest value below Upper Inner Fence 24
Lower Adjacent Smallest value above Lower Inner Fence 14
A value beyond an Inner Fence but not
Outside Value 29
beyond an Outer Fence
Far Out Value A value beyond an Outer Fence None

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 53 / 59


Box plots (4)
The box plot for the women's data with detailed labels:

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 54 / 59


Box plots (5)
The completed box plots:

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 55 / 59


Box plots (6)
Statistical analysis programs may oer options on how box plots are created. For example, the
following box plots dier from the previous box plots in several ways:
1 It does not mark outliers.
2 The means are indicated by green lines rather than plus signs.
3 The mean of all scores is indicated by a grey line.
4 Individual scores are represented by dots. Subjects with the same score are jittered (the
exact horizontal position of a dot is determined randomly).

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 56 / 59


Bar Charts

Percent increase in three stock indexes from May 24th 2000 to May 24th 2001.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 57 / 59


Line Graphs
A line graph of the percent change in ve components of the CPI (Consumer Price Index) over
time.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 58 / 59


Dot Plots

The number of people playing various card games on a Sunday and on a Wednesday.

ioc.pdf

margarita.spitsakova@taltech.ee ICY0006: Lecture 2 59 / 59

You might also like