L02 ECO220 Print

Histograms, Central Tendency,
and Variability for Describing a

Single Interval Variable
Lecture 2
Reading: Sections 5.1 – 5.6
Review: Data Types
The Economist, September 6, 2014

2
Lecture 2 Slides, ECO220Y1Y, 1

Histogram
• Histogram graphically n = 174 countries
describes how a single .4
.3
Fraction
variable containing
interval data is .2
distributed .1
• Range of data divided 0
0 20 40 60
into non-overlapping Inflation Rate, 2011
and equal width classes
How many bins? Width of bins?
(bins) that cover range
of values
http://data.worldbank.org/indicator/FP.CPI.TOTL.ZG 3
n = 174 countries n = 174 countries

80 .4
Frequency
60 .3
Fraction
40 .2
20 .1
0 0
0 20 40 60 0 20 40 60
Inflation Rate, 2011 Inflation Rate, 2011
n = 174 countries Frequency histogram: Bar height

.1 number of observations in bin
.08
Density
.06 Relative frequency histogram:

.04 Bar height fraction of obs. in bin
.02 Density histogram: Bar area
0
0 20 40 60 measures the fraction of
Inflation Rate, 2011 observations in bin
4

n = 34 OECD countries
.4
Can this histogram tell
us the exact number
.3 of countries with
inflation between 2
Density
and 4 percent?
.2 Is it definitely above
40%?
.1
0
0 2 4 6
Inflation Rate, 2011
n = 34 OECD countries n = 34 OECD countries

.3 .5
.4
Density
Density
.2 .3
.1 .2
.1
0 0
0 2 4 6 0 2 4 6
Inflation Rate, 2011 Inflation Rate, 2011
n = 34 OECD countries Number of bins changes the

1.5 appearance of the histogram
Density
1
One suggestion: # of bins ≈ 𝑛
.5
0 OECD inflation: 34 = 5.83 and
0 2 4 6 STATA picked 5
6

Shape of Things
• Histogram gives • Bell/Normal/Gaussian
overview of a variable • Positively skewed: long
with a single picture tail to right (aka right
– Can make informal skewed)
inferences about the
shape of population
• Negatively skewed: long
tail to left (aka left
• Symmetric: If draw an
skewed)
imaginary line at center,
have mirror image on • Modality: # major peaks
each side Most distributions are
unimodal: one major peak
7
Four Perfectly Symmetric Histograms

.3
.1
Density
Density
.1 .2
.05 0
10 20 30 40 1 2 3 4 5
0 .05 .1 .15 .2 .25
0 .1 .2 .3 .4
Density
Density
2 3 4 5 6 0 2 4 6 8

Two Perfectly Bell Shaped Histograms
.4
.4
.2 .3
.2 .3
Density
Density
.1
.1
0
0
-4 -2 0 2 4 10 15 20 25 30
But histograms of real data

will never be perfect: we
always mean approximately
For example, we’d describe

the histogram to the right as
Normal (Bell) shaped
9
Four Positively Skewed Histograms

0 .05 .1 .15 .2
.3
Density
Density
.1 .2 0
0 5 10 15 -6 -4 -2 0 2 4
0 .01 .02 .03 .04
0 .2 .4 .6 .8
Density
Density
-11 -10 -9 -8 -7 -6 0 50 100 150 200 250
Alternatively, these are right skewed 10

Four Negatively Skewed Histograms
0 .05 .1 .15 .2
.3
Density
Density
.1 .2 0
5 10 15 20 -15 -10 -5
0 .01 .02 .03 .04

0 .2 .4 .6 .8
Density
Density
-15 -14 -13 -12 -11 -10 250 300 350 400 450 500
Alternatively, these are left skewed 11
Percent of Population Living Percent of Population Living

Below International Poverty Line Above International Poverty Line
n = 157 countries n = 157 countries
in 2017 (or most recent year) in 2017 (or most recent year)
.6 .6
Fraction
Fraction
.4 .4
.2 .2
0 0
0 20 40 60 80 100 0 20 40 60 80 100
% Below Poverty Line % Above Poverty Line
Data retrieved “Proportion of population below the international poverty line of

US$1.90 per day (%)” from the World Health Organization on June 6, 2022:
https://www.who.int/data/gho/data/indicators/indicator-details/GHO/proportion-of-
population-below-the-international-poverty-line-of-us$1-90-per-day-(-)
In Canada in 2013 (the most recent year of data), 0.5% of the

population lives below the international poverty line.
In Malawi in 2016 (the most recent year of data), 70.3% of the
population lives below the international poverty line.
12

Four Bimodal Histograms
0 .05 .1 .15 .2 .25
0 .02 .04 .06 .08

Density
Density
0 2 4 6 8 0 10 20 30
0 .05 .1 .15 .2 .25
0 .05 .1 .15 .2
Density
Density
0 2 4 6 8 10 0 5 10 15
13
Figure 3: Violation Scores at Initial Inspection

Source: Farronato and Zervas (2022)
“Consumer Reviews and Regulation:
Evidence from NYC Restaurants”
https://www.nber.org/papers/w29715
Notes: This shows the distribution of violation scores that restaurants obtain during
the initial inspection. The vertical lines correspond to the score thresholds that
would assign A-B-C letter grades. Scores of 13 or less automatically give an A-grade,
while higher scores imply that a restaurant will be reinspected within a few weeks.
For the purpose of this plot, inspection scores are capped at 50. 14

Ages of first-time
mothers in the
U.S. in 1980
Ages of first-time
mothers in the
U.S. in 2016
The New York Times, August 4, 2018,

“The Age That Women Have Babies:
How a Gap Divides America” 15
Samples vs. Populations

• Sample is a random subset of population
– Sampling noise: Chance differences between
population and a random sample
• Driven by the sample size, not sample size relative to
the population size, which is assumed infinite (pp. 30 –
31, “The Sample Size is What Matters”)
– Informal inference: consider sample size (𝑛)
• Never see the perfect forms (Plato): statements about
shape always approximate
• “Nearly Normal Condition”
16

Population, N = 10,000,000 Sample 1; n = 10
.025 .05
.02 .04
Density
Density
.015 .03
.01 .02
.005 .01 Sample 1 is a LIE!!
0 0
50 100 150 80 90 100 110
IQ IQ
Sample 2; n = 10 Sample 3; n = 10
.05 .02
How many
.04 samples would .015
Density
Density
.03 you have in real Why aren’t these
life? .01 samples perfectly
.02
.005 Bell shaped?
.01
0 0
80 90 100 110 120 60 80 100 120
IQ IQ
17

.025
.025
.02 .02
Density
Density
.015 .015
.01 .01
Why are there more
.005 .005 bins than last slide?
0 0
50 100 150 60 80 100 120 140
IQ IQ
Sample 2; n = 30 Sample 3; n = 30
.04 .03
.03
.02
Density
Density
.02
.01
.01
0 0
70 80 90 100 110 120 60 70 80 90 100 110
IQ IQ
18

.03
.025
.02
Density .02
Density
.015
.01
.01
.005
0 0
50 100 150 60 80 100 120 140
IQ IQ
Sample 2; n = 1000 Sample 3; n = 1000

.03 .03
.02 .02
Density
Density
.01 .01
0 0
60 80 100 120 140 50 100 150
IQ IQ
19
What to Conclude About Shape?

n: 30 n: 500
.25 .8
.2
Density
Density
.6
.15
.1 .4
.05 .2
0 0
-4 -2 0 2 4 10 11 12 13 14
X Y
Is the graph on the left symmetric? Bell shaped?
Is the graph on the right symmetric? Bell shaped? Bi-modal?
20

Hsieh and Olken (2014) JEP “The Missing ‘Missing Middle’” Summer
2014 http://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.28.3.89
21
“There is a clear bimodality in the distribution of value-added/capital for the

large firms. However, the capital questionnaire for large firms was ambiguous
as to whether the results were to be entered in thousands or millions of
Rupiah. Our best guess is that approximately half the firms used thousands
and half used millions.” http://www.aeaweb.org/jep/app/2803/28030089_app.pdf
If the real distribution of value-added/capital for large firms is Normal, is the
bimodal shape caused by sampling error or non-sampling error?
22

Summary Statistics
• Statistics (i.e. summary statistics) give a
concise idea of what data “look like”
– For a single variable, statistics can give numeric
measures of:
• Central tendency: mean and median
• Variability: range, variance, standard deviation,
coefficient of variation, IQR
• Relative standing: percentiles
– For two variables, also measure relationship
23
Mean and Median

• Population mean, a • Median is the middle
∑ obs. after sorting
parameter: 𝜇 =
– if even # of obs., average
• Sample mean, a 2 middle ones
∑
statistic: 𝑋 = n = 34 OECD countries
mean = 3.1, median = 3.3
• Which is subject to .5
.4
Fraction
sampling error? .3
.2
.1
0
0 2 4 6
24

Two Symmetric Distributions:
Normal and Uniform
Population Sample, n=49
mu=100.0, med=100.0 X-bar=103.9, med=103.0 Why does 𝑋,
.025 .04 which is a
.02
Density
.03 statistic, differ
Density
.015 .02
.01 from 𝜇, which
.005 .01 is a
0 0 parameter?
50 100 150 60 80 100 120 140
IQ IQ
Population Sample, n=41 Why does the

mu=50.0, med=50.0 X-bar=55.3, med=62.1 population
.01 .015
.008 median
Density
Density
.006 .01 exactly equal
.004 .005 𝜇 in both
.002
0 0 distributions?
0 20 40 60 80 100 0 20 40 60 80 100
Book Rating Book Rating
25
n = 174 countries
mean = 6.6, median = 5.0
.4
.3 Why is the mean greater

than the median?
Fraction
.2
.1
0
0 20 40 60
26

Figure 2. Distribution of Local Business Tax Changes Fuest, C., A. Peichl, and S. Siegloch.
2018. “Do Higher Corporate Taxes
Reduce Wages? Micro Evidence
from Germany.” American Economic
Review, 108 (2): 393-418.
DOI:10.1257/aer.20130570
What is the variable?

What is the unit of
observation?
Notes: The histogram shows the distribution of changes in the local
business tax rate. The sample consists of 17,999 tax rate changes in
10,001 municipalities. We omit 0.1 percent of the observations with
absolute changes larger than 5 percentage points for illustrative purposes. 27
Measures of Variability (Spread)

• Range: max – min
• Variance: n = 34 OECD countries
∑ 𝑥 −𝜇 min = -0.3, max = 6.5
𝜎 = var = 1.6, sd = 1.3
𝑁 .4
Density
∑ 𝑥 −𝑋 .3
𝑠 = .2
𝑛−1 .1
• Standard deviation: 𝑠 = 0
0 2 4 6
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 Inflation Rate, 2011
• Coefficient of variation For all 174 countries, is
(textbook) the range bigger or
smaller than 6.8?
28

Breaking Down Variance
• Numerator: “total sum ∑ 𝑥 −𝑋
of squares” (TSS) 𝑠 =
𝑛−1
– If all sampled countries
have 3% inflation (xi = 3
for all i), what would TSS
& s2 be? 𝑇𝑆𝑆 = 𝑥 −𝑋
• Denominator:  (“nu”)
– Only n – 1 free obs left
after calculate mean
Degrees of freedom:
• Units of variance?
𝜈 =𝑛−1
– How about s.d.?
29
Empirical Rule (Normal/Bell)

• If a random sample is drawn from a Normal
population then about:
– 68.3% of observations will lie within 1 s.d. of the
mean (i.e. between 𝑋 − 𝑠 and 𝑋 + 𝑠)
mean (i.e. between 𝑋 − 2𝑠 and 𝑋 + 2𝑠)
mean (i.e. between 𝑋 − 3𝑠 and 𝑋 + 3𝑠)
• “Empirical Rule” only applies if Normal
30

SAT Scores Distributions: Normal
• SAT score mean is:
– 1230 for students with
HH income > $200,000
– 970 for students with HH
income < $20,000
For the random sample (right):

about 68.3% of students have scores between 775.4 and 1173
about 95.4% of students have scores between 576.6 and 1371.8
about 99.7% of students have scores between 377.8 and 1570.6
Douglas Belkin, May 16, 2019, “SAT to Give Students ‘Adversity Score’ to Capture Social and
Economic Background,” The Wall Street Journal https://www.wsj.com/articles/sat-to-give-
students-adversity-score-to-capture-social-and-economic-background-11557999000 31
Histogram #1, n = 94 Histogram #2, n = 154

.005 .01
.004 .008
Density
Density
.003 .006
.002 .004
.001 .002
0 0
800 900 1000 1100 1200 800 900 1000 1100 1200
X X
Histogram #3, n = 298 Histogram #4, n = 521

.002 .015
.0015
Density
Density
.01
.001
.005
5.0e-04
0 0
500 1000 1500 900 950 1000 1050 1100
X X
Noticing Normality, can we approximate the s.d. of X in each? 32

Chebysheff’s Theorem
• At least 100*(1–1/k2)% of observations lie
within k s.d.’s of the mean for k>1
– At least 75% of obs. lie within 2 s.d. of mean
• 1 – 1/22 = 3/4
– At least 89% of obs. lie within 3 s.d. of mean
• 1 - 1/32 = 8/9
– Can be applied to all samples no matter how
population is distributed
– What about within one s.d.?
33
n = 185 countries
mean = 14955.1, sd = 16243.0
.5
.4
Fraction
.3
How to describe the shape of the
.2 distribution of this variable?
.1
0
0 20000 40000 60000 80000 100000
GDP per capita (PPP), 2012 est.
34

Recap
• Started to describe a single interval variable
– The histogram is a powerful visual summary tool
• Three types – frequency, relative frequency, and
density – but all give same big picture
• Describe shape with well-known terms, if appropriate
– Sometimes terms don’t work and sentences are needed
– Summary stats: mean, median, s.d., range, etc.
• For an important measure of variability – s.d. – the
Empirical Rule (special case) and Chebysheff’s Theorem
(general) help us get a grasp on the meaning of the s.d.
35

L02 ECO220 Print

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L02 ECO220 Print

Uploaded by

Copyright:

Available Formats

Histograms, Central Tendency,

and Variability for Describing a

Reading: Sections 5.1 – 5.6

Review: Data Types

The Economist, September 6, 2014

Lecture 2 Slides, ECO220Y1Y, 1

n = 174 countries n = 174 countries

n = 174 countries Frequency histogram: Bar height

.06 Relative frequency histogram:

Lecture 2 Slides, ECO220Y1Y, 2

n = 34 OECD countries n = 34 OECD countries

n = 34 OECD countries Number of bins changes the

Lecture 2 Slides, ECO220Y1Y, 3

Four Perfectly Symmetric Histograms

Lecture 2 Slides, ECO220Y1Y, 4

But histograms of real data

For example, we’d describe

Four Positively Skewed Histograms

-11 -10 -9 -8 -7 -6 0 50 100 150 200 250

Alternatively, these are right skewed 10

Lecture 2 Slides, ECO220Y1Y, 5

0 .01 .02 .03 .04

Alternatively, these are left skewed 11

Percent of Population Living Percent of Population Living

Data retrieved “Proportion of population below the international poverty line of

In Canada in 2013 (the most recent year of data), 0.5% of the

Lecture 2 Slides, ECO220Y1Y, 6

0 .05 .1 .15 .2 .25

0 .02 .04 .06 .08

Figure 3: Violation Scores at Initial Inspection

Lecture 2 Slides, ECO220Y1Y, 7

The New York Times, August 4, 2018,

Samples vs. Populations

Lecture 2 Slides, ECO220Y1Y, 8

Population, N = 10,000,000 Sample 1; n = 30

Lecture 2 Slides, ECO220Y1Y, 9

Sample 2; n = 1000 Sample 3; n = 1000

What to Conclude About Shape?

Is the graph on the left symmetric? Bell shaped?

Is the graph on the right symmetric? Bell shaped? Bi-modal?

Lecture 2 Slides, ECO220Y1Y, 10

“There is a clear bimodality in the distribution of value-added/capital for the

Lecture 2 Slides, ECO220Y1Y, 11

Mean and Median

Lecture 2 Slides, ECO220Y1Y, 12

.03 statistic, differ

Population Sample, n=41 Why does the

.3 Why is the mean greater

Lecture 2 Slides, ECO220Y1Y, 13

What is the variable?

Measures of Variability (Spread)

Lecture 2 Slides, ECO220Y1Y, 14

Empirical Rule (Normal/Bell)

Lecture 2 Slides, ECO220Y1Y, 15

For the random sample (right):

Histogram #1, n = 94 Histogram #2, n = 154

Histogram #3, n = 298 Histogram #4, n = 521

Lecture 2 Slides, ECO220Y1Y, 16

Lecture 2 Slides, ECO220Y1Y, 17

Lecture 2 Slides, ECO220Y1Y, 18

You might also like