You are on page 1of 26

ECON 225: Data and

Statistics for Economics


Lecture 4
Review
How would you describe the shape of this distribution?
Price paid for last haircut

70
60
50
frequency

40
30
20
10
0
25 50 75 100 125 150 175 200 225 250 275 300
$

right skewed (positive skewed)


Plan for today
• Stemplots
• Time plots
• Mean and median
• Excel: time plot
histogram

Stemplots stemplot

• Stemplots are like quick


and dirty histograms that
can easily be done by
hand
oriented horizontally
• Rarely found in scientific
publications
• Only works well for small
datasets oriented vertically
• Can see the data values
How to make a stemplot
first digit last digit
Example: 9, 9, 22, 32, 33, 39, 39,
42, 49, 52, 58, 70
1. Separate each observation into a you can’t skip
stem (everything except last digit) and STEM
a leaf (the last digit)
2. Write stems in a vertical column with
smallest value at top
3. Draw a vertical line to the right
4. Write each leaf in the row to the right
of its stem, in increasing order
Time plots
• A time plot of a variable plots each observation against the time at
which it was measured
• In a time plot, time is always on the horizontal axis
• E.g. Retail price of fresh oranges

character:
- have times (x)
- have variables on (y)
Describing time plots
A trend is a rise or fall that persists over
time, despite small irregularities.
NOTE: the line represents the overall increasing

A pattern that repeats itself at regular


intervals is called seasonal variation.
NOTE: we can make histogram from this data,
but we can’t see the evolution over the time
series
Outliers NOTE: Outliers will tell you the mistakes in the observation

• Outliers are observations that lie outside the overall pattern


of a distribution. Always look for outliers and try to explain
them

mode: 12-14
A large gap in roughly symmetric / bell-shaped
the distribution
is typically a sign of
an outlier.
Scales matter
• How you stretch the axes and choose your scales can give a different
impression
• Always look at the scales

they have different scales for time on x


Example 1: What is this graph called?

a) Pareto chart
b) Histogram
c) Bar chart
d) Pie chart
e) Time plot

NOTE: pareto chart


doesn’t have to have
the line
Many categorical variables
Example 2: Stemplot
This stemplot is of the days required to Stem Leaf
complete the procedures to start a 0 2455556678
business.
1 01236799
Note: The stems represent (days x 10)
and the leaves represent (days x 1) 2 45
3 28
How would you describe the shape of this 4 9
distribution?
5 3
a) right skewed
b) left skewed 53 days is needed to start the busoness

c) symmetric
Describing Distributions with Numbers
1. Measures of central tendency: mean, median
2. Measures of spread: quartiles, standard deviations
3. Five-number summary and boxplots
Measure of central tendency: the mean
• To calculate the average, or
mean, add all values, then divide
by the number of individuals
• It is the “center of mass” of a
distribution
• Sum is 391
• Divided by 24 observations =
16.292
The mean of a variable in a sample

What is the mean salary for Black females?


Example 3: A change in the mean
The mean age of five people in a room is 30 years.
One of the people, whose age is 50 years, leaves the room.
Will the mean age of the remaining four people in the room
increase or decrease? decrease
Median
• The median is the midpoint of a distribution – the number such
that half of the observations are smaller and half are larger
Example 4: Median
Consider the following sample data:
20, 35, 10, 20, 50, 30
What is the MEDIAN of the sample?
10, 20, 20, 30, 35, 50

Q2 = (20+30) / 2
= 25

Steps:
1. Arrange the observations in ascending order
2. Because the number of observations is even, the median is the average of the two middle
values
Median vs Mean
• The mean and median are close together if the distribution is
symmetrical

should be close each other


Mean vs Median
• The mean and the median are farther apart the more skewed
the distribution
• The median is resistant to skew and outliers, while the mean is
not

mean and median will getting apart if the observation is getting skewer
Mean and median of a distribution with
outliers

The mean is pulled to the right a lot by the outliers (the ones dying after 13 years),
from 3.4 to 4.2. Meanwhile, the median would only increase from 3.4 to 3.6 with
inclusion of the outlier.
Summary of mean vs. median
• Mean is not resistant to outliers or skewness
• Any extreme value will have a big effect on the mean
• Median is very resistant to outliers and skewness
• Both mean and median are useful measures of central tendency
• Report both and let the reader decide
Example 3
• A realtor selling homes calculates two measures of central tendency
for the price of a home in her area. She gets $127,312 and $105,100
• One is the mean, and one is the median. Can you guess which is
which?

If the data contains some higher-priced homes, then i can suggest:


the mean is $127,312
the median is $105,100
Example 3 solution
• The variable for house price is likely to be skewed to the right
• There are likely to be a few very high-priced homes in the area
compared to the typical home. They drive the mean upward.
• Mean = $127.312, Median = $105,100
• What measure is most appropriate? show both mean and median
• Show both and let the client decide
Example 4
• Middletown is considering imposing an income tax on citizens. City
Hall wants a numerical summary of its citizens’ income to estimate
the total tax base
• Should we choose the mean or the median to present to City Hall?
mean
because rich families have really high income tax,
so we will find outlier on our observation. that’s
why we use mean that is sensitive to the outlier
Example 5
• In a study of standard of living of typical families in Middletown, a
sociologist makes a numerical summary of family income in that city.
• Would the mean or the median be more appropriate?
median
because we should know about the living standard of majority of families.
Including the income of extremely high standard families will give wrong
information about average living standard. So here median will convey accurate
result.
For next class
• Read: Alwan 1.4, Krauth chapter 6
• For practice: Alwan 1.48, 1.49

You might also like