You are on page 1of 25

ECON 225: Data and

Statistics for Economics


Lecture 3
Recap: Key terms
identifier

• Case or observation
• Identifier
• Categorical variable
• Ordinal = level (e.g. grades)
• Quantitative variable
• Discrete
• Continuous

Canadian unemployment rate over time

NOTE: the population is a discrete, it’s written in decimal because it’s on thousand
Recap: Types of datasets
one time period

• Cross-sectional dataset: same time period, varying units


• The identifier variable is not a time variable
• Time series dataset: varying time periods, same unit
• There is a time variable that can serve as the identifier
• Panel dataset: variation over both time periods and units
• An observation can only be identified by a combination of a time variable and
a unit-level variable
Recap: Collecting and preparing data
• Population ALL the subjects that we are interested in
• Sample
• Non-probability samples
• convenience sample WE CAN’T USE STAT TO DETERMINE THE POPULATION WITH THIS METHOD

• voluntary response sample


• Probability samples
• Simple Random Sample every element has same probability
• Stratified Random Sample
• Microdata: collected at the level of individual respondents
Recap: Visualizing data
• Purpose is to show the distribution of the variable (Distribution of a
variable: the values the variable takes and how often it takes them)
• For categorical variables:
• Pie chart
• Bar chart
• Pareto chart
• For quantitative variables
• Histograms
• Stemplots
• Time plots
Review
Below are some of the variables from a survey conducted by the U.S.
Postal Service. Which of these variables is categorical?
a) Household income quantitative
the rest are quantitive variable
b) Age of respondent quantitative
c) Number of persons in household quantitative
d) Zip code
why? because it is used to represent specific geographic areas or categories
Review
Suppose Loblaws asks 57 customers in the store the following questions:
1. How likely is it that you will purchase potato chips in the next week? (definitely,
probably, not sure, definitely not) Categorical
2. How much do you plan to spend on your snack purchases today? Quantitative

• What kind of sample is this? convenience sample (non-prob)


• What kind of dataset would be produced? cross-sectional
• What is the unit of observation? customer
• Would this be an example of microdata? YES, because the level is on the individual response
• What type of variables do you have in the dataset? Categorical, and Quantitative
Customer Question 1 Question 2

1.
2.
3.
.
.
.
Review: What type of dataset is represented
here? this is time plot
this is panel data = we have time periods and units

Graph from an article in The Economist


Plan for today
• How to make histograms
• Describing histograms
• Excel: frequency table; histogram
How to make a histogram
1. Divide the range of values into classes/bins of equal width
2. Count the number of cases that fall into each bin
How to make a histogram
• Mark off the x-axis using the class/bin widths.
Mark an appropriate scale on the y-axis
• Draw a bar over each interval on the x-axis. The
height of the bar will show the number of cases in
each interval.

Variable Unit (X axis)


Note on histograms
• Depending on the scale on the
vertical axis, we have different
types of histograms
• Frequency histogram: bar height
the number of observations in
bin
Other types of histograms
• Relative frequency histogram:
bar height is the fraction of
observations in the bin
• Density histogram: bar area is
the fraction of observations in
the bin
What we can tell from histograms

WE ARE NOT ABLE

WE ARE NOT ABLE, BUT LIKELY.


BUT WE CAN’T TELL THE EXACT
PERCENTAGE
Choosing the number of bins
• We mentioned last class that the
number of bins changes the
appearance of the histogram
• Too few: smooths over
important features of
distribution
• Too many: many bins with 1
observation
• One suggestion: # of bins = 𝑛
Choosing the number of bins
• There is no unique or perfect solution
• Experiment with different bin sizes and number of bins
• Start with 𝑛 bins. Check the distribution and refine as needed
Examining a distribution
• First, look for the overall pattern, which consists of
• Shape is it spread out or not?
• Center
• Spread maximum and minimum value
• Then look for deviations from the overall pattern
• Outlier - an individual value that falls outside the overall pattern
Describing a histogram [WILL DISCUSS LATER]

• Is it approximately normal? bell shape (symmetric)


• Is it negatively or positively skewed? positive = long right tail
negative = long left tail

• Is it symmetric?
• Is it bimodal? one peak or 2 peaks?
• Are there any outliers?
• What is the context?
Examples of Normal histograms
perfectly normal distribution

The first two histograms


here are perfectly Normally
distributed. The third is a histogram
of real-life data that we would
describe as approximately Normal
Positively skewed histograms
Right Skewed
almost bell shape
Negatively skewed histograms
left skewed
(Perfectly) symmetric histograms
equal prob = uniform distribution

bimodal cuz have 2 peaks


Bimodal histograms
Example: describe this distribution
ANS: slightly right skewed and bimodal
For next class
• Read: Alwan 1.2, 1.3
• For practice: Alwan question 1.31

You might also like