You are on page 1of 23

ECON 225: Data and

Statistics for Economics


Lecture 2
ECON 225: Lecture 2
• How data is collected
• Graphical representations of data
• Introduction to data analysis in Excel
Sources of data
Available data are data that were produced in the past for some other
purpose but that may help answer a present question inexpensively.
The library and the Internet are sources of available data.
• Government statistical offices are the primary source for demographic, economic, and social data

Some questions require data produced specifically to answer them.


This leads to designing observational or experimental studies.
Census or sample
• We often wish to find information or answer a question about a
group or population. For example, a business might want to know
something about all Canadian consumers.
• Census: There is an attempt to contact every individual in a
population in order to answer some question(s) of interest. ex: questionare

• Sampling: A subset of the members of the population is selected in


order to gain information about the population of interest.
random select the sample from population

NOTE: random sample is an example of observational studies


Observational studies and experiments
• Observational study: Record data on individuals without attempting
to influence the responses. You simply observe.
• Example: Watch the behavior of consumers looking at store displays or the
interaction between managers and employees.
• Experimental study: Treatment is deliberately imposed on individuals
and their responses are recorded. Influential factors can be
controlled.
• Example: In order to answer the question “Which TV ad will sell more
toothpaste?,” each ad is shown to a separate group of consumers. The
number that buy the toothpaste is recorded for each group
• Note: Sample surveys are a type of observational study. Experimental
studies can also be performed on the sample group.
Poor sampling designs
Voluntary response sampling:
• Individuals choose to participate. These samples are very susceptible
to being biased because different people are motivated to respond or
not. Often called “public opinion polls.” These are not considered
valid or scientific. why? because bias will be the result of the observation
• Bias: Sample design systematically favors a particular outcome.
• E.g. People who write letters to newspapers usually have polarized
opinions, so they are not representative of the population
Poor sampling designs
Convenience sampling: Simply ask whoever is around
• E.g. Journalists interview people on the street
• When you ask about corporate regulation or unemployment on a
street on Wall Street or in a small town, you would get different
answers
• Surveying people near a high school would skew the sample towards
high school students
• There is bias because opinions are limited to individuals present
limitation of the people
Probability or random sampling
• In random sampling, individuals are randomly selected. No one group
should be represented more than the others.
• Simple Random Sample
• Stratified Random Sample = sample from the subgroup / strata
• A series of simple random samples collected from defined subgroups of a population
• E.g. Male/female, by province or school district

how to differentiate the strata?


we have to categorize them with
the same characteristic

Source: Anderson et al. (2014)


Example 1
In 2015, there were 1,768 Gap stores in the United States. Suppose
that managers are interested in the average dollar amount of all sales
transactions at Gap stores in the US in 2015. What is the "population"
being studied?
a) All customers
b) All sales transactions
c) All managers
d) All stores
Example 1, continued
Suppose an analyst divides all 1768 Gap stores into five geographic
regions. She then randomly selects 24 stores from each region for sales
transaction measurement. What type of sample is she collecting?
a) A simple random sample.
b) A stratified random sample.
c) A convenience sample.
Data suitable for your project
• The dataset you choose for your project should be collected through
some form of random sampling
• This is important so that you’ll be able to use the statistical inference
techniques covered in the course
• It should also consist of microdata: where the data is collected at the
level of individual respondents
• The unit of observation can be person, household, business, farm, or school
• We’ll talk more about the project requirements and where you can
find suitable datasets later in the course

NOTE: for project do not choose aggregative data (such as average data of a province)
Types of quantitative variables
• Reminder: A quantitative variable is one that takes on numerical
values can be discrete and continuous
• Quantitative variables can be discrete or continuous
• Discrete: Takes a countable number of values
• E.g. Number of cats in your household exactly number such as: 2, 3, 4
• Continuous: Takes an uncountable (and infinite) number of values
• E.g. Length of your cat’s tail decimal number: 1.4, 3.1, etc

NOTE: even though you have unlimited number, but if it’s still exact
number (rounded) it’s still discrete number
Exploring distributions
• Exploratory data analysis: examining and looking at the features of a
set of data
• Distribution of a variable: the values the variable takes and how often
it takes them
• Categorical variable: list each category and show a count or precent of the
cases that fall in each category
• Quantitative variable: give ranges of values for the variable and show how
often cases have values falling in each range
Displaying distributions with graphs
• Ways to display categorical data
• Bar graphs
• Pie charts
• Ways to display quantitative data
• Histograms
• Stemplots
• Time plots
Categorical variables - Distributions
• Example: marital status value
• Values of the variable: Married,
Never married, Divorced,
Widowed
• We can summarize the data in
table form
Bar graphs for categorical variables
• Categories are listed on the
horizontal axis
• The height of the bar above each
category represents the count
(or sometimes percentage) of
observations in that category
• The categories in the graph can
be ordered any way we want
(alphabetical, by increasing
value, chronological, etc.)
Pie charts for categorical variables
• The area of the pie dedicated to
each category represents the
proportion of observations
falling in that category
• The categories in the pie chart
are exhaustive – the percentages
add up to 100%
Pareto charts
• A pareto chart is a bar chart that is sorted by frequency
• Example: accidents per day of the week

The Pareto chart on the left is easier to read than the chronologically ordered
bar chart on the right.
Displaying distributions – quantitative
variables
• Histograms and stemplots
• These graphs summarize the distribution of a single quantitative variable.
• Time plots
• These are graphs of a single variable measured at multiple points in time. A
line connecting the points emphasizes the changes occurring over time.
Histograms
A range of data is divided into
non-overlapping and equal width
classes (bins) that cover the full
range of values.

The histogram shows the number


(or fraction) of individual data
points that fall in each bin.

Source: World Bank


why is it histogram? because on

Example 2 the x axis is quantitative data

- pareto will shows from the most to


- the rest just show the categories the least
Is this graph a bar chart, a pie chart, a Pareto chart, or a histogram?

Price paid for last haircut


ANSWER!
Histogram:
70 represents the quantity
60 of a single quantitative
50
frequency
variable
40
30
20
10
0
25 50 75 100 125 150 175 200 225 250 275 300
$
Number of bins changes the appearance of
the histogram

How do we pick the number of bins and the width of bins? next class
For next class
• Complete the survey, if you haven’t done it yet
• Reading: Alwan 3.1, 1.2; the rest of Krauth Chapter 2
• For practice: Alwan Questions 1.7, 1.13 , 1.27, 1.28

You might also like