You are on page 1of 25

Statistics for Economists

Lecture 1
Lecturer: Dr Omid Mazdak
Email: omid.mazdak@kcl.ac.uk

1
Lecture 1 – Descriptive Statistics

• Basic Concepts in Statistics


• Types of economic data
➢ Cross sectional data
➢ Time series data
➢ Panel data
• Key summary statistics:
➢ Measures of central tendency: mean, median, mode,
➢ Percentiles – e.g. quartiles, box plots
➢ Measures of dispersion - variance, standard deviation

2
Purpose of Statistics

Essential purpose of statistics:


• Existing knowledge and theory on which decisions are based are
often incomplete. Thus empirical observation, data collection and
statistics can aid in the decision making process.
• It is also possible to make theoretical inference from data, drawing
upon measures such as absolute frequencies, relative frequencies,
averages, dispersion and correlation.
• Sample statistics can be used to estimate population parameters.
• Statistics are used both to test existing hypothesis and for inductive
analysis.

3
Concept: Random Variable

• A random variable (RV) is any variable whose value/outcome is


non-constant and cannot be predicted exactly.
- E.g. Total exports from the UK is a random variable which varies
over time.

• A discrete random variable takes on only finite (or countably


infinite) number of values. E.g. example, numbers of workers or
students.
• A continuous random variable is a random variable that can take
on any value in some interval of values. E.g. height or weight of
individuals, travel distances etc.

4
Concept: Sample Vs. Population
• A Sample: a subset of observations from variable(s) from the
population.
- E.g. An election exit poll is drawn from a sample of the voter
population.

• The Population: All possible observations from the variable(s)


of interest.
- E.g. the final election result is from the population of the
voters.

5
Concept: Sample Vs. Population (2)
• Sample size is usually indicated by n
and the population size by N with n <
N.
• A parameter: numerical measure that
describes a specific characteristic of a
population.
• A sample statistic/estimator: is
numerical measure that describes a
specific characteristic of a sample
which is used to estimate the
population parameter. The sample
mean is an example of a sample
statistic and is used to estimate the
population mean.
Random Samples

Simple random sampling is a procedure in which

• each member of the population is chosen strictly by chance,


• each member of the population is equally likely to be chosen,
• every possible sample of n objects is equally likely to be
chosen

The resulting sample is called a random sample. Ideally, all


samples are purely random samples so that they give an
unbiased representation of the population.
7
Why use samples?
• In economics and social sciences in general, the population of data
may not be available or too costly and time consuming to collate.
• We analyse samples and obtain statistical information from samples
as a way of estimating characteristics of the population.
• In general, the larger the sample, the better the estimate of the
population parameters becomes.

8
Types of Economic Data

• Variables
• Categorical variables (defined categories or groups, e.g. male/female)
• Numerical variables
• Discrete variables (counted items)
• Continuous variables (measured characteristics)
• Data
• Cross-sectional data
• Time series data
• Panel data

9
Types of Economic Data – Cross Sectional Data

Cross-section data:
Observations from multiple
variables, at a given moment
time.

E.g. of cross sectional data:


GDP of different countries, at
a given time – e.g. 2019.

10
Types of Economic Data – Cross Sectional Data (2)
Example 2 of Cross sectional data: Countries with Largest Trade Surpluses (2019)
Trade Balance (2019)

China
Germany
Russian Federation
Saudi Arabia
Ireland
Netherlands
Italy
Australia
United Arab Emirates
Brazil
Qatar
Taipei, Chinese
Iraq

0 50 100 150 200 250 300 350 400 450 500


US$ Billions Source: International Trade Centre
11
Types of Economic Data – Time Series Data

Time-series UK GDP (1998 - 2019) Current US$ Trilions


3.5
data: 3
Is a set of 2.5
observations of

US$ Trillions
2
a single 1.5
variable over a 1
period of time… 0.5
E.g. UK GDP 0
1998-2019
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
Source: World Bank

12
Types of Economic Data – Panel Data
UK, China, US, GDP (1998 - 2019) Current US$
Panel data: Trillions
Is a set of 25

observations of 20

multiple

US$ Trillions
15
variables over a UK
period of time… 10 China
US
E.g. UK, US 5

and China GDP 0


1998-2019

2007
1998
1999
2000
2001
2002
2003
2004
2005
2006

2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
Year

13
Measures of Central Tendency
Add up all observations of
1 n variable x, from 1 to n,
The Mean: Arithmetic mean = x =  xi and divide by the number
n i =1 of observations, n.

The median: the numerical value corresponding to the


middle observation in a dataset.

The mode: the value of the most frequent (most common)


observation in a dataset.

14
Measures of Central Tendency: Arithmetic Mean

Sample Arithmetic Mean: summing all values from all


observations from the sample and dividing by n:

Population Arithmetic Mean: summing all values from


all observations from the population and dividing by n:

15
Measures of Central Tendency: Median

The median is also known as the 50th Percentile. In other words, 50% of
the observations are below or equal to this value.

•To find the median of a distribution:


1) Arrange all the observations in order from smallest to largest.
2) The location of the median is 0.5(n + 1) observations up from the bottom of
the list.
- If the number of observations n is odd, the median is the centre observation.
- If the number of observations n is even, the median is the mean of the two
centre observations.
16
Measures of Central Tendency: Simple Example Question:

Xi = 1, 2, 5, 5, 6, 9, 11, 15

Find the a) mean, b) median and c) mode of the variable X.

a) Mean = = (1 + 2 + 5 + 5 + 6+ 9 + 11 + 15)/8 = 6.75

b) Median = 0.5(n + 1) observations up from the bottom of the list. In this


case, the median = 0.5(8+1) = 4.5th observation. So, the median is midway
between observation 4 and observation 5, which equals to (5 + 6)/2 = 5.5 =
median

c) Mode = 5 (since 5 is the most common observation)


17
Measures of Central Tendency – Grouped data (1)

Example – calculating the mean from grouped data


x=
 fx i i

UK income survey:
f i

x f fx
Class in £ Mid income point Number in thousand
0-10k 5 2448 12240
10-25k 17.5 1823 31902.5
25-40k 32.5 1375 44687.5
40-50k 45 480 21600
50-60k 55 665 36575
60-80k 70 1315 92050
80-100k 90 1640 147600
100-150k 125 2151 268875
150-200k 175 2215 387625
200-300k 250 1856 464000
300-500k 400 1057 422800
500-1000k 750 439 329250
1000-2000k 1500 122 183000
2000k+ 3000 50 150000
total 17636 2592205
Mean 146983.726 Mean ≈ 147k 18
Measures of Central Tendency – Grouped data (2)

UK income survey:
Class in £ Number in thousand frequency cumulative freq.
Mode ≈ 5k 0-10k 2448 13.88% 13.88%
10-25k 1823 10.34% 24.22%
25-40k 1375 7.80% 32.01%
40-50k 480 2.72% 34.74%
50-60k 665 3.77% 38.51%
Median ≈ 80k 60-80k 1315 7.46% 45.96%
80-100k 1640 9.30% 55.26%
100-150k 2151 12.20% 67.46%
Mean ≈ 147k 150-200k 2215 12.56% 80.02%
200-300k 1856 10.52% 90.54%
300-500k 1057 5.99% 96.54%
500-1000k 439 2.49% 99.02%
1000-2000k 122 0.69% 99.72%
2000k+ 50 0.28% 100.00%
total 17636 100%

19
Measures of Central Tendency – Grouped data (3)
UK Income Survey:
The Mode The Median The Mean
0.25 Histogram

0.2

0.15

0.1

0.05

0
10 60 110 160 210 260

• The mode < the median < the mean. So the distribution is skewed to
the right. (If the reverse was true, it would be skewed to left)
• If mode = the median = the mean, then it would be a symmetrical
distribution 20
Percentiles
A percentile is the percent of observations that are less than or equal to a given
value.
To calculate pth percentile, (for any percentile, p) the observations need to be first
ordered from lowest to highest.
Pth percentile = value located in the (P/100)(n + 1)th ordered position
So, for e.g., the 25th percentile (also known as the first, or lower quartile, Q1):
Q1 = the value in the 0.25(n + 1)th ordered position.

The 50th percentile (also known as the median):


Median = the value in the 0.5(n + 1)th ordered position.

The 75th percentile (also known as the third or upper quartile, Q3):
Q3 = the value in the 0.75(n + 1)th ordered position.
21
BOX PLOT

A Box plot can be used to summarize key percentiles and the


total range of the data. The range is the difference between the
maximum and minimum values.

22
Measures of dispersion: Variance, Standard Deviation, Coefficient of Variation
Variance is a measure of the dispersion of the data from the mean. The larger the
variance, the larger the standard deviation and the larger the coefficient of variation.

• The (sample) variance The (population) variance


𝑛 𝑛
1 1 2
𝑠2 = ෍ 𝑥𝑖 − 𝑥lj 2 σ2 = ෍ 𝑥𝑖 − 𝑥lj
𝑛−1 𝑛
𝑖=1 𝑖=1

Sample Standard deviation (Population) Standard deviation

s= 𝑠 2 σ = σ2

Sample coefficient of variation (Population) coefficient of variation


𝑠 σ
𝑥lj 𝑥lj 23
Small vs Large Standard Deviation

When the variance and


standard deviation is
relatively small, most
observations are
relatively close to the
mean.

When the variance and


standard deviation is
relatively large,
observations tend to be
further away from the
mean.

24
Summary
• Statistics can be used both to test theoretical hypothesis, and also to
create new theory from empirical observation.
• Data, can be in the form of cross sectional, time series and panel
data.
• There are three main measures of central tendency, the mean,
median and mode.
• Observations from a variable can be divided into percentiles, to give
an idea of the dispersion, and data distribution can be summarized
using a box plot.
• Variance, and the standard deviation of a variable can be used to
give a formal measure of the dispersion of the observations from the
mean.
• Next lecture, some additional descriptive statistics (covariance,
correlation) is covered, and probability theory is introduced. 25

You might also like