Lecture Topic 2

Econ6034-Econometrics and Business
Statistics
Topic 2: Descriptive Statistics for Data
1 / 25
Measures of Relative Standing
I We discussed measures of dispersion or variability and measures

of relative standing of data in Topic 1.
I Measures of relative standing are designed to provide
information about the position of particular values relative to
the entire data set.
Percentile: divide the data into hundredths, and report the

observation at the boundaries of the subsets.
I Suppose you scored in the 60th percentile on the GMAT that

means 60% of the other scores were below yours, while 40% of
scores were above yours.
2 / 25
Measures of Relative Standing
I We have special names for the 25th, 50th, and 75th percentiles,
namely quartiles.
The first (lower) quartile: Q1 = 25th percentile.
The second quartile (median): Q2 = 50th percentile.
The third (upper) quartile: Q3 = 75th percentile.
3 / 25
Interquartile Range
The quartiles can be used to create another measure of variability,

the interquartile range, which is defined as follows:
Interquartile Range (IQR) = Q3 – Q1
I The interquartile range measures the spread of the middle 50%

of the observations.
I Large values of this statistic mean that the 1st and 3rd
quartiles are far apart indicating a high level of variability.
4 / 25
Measures of Linear Relationship
Provide information as to the strength & direction of a linear

relationship between two variables (if one exists).
I Covariance
I Coefficient of correlation
5 / 25
Covariance
I Covariance measures the linear association between two

random variables. That is it measure s how much two random
variables vary together.
I Examples:
Traffic accidents and number of p-plate driver
Ratio of teacher/students and student performance.
6 / 25
Covariance
I Population covariance:
N
(xi − µx)(yi − µy )
X
i=1
σxy =
N
I Sample covariance:
n
(xi − x̄)(yi − ȳ)
X
i=1
sxy =
n−1
7 / 25
Covariance
I When two variables move in the same direction (both increase

or both decrease), the covariance will be a large positive
number.
I When two variables move in opposite directions, the covariance

is a large negative number.
I When there is no particular pattern, the covariance will be

close to zero.
However, it is often difficult to determine whether a particular

covariance is large or small. In other words, it is hard to
determine the strength of the relationship based on covariance.
The next parameter/statistic addresses this problem.
8 / 25
Correlation Coefficient
Correlation measures degree and strength of a relationship between

two random variables.
I Population correlation coefficient:

σxy
ρ=
σx σy
I Sample correlation coefficient:
Sxy
r=
Sx Sy
9 / 25
Correlation Coefficient
A correlation coefficient standardises the covariance of two variables

such that its value will lie between -1 and 1 (inclusive).
I If the two variables are very strongly positively related, the

coefficient value is close to +1 (strong positive linear
relationship).
I If the two variables are very strongly negatively related, the
coefficient value is close to -1 (strong negative linear
relationship).
I No straight line relationship is indicated by a coefficient close
to zero.
10 / 25
Using Excel
I Covariance
=covar(range of X, range of Y)
I Correlation
=correl(range of X, range of Y)
11 / 25
Parameters and Statistics
Population Sample
Size N n
Mean µ X̄
Variance σ2 S2
Standard deviation σ S
Covariance σxy Sxy
Correlation ρ r
12 / 25
Summary of Data
I Numerical Descriptive Techniques

Middle – mean, median
Spread – variance, range
Relative standing – quartiles, IQR
Linear relationship – covariance, correlation coefficient
I Graphical Descriptive Techniques

Bar chart, pie chart, line chart
Scatter plot
Boxplots
Histograms
13 / 25
Bar Chart
14 / 25
Pie Chart
15 / 25
Line Chart
I A line plot is useful when the progression of a numerical

quantity through time (historical record) is of interest.
16 / 25
Scatter Plots
I Shows the relationship between two numerical variables.
I Example: income v.s food consumption
17 / 25
Scatter plots
18 / 25
Box Plot
I It shows the positions of the
quartiles, outliers, the
largest and smallest values
except outliers.
I Example:
Food consumption
expenditure (US 1941
Family Budget Survey data)
I Whiskers: max length is
1.5*IQR; stretch from box
to furthest data point
(within this range)
I Points further out from box
marked with circles; called
outliers.
19 / 25
Histogram
I To provide a graphical summary of a data set with a large

number of observations, we often have to reduce the amount of
information that is to be absorbed, in general by grouping the
observations or, sometimes, only grouped data could be
available.
I The grouping procedure involves:
subdivision of the range of the data into subintervals
count of the number of the observations in each subinterval
I Definition of the terms:
classes - the subintervals into which the data are broken down
frequencies - the numbers of observations in each class
20 / 25
Example
Household incomes in a hypothetical Sydney suburb.
21 / 25
Other Characteristics of a “Distribution”
Skewness (S): a measure of asymmetry

I Symmetric:
skewness = 0, mean = median
I Skewed to the right (long tail to the right):

skewness > 0, median < mean
I Skewed to the left (long tail to the left):

skewness < 0, mean < median
22 / 25
Kurtosis (K): a measure of the weight in the tails
I Mesokurtic:
K = 3 (“Normal distribution”)
I Platykurtic:
K< 3
I Leptokurtic:
K >3
23 / 25
Skewness and Kurtosis
(a) Skewness ; (b) Kurtosis 24 / 25

Summary
I Key Statistical Concepts

Population, Sample, Parameters, Statistics
I Numerical Descriptive Techniques
Middle–mean, median
Spread–variance, range
Relative standing – quartiles, IQR
Linear relationship–covariance,correlation coefficient
I Graphical Descriptive Techniques
Bar chart, pie chart, line chart
Scatter plot
Boxplots
Histograms
I Other Characteristics of a “Distribution”
Skewness
Kurtosis
25 / 25

Lecture Topic 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Topic 2

Uploaded by

Copyright:

Available Formats

Econ6034-Econometrics and Business

I We discussed measures of dispersion or variability and measures

Percentile: divide the data into hundredths, and report the

I Suppose you scored in the 60th percentile on the GMAT that

The first (lower) quartile: Q1 = 25th percentile.

The second quartile (median): Q2 = 50th percentile.

The third (upper) quartile: Q3 = 75th percentile.

The quartiles can be used to create another measure of variability,

Interquartile Range (IQR) = Q3 – Q1

I The interquartile range measures the spread of the middle 50%

Provide information as to the strength & direction of a linear

I Covariance measures the linear association between two

Traffic accidents and number of p-plate driver

Ratio of teacher/students and student performance.

I When two variables move in the same direction (both increase

I When two variables move in opposite directions, the covariance

I When there is no particular pattern, the covariance will be

However, it is often difficult to determine whether a particular

The next parameter/statistic addresses this problem.

Correlation measures degree and strength of a relationship between

I Population correlation coefficient:

I Sample correlation coefficient:

A correlation coefficient standardises the covariance of two variables

I If the two variables are very strongly positively related, the

I Numerical Descriptive Techniques

I Graphical Descriptive Techniques

I A line plot is useful when the progression of a numerical

I To provide a graphical summary of a data set with a large

Skewness (S): a measure of asymmetry

I Skewed to the right (long tail to the right):

I Skewed to the left (long tail to the left):

(a) Skewness ; (b) Kurtosis 24 / 25

I Key Statistical Concepts

You might also like