You are on page 1of 25

Econ6034-Econometrics and Business

Statistics
Topic 2: Descriptive Statistics for Data

1 / 25
Measures of Relative Standing

I We discussed measures of dispersion or variability and measures


of relative standing of data in Topic 1.
I Measures of relative standing are designed to provide
information about the position of particular values relative to
the entire data set.

Percentile: divide the data into hundredths, and report the


observation at the boundaries of the subsets.

I Suppose you scored in the 60th percentile on the GMAT that


means 60% of the other scores were below yours, while 40% of
scores were above yours.

2 / 25
Measures of Relative Standing

I We have special names for the 25th, 50th, and 75th percentiles,
namely quartiles.

The first (lower) quartile: Q1 = 25th percentile.

The second quartile (median): Q2 = 50th percentile.

The third (upper) quartile: Q3 = 75th percentile.

3 / 25
Interquartile Range

The quartiles can be used to create another measure of variability,


the interquartile range, which is defined as follows:

Interquartile Range (IQR) = Q3 – Q1

I The interquartile range measures the spread of the middle 50%


of the observations.

I Large values of this statistic mean that the 1st and 3rd
quartiles are far apart indicating a high level of variability.

4 / 25
Measures of Linear Relationship

Provide information as to the strength & direction of a linear


relationship between two variables (if one exists).

I Covariance

I Coefficient of correlation

5 / 25
Covariance

I Covariance measures the linear association between two


random variables. That is it measure s how much two random
variables vary together.

I Examples:

Traffic accidents and number of p-plate driver

Ratio of teacher/students and student performance.

6 / 25
Covariance

I Population covariance:
N
(xi − µx)(yi − µy )
X

i=1
σxy =
N
I Sample covariance:
n
(xi − x̄)(yi − ȳ)
X

i=1
sxy =
n−1
7 / 25
Covariance

I When two variables move in the same direction (both increase


or both decrease), the covariance will be a large positive
number.

I When two variables move in opposite directions, the covariance


is a large negative number.

I When there is no particular pattern, the covariance will be


close to zero.

However, it is often difficult to determine whether a particular


covariance is large or small. In other words, it is hard to
determine the strength of the relationship based on covariance.

The next parameter/statistic addresses this problem.

8 / 25
Correlation Coefficient

Correlation measures degree and strength of a relationship between


two random variables.

I Population correlation coefficient:


σxy
ρ=
σx σy

I Sample correlation coefficient:

Sxy
r=
Sx Sy

9 / 25
Correlation Coefficient

A correlation coefficient standardises the covariance of two variables


such that its value will lie between -1 and 1 (inclusive).

I If the two variables are very strongly positively related, the


coefficient value is close to +1 (strong positive linear
relationship).
I If the two variables are very strongly negatively related, the
coefficient value is close to -1 (strong negative linear
relationship).
I No straight line relationship is indicated by a coefficient close
to zero.

10 / 25
Using Excel

I Covariance

=covar(range of X, range of Y)

I Correlation

=correl(range of X, range of Y)

11 / 25
Parameters and Statistics

Population Sample
Size N n
Mean µ X̄
Variance σ2 S2
Standard deviation σ S
Covariance σxy Sxy
Correlation ρ r

12 / 25
Summary of Data

I Numerical Descriptive Techniques


Middle – mean, median
Spread – variance, range
Relative standing – quartiles, IQR
Linear relationship – covariance, correlation coefficient

I Graphical Descriptive Techniques


Bar chart, pie chart, line chart
Scatter plot
Boxplots
Histograms

13 / 25
Bar Chart

14 / 25
Pie Chart

15 / 25
Line Chart

I A line plot is useful when the progression of a numerical


quantity through time (historical record) is of interest.

16 / 25
Scatter Plots
I Shows the relationship between two numerical variables.
I Example: income v.s food consumption

17 / 25
Scatter plots

18 / 25
Box Plot
I It shows the positions of the
quartiles, outliers, the
largest and smallest values
except outliers.
I Example:
Food consumption
expenditure (US 1941
Family Budget Survey data)
I Whiskers: max length is
1.5*IQR; stretch from box
to furthest data point
(within this range)
I Points further out from box
marked with circles; called
outliers.
19 / 25
Histogram

I To provide a graphical summary of a data set with a large


number of observations, we often have to reduce the amount of
information that is to be absorbed, in general by grouping the
observations or, sometimes, only grouped data could be
available.
I The grouping procedure involves:
subdivision of the range of the data into subintervals
count of the number of the observations in each subinterval
I Definition of the terms:
classes - the subintervals into which the data are broken down
frequencies - the numbers of observations in each class

20 / 25
Example
Household incomes in a hypothetical Sydney suburb.

21 / 25
Other Characteristics of a “Distribution”

Skewness (S): a measure of asymmetry


I Symmetric:
skewness = 0, mean = median

I Skewed to the right (long tail to the right):


skewness > 0, median < mean

I Skewed to the left (long tail to the left):


skewness < 0, mean < median

22 / 25
Kurtosis (K): a measure of the weight in the tails

I Mesokurtic:
K = 3 (“Normal distribution”)

I Platykurtic:
K< 3

I Leptokurtic:
K >3

23 / 25
Skewness and Kurtosis

(a) Skewness ; (b) Kurtosis 24 / 25


Summary

I Key Statistical Concepts


Population, Sample, Parameters, Statistics
I Numerical Descriptive Techniques
Middle–mean, median
Spread–variance, range
Relative standing – quartiles, IQR
Linear relationship–covariance,correlation coefficient
I Graphical Descriptive Techniques
Bar chart, pie chart, line chart
Scatter plot
Boxplots
Histograms
I Other Characteristics of a “Distribution”
Skewness
Kurtosis

25 / 25

You might also like