You are on page 1of 30

CHAPTER 3

DESCRIPTIVE STATISTICS
Learning Goals
Shape (Skewness, symmetry, modality)
Location (mean, median, mode)
Variability/Spread (variance, standard deviation, range, IQR)
Relative Location (z-score)
Empirical Rule and Chebyshev’s Theorem
Boxplots
Outliers
Descriptive Statistics
The purpose of descriptive statistics is to convert data into
information using meaningful charts (graphs) and numerical
reports.

Most statistical softwares can routinely generate all types of


descriptive statistics.
In this chapter, we give some practical guides on proper
interpretation of the commonly used descriptive statistics
tools.
Relevant features of your data
To effectively use descriptive statistics tools, It is important to understand
the following three features of data (our focus will be on quantitative data):

1) Shape of the data

2) Location of the data

3) Variability/Spread of the data

Do not see these aspects as separate; they complement each other. But at
the same time, treating them one at a time help you see things better.
1) Shape of data
To understand shape of your data, display its HISTOGRAM
which is simply a graphical summary of the data.
Most statistical software can automatically generate
histograms.
Although you may also create histograms in Excel, you need
to do several formatting to make it look right. So, to save
time and for brevity, we will always provide software
generated histograms when needed.
What to look for in a
histogram
When trying to understanding shape of your data, look for
these key features:

1) Symmetry
2) Skewness

3) Modality

Your data will exhibit one or more of the above features.


Data examples

The following data exhibit different shapes – as shown in


their respective histograms.
Weights of tomatoes (grown in an experimental) for a sample
240
Histogram left, Boxplot right
Summary
This data is symmetric.

Is also bell / mound shaped.

Many business/economic data exhibit this feature.

The data resembles what we call the Normal distribution.


CEO annual pay at 500 largest US firms in 2008
(histogram left, Boxplot - right)
Summary
This data is skewed to the right OR is positively skewed.

Quite a few salaries stretch the data to the right!

Data such as income are often skewed to the right.


Number of books shipped out daily by Amazon.com for
selected 100 days
Histogram left, Boxplot right
Summary
This is skewed to the left OR negatively skewed.

A few small values stretch the data to the left!

This data shape is rare in business and economic applications.


Grades of Economics Midterm
Histogram left, Boxplot right
Summary
This has two features: skewed to the left and bimodal.

We say there are two modes (is bimodal) in this data as


there are two distinct grades in the class.

Modality will tell you if there are distinct (unique) groups in


your data.
You can imagine how wrong our analysis is going to be if we
assume our data is homogenous – while the histogram
indicates otherwise.
VIDEO is available on shapes
On Titanium, an exercise video on shapes is
available.
VIDEO: HISTOGRAM
Please watch video immediately after this section
has been covered in class.
2) Location of data
A location measure is a numerical typical value /summary of data.

We present 3 location measures.

These measures are also called measures of central tendency, i.e.


the data tends to cluster around those measures!

Thus, may help summarize data with a single numerical value.


Location Measures
◦ Mean -- the simple average

◦ Median -- the middle observation after the data has been


ordered

>> Our focus in this course will be on mean and median

◦ Mode -- the observation that occurs most often

>> For continuous variables, the definition of mode as in above


is not meaningful.
Median
(Even Number of Data Points)
The average of the two middle observations

First put in either ascending or descending order


DATA: 4,2,3,3,2,2,1,4,3,2

There is an even number of data points (10)

ASCENDING ORDER: 1,2,2,2,2,3,3,3,4,4 2,3


Median =
Average of the middle two
(2+3)/2 = 2.5
Median
(Odd Number of Data Points)

Suppose we extend the previous data by one more observation which has a
value of 4.
Then, we will have odd number of data points (11)

ASCENDING ORDER: 1,2,2,2,2,3,3,3,4,4,4


3

Median is middle observation


Median = 3
Mode
The observation that occurs in a dataset most often

DATA: 4,2,3,3,2,2,1,4,3,2
Mode = 2

We can have data with more than mode


◦ Bimodal, multi-modal
Computing location in Excel
Suppose data are in cells A2 to A11
Mean -- = AVERAGE(A2:A11)
Median -- = MEDIAN(A2:A11)
Mode -- = MODE(A2:A11)

Can also use Descriptive Statistics Option from Data Analysis in the
Tools Menu
=AVERAGE(A2:A11)

=MEDIAN(A2:A11)

=MODE(A2:A11)
Check
Labels

Where data values are stored

Enter Name of
Output Worksheet

Check both:
Summary Statistics
Confidence Level
Drag to make
Column A wider

Sample Mean
Sample Median
Sample Mode
Which location measure to
use?
This depends on the SHAPE of your data.

When the histogram is unimodal and symmetric, all the location measures
are close to each other and use of mean is recommended due to its simplicity.

When the histogram shows a great deal of skewness , median is a better


measure than mean as it is still represents the center of the data.

Mode is used rarely and is supposed to be an indicator of groups within the


data.
Skewness, mean and median
Mean always go in the direction of the skew!! Never forget
this.

Therefore, when data is skewed to the right, mean is


greater than median.

When data is skewed to the left, mean is less than median.


Something to think about?
If I were to give you a summary of single family homes in Orange
County to be the following:

A) 900,000
B) 350,000

Can you tell which is more likely the mean and which is median?
VIDEO is available on
skewness, mean and median
On Titanium, an exercise video is available.
VIDEO: Mean and Median
Please watch video immediately after this section
has been covered in class.

You might also like