Chapter 3 Slides #1 Shape and Location

CHAPTER 3
DESCRIPTIVE STATISTICS
Learning Goals
Shape (Skewness, symmetry, modality)
Location (mean, median, mode)
Variability/Spread (variance, standard deviation, range, IQR)
Relative Location (z-score)
Empirical Rule and Chebyshev’s Theorem
Boxplots
Outliers
Descriptive Statistics
The purpose of descriptive statistics is to convert data into
information using meaningful charts (graphs) and numerical
reports.
Most statistical softwares can routinely generate all types of

descriptive statistics.
In this chapter, we give some practical guides on proper
interpretation of the commonly used descriptive statistics
tools.
Relevant features of your data
To effectively use descriptive statistics tools, It is important to understand
the following three features of data (our focus will be on quantitative data):
1) Shape of the data
2) Location of the data
3) Variability/Spread of the data
Do not see these aspects as separate; they complement each other. But at
the same time, treating them one at a time help you see things better.
1) Shape of data
To understand shape of your data, display its HISTOGRAM
which is simply a graphical summary of the data.
Most statistical software can automatically generate
histograms.
Although you may also create histograms in Excel, you need
to do several formatting to make it look right. So, to save
time and for brevity, we will always provide software
generated histograms when needed.
What to look for in a
histogram
When trying to understanding shape of your data, look for
these key features:
1) Symmetry
2) Skewness
3) Modality
Your data will exhibit one or more of the above features.

Data examples
The following data exhibit different shapes – as shown in

their respective histograms.
Weights of tomatoes (grown in an experimental) for a sample
240
Histogram left, Boxplot right
Summary
This data is symmetric.
Is also bell / mound shaped.
Many business/economic data exhibit this feature.
The data resembles what we call the Normal distribution.

CEO annual pay at 500 largest US firms in 2008
(histogram left, Boxplot - right)
Summary
This data is skewed to the right OR is positively skewed.
Quite a few salaries stretch the data to the right!
Data such as income are often skewed to the right.

Number of books shipped out daily by Amazon.com for
selected 100 days
Summary
This is skewed to the left OR negatively skewed.
A few small values stretch the data to the left!
This data shape is rare in business and economic applications.

Grades of Economics Midterm
Summary
This has two features: skewed to the left and bimodal.
We say there are two modes (is bimodal) in this data as

there are two distinct grades in the class.
Modality will tell you if there are distinct (unique) groups in

your data.
You can imagine how wrong our analysis is going to be if we
assume our data is homogenous – while the histogram
indicates otherwise.
VIDEO is available on shapes
On Titanium, an exercise video on shapes is
available.
VIDEO: HISTOGRAM
Please watch video immediately after this section
has been covered in class.
2) Location of data
A location measure is a numerical typical value /summary of data.
We present 3 location measures.
These measures are also called measures of central tendency, i.e.

the data tends to cluster around those measures!
Thus, may help summarize data with a single numerical value.

Location Measures
◦ Mean -- the simple average
◦ Median -- the middle observation after the data has been

ordered
>> Our focus in this course will be on mean and median
◦ Mode -- the observation that occurs most often
>> For continuous variables, the definition of mode as in above

is not meaningful.
Median
(Even Number of Data Points)
The average of the two middle observations
First put in either ascending or descending order

DATA: 4,2,3,3,2,2,1,4,3,2
There is an even number of data points (10)
ASCENDING ORDER: 1,2,2,2,2,3,3,3,4,4 2,3

Median =
Average of the middle two
(2+3)/2 = 2.5
Median
(Odd Number of Data Points)
Suppose we extend the previous data by one more observation which has a
value of 4.
Then, we will have odd number of data points (11)
ASCENDING ORDER: 1,2,2,2,2,3,3,3,4,4,4

3
Median is middle observation

Median = 3
Mode
The observation that occurs in a dataset most often
DATA: 4,2,3,3,2,2,1,4,3,2
Mode = 2
We can have data with more than mode

◦ Bimodal, multi-modal
Computing location in Excel
Suppose data are in cells A2 to A11
Mean -- = AVERAGE(A2:A11)
Median -- = MEDIAN(A2:A11)
Mode -- = MODE(A2:A11)
Can also use Descriptive Statistics Option from Data Analysis in the
Tools Menu
=AVERAGE(A2:A11)
=MEDIAN(A2:A11)
=MODE(A2:A11)
Check
Labels
Where data values are stored
Enter Name of
Output Worksheet
Check both:
Summary Statistics
Confidence Level
Drag to make
Column A wider
Sample Mean
Sample Median
Sample Mode
Which location measure to
use?
This depends on the SHAPE of your data.
When the histogram is unimodal and symmetric, all the location measures
are close to each other and use of mean is recommended due to its simplicity.
When the histogram shows a great deal of skewness , median is a better

measure than mean as it is still represents the center of the data.
Mode is used rarely and is supposed to be an indicator of groups within the

data.
Skewness, mean and median
Mean always go in the direction of the skew!! Never forget
this.
Therefore, when data is skewed to the right, mean is

greater than median.
When data is skewed to the left, mean is less than median.

Something to think about?
If I were to give you a summary of single family homes in Orange
County to be the following:
A) 900,000
B) 350,000
Can you tell which is more likely the mean and which is median?
VIDEO is available on
skewness, mean and median
On Titanium, an exercise video is available.
VIDEO: Mean and Median
Please watch video immediately after this section
has been covered in class.

Chapter 3 Slides #1 Shape and Location

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 Slides #1 Shape and Location

Uploaded by

Copyright:

Available Formats

CHAPTER 3

Most statistical softwares can routinely generate all types of

1) Shape of the data

2) Location of the data

3) Variability/Spread of the data

Your data will exhibit one or more of the above features.

The following data exhibit different shapes – as shown in

Is also bell / mound shaped.

Many business/economic data exhibit this feature.

The data resembles what we call the Normal distribution.

Quite a few salaries stretch the data to the right!

Data such as income are often skewed to the right.

A few small values stretch the data to the left!

This data shape is rare in business and economic applications.

We say there are two modes (is bimodal) in this data as

Modality will tell you if there are distinct (unique) groups in

We present 3 location measures.

These measures are also called measures of central tendency, i.e.

Thus, may help summarize data with a single numerical value.

◦ Median -- the middle observation after the data has been

>> Our focus in this course will be on mean and median

◦ Mode -- the observation that occurs most often

>> For continuous variables, the definition of mode as in above

First put in either ascending or descending order

There is an even number of data points (10)

ASCENDING ORDER: 1,2,2,2,2,3,3,3,4,4 2,3

ASCENDING ORDER: 1,2,2,2,2,3,3,3,4,4,4

Median is middle observation

We can have data with more than mode

Where data values are stored

When the histogram shows a great deal of skewness , median is a better

Mode is used rarely and is supposed to be an indicator of groups within the

Therefore, when data is skewed to the right, mean is

When data is skewed to the left, mean is less than median.

You might also like