You are on page 1of 26

Lecture 2.

Chapter 4
Graphical descriptive
techniques- Numerical data
4.1 Graphical techniques to describe numerical
data
4.2 Describing time-series data
4.3 Describing the relationship between two
numerical variables
Optional reading:
4.4 Graphical excellence and deception
4.1 Graphical Techniques to
Describe Numerical Data
There are several graphical methods that are
used when the data are numerical (i.e.
quantitative, non-categorical).

The most important of these graphical methods


is the histogram.

The histogram is not only a powerful graphical


technique used to summarize interval data, but
it is also used to help explain probabilities.

2
Example 4.1, page 85
As part of a larger study, an electricity provider
wanted to acquire information about the monthly
electricity bills of new subscribers in the first
month after signing with the company. The
company’s marketing manager conducted a
survey of 200 new residential subscribers wherein
the first month’s bills were recorded. The general
manager planned to present his findings to senior
executives. What information can be extracted
from these data?
– Collect data
– Prepare a frequency distribution
– Draw a histogram.
3
Example 4.1, page 85

We have chosen nine classes defined in such a way


that each observation falls into one and only one
class.
- Number of classes K = 1 + 3.3log10 n ≈ 8 (where n = 200).
For the sake of convenience, we can choose n = 9.
- Range = Largest observation – Smallest observation
= $470.5 – $59.5 = $411
- Equal width of classes = Range ÷ (# classes) = 45.67 ≈ 15
Classes
Amounts that are less than or equal to 100 fall into class: [50, 100];
Amounts that are more than 100 but less than or equal to 150 fall into
class: (100, 150]; and so on, the classes (150, 200], (200, 250], (250,
300], (300, 350], (350, 400], (400, 450] are determined similarly;
Amounts that are more than 450 but less than or equal to 500 fall into
class: (450, 500].
4
Building a Histogram…
Relative frequency distribution of bills and histogram

Class limit Frequency Histogram of electricity bills


50 up to100 8 70
60
100 up to 150 24 60

150 up to 200 36 50

Frequency
200 up to 250 60
40 36
250 upto 300 28 28
30
24
300 up to 350 16
20 16
350 up to 400 10 10 10
8 8
10
400 up to 450 8
0
450 up to 500 10 0
100 150 200 250 300 350 400 450 500 More

Total 200 Bin

5
4.6

Histogram & Interpretation

Fourteen percent (28)


Sixteen percent (32)
of the electricity bills are
of the bills are ‘small’, Seventy percent (140) of the
‘large’ i.e., $350 or
i.e. less than $150. Electricity bills are in the ‘middle range’,
more.
i.e. between $150 and $350
4.7

Stem and Leaf Display…


• Retains information about individual observations
that would normally be lost in the creation of a
histogram.
• How to draw a stem and leaf display: Split each
observation into two parts, a stem and a leaf:
Stem Leaf
e.g.
3 8
• Observation value: 3.8, 4.1
• There are several ways to split it up… 4 1
• We could split it at the decimal point.
4.8

Stem and Leaf Display…


Continue this process for all the observations. Then,
use the ‘stems’ for the classes and each leaf becomes
part of the histogram (based on Table 4.6 data) as
follows…

Thus, we still have access to


our original data point’s value!
4.9

Histogram and Stem and Leaf Display…


Histogram of House prices
8
7
6
Frequency

5
4
3
2
1
0
2 3 4 5 6 More
Bins ('0 000)

Compare the overall shapes of the figures…


Shapes of Histograms…
Symmetry
A histogram is said to be symmetric if, when
we draw a vertical line down the center of the
histogram, the two sides are identical in shape
and size:
Frequency

Frequency

Frequency
Variable Variable Variable

10
Shapes of Histograms…

Skewness
A skewed histogram is one with a long tail
extending to either the right or the left:
Frequency

Frequency
Variable Variable

Positively skewed Negatively skewed

11
Shapes of Histograms…
Modality
A unimodal histogram is one with a single peak, while
a bimodal histogram is one with two peaks:

Bimodal
Unimodal
Frequency

Frequency
Variable Variable

A modal class is the class with


the largest number of observations 12
Shapes of Histograms…
Bell Shape
A special type of symmetric unimodal
histogram is one that is bell shaped:

Many statistical techniques


require that the population be
bell shaped. Frequency

Drawing the histogram helps


verify the shape of the
population in question. Variable

Bell Shaped

13
Histogram comparison
Compare and contrast the following histograms based on
data from Example 4.3, page 97:
The marks from the computer-
based statistics course and the
Unimodal vs. bimodal manual statistics course have
very different histograms…
Marks (computer course) Marks (manual course)

Spread of the marks (narrower | wider)


14
Relative Frequencies
• Example 4.1 (contd.): 8 observations in first class
(electricity bills from $50 to $100). Thus, the relative frequency
for this class is 8÷200 (the total number of electricity bills) =
0.04 (or 4%). The relative frequencies for the remaining
classes can be calculated as shown in the table below.

15
Cumulative Relative Frequencies…
Cumulative relative frequencies for Example 4.1
Relative Cumulative Relative
Classes
Frequency Frequency
50 up to100 4.0% 4.0%
100 up to 150 12.0% 16.0%
150 up to 200 18.0% 34.0%
200 up to 250 30.0% 64.0%
250 upto 300 14.0% 78.0%
300 up to 350 8.0% 86.0%
350 up to 400 5.0% 91.0%
400 up to 450 4.0% 95.0%
450 up to 500 5.0% 100.0%
Calculate the cumulative relative frequencies by adding the current class’ relative frequency to
the previous class’ cumulative relative frequency. (For the first class, its cumulative relative
frequency is just its relative frequency.)
16
4.17

Ogive…
The ogive can be used to answer questions like:
What electricity bill value is at the 50th percentile?

We can
estimate
the
electricity
bill value
that is at
the 50th
percentile
as
approxima
-tely
$224.
4.2 Describing Time Series Data
• Observations measured at the same point in
time across individual units are called cross-
sectional data.
• Observations measured at successive points in
time on a single unit are called time-series
data.
• Time-series data are graphed on a line chart,
which plots the value of the variable on the
vertical axis against the time periods on the
horizontal axis.
• Time series data graphed on a line chart is
alternatively known as a time-series chart.
18
Time Series Data: Example 4.4, page 105
Line chart showing change in Queensland’s overseas exports and
imports over time, 1989 - 2015 (see also pages 105 - 107).
60,000

Exports
50,000 Imports
Total value ($m)

40,000

30,000

20,000

10,000

0
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
Queensland’s exports have had a slow but steady increase from 1989 to
2004. After 2004, exports have been increasing steadily at a much higher
rate but with a number of peaks and falls. Queensland’s imports have had a
steady increase throughout but has been declining since 2013. 19
4.3 Describing the relationship
between two numerical variables
Now we will investigate the relationship
between two numerical variables using
graphical techniques.
A cross-classification table (or cross-
tabulation table) is used to describe the
relationship between two (or more) nominal
variables.
A cross-classification table lists the frequency
of each combination of the values of the two
(or more) variables…
20
4.3 Describing the relationship
between two numerical variables
• Often we are interested in the House Selling
size price
relationships between two numerical
20.0 219
variables. 14.8 190
Example 4.5, page 109 20.5 199
12.5 121
A real estate agent wanted to know to 18.0 150
what extent the selling price of a house is 14.3 198
24.9 334
related to the size (number of squares) of
16.5 188
the house. He took a sample of 15 houses 24.3 310
that had recently sold in a suburb and 20.2 213
recorded the price and the size of each. 22.0 288
These data are listed in Table 4.9. Draw a 19.0 312
scatter diagram for these data, and 12.3 186
14.0 173
describe the relationship between house
16.7 174
size and its selling price. 21
Scatter diagram
• A scatter diagram can describe the relationship
between advertising expenditure and sales.
Excel output
House size Selling price
Selling price vs House size
20.0 219
350
14.8 190
20.5 199 300
12.5 121
18.0 150 250

14.3 198
200
24.9 334
16.5 188 150
24.3 310
20.2 213 100

22.0 288
50
19.0 312
12.3 186 0
14.0 173 10 14 18 22 26 30
22
16.7 174
Scatter Diagram: Petrol prices in Australia vs
World crude oil prices (Example 4.6, page 114)
Melbourne Petrol
180

160

140

120

100

80

60 2
y = 0.8622x +50.833, R = 0.9414
40

20

0
0 20 40 60 80 100 120 140 160

The scatter diagram reveals: the two prices are strongly related linearly.
- As the world crude oil price increases, petrol price also increases.
- When the price of crude oil was below A$120, the relationship between
the two variables was stronger than when the price of oil exceeded A$120.
23
Typical patterns of Scatter Diagrams
Positive linear relationship No relationship Negative linear relationship

Negative nonlinear relationship Nonlinear (concave) relationship

24
Review on graphical techniques to describe
nominal data and numerical data

Numerical Nominal
data data
Histogram Frequency and
Single set of relative frequency
data tables, bar and pie
charts
Relationship Scatter Cross-classification
between two diagram table, bar charts
variables

25
Summary: page 124
Home assignment:

- Section 4.1 Exercises pages 101 - 102:


4.4, 4.7

- Section 4.2 Exercises pages 107 - 108:


4.25, 4.26

- Section 4.3 Exercises pages 115 -116:


4.37, 4.40

26

You might also like