You are on page 1of 43

Describing Data: Displaying and

Exploring Data
Chapter 4

4-1 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Measures of Position
Quartiles divide a set of observations into four equal parts
denoted by Q1, Q2, and Q3.

 The first quartile (Q1) divides the smallest 25.0% of the


values from the other 75.0% that are larger. The second
quartile (Q2) is the median; 50.0% of the values are smaller
than or equal to the median, and 50.0% are larger than or
equal to the median. The third quartile (Q3) divides the
smallest 75.0% of the values from the largest 25.0%.
4-2 Copyright 2019 by McGraw-Hill Education. All rights reserved.
Measures of Position
 Deciles divide a set of observations into 10 equal parts.
 So, if you found that your statistics’ score was in the 8th decile
at your class, you could conclude that 80% of the students had a
score lower than yours and 20% had a higher score.

 Percentiles divide a set of observations into 100 equal


parts.
 If your statistics’ score was in the 92nd percentile, then 92% of
students had a score less than your score and only 8% of
students had a score greater than your score.

Copyright 2019 by McGraw-Hill Education. All rights reserved.


Example
 For data provided below
a) Please locate the median, the first quartile, and the third
quartile.

9 14 5 16 12 11 10 8 15
13 17 20 8 19 15

 First, sort the data from smallest to largest

5 8 8 9 10 11 12 13 14
15 15 16 17 19 20

4-4 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Example
 Next, find the median
 L50 = (15+1)*50/100 = 8 Thus, we need to find the 8th
ordered value in data set.
 So the median is 13, the value at position 8

Therefore, the first and third quartiles are located at


4th and 12th positions, respectively: L25 = 9; L75 = 16
4-5 Copyright 2019 by McGraw-Hill Education. All rights reserved.
Example
 For data provided below
a) Please locate the first quartile.

76 34 65 42 74 22 83 59 90 44

First, sort the data from smallest to largest

22 34 42 44 59 65 74 76 83 90

L25= 2.75, the first quartile is located between the second


and third value. Therefore, we need to move 0.75 of the
distance between the second and third values.
42-34= 8, 0.75*(8)= 6 So, q1= 34+6= 40
4-6
Exercise
 For data provided below
a) Please locate the first quartile, and the third quartile.
91 75 61 101 43 104 87 49 71 88

43, 49, 61, 71, 75, 87, 88, 91, 101, 104

The first quartile, Q1, L25 = (10+1)*25/100 = 2.75


61-49= 12 0.75*12= 9, 49+9= 58 so Q1= 58

The third quartile, Q3 is the 75th percentile, L75 = (1o+1)*75/100


= 8.25,
71-49= 22 0.25*22= 5.5 49+5.5= 54.5 so Q3 is 54.5
4-7
Interquartile Range
 The interquartile range is the difference between the third
quartile and the first quartile.
 It is the range of the middle 50% of the data values.

Interquartile Range = Q3 – Q1

4-8 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Example
 Calculate the interquartile range for the following data.
15,13,6,5,12,50,22,18
Answer:
First, arrange the above data in ascending order
5,6,12,13,15,18,22,50

Q1: (8+1)*25/100 9/4=2.25 12-6=6 0.25*6= 1.5 6+1.5=7.5


Q3: (8+1)*75/100 27/4=6.75 22-18=4 0.75*4=3 18+3=21
Interquartile range = 21- 7.5= 13.5

4-9 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Exercise
 Calculate the interquartile range for the following data.
3, 9, 7, 8, 5, 21, 15, 16, 20, 11.

Answer:
First, arrange the above data in ascending order
3, 5, 7, 8, 9, 11, 15, 16, 20, 21.

Q1: (10+1)*25/100 11/4=2.75 7-5=2 0.75*2= 1.5 5+1.5=6.5


Q3: (10+1)*75/100 33/4=8.25 20-16=4 0.25*4=1 16+1=17
Interquartile range = 17- 6.5= 10.5

4-10 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Outlier
 An outlier is an extremely high or an extremely low data
value when compared with the rest of the data values.

4-11 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Example
 Check the following data set for outliers.
-10, 9, 7, 8, 5, 21,15, 16, 20, 40.
First, arrange the above data in ascending order
-10, 5, 7, 8, 9, 15, 16, 20, 21,40.

Q1: 6.5
Q3: 17 Interquartile range = 17- 6.5= 10.5
Step 3: 10.5*1.5= 15.75
Step 4: 6.5 – 15.75= -9.25 17 + 15.75= 32.75
Step 5: Check the data set for any data values that fall outside the
interval from -9.25 to 32.75. The value -10 and 40 are outside this
interval; hence, they can be considered outliers.
4-12 Copyright 2019 by McGraw-Hill Education. All rights reserved.
Exercise
 Check the following data set for outliers.
22,6,50,13,15,18,5,12
 Answer
5,6,12,13,15,18,22,50
Q1=7.5 Q3=21 IQR= 21-7.5= 13.5
Q1- 1.5(IQR)=> 7.5 – 1.5(13.5)= -12.75
Q3+ 1.5(IQR)=> 21 + 1.5(13.5)= 41.25

 Check the data set for any data values that fall outside the
interval from -12.75 to 41.25. The value 50 is outside this
interval; hence, it can be considered outliers.
4-13 Copyright 2019 by McGraw-Hill Education. All rights reserved.
Box Plots
 A box plot is a graphical display using quartiles, that shows the general
shape of a variable’s distribution.

 A box plot is based on five statistics:


 Minimum value
 1st quartile
 Median
 3rd quartile
 Maximum value
 The interquartile range is Q3 – Q1
4-14 Copyright 2019 by McGraw-Hill Education. All rights reserved.
Example
 Domino’s Pizza offers free delivery of its pizza within 15
kilometers. Using a sample of 20 deliveries, Domino’s
determined the following:
 Minimum value = 13 minutes
 Q1 = 15 minutes
 Median = 18 minutes
 Q3 = 22 minutes
 Maximum value = 30 minutes
 Develop a box plot for delivery times

4-15 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Box Plot Example Continued
 Begin by drawing a number line using an appropriate
scale
 Next, draw a box that begins at Q1 (15 minutes) and ends
at Q3 (22 minutes)
 Draw a vertical line at the median (18 minutes)
 Extend a horizontal line out from Q3 to the maximum
value (30 minutes) and out from Q1 to the minimum value
(13 minutes)

4-16 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Exercise
 The following box plot shows the assets in millions of
dollars for credit unions in Taipei City, Taiwan.

 What are the smallest and largest value, the first and third
quartiles, and the median? Would you agree that the
distribution is symmetrical?
 Estimate the interquartile range.

4-17 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Answer
 The smallest value is 10 and the largest 85; the first
quartile is 25 and the third 60.
 About 50% of the values are between 25 and 60.
 The median value is 40.
 The distribution is positively skewed.

4-18 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Exercise
 In a study of the gasoline mileage of model year 2017
automobiles, the mean miles per gallon was 28 and the
median was 27. The smallest value in the study was 13
miles per gallon, and the largest was 50. The first and
third quartiles were 18 and 35 miles per gallon,
respectively.
 Develop a box plot and comment on the distribution. Is it
a symmetric distribution?

4-19 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Solution
 The distribution is somewhat positively skewed. Note that
dashed line above 35 is longer than below 18.

Gasoline mileage

10 20 30 40 50
C1

4-20 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Common Shapes of Data

4-21
Skewness
 The coefficient of skewness is a measure of the symmetry of a
distribution.
 Two formulas for coefficient of skewness

 The coefficient of skewness can range from -3 to +3


 A value near -3 indicates considerable negative skewness
 A value of 1.63 indicates moderate positive skewness
 A value of 0 means the distribution is symmetrical
4-22 Copyright 2019 by McGraw-Hill Education. All rights reserved.
Example
 Following are the earnings per share for a sample of 15
software companies for the year 2020. The earnings per
share are arranged from smallest to largest.

a) Find the mean, median, and standard deviation


b) Compute the coefficient of skewness using Pearson’s
method.
c) Compute the coefficient of skewness using the software
method.
d) What is your conclusion regarding the skewness of the data?

4-23 Copyright 2019 by McGraw-Hill Education. All rights reserved.


4-24 Copyright 2019 by McGraw-Hill Education. All rights reserved.
4-25
4-26 Copyright 2019 by McGraw-Hill Education. All rights reserved.
4-27 Copyright 2019 by McGraw-Hill Education. All rights reserved.
Exercise
For the sample of five data listed below:

73, 98, 60, 92, and 84

a) Find the mean, median, and standard deviation


b) Compute the coefficient of skewness using
Pearson’s method.
c) Compute the coefficient of skewness using the
software method.
d) What is your conclusion regarding the skewness of
the data?

4-28 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Answer

4-29
Describing the Relationship Between
Two Variables
 When we are studying a single variable, we refer to this as
univariate data.
 When we study the relationship between two variables we
refer to the data as bivariate.
 Would it be reasonable to conclude that the more expensive
vehicles are purchased by older buyers?
 Is there a relationship between the profit earned on a vehicle
sale and the age of the purchaser?
 Do tall fathers tend to have tall children?
 One graphical technique we use to show the relationship between
variables is called a scatter diagram.

4-30 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Scatter Diagram
 To draw a scatter diagram, we need two variables.
 Step 1: We scale one variable along the horizontal axis (X-
axis) of a graph and the other variable along the vertical axis
(Y- axis).
 Step 2: Plot each point on the graph.
 Step 3: Determine the type of relationship (if any) that
exists for the variable.

 Both variables are measured with interval or ratio level


scale.

4-31 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Types of Relationships Depicted by
Scatter Diagrams

4-32
How to report the relationships in
Scatter Diagrams?

 If the scatter of points moves from the lower left to


the upper right, the variables under consideration are
directly or positively related (temperature & ice
cream sales).
 If the scatter of points moves from the upper left to
the lower right, the variables are inversely or
negatively related.
 If there is not a meaningful pattern, it means that
there is no relationship between two variables.

4-33 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Example
Construct a scatter diagram for the data obtained in a
study on the number of absences and the final grades of
seven randomly selected students from a statistics class.
The data are shown below.

4-34 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Answer
 Step 1: draw and label the x and y axes.
 Step 2: Plot each point on the graph, as shown in Figure below:

 In this example, it looks as if a negative relationship exists between


the number of student absences and the final grade of the students.
4-35
Exercise
Consider the advertising/sales relationship for an
electronics store in Taipei City. On 10 occasions during
the past three months, the store used weekend television
commercials to promote sales at its stores.
The managers want to investigate whether a relationship
exists between the number of commercials shown and
sales at the store during the following week.

4-36 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Exercise

4-37 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Answer

 The scatter diagram indicates a positive relationship


between the number of commercials and sales. Higher sales
are associated
4-38 with
Copyright 2019 a higher Education.
by McGraw-Hill number ofreserved.
All rights commercials.
4-39 Copyright 2019 by McGraw-Hill Education. All rights reserved.
Contingency Table- An Example
 There are four dealerships in the Applewood Auto Group.
Suppose we want to compare the profit earned on each
vehicle sold by the particular dealership. To put it another
way, is there a relationship between the amount of profit
earned and the dealership? The table below is the cross-
tabulation of the raw data of the two variables.

4-40 Copyright 2019 by McGraw-Hill Education. All rights reserved.


Example

From the contingency table, we observe the following:


1. From the Total column on the right, 90 of the 180 cars sold had a
profit above the median and half below. From the definition of the
median, this is expected.
2. For the Kane dealership, 25 out of the 52, or 48 percent, of the cars
sold were sold for a profit more than the median.
3. The percent profits above the median for the other dealerships are 50
percent for Olean, 42 percent for Sheffield, and 60 percent for
4-41Tionesta.
Exercise

4-42 Copyright 2019 by McGraw-Hill Education. All rights reserved.


4-43 Copyright 2019 by McGraw-Hill Education. All rights reserved.

You might also like