You are on page 1of 62

STATISTICS for the

4-1
Utterly Confused, 2nd
nd ed.

SLIDES
SLIDES PREPARED
PREPARED
By
By
Lloyd
Lloyd R.
R. Jaisingh
Jaisingh Ph.D.
Ph.D.
Morehead
Morehead State
State University
University
Morehead
Morehead KYKY
4-2
Chapter 4

Data Description – Numerical


Measures of Position for
Ungrouped Univariate Data
4-3 Outline
 Do I Need to Read This Chapter?

 4-1 The z-Score or Standard


Score
 4-2 Percentiles
 It’s a Wrap
4-4 Objectives
 Introduction of some basic
statistical measurements of
position.
 Introduction of some graphical
displays to explain these
measures of position.
4-5 Introduction
 A measure of location or position for
a collection of data values is a number
that is meant to convey the idea of
the relative position of a data value in
the data set.
 The most commonly used measures of
location for sample data are the:
z-score,
z-score and percentiles.
percentiles
4-6 4-1 The z-Score
• Explanation of the term – z-score: The
z-score for a sample value in a data set
is obtained by subtracting the mean of
the data set from the value and dividing
the result by the standard deviation of
the data set.
• NOTE: When computing the value of
the z-score, the data values can be
population values or sample values.
• Hence we can compute either a
population z-score or a sample z-score.
4-7 4-1 The z-Score
• The Sample z-score for a value x is
given by the following formula:

xx  xx
zz  score
score 
ss
• Where x is the sample mean and s is
the sample standard deviation.
4-8 4-1 The z-Score
• The Population z-score for a value x is
given by the following formula:

x  
x
zz  score
score 

• Where  is the population mean and 
is the population standard deviation.
4-9 Quick Tip:

• The z-score is the number of


standard deviations the data
value falls above (positive z-
score) or below (negative z-
score) the mean for the data
set.
4-10 Quick Tip:

• The z-score is affected by an


outlying value in the data set, since
the outlier (very small or very large
value relative to the size of the
other values in the data set)
directly affects the value of the
mean and the standard deviation.
4-11
The z-Score -- Example

• Example: What is the z-score


for the value of 14 in the
following sample values?

3 8 6 14 4 12 7 10
4-12
The z-score -- Example (Continued)
• Solution:

• Thus, the data value of 14 is 1.57


standard deviations above the mean of 8,
since the z-score is positive.
4-13 The z-Score – Why do we use the z-
score as a measure of relative position?

Dot Plot of the data points with


the location of the mean and the
data value of 14.
4-14 The z-score
• Observe that the distance between
the mean of 8 and the value of 14 is
1.57s = 5.99  6.
• Observe that if we add the mean of 8
to this value of 6, we will get 8 + 6 =
14, the data value.
• Thus, this shows that the value of 14
is 1.57 standard deviations above the
mean value of 8.
4-15
The z-score

• That is, the z-score gives us


an idea of how far away the data
value is from the mean, and so it
gives us an idea of the position of
the data value relative to the
mean.
4-16
The z-Score -- Example

• Example: What is the z-score for


the value of 95 in the following
sample values?

96 114 100 97 101 102 99


95 90
4-17 The z-Score -- Example (Continued)

• Example: First compute the sample


mean and sample standard deviation.
These values are respectively 99.3333
6.5955. Verify.
• Thus, z-score =
(95 – 99.3333)/6.5955 = -0.6570 
-0.66.
• Thus, the data value of 95 is located
0.66 standard deviation below the
mean value of 99.3333, since the
z-score is negative.
4-18 4-2 Percentiles
• Explanation of the term – percentiles:
Percentiles are numerical values that
divide an ordered data set into 100
groups of values with at most 1% of
the data values in each group.
• When we discuss percentiles, we
generally present the discussion
through the kth percentile.
• Let the kth percentile be denoted by
Pk .
4-19 4-2 Percentiles
• Explanation of the term – kth
percentile: the kth percentile for an
ordered array of numerical data is a
numerical value Pk (say) such that at
most k% of the data values are
smaller than Pk, and at most
(100 – k)% of the data values are
larger than Pk.
• The idea of the kth percentile is
illustrated on the next slide.
4-20 The kth Percentile

Illustration of
Illustration of the
the kkthth percentile
percentile. .
4-21 Quick Tip:

• In order for a percentile to be


determined, the data set first must
be ordered from the smallest to the
largest value.
• There are 99 percentiles in a data
set.
4-22 Display of the 99th Percentile

Illustration of
Illustration of the
the 99
99thth percentile
percentile. .
4-23 Percentile Corresponding to a Given
Data Value

• The percentile corresponding to a given


data value, say x, in a set is obtained
by using the following formula.

Numberof
Number of values belowxx00..55
valuesbelow
Percentile 
Percentile 100
100%%
Numberof
Number of values
valuesin
indata
dataset
set
4-24 Percentile Corresponding to a Given
Data Value

• Example: The shoe sizes, in whole


numbers, for a sample of 12 male
students in a statistics class were as
follows: 13, 11, 10, 13, 11, 10, 8,
12, 9, 9, 8, and 9.
• What is the percentile rank for a
shoe size of 12?
12
4-25 Percentile Corresponding to a Given
Data Value

• Solution: First, we need to arrange


the values from smallest to largest.
• The ordered array is given below: 8,
8, 9, 9, 9, 10, 10, 11, 11, 12, 13,
13.
13
• Observe that the number of values
below the value of 12 is 9.
4-26 Percentile Corresponding to a Given
Data Value

• Solution (continued): The total number


of values in the data set is 12.
12
• Thus, using the formula, the
corresponding percentile is:

Thevalue
The valueof
of12
12
correspondstoto
corresponds
approximatelythe
approximately the
79ththpercentile.
79 percentile.
4-27 Percentile Corresponding to a Given
Data Value
• Example: In the previous example,
what is the percentile rank for a shoe
size of 10 ?
• Recall, the ordered array was: 8, 8,
9, 9, 9, 10, 10, 11, 11, 12, 13, 13.
13
• Observe that the number of values
below the value of 10 is 5.
4-28 Percentile Corresponding to a Given
Data Value

• Solution (continued): Recall, the total number


of values in the data set was 12.
• Thus, using the formula, the corresponding
percentile is:

Thevalue
The valueof
of10
10
correspondstoto
corresponds
approximatelythe
approximately the
46ththpercentile.
46 percentile.
4-29 Procedure for Finding a Data Value
for a Given Percentile

• Assume that we want to determine what data


value falls at some general percentile Pk.
• The following steps will enable you to find a
general percentile Pk for a data set.
• Step 1: Order the data set from smallest
to largest.
• Step 2: Compute the position c of the
percentile. To compute the value of c,
use the following formula:
4-30 Procedure for Finding a Data Value
for a Given Percentile

Step 3.1
••Step 3.1:: If
If cc is
is not
not aa whole
whole number,
number,
round up
round up to
to the
the next
next whole
whole number.
number.
Locate this
•• Locate this position
position in
in the
the ordered
ordered set.
set.
The value
•• The value in
in this
this location
location is
is the
the required
required
percentile.
percentile.
4-31 Procedure for Finding a Data Value
for a Given Percentile

Step 3.2
••Step 3.2:: If
If cc is
is aa whole
whole number,
number, find
find
the average
the average of of the
the values
values in
in the
the cc and
and
c+1 positions
c+1 positions in
in the
the ordered
ordered set.
set.
This average
•• This average value
value will
will be
be the
the required
required
percentile.
percentile.
4-32 Percentile Corresponding to a Given
Data Value
• Example: The data given below represents
the 19 countries with the largest numbers
of total Olympic medals – excluding the
United States, which had 101 medals –
for the 1996 Atlanta games. Find the
65th percentile for the data set.
• 63, 65, 50, 37, 35, 41, 25, 23, 27, 21, 17,
17, 20, 19, 22, 15, 15, 15, 15.
15
4-33 Percentile Corresponding to a Given
Data Value
• Solution: First, we need to arrange
the data set in order. The ordered
set is: .
• 15, 15, 15, 15, 17, 17, 19, 20, 21, 22,
23, 25, 27, 35, 37, 41, 50, 63, 65.
65
• Next, compute the position of the percentile.
• Here n = 19, k = 65.
• Thus, c = (19  65)/100 = 12.35.
• We need to round up to a value 13.
4-34 Percentile Corresponding to a Given
Data Value

• Solution (continued): Thus, the 13th


value in the ordered data set will
correspond to the 65th percentile.
• That is P65 = 27.
27
• Question: Why does a percentile
measure relative position?
4-35 Question: Why does a percentile
measure Relative Position?

Display of the 65th Percentile along with


the data values.
Question: Why does a percentile
4-36 measure Relative Position?
• Referring to the diagram on the previous
page,observe that the value of 27 is such
that at most 65% of the data values are
smaller than 27 and at most 35% of the
values are larger than 27.
•This shows that the percentile value of 27
is a measure of location.
•Thus, the percentile gives us an idea of the
relative position of a value in an ordered
data set.
4-37 Percentile Corresponding to a Given
Data Value
• Example: Find the 25th percentile for
the following data set:
• 6, 12, 18, 12, 13, 8, 13, 11, 10, 16, 13,
11, 10, 10, 2, 14.
14
• Solution: First, we need to arrange the data
set in order. The ordered set is:
• 2, 6, 8, 10, 10, 10, 11, 11, 12, 12, 13,
13, 13, 14, 16, 18.
4-38 Percentile Corresponding to a Given
Data Value
• Solution (continued):
• Next, compute the position of the
percentile.
• Here n = 16, k = 25.
• Thus, c = (16  25)/100 = 4.0.
• Thus, the 25th percentile will be the
average of the values located at the 4th
and 5th positions in the ordered set.
• Thus, P25 = (10 + 100/2 =10.
4-39 Special Percentiles – Deciles and
Quartiles
• Deciles and quartiles are special
percentiles.
• Deciles divide an ordered data set into 10
equal parts.
parts
• Quartiles divide the ordered data set into
4 equal parts.
parts
• We usually denote the deciles by D1, D2,
D3, … , D9.
• We usually denote the quartiles by Q1, Q2,
and Q3.
4-40 Deciles

•• Nine
Nine deciles.
deciles.
••At
At most
most 10%
10% of
of the
the values
values are
are in
in each
each group.
group.
4-41 Quartiles

•• Three
Three quartiles.
quartiles.
••At
At most
most 25%
25% of
of the
the values
values are
are in
in each
each group.
group.
4-42 Quick Tip:

• There are 9 deciles and 3 quartiles.


• Q1 = first quartile = P25
• Q2 = second quartile = P50
• Q3 = third quartile = P75
• D1 = first decile = P10
• D2 = second decile = P20 . . .
• D9 = ninth decile = P90
4-43 Quick Tip:

• P50 = D5 = Q2 = median
• i.e. the 50th percentile, the 5th
decile, and the 2nd quartile, and the
median are all equal to one another.
• Finding deciles and quartiles are
equivalent equivalent to finding the
equivalent percentiles.
4-44
OUTLIERS

• Recall that an outlier is an


extremely small or extremely large
data value when compared with the
rest of the data values.
•The following procedure allows us to
check whether a data value can be
considered as an outlier.
4-45
Procedure to Check for OUTLIERS

• The following steps will allow us to


check whether a given value in a data
set can be classified as an outlier.
•Step 1: Arrange the data in order
from smallest to largest.
•Step 2: Determine the first quartile
Q1 and the third quartile Q3. (Recall
Q1 = P25 and Q3 = P75.
4-46
Procedure to Check for OUTLIERS

• Step 3: Find the interquartile range


(IQR). IQR = Q3 – Q1.

•Step 4: Compute (Q1 – 1.5IQR) and


(Q3 + 1.5IQR).
4-47
Procedure to Check for OUTLIERS

• Step 5: Let x be the data value that is


being checked to determine whether it is an
outlier.
• (a) If the value of x is smaller than
(Q1 – 1.5IQR), then x is classified as an
outlier.
• (b) If the value of x is larger than
(Q3 + 1.5IQR), then x is classified as an
x is an outlier.
4-48
Procedure to Check for OUTLIERS
4-49
• Example: The data below represent the 20
countries with the largest number of total
Olympic medals, including the United States,
which had 101 medals for the 1996 Atlanta
games. Determine whether the number of
medals won by the United States is an outlier
relative to the numbers for the other
countries.
• The data is given on the next slide.
4-50
• Example (continued): Data values – 63, 65,
50, 37, 35, 41, 25, 23, 27, 21, 17, 17,
20, 19, 22, 15, 15, 15, 15, 101.
• Solution: First, we need to arrange the data
set in order. The ordered set is – 15 15
15 15 17 17 19 20 21 22 23 25 27
35 37 41 50 63 65 101.
• Next we need to determine the first and
third quartiles.
• Verify that Q1 = P25 = 17 and Q3 = P75 = 39.
39
4-51
• Example (continued): Thus the IQR = 39 – 17
= 22.
• Now, Q1 – 1.5IQR = 17 – (1.522) = -16.
• and, Q3 + 1.5IQR = 39 + (1.522) = 72.
• Since, 101 > 72, the value of 101 is an
outlier relative to the rest of the values in
the data set (based on the procedure
presented here).
• That is, the number of medals won by the
United States is an outlier relative to the
numbers won by the other 19 countries for
the 1996 Atlanta Olympic Games.
Pictorial Representation for the
4-52
OUTLIER of the Number of Olympic
Medals Won by the United States in
1996 Atlanta Games.

101
101 OUTLIER

-16
-16 +72
+72
4-53 BOX PLOTS

• Explanation of the term – box


plot: A box plot is a graphical
display that involves a five-
number summary of a distribution
of values, consisting of the
minimum value,
value the first quartile,
quartile
the median,
median the third quartile,
quartile
and the maximum value.
value
4-54 BOX PLOTS

• A horizontal box-plot is constructed


by drawing a box between the
quartiles Q1 and Q3.
• Horizontal lines are then drawn from
the middle of the sides of the box to
the minimum and maximum values.
4-55 BOX PLOTS

• These horizontal lines are called


whiskers.
• A vertical line inside the box marks
the median.
• Outliers are usually indicated by a dot
or an asterisk.
Example of a Box Plot for the Olympic
4-56 (1996) Medal Count Data
4-57 Information That Can Be Obtained From
a Box Plot
4-58 Information That Can Be Obtained From
a Box Plot – Looking at the Median

• If the median is close to the center of


the box, the distribution of the data values
will be approximately symmetrical.
•If the median is to the left of the center
of the box, the distribution of the data
values will be positively skewed.
•If the median is to the right of the center
of the box, the distribution of the data
values will be negatively skewed.
4-59 Information That Can Be Obtained From a Box
Plot – Looking at the Length of the Whiskers

• If the whiskers are approximately the


same length, the distribution of the data
values will be approximately symmetrical.
•If the right whisker is longer than the left
whisker, the distribution of the data values
will be positively skewed.
•If the left whisker is longer than the right
whisker, the distribution of the data values
will be negatively skewed.
4-60
Box Plot Displaying Positive Skewness
4-61 Box Plot Displaying a Symmetrical
Distribution
4-62 Box Plot Displaying a Negative Skewness

You might also like