Introductory of Statistics - Chapter 3

Pepito, Riezel V.
BSA 4A
Introductory of Statistics
(Introduction of Statistics)
Chapter III. Averages and Variation
Objectives for this lesson:

 Understand the reason that center and variation are important in describing a
distribution or data
 Grasp the different ways to describe the center of a distribution or dataset: mode,
median, mean, trimmed mean, and weighted mean
 Example: mean heights of males in US (5’9”) versus Netherlands (6’0”)
Mode
 Mode: for discrete data, the mode is the value that occurs the most
o Example 1: 1,1,2,2,2,3,4,5,6,6
o Example 2: -1,-1,0,0,0,1,2,3,3,4,4,4
o Example 3: 5,6,8,10,12,,15,20
o Example 4:8,8,9,9,10,10,11,11,12,12
o For continuous data, it is (are) the peak(s) of the distribution
Advantages:
1. Easy to find
2. Not sensitive to extreme of central tendency for categorical data
3. Only measure of central tendency for categorical data
Disadvantage: only uses some of the data
Mean
The average value. For discrete data sum of all values
number of values
Sample statistics, Population parameter,
Where
 n is the sample size
 N is the population size
Example 1: 1,1,1,2,2,3,3,4,5,5
Example 2: 1,1,1,2,2,3,3,4,5,100
Advantages:
1. Every data value is used
2. Reliable: means of samples from the same population do not vary much (relatively
speaking)
Disadvantage:
1. Sensitive to extreme values
Trimmed Mean
We trim k% from both “ends” of the data: remove extreme values.

Procedure:
1. Put data in order from smallest to largest
2. Calculate how many values make up k% : n
3. Discard the number of values from (2) from the top and the bottom of the data
4. Calculate the mean on the remaining values, xt
Example: Calculate a 5% trimmed mean 1,1,2,3,4,4,5,5,5,5,6,6,6,6,7,7,8,9,10,18
Weighted Mean
Gives more weight or importance to some values: like grades
Example: you want to know your grade in statistics before the final exam. You currently have
a homework (20%) grade of 92, three test grades (12%each) of 100, 85,96, and a
participation grade (20%) of 98.
Total
X 92 100% 85 96 98
W 20% 12% 12% 12% 12% 76%
XW 18.4 12.0 10.2 11.52 19.6 71.72
Range
 Range: the overall spread of the data between the minimum and maximum values
R= max – min
 Example 1: -1,-1,0,0,0,1,2,3,3,4,4,4
 Example 2: 5,6,8,10,12,15,100
Advantage:
o Easy to find
o Does not provide information about the shape
Standard Deviation
The standard deviation measures the variation of all values from the mean.
Advantages:
o Uses all values
o Same units as the data
Disadvantages:
o Difficult to calculate
o Sensitive to extreme values
Note: the variance is the square of the standard deviation

Coefficient of Variation (CV)
The coefficient of variation (CV) is a measure relative variation. We use it to compare the
variation in two or more samples or population.
Note: It is always better to have less variation.

Chebyshev’s Theorem
 In chapter 7, we learn how to determine the precise proportion of data that lie within a
certain number of standard deviations on either side of the mean, when dealing with a
bell-shaped/symmetrical distribution called the normal distribution.
 What if the distribution is skewed, symmetric, or another shape?
 Answer: we can use Chebyshev’s Theorem to determine the minimum proportion of
data (or the population) that must lie within more (greater) than 1 standard deviation
to either side of the mean.
 Chebyshev’s Theorem applies to any, distribution as long as the mean and standard
deviation are defined (finite).
 Tells us the minimum proportion of data (or the population) that falls within k
standard deviations above the mean.
o This implies that a maximum of 11.1% of data fall beyond 3 standard
deviations of the mean.
o Such values might be suspect outliers, particularly for a mound-shaped
symmetric distribution.
Example 2: We want to know the minimum proportion of the population of male college
students that have heights between 5’4” and 6’2” tall, if the mean height is 5’9” and the
standard deviation is 2.5”.
Example 3: We want to know the central values that have a minimum of 88.9% of the
heights of college males between them if the mean height is 5’9”, and the standard deviation
is 2.5”.
Percentiles, Quartiles, 5 # Summary
Percentiles (pth percentile)
The Pth percentile (1< P < 99) of a distribution is a value such that P% of the data fall below
it and (100-P)% of the data fall above it.
Example 1:
If you are given in the 89th percentile of math scores, what % of students have scores
a) Below yours? 89%
b) Above yours? (100-89)% = 11%
Why is there no 100th percentile?
Quartiles
There are three quartiles which split the data into four parts. The first quartile (Q1)
corresponds to the 25th percentile. The second quartile (Q2) corresponds to the 50th
percentile and is hence also known as the median. The third quartile (Q3) corresponds to the
75th percentile.
Example: Given the data set {2, 6, 8, 9, 12, 13, 18, 20, 22, 23, 49}. Identify the first, second,
and third quartiles.
Solution:
Step 1: Find median → 13 is the median because there are five data points to the right and
left of 13, and thus it splits the data 50/50.
Step 2: Group the data left and right of the median → {2, 6, 8, 9, 12, 13, 18, 20, 22, 23, 49}
Step 3: Repeat the process of step 1 for each group →
Group 1: {2, 6, 8, 9, 12} → 8 is the first quartile because there are 2 data points on either side
of 8, and thus it divides the first half of data in half.
Group 2: {18, 20, 22, 23, 49} → 22 is the third quartile because there are 2 data points on
either side of 22, and thus it divides the second half of data in half.
Final Answer: Q1 = 8, Q2 = 13, Q3 = 22
Q1 = 25th percentile
Q2 = 50th percentile  what else is this? median
Q3 = 75th percentile
1. Put data in order from smallest to longest

2. Find the median (Q2)
3. Find the median of the values below (not equal to) the median – Q1
4. Find the median of the values above (not equal to) the median – Q3
The Five Number Summary
The Five-Number Summary = (Minimum, Q1, Median, Q3, Maximum)
Box & Whisker Plots (Boxplots)

 A useful technique from explanatory data analysis for describing data
Procedure
1. Draw a scale horizontal scale
2. Above the scale draw a box from Q1 to Q3 (height of box can vary)
3. Draw a solid vertical line from the top to the bottom of the box at Q2
4. Draw horizontal lines (whiskers) from the left end of the box (Q1) to the minimum
(lowest) value (located vertically near the center of the box) and from the right end of the box
(Q2) to the maximum (highest) value.
Shape of Distribution
!. Symmetric distribution if the line for Q2 is Approximately at the center of the box, the
distribution is symmetric
2. Skewed to the left: the line is closer to Q2: left (horizontal) or lower (vertical) side of box
bigger.
3. Skewed to the left: the line is closer to Q1: right side (horizontal) on upper side (vertical) is
bigger.

Introductory of Statistics - Chapter 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introductory of Statistics - Chapter 3

Uploaded by

Copyright:

Available Formats

Pepito, Riezel V.

Chapter III. Averages and Variation

Objectives for this lesson:

Disadvantage: only uses some of the data

Sample statistics, Population parameter,

We trim k% from both “ends” of the data: remove extreme values.

Example: Calculate a 5% trimmed mean 1,1,2,3,4,4,5,5,5,5,6,6,6,6,7,7,8,9,10,18

Gives more weight or importance to some values: like grades

W 20% 12% 12% 12% 12% 76%

XW 18.4 12.0 10.2 11.52 19.6 71.72

Note: the variance is the square of the standard deviation

Note: It is always better to have less variation.

Percentiles, Quartiles, 5 # Summary

Percentiles (pth percentile)

Why is there no 100th percentile?

Step 3: Repeat the process of step 1 for each group →

Final Answer: Q1 = 8, Q2 = 13, Q3 = 22

1. Put data in order from smallest to longest

The Five Number Summary

The Five-Number Summary = (Minimum, Q1, Median, Q3, Maximum)

Box & Whisker Plots (Boxplots)

You might also like