You are on page 1of 7

Pepito, Riezel V.

BSA 4A

Introductory of Statistics
(Introduction of Statistics)

Chapter III. Averages and Variation

Objectives for this lesson:


 Understand the reason that center and variation are important in describing a
distribution or data
 Grasp the different ways to describe the center of a distribution or dataset: mode,
median, mean, trimmed mean, and weighted mean
 Example: mean heights of males in US (5’9”) versus Netherlands (6’0”)

Mode
 Mode: for discrete data, the mode is the value that occurs the most
o Example 1: 1,1,2,2,2,3,4,5,6,6
o Example 2: -1,-1,0,0,0,1,2,3,3,4,4,4
o Example 3: 5,6,8,10,12,,15,20
o Example 4:8,8,9,9,10,10,11,11,12,12
o For continuous data, it is (are) the peak(s) of the distribution

Advantages:
1. Easy to find
2. Not sensitive to extreme of central tendency for categorical data
3. Only measure of central tendency for categorical data

Disadvantage: only uses some of the data

Mean
The average value. For discrete data sum of all values
number of values

Sample statistics, Population parameter,

Where
 n is the sample size
 N is the population size

Example 1: 1,1,1,2,2,3,3,4,5,5
Example 2: 1,1,1,2,2,3,3,4,5,100

Advantages:
1. Every data value is used
2. Reliable: means of samples from the same population do not vary much (relatively
speaking)
Disadvantage:
1. Sensitive to extreme values

Trimmed Mean

We trim k% from both “ends” of the data: remove extreme values.


Procedure:
1. Put data in order from smallest to largest
2. Calculate how many values make up k% : n
3. Discard the number of values from (2) from the top and the bottom of the data
4. Calculate the mean on the remaining values, xt

Example: Calculate a 5% trimmed mean 1,1,2,3,4,4,5,5,5,5,6,6,6,6,7,7,8,9,10,18

Weighted Mean

Gives more weight or importance to some values: like grades

Example: you want to know your grade in statistics before the final exam. You currently have
a homework (20%) grade of 92, three test grades (12%each) of 100, 85,96, and a
participation grade (20%) of 98.
Total

X 92 100% 85 96 98

W 20% 12% 12% 12% 12% 76%

XW 18.4 12.0 10.2 11.52 19.6 71.72

Range
 Range: the overall spread of the data between the minimum and maximum values

R= max – min

 Example 1: -1,-1,0,0,0,1,2,3,3,4,4,4

 Example 2: 5,6,8,10,12,15,100

Advantage:
o Easy to find
o Does not provide information about the shape

Standard Deviation
The standard deviation measures the variation of all values from the mean.
Advantages:
o Uses all values
o Same units as the data

Disadvantages:
o Difficult to calculate
o Sensitive to extreme values

Note: the variance is the square of the standard deviation


Coefficient of Variation (CV)

The coefficient of variation (CV) is a measure relative variation. We use it to compare the
variation in two or more samples or population.

Note: It is always better to have less variation.


Chebyshev’s Theorem

 In chapter 7, we learn how to determine the precise proportion of data that lie within a
certain number of standard deviations on either side of the mean, when dealing with a
bell-shaped/symmetrical distribution called the normal distribution.
 What if the distribution is skewed, symmetric, or another shape?
 Answer: we can use Chebyshev’s Theorem to determine the minimum proportion of
data (or the population) that must lie within more (greater) than 1 standard deviation
to either side of the mean.

 Chebyshev’s Theorem applies to any, distribution as long as the mean and standard
deviation are defined (finite).
 Tells us the minimum proportion of data (or the population) that falls within k
standard deviations above the mean.
o This implies that a maximum of 11.1% of data fall beyond 3 standard
deviations of the mean.
o Such values might be suspect outliers, particularly for a mound-shaped
symmetric distribution.

Example 2: We want to know the minimum proportion of the population of male college
students that have heights between 5’4” and 6’2” tall, if the mean height is 5’9” and the
standard deviation is 2.5”.
Example 3: We want to know the central values that have a minimum of 88.9% of the
heights of college males between them if the mean height is 5’9”, and the standard deviation
is 2.5”.

Percentiles, Quartiles, 5 # Summary

Percentiles (pth percentile)

The Pth percentile (1< P < 99) of a distribution is a value such that P% of the data fall below
it and (100-P)% of the data fall above it.

Example 1:
If you are given in the 89th percentile of math scores, what % of students have scores
a) Below yours? 89%
b) Above yours? (100-89)% = 11%

Why is there no 100th percentile?

Quartiles

There are three quartiles which split the data into four parts. The first quartile (Q1)
corresponds to the 25th percentile. The second quartile (Q2) corresponds to the 50th
percentile and is hence also known as the median. The third quartile (Q3) corresponds to the
75th percentile.

Example: Given the data set {2, 6, 8, 9, 12, 13, 18, 20, 22, 23, 49}. Identify the first, second,
and third quartiles.
Solution:

Step 1: Find median → 13 is the median because there are five data points to the right and
left of 13, and thus it splits the data 50/50.

Step 2: Group the data left and right of the median → {2, 6, 8, 9, 12, 13, 18, 20, 22, 23, 49}

Step 3: Repeat the process of step 1 for each group →

Group 1: {2, 6, 8, 9, 12} → 8 is the first quartile because there are 2 data points on either side
of 8, and thus it divides the first half of data in half.

Group 2: {18, 20, 22, 23, 49} → 22 is the third quartile because there are 2 data points on
either side of 22, and thus it divides the second half of data in half.

Final Answer: Q1 = 8, Q2 = 13, Q3 = 22

Q1 = 25th percentile
Q2 = 50th percentile  what else is this? median
Q3 = 75th percentile

1. Put data in order from smallest to longest


2. Find the median (Q2)
3. Find the median of the values below (not equal to) the median – Q1
4. Find the median of the values above (not equal to) the median – Q3

The Five Number Summary

The Five-Number Summary = (Minimum, Q1, Median, Q3, Maximum)

Box & Whisker Plots (Boxplots)


 A useful technique from explanatory data analysis for describing data

Procedure
1. Draw a scale horizontal scale
2. Above the scale draw a box from Q1 to Q3 (height of box can vary)
3. Draw a solid vertical line from the top to the bottom of the box at Q2
4. Draw horizontal lines (whiskers) from the left end of the box (Q1) to the minimum
(lowest) value (located vertically near the center of the box) and from the right end of the box
(Q2) to the maximum (highest) value.

Shape of Distribution
!. Symmetric distribution if the line for Q2 is Approximately at the center of the box, the
distribution is symmetric
2. Skewed to the left: the line is closer to Q2: left (horizontal) or lower (vertical) side of box
bigger.

3. Skewed to the left: the line is closer to Q1: right side (horizontal) on upper side (vertical) is
bigger.

You might also like