Professional Documents
Culture Documents
BSA 4A
Introductory of Statistics
(Introduction of Statistics)
Mode
Mode: for discrete data, the mode is the value that occurs the most
o Example 1: 1,1,2,2,2,3,4,5,6,6
o Example 2: -1,-1,0,0,0,1,2,3,3,4,4,4
o Example 3: 5,6,8,10,12,,15,20
o Example 4:8,8,9,9,10,10,11,11,12,12
o For continuous data, it is (are) the peak(s) of the distribution
Advantages:
1. Easy to find
2. Not sensitive to extreme of central tendency for categorical data
3. Only measure of central tendency for categorical data
Mean
The average value. For discrete data sum of all values
number of values
Where
n is the sample size
N is the population size
Example 1: 1,1,1,2,2,3,3,4,5,5
Example 2: 1,1,1,2,2,3,3,4,5,100
Advantages:
1. Every data value is used
2. Reliable: means of samples from the same population do not vary much (relatively
speaking)
Disadvantage:
1. Sensitive to extreme values
Trimmed Mean
Weighted Mean
Example: you want to know your grade in statistics before the final exam. You currently have
a homework (20%) grade of 92, three test grades (12%each) of 100, 85,96, and a
participation grade (20%) of 98.
Total
X 92 100% 85 96 98
Range
Range: the overall spread of the data between the minimum and maximum values
R= max – min
Example 1: -1,-1,0,0,0,1,2,3,3,4,4,4
Example 2: 5,6,8,10,12,15,100
Advantage:
o Easy to find
o Does not provide information about the shape
Standard Deviation
The standard deviation measures the variation of all values from the mean.
Advantages:
o Uses all values
o Same units as the data
Disadvantages:
o Difficult to calculate
o Sensitive to extreme values
The coefficient of variation (CV) is a measure relative variation. We use it to compare the
variation in two or more samples or population.
In chapter 7, we learn how to determine the precise proportion of data that lie within a
certain number of standard deviations on either side of the mean, when dealing with a
bell-shaped/symmetrical distribution called the normal distribution.
What if the distribution is skewed, symmetric, or another shape?
Answer: we can use Chebyshev’s Theorem to determine the minimum proportion of
data (or the population) that must lie within more (greater) than 1 standard deviation
to either side of the mean.
Chebyshev’s Theorem applies to any, distribution as long as the mean and standard
deviation are defined (finite).
Tells us the minimum proportion of data (or the population) that falls within k
standard deviations above the mean.
o This implies that a maximum of 11.1% of data fall beyond 3 standard
deviations of the mean.
o Such values might be suspect outliers, particularly for a mound-shaped
symmetric distribution.
Example 2: We want to know the minimum proportion of the population of male college
students that have heights between 5’4” and 6’2” tall, if the mean height is 5’9” and the
standard deviation is 2.5”.
Example 3: We want to know the central values that have a minimum of 88.9% of the
heights of college males between them if the mean height is 5’9”, and the standard deviation
is 2.5”.
The Pth percentile (1< P < 99) of a distribution is a value such that P% of the data fall below
it and (100-P)% of the data fall above it.
Example 1:
If you are given in the 89th percentile of math scores, what % of students have scores
a) Below yours? 89%
b) Above yours? (100-89)% = 11%
Quartiles
There are three quartiles which split the data into four parts. The first quartile (Q1)
corresponds to the 25th percentile. The second quartile (Q2) corresponds to the 50th
percentile and is hence also known as the median. The third quartile (Q3) corresponds to the
75th percentile.
Example: Given the data set {2, 6, 8, 9, 12, 13, 18, 20, 22, 23, 49}. Identify the first, second,
and third quartiles.
Solution:
Step 1: Find median → 13 is the median because there are five data points to the right and
left of 13, and thus it splits the data 50/50.
Step 2: Group the data left and right of the median → {2, 6, 8, 9, 12, 13, 18, 20, 22, 23, 49}
Group 1: {2, 6, 8, 9, 12} → 8 is the first quartile because there are 2 data points on either side
of 8, and thus it divides the first half of data in half.
Group 2: {18, 20, 22, 23, 49} → 22 is the third quartile because there are 2 data points on
either side of 22, and thus it divides the second half of data in half.
Q1 = 25th percentile
Q2 = 50th percentile what else is this? median
Q3 = 75th percentile
Procedure
1. Draw a scale horizontal scale
2. Above the scale draw a box from Q1 to Q3 (height of box can vary)
3. Draw a solid vertical line from the top to the bottom of the box at Q2
4. Draw horizontal lines (whiskers) from the left end of the box (Q1) to the minimum
(lowest) value (located vertically near the center of the box) and from the right end of the box
(Q2) to the maximum (highest) value.
Shape of Distribution
!. Symmetric distribution if the line for Q2 is Approximately at the center of the box, the
distribution is symmetric
2. Skewed to the left: the line is closer to Q2: left (horizontal) or lower (vertical) side of box
bigger.
3. Skewed to the left: the line is closer to Q1: right side (horizontal) on upper side (vertical) is
bigger.