You are on page 1of 5

Chapter 4: Statistics Measures of Dispersion

Consider the table below.


Measure of Central Tendency
Machine A Machine B
The Arithmetic Mean
The mean of n numbers is the sum of the numbers divided by n. 9.52 7.95
∑𝑥 5.85 8.03 8 8
𝑀𝑒𝑎𝑛 = 8.15 8.02 8.01 8.15
𝑛
The Median 10.07 8.01 0.0316228 1.8552897
The median of a ranked list of n numbers is: 6.41 7.99 0.001 3.4421
 the middle number if n is odd.
 the mean of the two middle numbers if n is even.
The Range
The range of a set of data values is the difference between the greatest
The Mode
data value and the least data value.
The mode of a list of numbers is the number that occurs most frequently.

The Standard Deviation


The Weighted Mean
If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 is a population of n numbers with mean 𝜇, then the
The weighted mean of the n numbers 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 with the
∑(𝑥−𝜇)2
respective assigned weights 𝑤1 , 𝑤2 , 𝑤3 , … , 𝑤𝑛 is standard deviation of the population is 𝜎 = √ .
𝑛
∑(𝑥 ∙ 𝑤) If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 is a sample of n numbers with mean 𝑥̅ , then the
𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑚𝑒𝑎𝑛 =
∑𝑤 ∑(𝑥−𝑥̅ )2
where ∑(𝑥 ∙ 𝑤) is the sum of the products formed by multiplying each number standard deviation of the sample is 𝑠 = √ 𝑛−1
.
by its assigned weight, and ∑ 𝑤 is the sum of all the weights.
Procedure for Computing a Standard Deviation
Illustration: 1. Determine the mean of the n numbers.
Scores on MMW 2. For each number, calculate the deviation (difference) between the number
Frequency
first long exam and the mean numbers.
3. Calculate the square of each deviation and find the sum of the squared
7 2
deviation.
9 4
4. If the data is a population, then divide the sum by n. If the data is a
10 6
sample, then divide the sum by n – 1.
11 6
5. Find the square root of the quotient in Step 4.
14 8
15 7
Illustration:
17 4
Find the standard deviation of the sample scores in the MMW
18 3
first long exam. The sample scores are: 7, 9, 11, 15, 19
19 2
20 1
Challenge: Box-and-Whisker Plots
A consumer testing agency has tested the lifespans of 3 lightbulbs. Which of A box-and-whisker plot (also known as a box plot) is often used to provide a
the 3 lightbulbs is more consistent? visual summary of a given set of data. Its shows the median, the first and the
third quartiles, and the minimum and maximum values of a data set.
lightbulbs lifespan (in hours) Median Q2

X 127 148 164 97 128 109 137 Minimum Q1 Box Q3 Maximum

Y 121 138 131 134 139 112 135


Z 141 151 114 108 149 125 122
Whisker
Measures of Relative Position

Construction of a Box-and-Whisker Plot


z-Score or standard score
1. Draw a horizontal scale that extends from the minimum data value to the
The z-score for a given data value 𝑥 is the number of standard deviations that
maximum data value.
𝑥 is above or below the mean of the data. 2. Above the scale, draw a rectangle (box) with its left side at Q 1, and its right side at
𝑥− 𝜇 𝑥− 𝑥̅ Q3.
Population: 𝑧𝑥 = Sample: 𝑧𝑥 =
𝜎 𝑠 3. Draw a vertical line segment across the rectangle at the median, Q 2.
4. Draw a horizontal line segment, called a whisker, that extends from Q1 to the
Percentiles minimum and another whisker that extends from Q 3 to the maximum.
A value 𝑥 is called the 𝒑th percentile of a data set provided 𝑝% of the data
values are less than 𝑥. Illustration:
Construct a box-and-whisker plot for the following data.
Illustration:
Number of Rooms Occupied in a Hotel during an 18-day period
The median annual salary for a police is PhP 44,528.00. If the 25th percentile
86 77 58 45 94 96 83 76 75
for the annual salary is PhP 32,761.00, find the percent of the police force whose
65 68 72 78 85 87 92 55 61
annual salaries were
a. less than PhP 44,528.00
Normal Distributions
b. more than PhP 32,761.00
Download Time Number of Percent of
c. between PhP 32,761.00 and PhP 44,528.00. (in seconds) Subscribers Subscribers
0 - 5 6 0.6
Percentile for a Given Data Value 5 - 10 17 1.7
10 - 15 43 4.3
Given a set of data and a data value 𝑥, 15 - 20 92 9.2
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑥
Percentile of score 𝑥 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠
∙ 100 20 - 25 151 15.1
25 - 30 192 19.2
Illustration: 30 - 35 190 19
In the MMW examination given to 450 students, Angela’s score of 57 was 35 - 40 149 14.9
higher than the scores of 360 of the students who took the examination. What is 40 - 45 90 9.0
45 - 50 45 4.5
the percentile for Angela’s score? 50 - 55 15 1.5
55 - 60 10 1.0
200 Example
A vegetable distributor knows that during the month of August, the weights

Number of Subscribers
150 of its tomatoes are normally distributed with a mean of 0.61 lb and a standard
deviation of 0.15 lb.
100 a. What percent of the tomatoes weigh less than 0.76 lb?
b. In a shipment of 6000 tomatoes, how many tomatoes can be expected to
50
weigh more than 0.31 lb?
c. in a shipment of 4500 tomatoes, how many tomatoes can be expected to
0
weigh from 0.31 lb to 0.91 lb?
10 20 30 40 50 60
Download Time (in seconds)
The table (minus the 3rd column) is called a grouped frequency distribution. The Standard Normal Distribution
For the 15-20, 15 is the lower class boundary and 20 is the upper class The standard normal distribution is the normal distribution that has a mean
boundary. The graph of a frequency distribution is called a histogram. The table, of 0 and a standard deviation of 1.
with the 3rd column, is a relative frequency distribution It is often helpful to convert data values 𝑥 to 𝑧-scores using the 𝑧-score
formulas:
𝑥− 𝜇 𝑥− 𝑥̅
Use the relative frequency distribution to determine the 𝑧𝑥 = 𝑧𝑥 =
𝜎 𝑠
a. percent of subscribers who required at least 25s to download the file.
b. probability that a subscriber chosen at random will require at least 5s Example
but less than 20s to download the file. 1. Find the area of the standard normal distribution between 𝑧 = −1.44
and 𝑧 = 0.
Properties of a Normal Distribution 2. Find the area of the standard normal distribution to the right of 𝑧 = 0.82.
Every normal distribution has the following properties. The Standard Normal Distribution, Areas, Percentages, and Probabilities
 The graph is symmetric about a vertical line through the mean of the In the standard normal distribution, the area of the distribution from 𝑧 = 𝑎
distribution. to 𝑧 = 𝑏 represents
 The mean, median, and mode are equal.  the percentages of 𝑧-values that lie in the interval from 𝑎 to 𝑏
 The y-value of each point on the curve is the percent (expressed as a  the probability that 𝑧 lies in the interval from 𝑎 to 𝑏
decimal) of the data at the corresponding x-value.
 Areas under the curve that are symmetric about the mean are equal. Illustration
 The total area under the curve is 1. A soda machine dispenses soda into 12-ounce cups. Tests show that the
actual amount of soda dispensed is normally distributed, with a mean of 11.5 oz
Empirical Rule for a Normal Distribution and a standard deviation of 0.2 oz.
In a normal distribution, approximately a. What percent of cups will receive less than 11.25 oz of soda?
 68% of the data lie within 1 standard deviation of the mean. b. What percent of cups will receive between 11.2 oz and 11.55 oz of soda?
 95% of the data lie within 2 standard deviation of the mean. c. If a cup is filled at random , what is the probability that the machine will
 99.7% of the data lie within 3 standard deviation of the mean overflow the cup?
Challenges: Draw a scatter diagram or scatter plot
1. A highway study of 8000 vehicles that passed by a checkpoint found that
their speeds were normally distributed, with a mean of 61 mph and a The Least-Square Regression Line (line of best fit)
standard deviation of 7 mph. The least-square regression line for a set of bivariate data is the line that
minimizes the sum of the squares of the vertical deviations from each data point
a. How many of the vehicles had a speed of more than 68 mph?
to the line.
b. How many of the vehicles had a speed of less than 40 mph?
2. Find the area, to the nearest thousandth, of the indicated region of the
The formula for the Least-Squares Line
standard normal distribution where:
The equation of the least-square line for the n-ordered pairs
a. 𝑧 < −2.22 c. 𝑧 is between 𝑧 = 1 and 𝑧 = 1.9
b. 𝑧 > −1.45 d. 𝑧 is between 𝑧 = −1.47 and 𝑧 = 1.64 (𝑥1, 𝑦1 ), (𝑥2, 𝑦2 ), (𝑥3, 𝑦3 ), . . . , (𝑥𝑛, 𝑦𝑛 )
3. A biologist measured the lengths of hundreds of cuckoo bird eggs. Use the Is 𝑦̂ = 𝑎𝑥 + 𝑏, where
𝑛 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
relative frequency distribution table below to answer the questions that follow. 𝑎= 2 and 𝑏 = 𝑦̅ − 𝑎𝑥̅
𝑛 ∑ 𝑥 2 −(∑ 𝑥)

Length (in mm) Percent of eggs


Illustration: Old Faithful
18.75 - 19.75 0.8
19.75 - 20.75 4.0 a. What percent of the group of eggs was less
20.75 - 21.75 17.3 than 21.75 mm long? Linear Correlation Coefficient
21.75 - 22.75 37.9 To determine the strength of a linear relationship between two variables, a
22.75 - 23.75 28.5 b. What is the probability that one of the
statistic called the linear correlation coefficient is used.
23.75 - 24.75 10.7 eggs selected at random was at least 20.75
24.75 - 25.75 0.8 mm long but less than 24.75 mm long?
Linear Correlation Coefficient
For the ordered pairs (𝑥1, 𝑦1 ), (𝑥2, 𝑦2 ), (𝑥3, 𝑦3 ), . . . , (𝑥𝑛, 𝑦𝑛 ), the linear
Linear Regression and Correlation correlation coefficient 𝑟 is given by

Linear Regression 𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦)


𝑟=
√𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2 ∙ √𝑛(∑ 𝑦 2 ) − (∑ 𝑦)2
Consider the table below showing the time between two eruptions and the
duration of the second eruption for 10 eruptions of the Old Faithful, a cone
If 𝑟 is positive, the relationship between the variables has a positive
geyser located in Yellowstone National Park in Wyoming, United States. It is a
correlation. In this case, if one variable increases, the other variable tends to
highly predictable geothermal feature, and has erupted every 44 to125 minutes
increase. If 𝑟 is negative, the linear relationship between the variables has a
since 2000.
negative correlation. In this case, if one variable increases, the other variable
tends to decrease.
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds),x Illustration
MMW test scores vs. Hours Spent on Cellphones
Duration of eruption
89 79 83 82 81 85 78 81 85 79 Hours per week spent on CPs 4 5 7 9 10
(in seconds), y
MMW Score 89 79 83 82 81
Challenge
1. Given the bivariate data:
x 3 5 6 4 7
y 2 3 5 3 5
a. Draw a scatter diagram for the data.
b. Find 𝑛, ∑ 𝑥, ∑ 𝑦, ∑ 𝑥 2 , (∑ 𝑥)2 , ∑ 𝑥𝑦 .
c. Find 𝑎, the slope of the least-square line, and 𝑏, the y-intercept of
the least-square line.
d. Draw the least-square line on the scatter diagram from part a.
e. Is the point (𝑥̅ , 𝑦̅) on the least-square line?
f. Use the equation of the least-square line to predict the value of 𝑦
when 𝑥 = 7.3
g. Find, to the nearest hundredth, the linear correlation coefficient.

You might also like