Professional Documents
Culture Documents
0 10-July-2020
MODULE 4
MODULE OVERVIEW
This module consists of five lessons: Measure of Central Tendency, Measures of Dispersion, Measures of
Relative Position, Normal Distribution, and Regression and Correlation. Each lesson was designed as a self-
teaching guide. Definitions of terms and examples had been incorporated. Answering the problems in ―your
turn‖ will check your progress. You may compare your answers to the solutions provided at the later part of
this module in that way you will be able to measure your achievement as well as the effectiveness of the
module. Exercises were prepared as your assignment to measure your understanding about the topics.
Discussion
Statistics involves the collection, organization, summarization, presentation, and interpretation of data.
The branch of mathematics that involves the collection of organization, summarization, and presentation of
data is called descriptive statistics. The branch that interprets, and draws conclusions from the data is
called inferential statistics.
Arithmetic Mean
The arithmetic mean or just simply mean is the sum of the value of each observation in a data set
divided by the number of observations. The traditional symbol used to indicate a summation is the Greek
letter , . Thus, the notation , called summation notation, denotes the sum of all numbers in a given
set .
The definition is the same for both the sample (portion of the whole population) and population (is a
collection of all possible observations under a particular study), although we use different symbol to refer to
each.
The symbol for the sample mean is bar ̅ , and for the population mean is the Greek letter mu (µ).
Mean
The mean of 𝒏 is the sum of the numbers divided by the total number of observations.
∑𝒙
Mean 𝒏
The mean score of a sample ̅ , or any other measure based on a sample data is called statistic. Any
measurable characteristic of a population is called parameter. The mean of a population , is a parameter.
Example 1
. Six friends in a Mathematics in the Modern World class of 20 students received
test grades of 92, 84, 65, 76, 88, and 90.
a. Find the mean of these test scores.
b. Is the mean computed, a statistic or a parameter? Why?
Solution
a. The six friends are sample of the population of 20 students. Use ̅ instead of to represent the mean.
∑
̅
The daily wages of 10 employees of Home depot are: 500, 750 , 430, 630,
Your turn 1 450, 440, 700, 350, 80, 630.
Median
The median is the middle number of the mean of the two middle numbers in a list of numbers that
have been arrange in numerical order from smallest to largest or largest to smallest. Any list of numbers
arranged in numerical order from smallest to largest or largest to smallest is a ranked list.
Median
The median of a ranked list of 𝑛 numbers is:
The middle number if 𝑛 is odd
The mean of two middle numbers if 𝑛 is even
Solution
a. The list contains 7 numbers. The median of a list with an odd number of entries is found by
ranking the numbers and finding the middle number.
Ranking the numbers from smallest to largest gives
b. The list 46, 23, 92, 89, 77, 108 contains 6 numbers. The median of the list of data with an even number of
entries is found by ranking the numbers and computing the mean of the two middle numbers. Ranking the
numbers from smallest to largest gives
a. A sample of senior citizens in Lingayen, Pangasinan receiving Social Security payments revealed these
monthly benefits : , , , , , , , .
b. The scores in a quiz of nine students in MMW class are: 2, 4, 10, 7, 8, 0,5, 8, and 2.
Mode
The mode is another measure of type of average.
Mode
This is a value of the observation that appears most frequently.
Some lists of numbers do not have a mode. For instance, 1, 6, 8, 10, 32, 15, 49, each of number
occurs exactly once. No number occurs more often than the other numbers. Thus, there is no mode.
A list of numerical data can have more than one mode. For instance, in the list 4, 2 6, 2, 7, 9, 2, 4, 9,
8, 9, 7, the numbers 2 and 9 occurs three times. Thus, 2 and 9 are both modes of the data .
Solution
a. In the list 18, 15, 21, 16, 15, 14, 15, 21, the number 15 occurs more often that the other numbers. Thus 15
is the mode.
b. Each of the number in the list 2, 5, 8, 9, 11, 4, 7, 23 occurs only once. No number occurs more often than
others. Therefore, there is no mode.
Your turn 3 Find the mode of the data in the following lists.
The mean, median, and mode are all averages. However, they are generally not equal. The mean of
a set of data is most sensitive of the averages. A change of the numbers changes the mean, and the mean
can be changed drastically by changing an extreme value.
In contrast, the median and the mode of a set of data are usually not changed by changing an
extreme value.
When a data set has one or more extreme values that are very different from the majority of values,
the mean will not necessarily be a good indicator of an average value. In the following example, we compare
the mean, median , and the mode for the salaries of five employees of a small company.
Salaries :
The mean is
The median is and the mode is . The data contain one extreme value that is much larger
than the others. This extreme value makes the mean considerably larger than the median. So, you would
probably agree that better represents the average of the salaries than does either mean or the mode.
Computer Solution
We can use spreadsheet to find the mean, media, and the mode of a certain data set. Consider the
following satisfaction level ratings of 35 people.
9 12 10 8 9 12 12
11 14 12 10 8 10 9
12 8 14 13 7 9 10
12 8 12 14 9 8 13
10 9 9 11 10 11 10
The following screen shot shows the mean, median and the mode for 35 ratings (occupying cells A2 to A36),
as calculated by the spreadsheet’s built –in statistical functions.
Σ 𝑥𝑤
Weighted mean
Σ
where:
𝑤 weight of each item
𝑥 value of each item
Example 4 Table 1.1 shows Janet’s first semester course grades. Use the weighted mean
formula to find the Janet’s GPA for the spring semester.
Physics 1.75 4
Statistics 2.25 3
Psychology 2.75 3
P.E 1.5 2
Solution
Weighted mean
A man bought 10 liters of premium gasoline at P11.50 per liter, 12 liters at P12.01
Your turn 4
per liter and 18 liters at P11.78 per liter from three different gasoline stations. Find the
mean price per liter.
LEARNING POINTS
A measure of central tendency is a summary measure that attempts to describe a whole set of data
with a single value that represents the middle or center of data set. Most commonly used measures of central
tendency or type of averages are arithmetic mean, median and mode.
LEARNING ACTIVITY 1
In numbers 1 to 5. Find the mean, the median, and the mode(s), if any, for the given data. Round non-
integer means to the nearest tenth.
3.
4.
5.
6. The final grades of a student in six courses were taken and are shown below. Compute the student’s
weighted mean grade. Round off your answer to the nearest hundredth.
Courses No. of Units Final Grade
Math 112 3 2.5
English 101 6 2.0
PS 25 3 1.5
Fil 1 3 1.4
Chem 1 5 2.4
PE 1 2 1.1
7. A professor grades students on 4 tests, a term paper, and a fi nal examination. Each test counts as 15% of
the course grade. The term paper counts as 20% of the course grade. The final examination counts as 20% of
the course grade. Alan has test scores of 80, 78, 92, and 84. Alan received an 84 on his term paper. His final
examination score was 88. Use the weighted mean formula to fi nd Alan’s average for the course. Hint: The
sum of all the weights is 100% = 1.
8. After 6 math tests, Zia has a mean score of 88. What score does Zia need on the next test to raise his
average (mean) to 90?
9. After 4 algebra tests, Alisa has a mean score of 82. One more 100-point test is to be given in this class. All
of the test scores are of equal importance. Is it possible for Alisa to raise her average (mean) to 90? Explain.
For instance, consider a soft-drink dispensing machine that should dispense 8 oz of your selection
into a cup. Table 2.1 shows data for two of these machines. The mean data value for each machine is 8 oz.
Table 2.1 Soda Dispensed (ounces)
Machine 1 Machine 2
9.52 8.01
6.41 7.99
10.07 7.95
5.85 8.03
8.15 8.02
̅ ̅
However, look at the variation in data values for Machine 1. The quantity of soda dispensed is very
inconsistent—in some cases the soda overflows the cup, and in other cases too little soda is dispensed. The
machine obviously needs adjustment. Machine 2, on the other hand, is working just fine. The quantity
dispensed is very consistent, with little variation.
This example shows that average values do not reflect the spread or dispersion of data. To measure
the spread or dispersion of data, we must introduce statistical values known as the range, mean deviation,
standard deviation, and the variance.
The Range
The simplest measure of dispersion is the range. It is the difference between the largest and the
smallest values in a data set.
Range
Range = Largest value – Smallest value
Mean Deviation
A defect of the range is that it is based on only two values, the highest and the lowest; it does not take
into consideration all of the values. The mean deviation does. It measures the mean amount by which the
values in a population, or sample, vary from their mean. In terms of a definition: Mean Deviation is the
arithmetic mean of the absolute values of the deviations from the arithmetic mean.
where
𝑥 𝑡 𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑒𝑎𝑐 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑥̅ 𝑡 𝑒 𝑎𝑟𝑖𝑡 𝑚𝑒𝑡𝑖𝑐 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡 𝑒 𝑣𝑎𝑙𝑢𝑒𝑠
𝑛 𝑡 𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑛𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
|𝑎| 𝑖𝑠 𝑡 𝑒 𝑎𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑣𝑎𝑙𝑢𝑒
The mean deviation has two advantages. First, it uses all values in the computation while range only uses
the highest and lowest values. Second, it is easy to understand it is the average amount by which values
deviate from the mean.
The weighs of some containers being shipped to China are (thousands of pounds):
Example 1 95, 103, 105, 110, 104, 105, 112, 90
Solution
a. Range = Highest value – Lowest value
or 22000 pounds
b. ̅
or 103000 pounds
| | | | | | | | | | | | | | | |
c.
Your turn 1 Find the following using the number of ounces by Machine 1 and 2 in Table 2.1.
a. range
b. mean deviation amount in ounces dispensed by each machine.
Because the sum of all the deviations of the data values from the mean is always 0, we cannot use
the sum of the deviations as a measure of dispersion for a set of data. Instead, the standard deviation uses
the sum of the squares of the deviations.
You may question why a denominator of is used instead of n when we compute a sample
standard deviation. The reason is that a sample standard deviation is often used to estimate the population
standard deviation, and it can be shown mathematically that the use of tends to yield better estimates.
Solution:
Step 1: Determine the mean.
Step 2: For each number, calculate the deviation between the number and the mean.
̅
2
4
7
12
15
Step 3: Calculate the square of each of the deviations in Step 2, and find the sum of these squared
deviations.
̅ ̅
2
4
7
12
15
̅ Sum of the squared deviations
Step 4: Because we have a sample of values, divide the sum 118 by which is 4.
Step 5: The standard deviation of the sample is √ . Thus the standard deviation is
.
A student has the following quiz scores: 5, 8, 16, 17, 18, 20. Find the standard
Your turn 2 deviation for this population of quiz score
Variance
A statistic known as the variance is also used as a measure of dispersion. The variance for a given
set of data is the square of the standard deviation of the data.
Variances
If 𝑥 𝑥 𝑥 … 𝑥𝑛 is a population of 𝑛 numbers with a mean of 𝜇 , then the variance of the
∑ 𝑥 𝜇 2
population is 𝜎 𝑛
Solution
In Example 2 , we found √ . The variance is the square of the standard deviation. Thus the variance
is √
Computer Solution
We can use spreadsheet like to find the range, standard deviation, and variance and the mode of a
certain data set.
Let us use the same list of data in Example 2.data are: 2, 4, 7, 12, 15
The formula is: The formula is: The formula is: The formula is:
The formula is: 𝐴𝑉𝐸𝐷𝐸𝑉 𝐴 : 𝐴 𝑆𝑇𝐷𝐸𝑉 𝐴 : 𝐴 𝑆𝑇𝐷𝐸𝑉𝑃 𝐴 : 𝐴 𝑉𝐴𝑅 𝐴 : 𝐴
𝑀𝐴𝑋 𝐴 : 𝐴 𝑀𝐼𝑁 𝐴 : 𝐴
.
LEARNING POINTS
Measures of dispersion are important for describing the spread of the data, or its variation around a
central value. . To measure the spread or dispersion of data, we must compute for statistical values known as
the range, mean deviation, standard deviation, and the variance.
LEARNING ACTIVITY 2
In exercises 1 to 6, compute the (a) range, (b) mean deviation ,(c) standard deviation, and (d) variance for the
following samples.
1. 6, 8, 3, 5, 6, 2, 7
2. 2, 8, 4, 2, 5, 8, 10, 1, 8, 12
3. 3, 4, 4, 6, 7, 9, 10, 13, 15, 16, 18, 21
4. 5.2, 11.7, 19.1, 3.7, 8.2, 16.3
5. 93, 67, 49, 55, 92, 87, 77, 66, 73, 96, 54
6.
7. The study described in the In the News article presented the data on flu-related deaths in several age
categories. Here is the complete set of data for one category.
Find the range, the mean, and the population standard deviation of the data.
In exercises 8 to 10 . The following tables list the ages of female and male actors when they starred in their
Oscar-winning Best Actor performances.
8. Find the mean and the sample standard deviation of the ages of the female recipients. Round each result
to the nearest tenth.
9. Find the mean and the sample standard deviation of the ages of the male recipients. Round each result to
the nearest tenth.
10. Which of the two data sets has the larger mean? Which of the two data sets has the larger standard
deviation?
Quartiles divide a set of observations into four equal parts. To explain further, think of any set of
values arranged from smallest to largest. The first quartile, usually labeled , is the value below which 25
percent of the observations occur , and the third quartile , usually labeled , is the value below which 75
percent of the observations occur . Logically is the median.
Deciles divide a set of observation s into 10 equal parts. If you found that your GPA was in the 8th
decile or at your class, you could conclude that 80 percent of your classmates had a GPA lower than
yours and 20 percent has a higher GPA .
The last one is the percentiles which divide observations into 100 equal parts. For instance, a GPA
of 33rd percentile means that 33 percent of the students have a lower GPA and 67 percent have a higher
GPA.
Quartiles, Deciles, Percentiles
Quartile Decile Percentile
𝑖 𝑛 𝑖 𝑛 𝑖 𝑛
𝑄𝑛 𝐷𝑛 𝑃𝑛
where
𝑛 number of observations
𝑖 – desired location
After you arranged the data in ascending order, you count what number falls under the 9.75 th position.
To get the 9.75th position, we have to interpolate from the given data. The 9.75 th position is interpolated from
the 9th position plus .75 ( The value of the third quartile is equal to 18.5.
𝒛 𝒔𝒄𝒐𝒓𝒆
The following formulas show how to calculate the z-score for a data value x in a population
and in a sample.
𝑥 𝜇 𝑥 𝑥̅
Population : 𝑧𝑥 Sample : 𝑧𝑥
𝜎 𝑠
A negative represents a value less than the mean. A positive represents a value
greater than the mean. When , the data value is equal to the mean.
A equal to 1 represents a value that is 1 standard deviation above the mean ; a
equal to represents an element that is 1 standard deviation below the mean . If the number of
elements in the data set is large, about 68% of the elements have between and 1. About 95%
have between and 2 and about 99% have between and .
Example 2 Andrew gets a score of 64 in the Mathematics test where the class mean is 50 with
standard deviation of 8. Belle gets a score of 74 in a Physics test where the mean is
58 and the standard deviation is 10. Find out who actually performed better.
Solution
Find the z-score for each test.
Andrew : Belle:
So although Belle’s score is higher, Andrew’s score is farther above the mean and we may say that Andrew
performed better.
Cheryl has taken two quizzes in her history class. She scored 15 on the first quiz, for
Your turn2
which the mean of all scores was 12 and the standard deviation was 2.4. Her score
on the second quiz, for which the mean of all scores was 11 and the standard deviation
was 2.0, was 14. In comparison to her classmates, did Cheryl do better on the first quiz or the second quiz?
Example 3 A consumer group tested a sample of 100 light bulbs. It found that the mean life
expectancy of the bulbs was 842 h, with a standard deviation of 90. One particular
light bulb from the DuraBright Company had a of 1.2. What was the life span of this light bulb?
Solution
Substitute the given values into the equation and solve for
̅
Solve for
Your turn3 Roland received a score of 70 on a test for which the mean score was 65.5. Roland
has learned that the z-score for his test is 0.6. What is the standard deviation for this
set of test scores?
LEARNING POINTS
The measures of relative position of a given value shows where the value stands in relation
position of a given value in relation to other values in the same set of data. The most common measures of
relative position are quartiles, percentiles, and standard scores
LEARNING ACTIVITY 3
In exercises 1 to 4. A data set has a mean of ̅ and a standard deviation of . Find the score for
each of the following.
1.
2.
3.
4.
A data set has a mean of ̅ and a standard deviation of 115. Find the z-score for each of the
following.
5.
6.
7.
8.
In exercises 9 to 10. A random sample of 1000 oranges showed that the mean amount of juice per orange
was 7.4 fluid ounces, with a standard deviation of 1.1 fluid ounces.
9. Determine the z-score, to the nearest hundredth, of an orange that produced 6.6 fluid ounces of juice.
10. The z-score for one orange was 3.15. How much juice was produced by this orange? Round to the
nearest tenth of a fluid ounce.
11. Which of the following fitness scores is the highest relative score?
a. A score of 42 on a test with a mean of 31 and a standard deviation of 6.5
b. A score of 1140 on a test with a mean of 1080 and a standard deviation of 68.2
c. A score of 4710 on a test with a mean of 3960 and a standard deviation of 560.4
In exercises 12 to 14. The following scores were received by 20 accounting students in a short quiz: 10, 9, 15,
20, 13, 15, 18, 11, 7, 12, 15, 13, 18, 19, 12, 8, 10, 13, 17, and 15. Find the following :
12. third quartile,
13. eight decile and
14. forty percentile.
15. Rene scored at the 84th percentile on a test given to 12,600 students. How many students scored
higher than Rene?
Data that has not been organized or manipulated in any manner is called raw data. A large collection
of raw data may not provide much pertinent information that can be readily observed. A frequency
distribution, which is a table that lists observed events and the frequency of occurrence of each observed
event, is often used to organize raw data. For instance, consider the following table, which lists the number of
laptop computers owned by families in each of 40 homes in a subdivision.
Table 4.1
The frequency distribution in Table 4.2 below was constructed using the data from Table 4.1. The first
column of the frequency distribution consists of the numbers 0, 1, 2, 3, 4, 5, 6, and 7. The corresponding
frequency of occurrence, f, of each of the numbers in the first column is listed in the second column.
Table 4.2
In the normal distribution shown below, the area of the shaded region is 0.159 units. This region
represents the fact that 15.9% of the data is greater than or equal to 10. Because the area under the curve is
1, the unshaded region under the curve has area , or 0.841, representing the fact that 84.1% of the
data are less than 10.
The following rule, called the Empirical Rule, describes the percent of data that lie within 1, 2, and 3
Solution
a. Converting $2.74 into a z-score, , means that $2.74 per gallon price is 2 standard
deviations below the mean. While the $3.46 price, , thus $3.46 price is 2 standard
deviations above the mean. In a normal distribution, 95% of all data lie within 2 standard deviations of the
mean. See Figure 4.3. Therefore, approximately of the stations charge
between $2.74 and $3.46 for a gallon of regular gas.
Figure 4.3
b. Converting $3.28 price into a z-score, we can say that $3.28 price is 1 standard deviation
above the mean. See Figure 4.4. In a normal distribution, 34% of all data lie between the mean and 1
standard deviation above the mean. Thus, approximately (34%)(1000) 0.34)(1000) 340 of the stations
charge between $3.10 and $3.28 for a gallon of regular gasoline. Half of the 1000 stations, or 500 stations,
charge less than the mean. Therefore, about
of the stations charge less than $3.28 for a gallon of regular gas.
Figure 4.4
c. Converting $3.46 price in a z-score , will give us a result of 2 standard deviations above
the mean. In a normal distribution, 95% of all data are within 2 standard deviations of the mean. This means
that the other 5% of the data will lie either above 2 standard deviations of the mean or below 2 standard
deviations of the mean. We are interested only in the data that are more than 2 standard deviations above the
mean, which is of 5%, or 2.5%, of the data. See Figure 4.5. Thus about (2.5%)(1000) (0.025)(1000) 25
of the stations charge more than $3.46 for a gallon of regular gas.
Figure 4.5
A vegetable distributor knows that during the month of August, the weights of its
Your turn2 tomatoes are normally distributed with a mean of 0.61 lb and a standard deviation
of 0.15 lb.
a. What percent of the tomatoes weigh less than 0.76 lb?
b. In a shipment of 6000 tomatoes, how many tomatoes can be expected to weigh more than 0.31 lb?
c. In a shipment of 4500 tomatoes, how many tomatoes can be expected to weigh from 0.31 lb to 0.91 lb
Figure 4.6
Tables and calculators are often used to determine the area under a portion of the standard normal
curve. We will refer to this type of area as an area of the standard normal distribution. Table 4.4 gives the
approximate areas of the standard normal distribution between the mean 0 and z standard deviations from the
mean. (See Figure 4.7) Table 4.4 indicates that the area A of the standard normal distribution from the mean
0 up to z 1.34 is 0.410 square unit.
Figure 4.7
TABLE 4.4
Area Under the Standard Normal Curve
z A z A z A z A z A z A
0.00 0.000 0.56 0.212 1.12 0.369 1.68 0.454 2.24 0.487 2.80 0.497
0.01 0.004 0.57 0.216 1.13 0.371 1.69 0.454 2.25 0.488 2.81 0.498
0.02 0.008 0.58 0.219 1.14 0.373 1.70 0.455 2.26 0.488 2.82 0.498
0.03 0.012 0.59 0.222 1.15 0.375 1.71 0.456 2.27 0.488 2.83 0.498
0.04 0.016 0.60 0.226 1.16 0.377 1.72 0.457 2.28 0.489 2.84 0.498
0.05 0.020 0.61 0.229 1.17 0.379 1.73 0.458 2.29 0.489 2.85 0.498
0.06 0.024 0.62 0.232 1.18 0.381 1.74 0.459 2.30 0.489 2.86 0.498
0.07 0.028 0.63 0.236 1.19 0.383 1.75 0.460 2.31 0.490 2.87 0.498
0.08 0.032 0.64 0.239 1.20 0.385 1.76 0.461 2.32 0.490 2.88 0.498
0.09 0.036 0.65 0.242 1.21 0.387 1.77 0.462 2.33 0.490 2.89 0.498
0.10 0.040 0.66 0.245 1.22 0.389 1.78 0.462 2.34 0.490 2.90 0.498
0.11 0.044 0.67 0.249 1.23 0.391 1.79 0.463 2.35 0.491 2.91 0.498
Because the standard normal distribution is symmetrical about the mean of 0, we can also use Table 4.4 to
find the area of a region that is located to the left of the mean.
Find the area of the standard normal distribution between z 1.44 and
Example 3
z 0.
Solution
Because the standard normal distribution is symmetrical about the center line the area of the standard
normal distribution between and is equal to the area between and . The entry
in Table 4.4 associated with is 0.425. Thus the area of the standard normal distribution between
and is 0.425 square unit. See Figure 4.8.
Your turn3 Find the area of the standard normal distribution between and
In Figure 4.9, the region to the right of is called a tail region. A tail region is a region of the standard
normal distribution to the right of a positive value or to the left of a negative value. To find the area of a
tail region, we subtract the entry in Table 4.4 from 0.500. This procedure is illustrated in the next example.
Example 4 Find the area of the standard normal distribution to the right of .
Solution
Table 4.4 indicates that the area from to is 0.294 square unit. The area to the right of is
0.500 square unit. Thus the area to the right of is square unit. See Figure 4.9.
Your turn4 Find the area of the standard normal distribution to the left of
Because the area of a portion of the standard normal distribution can be interpreted as a percentage
of the data or as a probability that the variable lies in an interval, we can use the standard normal distribution
to solve many application problems.
A soda machine dispenses soda into 12-ounce cups. Tests show that the
Example 5 actual amount of soda dispensed is normally distributed, with a mean of 11.5
oz and a standard deviation of 0.2 oz.
Solution
a. Recall that the formula for the score for a data value is
̅
Table 4.4 indicates that 0.394 (39.4%) of the data in a normal distribution are between and
. Because the data are normally distributed, 39.4% of the data is also between and
. The percent of data to the left of is 50% 39.4% 10.6%. See Figure 4.9 . Thus
10.6% of the cups filled by the soda machine will receive less than 11.25 oz of soda.
Table 4.4 indicates that 0.099 (9.9%) of the data in a normal distribution is between and .
The z-score for 11.2 oz is
Table 4.4 indicates that 0.433 (43.3%) of the data in a normal distribution are between and
. Because the data are normally distributed, 43.3% of the data is also between and
. See Figure 4.10. Thus the percent of the cups that the vending machine will fi ll with between 11.2
oz and 11.55 oz of soda is 43.3% 9.9% 53.2%.
c. A cup will overflow if it receives more than 12 oz of soda. The score for 12 oz is
Table 4.4 indicates that 0.494 (49.4%) of the data in the standard normal distribution are between and
. The percent of data to the right of is determined by subtracting 49.4% from 50%. See Figure
4.11. Thus 0.6% of the time the machine produces an overflow, and the probability that a cup chosen at
random will overflow is 0.006.
A study of the careers of professional football players shows that the lengths
Your turn5 of their careers are nearly normally distributed, with a mean of 6.1 years and
a standard deviation of 1.8 years.
a. What percent of professional football players have a career of more than 9 years?
b. If a professional football player is chosen at random, what is the probability that the player will have a
career of between 3 and 4 years?
LEARNING POINTS
A normal distribution forms a bell-shaped curve that is symmetric about a vertical line through the mean
of the data.
Empirical Rule for a Normal Distribution
In a normal distribution, approximately
LEARNING ACTIVITY 4
In exercises 1 to 3. Use the Empirical Rule to answer each question. In a normal distribution, what percent of
the data lie
1. within 3 standard deviations of the mean?
2. below 2 standard deviations of the mean?
3. between 2 standard deviations below the mean and 3 standard deviations above the mean?
In exercises 4 to 5. Use the Empirical Rule to answer each question. A baseball franchise finds that the
attendance at its home games is normally distributed, with a mean of 16,000 and a standard deviation of
4000.
4. What percent of the home games have an attendance between 12,000 and 20,000 people?
5. What percent of the home games have an attendance of fewer than 8000 people?
In exercises 6 to 9, find the area, to the nearest thousandth, of the standard normal distribution between the
given z-scores.
6. The region where
7. The region where
8. The region where
9. The region where
In exercises 10 to 13 . Find the area, to the nearest thousandth, of the standard normal distribution between
the given z-scores.
10. and
11. and
12. and
13. and
In exercises 14 to 15. A psychologist finds that the intelligence quotients of a group of patients are normally
distributed, with a mean of 102 and a standard deviation of 16. Find
the percent of the patients with IQs
where :
= sum of the values of x
If the linear correlation coefficient is positive, the relationship between the variables has a positive
correlation. In this case, if one variable increases, the other variable also tends to increase. If is negative,
the linear relationship between the variables has a negative correlation. In this case, if one variable
increases, the other variable tends to decrease.
Figure 5.1 shows some scatter diagrams along with the type of linear correlation that exists between
the and variables. The closer | | is to 1, the stronger the linear relationship between the variables.
Below are the scores of 12 college students in Mathematics and Physics tests of 80
Example 1 items each.
Table 5.1
Mathematics ( 65 63 67 64 68 62 70 66 68 67 69 71
Physics ( ) 68 66 68 65 69 66 68 65 71 67 68 70
Solution
Step 1: Draw a scatter plot. If the scatter plot does not show any (linear) trend stop analysis, conclude ―no
relationship‖. Otherwise proceed to step number 2
72
71
70
69
68
67
66
65
64
60 62 64 66 68 70 72
The scatter plot indicates an upward linear trend between Mathematics and Physics proficiency. Thus, ―there
is a reason to believe that they are related.‖
Step 2: Compute for Pearson by rearranging the given in columns.
Table 5.2
Mathematics
Number Physics x2 y2 xy
1 65 68 4225 4624 4420
2 63 66 3969 4356 4158
3 67 68 4489 4624 4556
4 64 65 4096 4225 4160
5 68 69 4624 4761 4692
6 62 66 3844 4356 4092
7 70 68 4900 4624 4760
8 66 65 4356 4225 4290
9 68 71 4624 5041 4828
10 67 67 4489 4489 4489
11 69 68 4761 4624 4692
12 71 70 5041 4900 4970
N = 12 x 800 y 811 x 2
53418 y 2
54849 xy 54107
r
1254107 800811
1253418 8002 1254849 8112
r 0.70
Referring to the arbitrary scale for the interpretation of , it states that there is a strong/ high positive
relationship between the scores of the students in Mathematics and Physics.
Your turn1 Find the linear correlation coefficient for stride length versus speed of a camel as given
in Table 5.3 and interpret the result. Round your result to the nearest hundredth.
Table 5.3
Stride length(m) 2.5 3.0 3.2 3.4 3.5 3.8 4.0 4.2
Speed (m/s) 2.3 3.9 4.4 5.0 5.5 6.2 7.1 7.6
LINEAR REGRESSION
Regression is a term used to describe the process of estimating the relationship between two
variables. The relationship is estimated by fitting a straight line through the given data. The method of least
squares permits us to find a line of best fit called regression line which keeps the errors of prediction to a
minimum.
where
Y = predicted value
a = y-intercept
b = slope of the regression line
x = the value of x to be predicted
𝑦 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑌
where : 𝑥̅ 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑋
= sum of the values of x
= sum of the values of y
= sum of the values of the square of x
= sum of the values of the product of x and y
n= total number of pairs
Find the regression line equation of Table 5.2 and predict the score in Physics if
Example 2 the score in Mathematics of the student is 75.
Solution
Formulate the regression line equation by solving first the value of the variables b and a.
Solving for
b
1254107 800811
b 0.48
1253418 8002
Solving for
Y = a + bx
y 35.59 0.48x regression line equation
We can now estimate scores in Physics using the regression line equation by substituting a value
or score in Mathematics Say for instance, if x is equal to 75, then solving for y will give a 71.59.
y 35.59 0.4875
y 71.59
Therefore, the estimated score in Physics is 71.59 or approximately equivalent to 72 if the score in
Mathematics is 75. The regression line equation may be used now in estimating scores for y by substituting a
value of
Find the regression line equation of Table 5.3 and predict the speed of a camel if
Your turn2 the stride length of the camel is 5.0.
Computer Solution
Using the data on the scores of 12 college students in Mathematics and Physics tests of 80 items
(Table 5.1), the following screenshot shows for the 12 paired values (occupying cells and cells
) as calculated by the spreadsheet’s built in PEARSON() ,INTERCEPT(), SLOPE()function.
Note here that the value of is slightly different from the value of in example 1
because of some rounding off error.
LEARNING POINTS
Correlation is a degree of relationship between variables, which seeks to determine how well a linear or
other equation describes or explains the relationship between variables. It also implies ―association‖
between two variables
Regression is a term used to describe the process of estimating the relationship between two
variables. The relationship is estimated by fitting a straight line through the given data. The method of least
squares permits us to find a line of best fit called regression line which keeps the errors of prediction to a
minimum.
LEARNING ACTIVITY 5
Trigonometry 43 41 50 47 35 33 50 33 54
Geometry 48 45 47 43 33 28 48 31 57
3. The number of hours spent per week viewing television ( ) and the number of years of education ( ) were
recorded for ten randomly selected individuals. The results are given below;
12 14 11 16 16 18 12 20 10 12
10 9 15 8 5 4 20 4 16 15
REFERENCES
References :
Blay et. all, Mathematical Trips in the Modern World Outcomes-Based Approach
Nocon et. al , Essential Mathematics for the Modern World
Baltazar et. al, Mathematics in the Modern World
Aufman,Richard et. al, Mathematics in the Modern World
Mathematics in the World book from RBSI
Paguio et. all, Statistics with Computer Based Discussion
Photo credits:
Population vs sample, keydifference.com
Figure 4.1 A histogram for the frequency distribution , Aufman,Richard et. al, Mathematics in the Modern
World
3. a. In the list 3, 3, 3, 4, 4, 4, 5, 5, 5, 8, the numbers 3, 4, and 5 occur more often. Thus 3, 4, and 5 are the
mode.
b. In the list 12, 34, 12, 71, 48, 93, 71,the numbers 12 and 71 occur more often that others, thus 12 and 71
are the mode
4.
Machine 1 Machine 2
a. Range
Range = Range =
| ̅|
b.
2.
̅ ̅
5
8
16
17
18
20
̅
∑
√ √ √
3. In Your turn 2, we found √ . Variance is the square of the standard deviation. Thus the variance
is (√ )
After you arranged the data in ascending order, you count what number falls under the 5.2 th position. To get
the 5.2th position, we have to interpolate from the given data. The 5.2 th position is interpolated from the 5 th
position plus .2 ( The value of fourth decile is equal to 9.4
2.
These indicate that in comparison to her classmates, Cheryl did better on the second quiz than
she did on the first quiz.
3.
2. a. 0.76 lb is 1 standard deviation above the mean of 0.61 lb. In a normal distribution, 34% of all data lie
between the mean and 1 standard deviation above the mean, and 50% of all data lie below the mean. Thus
34% +50% = 84% of the tomatoes weigh less than 0.76 lb.
b. 0.31 lb is 2 standard deviations below the mean of 0.61 lb. In a normal distribution, 47.5% of all data lie
between the mean and 2 standard deviations below the mean, and 50% of all data lie above the mean. This
gives a total of 47.5% + 50% 97.5% of the tomatoes that weigh more than 0.31 lb.
Therefore
(97.5%)(6000) (0.975)(6000) 5850 of the tomatoes can be expected to weigh more than 0.31 lb.
c. 0.31 lb is 2 standard deviations below the mean of 0.61 lb and 0.91 lb is 2 standard deviations above the
mean of 0.61 lb. In a normal distribution, 95% of all data lie within 2 standard deviations of the mean.
Therefore(95%)(4500) (0.95)(4500) 4275 of the tomatoes can be expected to weigh from 0.31 lb to 0.91
lb.
3. The area of the standard normal distribution between and is equal to the area between
and . The entry in Table 4.4 associated with is 0.249. Thus the area of the standard
normal distribution between and is 0.249 square unit.
4. Table 4.4 indicates that the area from to is 0.429 square unit. The area to the left of
is 0.500 square unit. Thus the area to the left of is square unit.
5. Round z-scores to the nearest hundredth so you can use Table 4.4 .
a.
Table 4.4 indicates that 0.446 (44.6%) of the data in the standard normal distribution are between and
. The percent of the data to the right of is 50% 44.6% 5.4%. Approximately 5.4% of
professional football players have careers of more than 9 years.
b.
The probability that a professional football player chosen at random will have a career of between 3 and 4
years is about 0.078.
N xy x y
r
N x x N y y
2 2 2 2
√[ ][ ]
The linear correlation coefficient, rounded to the nearest hundredth, is 1.00. Referring to the arbitrary scale for
the interpretation of , it states that there is a perfect relationship between the stride length and speed
of a camel.
2. Formulate the regression line equation by solving first the value of the variables and .
Solving for
b
8195.86 28.852.1
b 2.7303
8106.72 28.82
Solving for
Y = a + bx
y 3.3 2.7 x regression line equation
We can now estimate the speed of a camel using the regression line equation by substituting a
value or stride length of the camel Say for instance, if is equal to 5.0, then solving for y will give a 71.59.
y 3.3 2.75.0
y 10.2
Therefore, the estimated speed of a camel is 10.2 if its stride length is 5.0. The regression line
equation may be used now in estimating scores for y by substituting a value of