0% found this document useful (0 votes)
49 views23 pages

Statistics 2025

The document outlines the course objectives and structure for a statistics course at the Estuary Academic and Strategic Institute, focusing on data collection, analysis, and interpretation. It includes a detailed course outline covering topics such as graphical representation, measures of central tendency, and hypothesis testing. Additionally, it provides specific objectives for students to effectively communicate data insights and utilize various data presentation methods.

Uploaded by

kerryntekie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views23 pages

Statistics 2025

The document outlines the course objectives and structure for a statistics course at the Estuary Academic and Strategic Institute, focusing on data collection, analysis, and interpretation. It includes a detailed course outline covering topics such as graphical representation, measures of central tendency, and hypothesis testing. Additionally, it provides specific objectives for students to effectively communicate data insights and utilize various data presentation methods.

Uploaded by

kerryntekie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Institut Universitaire et Stratégique de l’Estuaire General objectives

Estuary Academic and Strategic Institute(IUEs/Insam)


sous la tutelle académique des Universités de Buéa, de Douala et de Dschang At the end of this course the students shall be able to

 Understand statistical concepts and terminology


 Develop skills in collecting, analyzing and interpreting data
 Apply statistical methods to real-world problems

COURSE OUTLINE
Statistics: 2 credits (30 hours); L, T, SPW
1. Graphical representation;
2. Central tendency, dispersion,(mean, mode, median, variance,

STATISTICS and standard deviation, deciles, interquatile range);


3. Covariance;
4. Correlation coefficients and regression;
5. Least square methods;
6. Estimation of mean and standard deviation;
7. Test of hypothesis
8. Descriptive statistics;

NGA KEVIN

1
CHAPTER 1 This is the different between the upper true limit and the lower true limit for
DATA PRESENTATION, any of the classes. It is calculated using any of the following formula. 𝑖 =
𝑟𝑎𝑛𝑔𝑒
𝑢𝑡𝑙 − 𝑙𝑡𝑙 or 𝑖 = 𝑛𝑜 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
Specific Objectives
𝑟𝑎𝑛𝑔𝑒
At the end of this chapter the students shall be able to No of classes = 𝑖
 Effectively communicate insights and findings e.g., 20+(10-1) =29
 Convey complex data in a clear and concise manner .
DATA PRESENTATION 6. The class boundaries or true limits
When data is collected, it has to be sorted, reorganized, and put in readable Class boundaries are always estimated to be 0.5 units lower and higher than
form. Statistics make use of tables, percentages, graphs, polygon, etc. to the class limits. The lower-class boundary is usually 0.5 units less than the
presents data for easy interpretation lower-class limit. It is estimated by subtracting 0.5 units from the lower-class
Methods of data presentation limit. The upper-class boundary which is the usually 0.5 unit above the upper-
Grouped Frequency Distribution class limit is estimated by adding 0.5 units to the upper-class limits.
When data becomes very spreads to be included on a simple frequency 7. The midpoints
distribution table, a grouped frequency distribution table become inevitable. The midpoint is the values that lie midway between the lower and upper limit.
For example, to analyze the results of 2000 students in an exam on 100 items It is often taken to represent a class interval as a single value. It is calculated=
𝐿𝐿+𝑈𝐿
cannot be done by use of simple or ungrouped frequency distribution
2
8. Class interval
CONSTRUCTING GROUP FREQUENCY This is the different between the upper and lower limits. In our case above, the
1. Identify the Highest Hx and lowest Lx score class interval can be calculated from any class interval using the formula
2. Range[extent]; It is the difference between the Highest Hx and lowest Lx Class interval= 𝒖𝒄𝒍 − 𝒍𝒄𝒍
score. Range = [Hx-Lx] The second-class interval begins one unit from the upper limit of the preceding
3. Absolute frequenc This is referred to the number of times a particular class interval. In our case above the lower limit of the second-class interval
event occurs.i.e., the rate of occurrence of anything, the relationship will be 30 while the upper limit will be obtained in the same manner as in
between the incident and time period one’ll+(class width-1) = 30+(10-1) =39.
4. Class limits These are the extremes of class intervals which most never be 9. Cumulative frequency
exceeded. Two types of class limits have been identified in statistics. i.e., This is the accumulation of successive values of frequency up to a certain
the lower and the upper class limits. The lower limits, LL are the scores on level. Cumulative frequency is calculated by adding all the frequencies of the
the left of each class while the upper limits, UL boarder the class interval preceding event to that of the current event. For calculation see table below.
to the right. Two types of cumulative frequency have been identified increasing and
The lower limit of 1st class is lowest score and upper limit of first score is decreasing cumulative frequency. With increasing cumulative frequency,
obtained by addition of successive unit is from lowest to highest values usually from top to
UL= LL+(i-1) where LL=lower limit, i=class width bottom while with decreasing cumulative frequency, addition of successive
5. Class width or class size [a or i] unit is from highest to the lowest usually from below. 𝒄𝒇 = 𝒇 + 𝒑𝒇 where
𝑐𝑓 𝑖𝑠 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦, 𝑓 𝑖𝑠 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑎𝑛𝑑 𝑝𝑓 𝑖𝑠 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠

2
10. Relative frequency vi. Pie chart etc.
This is the frequency of events incident, or class express in relation to the total
𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚
frequency. It is calculated RF=𝒕𝒐𝒕𝒂𝒍 𝒂𝒃𝒔 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 LINE GRAPH [ALSO CALLED STICK DIAGRAM]
A line graph is use to present discrete data on a rectangular coordinate’s axis
11. Percentage relative frequency with frequency on the vertical axis and scores on the horizontal axis
Sometimes relative frequency is express as a percentage of total absolute Example: Construct a line graph with the following information
𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚
frequency. It is calculated 𝑅𝐹 = 𝒕𝒐𝒕𝒂𝒍 𝒂𝒃𝒔 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 𝒙𝟏𝟎𝟎 Marks frequency
12. Relative cumulative frequency 10 8
This is cumulative frequency express in relation to the total absolute frequency. 11 5
It is calculated=
𝒄𝒖𝒎 𝒇𝒓𝒆𝒒 12 7
𝒕𝒐𝒕𝒂𝒍 𝒂𝒃𝒔 𝒇𝒓𝒆𝒒 13 4
13. Relative percentage cumulative frequency 14 6
This is cumulative frequency express as a percentage of total absolute 15 3
𝒄𝒖𝒎 𝒇𝒓𝒆𝒒
frequency. It is calculated using the formula𝑹𝑪𝑭 = 𝒕𝒐𝒕𝒂𝒍 𝒂𝒃𝒔 𝒇𝒓𝒆𝒒 𝒙𝟏𝟎𝟎 16 4
Following our example above, the following additional information can be 17 3
obtained. Total 40
Class Absolut Class Mid %Relati Increasi Decreasi %RC BAR CHART
interv e boundari point ve ng cum ng cum F A bar chart is a graph of rectangular bars of different height representing the
al frequen es s frequenc free free frequencies or categories of data.
cy y Bar charts are of three types the simple, component, and compound bar charts
20-29 3 but statistics in education is only concern with the simple bar chart.
30-39 6 The simple bar charts
40-49 11 Consider our example above,
50-59 8 Compound bar chart
60-69 10 A compound bar chart is one that allow us to compare changes between more
70-79 7 variables e.g., the figure below gives marks of students in an exam distributed
Total 45 between males and females
Table 4: additional information on group frequency distribution Scores Frequency
male Female
PICTORIAL PRESENTATION 0-20 40 20
Pictorial presentations are the most widely used in statistics because they are 21-40 50 30
attractive, stylish, and easy to interpret. They include 41-60 40 50
i. Line graph 61-80 40 70
ii. Bar chart 81-100 10 40
iii. Histogram A component bar chart on its part includes some characteristics of the variable
iv. Frequency polygon in question. The figures below give results of students in a HND exams
v. Pictogram according to their respective division of origin 0-20
3
Scores Frequency  Plot the midpoints against the frequencies with frequency on the
Momo Mezam Boyo Donga vertical axis and the midpoints on the horizontal axis
mantum  Construct first a histogram, then superimpose a frequency polygon on
0-20 10 20 2 5 the same diagram by linking all the midpoints of the top of the bars
21-40 10 10 5 8 with a line.
41-6 15 10 5 5
61-80- 20 10 5 12 PIE CHART
81- 5 5 00 00 This is a commonly use data presentation method whereby characteristics or
100 attributes of variables are presented in terms of degrees of a circle.. To
HISTOGRAM construct a pie chart, we must first calculate in terms of degrees the frequency
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
It is a graphical display of numerical data in the form of upright jointed bars of proportion of each variable using the formula ∑𝑓 𝑥360
width representing the class size and length representing the frequency. The scores Frequency Proportion in degrees
histogram is only use for group data. 0-19 20 20
scores True Frequency 𝑥360 = 78.26
92
limits 20-29 16 16
0-4 -0.5-4.5 20 𝑥360 = 62.61
92
5-9 4.5-9.5 16 30-39 26 26
10-14 9.5-14.5 26 𝑥360 = 101.74
92
15-19 14.5-19.5 18 40-49 18 18
19-23 19.5-23.5 12 𝑥360 = 70.43
92
50-59 12 12
𝑥360 = 46.96
92
total 92 360
Cumulative frequency curve/ ogive
The ogive is different from the graph and a frequency polygon in that instead
of the frequency, it rather uses the cumulative frequency values plotted against
scores for ungroup data. For the case of group data, the cumulative frequency
is plotted class boundaries.
Cumulative frequency curve of ungroup data
Using information from the table below, draw accumulative frequency curve
Marks Freq Cumm freq
histogram 10 8 8
FREQUENCY POLYGON 11 5 13
A frequency polygon is a locust of points linking all the midpoints plotted 12 7 20
against the frequencies. To construct a frequency polygon, any of the following 13 4 24
method could be use: 14 6 30
15 3 33
4
16 4 37 CHAPTER 2
17 3 40 MEASURE OF CENTRAL TENDENCY
total 40 General objectives
Cumulative frequency curve for grouped frequency At the end of this chapter the students shall be able to
Use information from the table below to draw a cumulative frequency curve.  To present a brief picture of data:
Use midpoints  Perform data comparison
Scores Frequency Cumulative  Help in decision making:
frequency  Formulate of policies
0-19 20 20 Measure of central tendency is a single score or attribute, calculated from a
20-29 16 36 group of scores and can be use to describe the group e.g., the mean, mode and
30-39 26 62 median
40-49 18 80 The arithmetic mean
50-59 12 92 Sometimes call the average is a measure of central tendency obtained by
Total 92 dividing the sum total of scores by the number of scores in that group.

END OF CHAPTER EXERCISES THE MEAN OF UNGROUPED FREQUENCY DISTRIBUTION


HND 2021 The formula of frequency distribution is slightly different from that of raw
Marks No. of Candidates data. The formula is 𝑥̅ =
∑𝑓𝑥
where
∑𝑓
0 – 9 10 𝑥̅ 𝑖𝑠 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛, 𝑓 𝑖𝑠 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦, 𝑥 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑐𝑜𝑟𝑒 fx is frequency times the
10 – 19 25 scores. find the mean of the scores below
20 – 29 45 Marks Frequency, Fx Table 13: calculating the mean of
30 – 39 65 x f ungrouped frequency distribution
40 – 49 80 10 8 80 ∑𝑓 = 40, ∑𝑓𝑥 = 515
∑𝑓𝑥 515
11 5 55 Using the formula=𝑥̅ = ∑𝑓 = 40 = 12.88
50 – 59 70
12 7 84
60 – 69 55 13 4 52
70 – 79 30 14 6 84
80 – 89 15 15 3 45
90 – 99 5 16 4 64
17 3 51
Total 40 515
(i) Compile the cumulative frequency table and draw the cumulative frequency THE MEAN OF GROUPED FREQUENCY DISTRIBUTION
curve. To obtain the mean of group frequency, the formula remains the same as in the
case of ungroup frequency but we are required to calculate the midpoints (x) in
order to use them find (fx) e.g., calculate the mean from the table below

5
Scores Frequency Midpoints Fx  L=lower class boundary of the modal class
f x  Then I=class width
0-19 20 9.5 190  fez=frequency of the modal class
20-29 16 24.5 392  fi=frequency before the modal class
30-39 26 34.5 897  ft =frequency after the modal class
40-49 18 44.5 801 using information from the table below estimate the mode of the distribution
50-59 12 54.5 654 scores True Frequency Midpoints fox Table 14.1: estimating
Total 92 3326 limits the mode of grouped
Table 13.1 calculating the mean of grouped frequency distribution 0-19 -0.5-19.5 20 9.5 190 data
∑𝑓X=3326, ∑𝑓 = 92 20-29 19.5-29.5 16 24.5 392
∑𝑓𝑥 3326
𝑥̅ = ∑𝑓 = 92 = 36.15 30-39 29.5-39.5 26 34.5 897 The first thing to do is
40-49 39.5-49.5 18 44.5 801 to identify the modal
50-59 49.5-59.5 12 54.5 654 class which is the class
MODE
The mode in a group of scores is that scores that appear the highest number of total 92 3326 with the highest
frequency. By
times. In a frequency distribution table, it is the score with the highest observation it is 30-39 with frequency of 26. Next you identify the values in
frequency. the formula L=29.5, I= 39.5-29.5=10, fez=26, fi=16, and ft=18
THE MODE OF UNGROUPED FREQUENCY DISTRIBUTION 26−16 10
The mode of an ungroup frequency is obtained be observing. It is the score(s) Mode=29.5 + (26−16)+26−18 10 = 29.5 + 10+8 10 = 35.05
with the highest frequencies e.g., consider the table below
Marks Frequency Cumulative Table 14: estimating the mode of MEDIAN
free ungrouped data The median score is the middle score in a distribution when arrange in
10 8 8 By observation the mode is 10 because it ascending or descending order. In other words, it is the score that has 50% of
11 5 13 has the highest frequency 8. NB the scores below and 50% above it when arrange in ascending or descending order.
12 7 20 mode is not 8 but the score 10. The median of raw and ungroup data can be obtained by observation or simple
13 4 24 Remember the mode could still be calculation but the median of group frequency distribution needs some energy
14 6 30 bimodal or multimodal depending on and time for it to be obtain.
15 3 33 how many scores have the highest
16 4 37 frequency. THE MEDIAN UNGROUPED FREQUENCY DISTRIBUTION
17 3 40 The median of ungroup frequency distribution is determined by use of the
Total 40 formula 1/2(∑𝑓 + 1)𝑡ℎ score.
Example
THE MODE OF GROUPED FREQUENCY Find the median from the distribution below
Unlike with raw data and ungroup frequency distribution where the mode was
observable, in the group frequency distribution the mode is obtained by use of
a formula
𝑓𝑧−𝑓𝑖
𝐿 + (𝑓𝑧−𝑓𝑖)+(𝑓𝑧−𝑓ℎ) 𝑖 ,where

6
Marks frequency Cumulative Table 15.1: the median of ungrouped data scores True limits Frequency Cum frequency Midp fox
free oints
10 8 8 ∑𝑓 = 40 0-19 -0.5-19.5 20 20 9.5
190
11 5 13 1/2(∑𝑓 + 1)𝑡ℎ score = 1/2(40 + 20-29 19.5-29.5 16 36 24.5
392
12 7 20 1)𝑡ℎ=20.5the score 30-39 29.5-39.5 26 62 34.5
897
13 4 24 From the cumulative frequency the 20.5th 40-49 39.5-49.5 18 80 44.5
801
14 6 30 score fall between 12 and 13. The median 50-59 49.5-59.5 12 92 54.5
654
12+13
15 3 33 score therefore = 2 = 12.5 total 92 3326
16 4 37 Solution
17 3 40 Median class 1/2(∑𝑓 + 1)= 1/2(92 + 1) = 46.5. Going by our table above,
Total 40 46.5 falls in cumulative frequency of 62 corresponding to the class 30-39. All
the information needed in the formula will be obtain in relation to the median
THE MEDIAN OF GROUPED FREQUENCY class. If you fail to identify the median class there is no way your answer will
The median of group frequency is estimated using the formula be correct.
𝑁 I. L = 29.5
− 𝑐𝑓𝑏 II. Class width(I) = 39.5-29.5 = 10
𝐿 + (2 )𝑖
𝑓𝑚 III. Frequency of the median class (FM) = 26
IV. Cumulative Frequency of class before the median class (cob) = 36
Where: 𝑁
−𝑐𝑓𝑏
1. L =lower true limit of the median class Substitute the values above in the formula 𝐿 + ( 2
)𝑖
𝑓𝑤
2. N = ∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
92
3. Then I= class width or class size − 36
4. Cfb cumulative frequency of the class before the median class 29.5 + ( 2 ) 10
26
5. Fm is the frequency of the median class
The median class is obtained by help of the formula 1/2(∑𝑓 + 1)th score 33.53
Steps to follow END OF CHAPTER EXERCISES
I. Determined the median class HND 2020
II. From the median class pick out the unknown in the formula 2.1 A national examination in Mathematics was taken by 8479 candidates and
III. Substitute the values in the formula the result is summarized in the grouped frequency distribution below.
Example 0– 11– 21– 31– 41– 51– 61– 71– 81– 91–
From the table below, find the median Marks
10 20 30 40 50 60 70 80 90 100
Table 15.2 the median of grouped data
No. of
46 216 822 1057 1492 1683 1522 1011 522 108
candidates
It is decided that 62.2% of the candidates should pass.
a) Calculate the necessary pass mark, regarding the marks as a continuous
variable.
b) Draw a cumulative frequency graph and hence estimate the range of marks

7
obtained by the central 80% of candidates. CHAPTER III
c) If the pass mark has been fixed for 55%, how many candidates would pass? RELATIVE POSITION INDICATORS
(3 + 7 + 2 marks)
(QUARTILES, DECILES, RANGE AND PERCENCILES)
HND 2022
Specific Objectives
The monthly earnings of computer engineers on part-time employment in a
At the end of this chapter, the student shall be able to
certain year in a firm are shown in the following table:
 To learn the concept of the relative position of an element of a data set.
Earnings (millions) 1–3 4–6 7–9 10–12 13–15  To learn the meaning of each of two measures, the percentile rank and decile
Number of workers (f) 2 4 8 5 1  To learn the meaning of the three quartiles associated to a data set and how
a) Calculate the mean, variance and standard deviation giving your answers to compute them.
correct to 2 decimal places.
(15 marks) QUARTILES
b) Calculate the median and mode of this data leaving your answers correct to Three points that divide a ranked distribution into four equal parts, each
2 decimal places. containing a quarter of the distribution, in other words, when a distribution is
(5 marks) divided into four equal parts, the three positions or points of division are
HND 2023 known as first quartile Q1, second quartile (Q2) third quartile (Q3)
Here’s the copied question exactly as it appears in your image: respectively.
STATISTICS (30 marks) First quartile (Q1) or 20th percentile
1. An inquiry in the salary of workers of an enterprise is concerned with the The first quartile is a point in a set of ordered scores from where we have one
sample of 58 workers. The results of the inquiry are shown in the table below: quarter (1/4) or 25% of the scores below and three quarter (3/4) or 75% of
Hourly Salary 1440– 1620– 1800– 1980– 2160– 2340– 2520– scores above it.
(CFA) 1620 1800 1980 2160 2340 2520 2700 Second quartile (Q2), 50th percentile, or median
The second quartile is the position in a set of ordered scores at which we have
Number of
X 2 4 8 11 15 Y half (1/2) or 50% of the scores below it and half (½) or 50% above it
workers
Third quartile (Q3) or 75th percentile
a) Knowing that the average hourly salary of 58 workers of the enterprise is The third quartile is the position in a group of ordered scores where we have
2302.76 CFA, determine the missing frequency values x and y.(9 Marks) three quarters (3/4) or 75% of the scores below it and one quarter (1/4) or 25%
b) Determine the median salary of the 58 workers of the enterprise.(4 Marks) above it

DETERMINING THE QUARTILES IN UNGROUPED FREQUENCY


DISTRIBUTION
𝑁 1 1
𝑄1 = + 𝑜𝑟 25% 𝑁 +
4 2 2
𝑁 1 1
𝑄2 = + 𝑜𝑟 50%𝑁 +
2 2 2
3𝑁 1 1
𝑄1 = + 𝑜𝑟 75%𝑁 +
4 2 2

8
The formulae remain the same but the procedure will change. N here will stand calculating the quartiles of group frequency distribution is to determine the
for ∑𝑓, cumulative frequency will be very important. Consider the example quartile class using the formula 1/4(∑𝑓 + 1) for first quartile, 1/2(∑𝑓 + 1 for
below. the second quartile and 3/4(∑𝑓 + 1), forQ3. After determining the quartile
Example1 class then the formulae below will be use to get the exact quartile values
𝑥
Find the first second and third quartiles in the distribution bellow 𝑁−𝑐𝑓𝑏
𝑄𝑥 = 𝐿 + (4 )𝑖 Where;
Score frequency Cumulative 𝑓𝑚
frequency 1
𝑁 − 𝑐𝑓𝑏 1. L =lower true limit of the quartile class
2 2 2 𝑄1 = 𝐿 + (4 )𝑖
4 3 5 𝑓𝑚 2. N = ∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
6 6 11 2
3. Cfb cumulative frequency of the class
8 5 16 𝑁 − 𝑐𝑓𝑏 before the Quartile class
𝑄2 = 𝐿 + (4 )𝑖
10 3 19 𝑓𝑚 4. Fm is the frequency of the quartile
12 4 23 3
class
14 3 25 𝑁 − 𝑐𝑓𝑏
Total ∑𝑓 =25 𝑄3 = 𝐿 + (4 )𝑖
𝑓𝑚
Table 16: determining the quartiles of ungrouped data
Solution NB Q2 is also the median thus whenever you are asking to calculate the
𝑁 1 median you can always use the value of the median. For this reason, we will
𝑄1 = + dwell more on the first and third quartile having in mind that the second
4 2
25 1 quartile has been extensively treated under the median.
𝑄1 = 4 + 2 = 6.75𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 . The 6.75th position lie in cumulative
Example1
frequency 11 therefore the second quartile (Q2) = 6 Using information from the table below, find the first second and third
𝑁 1 quartiles of the distribution.
𝑄2 = +
2 2 Marks f cf True limit Mid Fx
25 1
𝑄2 = 2 + 2 = 13 th position. The 13th score lies inside the cumulative points
frequency 16 therefore the Q2 = 8 5-10 2 2 4.5-10.5 7.5 15
3𝑁 1 11-16 8 10 10.5-16.5 13.5 108
𝑄3 = + 17-22 14 24 16.5-22.5 19.5 273
4 2
75 1
𝑄3 = 4 + 2 = 19.25𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛. The 19.25th position lie between cumulative 23-28 16 40 22.5-28.5 25.5 408
frequency 19 and 23 therefore Q3 lie between the score 10 and 12. 29-34 7 47 28.5-34.5 31.5 220.5
10 + 12 35-40 3 50 34.5-40.5 37.5 112.5
𝑄3 = = 11 Total ∑𝑓 ∑𝑓
2
=50 = 1137
DETERMINING THE QUARTILES IN A GROUPED FREQUENCY
DISTRIBUTION
To obtain the values of the quartiles in a group frequency, the formulae and
procedure will witness some modifications. The first thing to do when

9
Table 16.1 determining the quartiles of grouped data 1
𝑁 − 𝑐𝑓𝑏
Solution 𝐿 + (2 )𝑖
FIRST QUARTILE, LOWER QUARTILE OR 25TH PERCENTILE 𝑓𝑚
The score below which there are ¼ scores or 25% of scores below it in a 1
distribution when ordered. This score is obtain using the formula 50 − 24
1 22.5 + (2 )6
𝑁 − 𝑐𝑓𝑏 16
𝑄1 = 𝐿 + (4 )𝑖 1
𝑓𝑚 22.5 + (16) 6 =22.875
1. The first thing to do is to identify the class in which the first quartile Q1 The third quartile, upper quartile or the 75th percentile
1
lie using the formula 1/4(∑𝑓 + 1) ∼ 51 = 12.75𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 the The third quartile is the score below which lie 75 percent or ¾ of the scores
4 when ordered. To determine this, we start by identifying the class in which this
12.75th position lie in the cumulative frequency 24 which correspond to 3 3
the class 17-22. value fall using the formula 4 (∑𝑓 + 1) ∼ 4 51 = 38.25𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 which lie
2. The second thing to do is to identify the values in the formula in the cumulative frequency 40 and the class is 23-28
 L =lower true limit of the quartile class = 16.5 The second thing to do is to identify the values in the formula
 N = ∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 = 50 3
𝑁 − 𝑐𝑓𝑏
 Cfb cumulative frequency of the class before the Quartile class = 10 𝑄3 = 𝐿 + (4 )𝑖
 The class width (i) = 6 𝑓𝑚
 Fm is the frequency of the quartile class = 14  L =lower true limit of the quartile class = 22.5
3. The third thing to do is to substitute into the formula  N = ∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 = 50
1
𝑁 − 𝑐𝑓𝑏  Cfb cumulative frequency of the class before the Quartile class = 24
𝑄1 = 𝐿 + (4 )𝑖  The class width (i) = 6
𝑓𝑚
 Fm is the frequency of the quartile class = 16
1 3
50 − 10 50 − 24
𝑄1 = 16.5 + (4 )6 𝑄3 = 22.5 + (4 )6
14 16
2.5 13.5
𝑄𝑄1 = 16.5 + ( ) 6 = 16.5 + 1.071 = 17.571 𝑄3 = 22.5 + ( ) 6 = 22.5 + 5.0625 = 27.56
14 16
The second quartile, median, or 50th percentile
The first thing to do is to identify the second quartile class using the formula PERCENTILES
1
Q2 class = 1/2(∑𝑓 + 1) =Q2 class = 2 (50 + 1) = 25.5𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 the 25.5 When scores become so large so that dividing them using the quartiles or
decile no longer give the desired proportions, bigger position indicators like
position lie in the cum frequency 40 corresponding to the class 23-28
percentiles become inevitable. The first percentile denoted by (P1) is a point
The median class is in 23-28
or score below which lie 1% of the scores and above which lie 99% of the
Lower true limit (L) = 22.5
scores, the second percentile (P2) is a position or score below which lie 2% of
Class width(i) = 6
the score and above which lie 98% of the scores. In the same vein, the 99th
Frequency of the median class (fm) = 16
Cumulative frequency before (cfb) = 24
10
percentile (P99) is the score below which lie 99% of scores and above which 1
𝑃5 = 8.5 + 2 = 9th position
lie 1% of scores.
which falls in the cumulative frequency of 12 corresponding to the score 2
P15 is the point below which lie 15% of the scores and above which lie 85% of
DETERMINING THE PERCENTILES OF UNGROUPED
scores
FREQUENCY DISTRIBUTION 15 1
To determine the percentile of ungrouped data the following formula will help 𝑃15 = 100 170 + 2 26𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛.The 26th position lie in the cum frequency
𝑥
𝑃𝑥 = 100 ∑𝑓 + 2
1 51 corresponding to the score 6
At P50 there are 50% 0f scores below it and there are also 50% of the scores
Where x can take any value from 1,2,3,4……….99
above it
∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 50 1
Example 𝑃50 𝑐𝑙𝑎𝑠𝑠 = ∑𝑓 +
Estimate the 1st 15th 50th 75th percentiles of the scores below. 100 2
50 1
Score Frequency Cumulative 𝑃50 𝑐𝑙𝑎𝑠𝑠 = 100 170 + 2 = 85.5𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛. The 85.5th position falls in the
frequency cumulative frequency 109 corresponding to the score10
2 12 12 At P75 there are 75% of the scores below it and there are 25% of the scores
75 1
4 13 25 above it. 𝑃75 = 100 ∑𝑓 + 2
6 26 51 75 1
𝑃75 𝑐𝑙𝑎𝑠𝑠 = 100 170 + 2 = 127.5𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛. The 127th position falls in
8 25 76
10 33 109 cumulative frequency 143 corresponding to the score 12
12 34 143
14 27 170 DETERMINING THE PERCENTILES OF GROUPED DATA
To determine the percentiles of grouped frequency, the following formula is
Total ∑𝑓 =170
necessary
Solution 𝑥
P1 is the point or score below which lie 1% of scores and above which lie 99% 𝑁 − 𝑐𝑓𝑏
100
𝑃𝑥 = 𝐿 + ( )𝑖
scores 𝑓𝑚
1 1 Where x can take any value from 1,2,3………………….99
𝑃1 = ∑𝑓 +
100 2  L =lower true limit of the percentile class
1 1
𝑃1 = 170 +  N = ∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
100 2  Cfb cumulative frequency of the class before the percentile class
1
𝑃1 = 1.7 + 2 = 2.2𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛.  The class width (i)
The 2.2th position falls within the cumulative frequency of 12 corresponding to  Fm is the frequency of the percentile class
the score 2. P1 = 2 Example
P5 is a score or point below which lie 5% of the scores and above which lie Find P4, P20, P45, P90 from the scores below.
95% of scores Marks Frequency Cf True Mid Fx
5 1 limit points
𝑃5 = ∑𝑓 +
100 2 5-10 20 20 4.5-10.5 7.5 150
5 1 11-16 36 56 10.5-16.5 13.5 486
𝑃5 = 170 +
100 2 17-22 44 100 16.5-22.5 19.5 858
11
𝑥
23-28 60 160 22.5-28.5 25.5 1350 𝑁 − 𝑐𝑓𝑏
100
29-34 37 197 28.5-34.5 31.5 1165.5 𝑃𝑥 = 𝐿 + ( )𝑖
𝑓𝑚
35-40 33 230 34.5-40.5 37.5 1237.5 20
Total ∑𝑓 = 230 ∑𝑓𝑥 230 − 20
100
𝑃20 = 10.5 + ( )6
= 5247 36
Steps to follow
The first thing to do is to determine the class in which the percentile falls by 𝑃20 = 10.5 + 4.33 = 14.83
th
using the formula The 45 percentile has 45% of the scores below it and 55% of scores above it.
𝑥 It is calculated thus
𝑃𝑥 𝑐𝑙𝑎𝑠𝑠 = ∑𝑓 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 45
100 𝑃45 𝑐𝑙𝑎𝑠𝑠 = ∑𝑓 =
100
The second step to take is to identify all the values in the formula above. 45
Then finally you substitute the values in the formula to have the exact value 𝑃4𝑐𝑙𝑎𝑠𝑠 = 100 230 = 103.5 th score which lie in the cum frequency 108
Solution corresponding to the class 11-16
4  L =lower true limit of the percentile class 10.5
𝑃4𝑐𝑙𝑎𝑠𝑠 = ∑𝑓 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛  N = ∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 230
100
4
𝑃4 𝑐𝑙𝑎𝑠𝑠 = 100 230 = 9.2 th position which falls in the cum frequency 20  Cfb cumulative frequency of the class before the percentile class = 20
 The class width (i) = 6
corresponding to the class 5-10
 Fm is the frequency of the percentile class 36
 L =lower true limit of the percentile class 4.5 45
 N = ∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 230 𝑁 − 𝑐𝑓𝑏
 Cfb cumulative frequency of the class before the percentile class =0 𝑃45 = 𝐿 + (100 )𝑖
𝑓𝑚
 The class width (i) = 6
45
 Fm is the frequency of the percentile class = 20 230 − 20
100
4
𝑁 − 𝑐𝑓𝑏 𝑃45 = 10.5 + ( )6
100 36
𝑃4 = 𝐿 + ( )𝑖
𝑓𝑚
𝑃45 = 10.5 + 13.917 = 24.417 ∼ 24.42
4 At P90, 90% scores lie below while 10% scores lie above
230 − 0 90
100 𝑃90 𝑐𝑙𝑎𝑠𝑠 = 100 230 =
𝑃4 = 4.5 + ( )6
20
207𝑡ℎ 𝑠𝑐𝑜𝑟𝑒 𝑤ℎ𝑖𝑐ℎ 𝑙𝑖𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 230 corresponding to
At P20 there are 20% scores below and 80% above the class 35-40
20
𝑃20 𝑐𝑙𝑎𝑠𝑠 = 100 230 = 46𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛. the 46th position falls in cumulative  L =lower true limit of the percentile class 34.5
 N = ∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 230
frequency 56 corresponding to the class 11-16
 Cfb cumulative frequency of the class before the percentile class = 197
 L =lower true limit of the percentile class 10.5
 The class width (i) = 6
 N = ∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 230
 Fm is the frequency of the percentile class 33
 Cfb cumulative frequency of the class before the percentile class = 20
 The class width (i) = 6
 Fm is the frequency of the percentile class 36
12
90 28−22.5
𝑁 − 𝑐𝑓𝑏 100 + ( ) 60
6
𝑃90 = 𝐿 + (100 )𝑖 𝑃𝑅28 = ( ) 100
𝑓𝑚 230
90
230 − 197 PR28 = 67.39
𝑃90 = 34.5 + ( 100
)6 PR40 class =35-40
33  L =lower true limit of the percentile class = 34.5
𝑃90 = 34.5 + 1.81 = 36.31  N = ∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 230
The percentile ranks  Cfb cumulative frequency of the class before the percentile class = 197
The scores in a distribution may be transformed from its original scale to a  The class width (i) = 6
percentile range. When this is done, scores are no longer weighted according to  Fm is the frequency of the percentile class= 33
40−34.5
their original scale. E.g., 19 on 20 will no longer be weighted from this original 197 + ( 6 ) 33
scale but on a percentile scale. To look for the percentile rank of scores, we use 𝑃𝑅40 = ( ) 100
230
the following formula
𝑐𝑓𝑏 (
𝑥−𝐿
) 𝑓𝑚 PR40 = 98.8
𝑃𝑅𝑥 = ( 𝑖
) 100 Relationship between quartiles, deciles and percentiles
𝑁 The quartiles correspond to some deciles position and percentiles. For instant
Where: the first quartile Q1 is equal to the 2.5th decile, 25th percentile, the 2nd quartile
 X is the score you wish to establish percentile rank is the same as the 5th decile, median and the 50th percentile, the third quartile is
 Cfb is cumulative frequency before the class containing the score the same as 7.5th decile, the 75th percentile
 L is lower true limit of the class that contain the score
 N is sum of frequency or total number of scores
 Fm is the frequency of the class containing the score
For example, determined the percentile rank of 28 and 40 in the example above END OF CHAPTER EXERCISES
Solution 2. STATISTICS (30 Marks)
𝑥−𝐿
𝑐𝑓𝑏 + ( ) 𝑓𝑚 2.1 The table shows the marks, collected into groups, for 400 candidates in an
𝑖
𝑃𝑅𝑥 = ( ) 100 HND examination. The maximum mark was 99.
𝑁
0– 10– 20– 30– 40– 50– 60– 70– 80– 90–
P28 class = 23-28 Marks
9 19 29 39 49 59 69 79 89 99
 L =lower true limit of the percentile class = 22.5 No. of
 N = ∑𝑓 = 𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 230 10 25 45 65 80 70 55 30 15 5
Candidates
 Cfb cumulative frequency of the class before the percentile class = 100
 The class width (i) = 6 (i) Compile the cumulative frequency table and draw the cumulative frequency
 Fm is the frequency of the percentile class 60 curve. Use your curve to estimate: (ii) Median.
(iii) The 30th percentile.
(vi) If the minimum mark for Grade A was fixed at 74, estimate from your
curve the percentage of candidates who will obtain Grade A.
(6 + 4 + 4 + 4 marks)
13
CHAPTER 4 10 3 19
MEASURE OF DISPERSION Total 19
Specific objectives Table 19: the range of ungrouped data
At the end of this chapter the student shall be able to
 Judge the reliability of measures of central tendency Range =Hex – Lx
 Make a comparative study of the variability of two variables Highest score (Hx) = 10
 Identify the causes of variability with a view to control Lowest score (Lx) = 2
Range = 10 – 2 = 8
Measure of central tendency help us get a single value that enable us describe Example 3; find the range of the following group distribution.
and compare group of scores. Measure of dispersion measure how spread,
scattered or clustered scores are in a group. How distant is a score from one Marks f Cf True Mid Fx
another or from the central point is the responsibility of measures of dispersion. limit points
They exist many types of measures of dispersion but for the sake of this course 5-10 20 20 4.5-10.5 7.5 15
we shall look at; 11-16 36 56 10.5- 13.5 108
 The range 16.5
 The interquartile range 17-22 44 100 16.5- 19.5 273
 Semi interquartile range 22.5
 Mean deviation 23-28 60 160 22.5- 25.5 408
 Variance 28.5
 Standard deviation 29-34 37 197 28.5- 31.5 220.5
34.5
THE RANGE 35-40 33 230 34.5- 37.5 112.5
The range is the simplest measure of dispersion it is measured using the 40.5
formula 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒 for the case of raw data and Total ∑𝑓 =50 ∑𝑓
ungrouped data and for the case of grouped data it is the ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 − = 1137
𝑙𝑜𝑤𝑒𝑠𝑡 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡. Table 19.1: the range of grouped data
e.g. find the range in the distribution below; 19, 18, 20, 11, 12, 21, 22, 13, 15,
16, 14, 27, 18, 25, 23, 24, 24, 23, 26, 26, Highest midpoints = 37.5
highest score =26 Lowest midpoint = 7.5
lowest score =11 Range = highest midpoint – lowest midpoint
range = 26 – 11 = 15 Range = 37.5 – 7.5 = 30.
example 2 find the range in the following distribution
Score frequency Cumulative frequency THE INTERQUARTILE RANGE
2 2 2 The interquartile range give the range in which the middle 50% scores in a
4 3 5 distribution that has been ordered may be found. In other words it is the
6 6 11 average scores in a set of scores that has 25% of scores below it and 25%
8 5 16 above it when ordered. It is calculated Q3-Q1 since Q1 represent the first

14
quartile and Q3 represent the upper quartile. In our example above interquartile 35.23
range IQR = 27.56-17.57 = 9.99 mean deviation = = 2.71
13
SEMI-INTERQUARTILE RANGE MEAN DEVIATION FROM UNGROUP FREQUENCY
1
This is simply half of an interquartile range. It is calculated 𝑆𝐼𝑄𝑅 = 2 (𝑄3 − DISTRIBUTION
1 The data above could be summarized in a frequency table as follows
𝑄1) our example above, the 𝑆𝐼𝑄𝑅 = 2 9.99 = 4.995 Scores Frequency X - 𝑥̅ │ X - F│ X - 𝑥̅ │
MEAN DEVIATION 𝑥̅ │
The mean deviation measure the average distance of scores in a set of ordered 6 2 -4.23 4.23 8.46
scores from the mean of the distribution. In other words, it measures averagely 7 1 -3.23 3.23 3.23
how far or near each score is from the mean. 8 2 -2.23 2.23 4.46
MEAN DEVIATION OF RAW DATA 9 1 -1.23 1.23 1.23
Example 10 1 -0.23 0.23 0.23
Find the mean deviation of the following scores 11 0 0 0 0
6,6,7 ,8,9,10,12,12,13,13,14,14,14 12 2 1.77 1.77 3.54
6+6+7+8+8+10+12+12+13+13+14+14+14
Step1; calculate the mean = = 10.2 13 1 2.77 2.77 2.77
13
Step2; estimate the mean deviation 14 3 3.77 3.77 11.31
scores Mean X - 𝑥̅ │ X - 𝑥̅ │ Total 13 ∑f│ X - 𝑥̅ │=35.3
6 10.6 6-10.23 = -4.23 4.23 Table 20.1 mean deviation of ungrouped frequency distribution
6 10.6 6-10.23 = -4.23 4.23 ∑f│ X − 𝑥̅ │
7 10.6 7-10.23 = -3.23 3.23 mean deviation =
∑f
8 10.6 8-10.23 = -2.23 2.23 35.3
8 10.6 8-10.23 = -2.23 2.23 mean deviation = = 2.71
13
9 10.6 9-10.23 = -1.23 1.23 MEAN DEVIATION OF GROUP FREQUENCY
10 10.6 10-10.23 =- 0.23 0.23 To obtain the mean deviation of group frequency, we use the same formula and
12 10.6 12-10.23 =1.77 1.77 a procedure that is slightly different from the one use in ungroup frequency
12 10.6 12-10.23 =1.77 1.77 distribution.
13 10.6 13-10.23=2.77 2.77 For example, the scores below can be group as follows
14 10.6 14-10.23 = 3.77 3.77 Scores Frequency Midpoint FX │ X - F│ X -
14 10.6 14-10.23= 3.77 3.77 X ̅│
𝒙 ̅│
𝒙
14 10.6 14-10.23 = 3.77 3.77 6-8 5 7 35 3.23 16.15
total ∑│ X − 𝑥̅ │ 9-11 2 10 20 0.23 0.46
= 35.23 12-14 6 13 78 2.77 16.62
Table 20: determining the mean deviation of ungrouped data Total 13 133 33.23
∑│ X − 𝑥̅ │ Table 20.2: mean deviation of grouped data
mean deviation =
N ∑𝑓𝑥 133
𝑥̅ = ∑𝑓 = = 10.23
13

15
∑f│ X − 𝑥̅ │ STANDARD DEVIATION
mean deviation = This is a measure of dispersion measuring how spread or distant a score is from
∑f
33.23 the mean in a set of scores. The standard deviation is obtained by taking the
mean deviation = = 2.56 root of the variance. S =√𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
13
NB the value may not be exactly the same as in ungroup frequency because in To obtain the standard deviation we must first look for the variance, then look
group frequency we use the midpoints to represents the scores. But whatever for its root.
the case, the difference is hardly significant.
Calculating the variance and standard deviation of raw scores
VARIANCE To calculate the variance and standard deviation of raw scores, we use either
This is a measure of dispersion measuring the average squared distant from the the deviation formula or the raw score formula. E.g., find the variance and
scores to the mean in a distribution. In other words, the variance is the mean standard deviation of the scores below 6,6,7 ,8,9,10,12,12,13,13,14,14,14
(average) of sum of deviation. To obtain this, we square the deviations so as to Using the deviation formula
6+6+7+8+8+10+12+12+13+13+14+14+14
get only positive values and then sum and divide by the total frequency. It is Step1; calculate the mean = = 10.23
13
calculated using the formulae below Step 2 calculate and sum the squares of the deviations
Formula1 the deviation formula
a) Formula for raw data Scores Mean X-𝒙̅ ̅) 𝟐
(𝐗 − 𝒙
∑(𝑥 − 𝑥̅ )2 6 10.6 6-10.23 = -4.23 14.89
S2 =
N 6 10.6 6-10.23 = -4.23 14.89
Where x = scores 7 10.6 7-10.23 = -3.23 10.43
𝑥̅ = the mean 8 10.6 8-10.23 = -2.23 4.97
N = ∑ 𝑓= sum of frequency 8 10.6 8-10.23 = -2.23 4.97
S 2 = the variance 9 10.6 9-10.23 = -1.23 1.51
b) formula for ungrouped and grouped frequency distribution 10 10.6 10-10.23 =- 0.23 0.053
2
∑𝑓(𝑥 − 𝑥̅ )2 12 10.6 12-10.23 =1.77 3.13
S =
∑𝑓 12 10.6 12-10.23 =1.77 3.13
Formula 2 the raw score formula 13 10.6 13-10.23=2.77 7.67
formula for raw data 14 10.6 14-10.23 = 3.77 14.21
∑ 𝑥2 ∑𝑥 2 ∑𝑥 2
𝑆2 = - ( 𝑁 ) but since ( 𝑁 ) = 𝑥̅ 14 10.6 14-10.23= 3.77 14.21
𝑁
14 10.6 14-10.23 = 3.77 14.21
∑(𝑥−𝑥̅ )2
S=√ Total ∑(X − 𝑥̅ )2
N
Formula for ungrouped and grouped frequency distribution = 108.27
∑𝑓𝑥 2 ∑𝑓𝑥 2 ∑𝑓𝑥 2
2
𝑆 = - ( ∑𝑓 ) but since ( ∑𝑓 ) = 𝑥̅ Table 21.0 determining the variance and standard deviation of raw data
∑𝑓 ∑(𝑥−𝑥̅ )2
2 ∑𝑓𝑥 2 2 Variance S 2 =
𝑆 = – 𝑥̅ N
∑𝑓 108.27
S2 = = 8.33
13

16
S = √8.33 = 2.89 Scores f X-𝒙
̅ ̅) 𝟐
(𝐗 − 𝒙 ̅) 𝟐
F(𝐗 − 𝒙
Using the raw score formula
Step1; calculate the mean =
6+6+7+8+8+10+12+12+13+13+14+14+14
= 10.23 6 2 -4.23 17.89 35.79
13
Step 2 7 1 -3.23 10.43 10.43
Scores 𝒙𝟐 8 2 -2.23 4.97 9.95
6 36 9 1 -1.23 1.51 1.51
6 36 10 1 -0.23 0.053 0.053
7 49 11 0 0 0 0
8 64 12 2 1.77 3.13 6.26
8 64 13 1 2.77 7.67 7.67
9 81 14 3 3.77 14.21 42.64
10 100 Total 13
12 144 ∑𝑓(X − 𝑥̅ )2 =
12 144 114.27
13 169 Table 21.2 Calculating the variance and standard deviation for ungrouped
14 196 frequency distribution
14 196 Using the deviation formula
14 196 ∑𝑓(𝑥 − 𝑥̅ )2
S2 =
Total 1475 ∑𝑓
Table 21.1 Using the raw score formula 114.27
S2 = = 8.79
13
Formula for raw data
∑ 𝑥2 ∑𝑥 2 ∑𝑥 2 ∑𝑓(𝑥−𝑥̅ )2
𝑆2 = - ( 𝑁 ) but since ( 𝑁 ) = 𝑥̅ S=√ = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑁 ∑𝑓
∑ 𝑥2
𝑆2 = – 𝑥̅ 2 S = √8.79 = 2.96
𝑁
1475
𝑆2 = – 10.232
13 Scores(x) Frequency(f) 𝒙𝟐 𝒇𝒙𝟐
𝑆 2 = 113.46 – 104.67 = 8.79
6 2 36 72
7 1 49 49
∑ 𝑥2
S=√ – 𝑥̅ 2 = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 8 2 64 128
𝑁
9 1 81 81
S = √8.79 = 2.96
10 1 100 100
11 0 121 00
Calculating the variance and standard deviation for ungrouped frequency
12 2 144 288
distribution
13 1 169 169
Find the variance and standard deviation from the table below
14 3 196 588
17
Total ∑𝑓 =13 ∑𝑓𝑥 2 =1475 Using the raw score formula
Table 21.3 Using the raw score formula Mark Frequen Mid Fx 𝒙𝟐 𝒇𝒙𝟐
s cy points
∑𝑓𝑥 2 ∑𝑓𝑥 2 ∑ 𝑓𝑥 2 5-10 20 7.5 150 56.25 1125
𝑆2 = - ( ∑𝑓 ) but since ( ∑𝑓 ) = 𝑥̅
∑𝑓 11-16 36 13.5 486 182.25 6561
∑ 𝑓𝑥 2
𝑆2 = – 𝑥̅ 2 17-22 44 19.5 858 380.25 16731
∑𝑓
1475 23-28 60 25.5 1350 650.25 39015
𝑆 2 = 13 – 10.232 29-34 37 31.5 1165.5 992.25 36713.2
𝑆 2 = 113.46 – 104.65 = 8.79 5
S = √8.79 = 2.96 35-40 33 37.5 1237.5 1406.2 46406.2
Calculating the variance and standard deviation for grouped frequency 5 5
With information from the table below find the variance and standard deviation Total ∑𝑓 =230 ∑𝑓𝑥 ∑𝑓𝑥 2 =
of the distribution = 5247 146551.
Mark Frequ Mid Fx X-𝒙 ̅ (𝐗 𝒇(𝐗 5
s ency point − 𝒙̅) 𝟐
− 𝒙̅) 𝟐 Table 21.5: Using the raw score formula
s Mean = 22.81
5-10 20 7.5 150 15.31 234.4 4687.922 ∑𝑓𝑥 2 ∑𝑓𝑥 2 ∑ 𝑓𝑥 2
𝑆2 = - ( ∑𝑓 ) but since ( ∑𝑓 ) = 𝑥̅
11-16 36 13.5 486 9.31 86.68 3120.34 ∑𝑓
∑ 𝑓𝑥 2
17-22 44 19.5 858 3.3 10.89 479.16 2
𝑆 = – 𝑥̅ 2
∑𝑓
23-28 60 25.5 1350 2.69 7.24 434.17 146551.5
29-34 37 31.5 1165.5 8.69 75.52 2794.10 𝑆 2 = 230 – 22.81
35-40 33 37.5 1237.5 14.70 216.09 7130.97 𝑆 2 = 637.18 – 520.435 = 116.75
Total ∑𝑓 ∑𝑓𝑥 ∑𝑓(X S = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
=230 = 5247 − 𝑥̅ )2 S = √116.75 = 10.8
= 18646.662 END OF CHAPTER QUESTIONS
Table 21.4 Calculating the variance and standard deviation for grouped HND 2021
frequency 2.2 The following table summarizes the masses, measured to the nearest
microgram (μg), of 200 microchips of the same type.
Using the deviation formula Mass (μg) 70–79 80–84 85–89 90–94 95–99 100–109
∑𝑓𝑥 1137
Mean 𝑥̅ = ∑𝑓 = 50 = 22.81 Frequency 7 30 66 57 27 13
2
∑𝑓(𝑥 − 𝑥̅ )2 (i) Calculate estimates of the median and upper quartile of the distribution.
S =
∑𝑓 (ii) Estimate the number of microchips whose actual masses are less than 81
15219.2808 μg.
S2 = = 81.072 (iii) Calculate estimates of the mean and the standard deviation of the
50
S = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 distribution.(5 + 2 + 5 marks)
S = √81.072 = 9.00
18
CHAPTER 5
BIVARIATE DATA
ANALYSIS

19
20
21
END OF CHAPTER QUESTIONS
HND 2024
HND 2025
1. The following table summarizes the masses, measured to the nearest Temperature
microgram (μg), of 200 microchips of the type. 250 270 290 310 330 350
(x)
Density (y) 1.955 1.935 1.890 1.920 1.895 1.865
Mass (μg) 70–79 80–84 85–89 90–94 95–99 100–109
Frequency 7 30 66 57 27 13 (a) Plot a graph of y against x showing these points (the graph is called a scatter
diagram).(4 marks)
(i) Calculate estimates of the median and upper quartile of the distribution.
(ii) Estimate the number of microchips whose actual masses are less than 81 (b) Determine the equation of the regression line of y on x.(6 marks)
μg.
(iii) Calculate estimates of the mass and the standard deviation of the (c) Determine the equation of the regression line of x on y.(2 marks)
distribution.(5 + 3 + 5 = 13 marks)
(d) Plot the regression lines in (b) and (c) on the same graph as the scatter
2. The marks obtained by 8 SWE students in Discrete Mathematics and Digital diagram in (a).(4 marks)
Electronics at a given HND examination are tabulated below:
(e) Calculate the product moment correlation coefficient for the distribution.
Student A B C D E F G H (6 marks)
Maths Marks (x) 45 23 27 33 18 0 8 14
(f) Calculate the Spearman’s correlation coefficient of rank.(4 marks)
D. Electronics Marks (y) 31 20 18 33 19 1 13 9
(g) Calculate the Kendall correlation coefficient of ranks.(4 marks)
a) Find the equation of the least square regression line y on x.
b) Calculate the product moment correlation coefficient.
c) Calculate the minimum sum of squares of residuals of y on x.

22
23

You might also like