You are on page 1of 31

CHAPTER ONE

INTRODUCTION TO STATISTICS
1.1 Definition and classification of Statistics
The word statistics is defined in different ways depending on its use in the plural and singular
sense.
In the plural sense: - statistics is defined as the collection of numerical facts or figures (or the raw
data themselves).
Eg. 1. Vital statistics (numerical data on marriage, births, deaths, etc).
2. The average mark of statistics course for students is 70% would be considered as a
statistics whereas Abebe has got 90% in statistics course is not statistics.
Remark: statistics are aggregate of facts. Single and isolated figures are not statistics as they
cannot be compared and are unrelated.
In its singular sense:- the word Statistics is the subject that deals with the methods of collecting,
organizing, presenting, analyzing and interpreting statistical data.
Classification of Statistics
Statistics is broadly divided into two categories based on how the collected data are used.
Descriptive Statistics: - deals with describing the data collected without going further conclusion.
Example 1.1: Suppose that the mark of 6 students in Statistics course for Mathematics is given as
40, 45, 50, 60, 70 and 80. The average mark of the 6 students is 57.5 and it is considered as
descriptive statistics.
Inferential Statistics:- It deals with making inferences or conclusions about a population based
on data obtained from a sample of observations. It consists of performing hypothesis testing,
determining relationships among variables and making predictions.
Example 1.2: In the above example, if we say that the average mark in Statistics course for
Mathematics students is 57.5, then we talk about inferential statistics (draw conclusion based on
the sample observation).

1.2 Stages of Statistical Investigation


The area of statistics points out the following five stages. These are collection, organization,
presentation, analysis and interpretation of data.
Collection of data: This is the process of obtaining measurements or counts or obtaining raw data.

1|Page
Data can be collected in a variety of ways; one of the most common methods is through the use of
sample or census survey. Survey can also be done in different methods, three of the most common
methods are:
 Telephone survey
 Mailed questionnaire
 Personal interview.
Organization of data: - Data collected from published sources are generally in organized form.
However if an investigator has collected data through a survey, it is necessary to edit these data in
order to correct any apparent inconsistencies, ambiguities, and recording errors.
This phase also includes correcting the data for errors, grouping data into classes and tabulating.
Presentation of data:- After the data have been collected and organized they can be presented in
the form of tables, charts, diagrams and graphs. This presentation in an orderly manner facilitates
the understanding as well as analysis of data.
Analysis of data: - the basic purpose of data analysis is to dig out useful information for decision
making. This analysis may simply be a critical observation of data to draw some meaningful
conclusions about it or it may involve highly complex and sophisticated mathematical techniques.
Interpretation of data: - Interpretation means drawing conclusions from the data collected and
analyzed. Correct interpretation will lead to a valid conclusion of the study & thus can aid in
decision making.
1.3 Definition of some statistical terms
Population: - It is the totality of objects that are being studied
Examples:
 All clients of Telephone Company
 All students of Mettu University (MeU)
 Population of families, etc.
The population could be finite or infinite (an imaginary collection of units).
Sample: - is part or subset of population under study
Sampling frame: - is the list of all possible units of the population that the sample can be drawn
from it.
Eg. List of all students of MeU, List of all residential houses in Mettu town, etc

2|Page
Survey: - is an investigation of a certain population to assess its characteristics. It may be census
or sample.
Census survey: a complete enumeration of the population under study.
Sample survey: the process of collecting data covering a representative part or portion of a
population.
Parameter: - is a statistical measure of a population, or summary value calculated from a
population. Examples: Average, Range, proportion, variance, etc
Statistic: - is a descriptive measure of a sample, or it is a summary value calculated from a sample.
Sampling: - The process or method of sample selection from the population.
Sample size: - The number of elements or observation to be included in the sample.
An element: - is a member of sample or population. It is specific subject or object (for example a
person, firm, item, etc.) about which the information is collected.
Variable: - It is an item of interest that can take numerical or non-numerical values for different
elements. It may be qualitative or quantitative. Example: age, weight, sex, marital status, etc.
Observation (measurement):- is the value of a variable for an element.
Qualitative variables:- are variables that assume non-numerical values. They can be categorized
and they are usually called attributes. Example: - Sex, marital status, ID number, etc.
Quantitative variables: - are variables which assume numerical values. eg. Age, weight, etc.
1.4 Applications of Statistics in Economics:
Statistics can be applied in any field of study which seeks quantitative evidence. For instance,
engineering, economics, natural science, etc.
In Economics: Statistics are widely used in economics study and research.
 To measure and forecast Gross National Product (GNP)
 Statistical analyses of population growth, inflation rate, poverty, unemployment figures,
rural or urban population shifts and so on influence much of the economic policy making.
 Financial statistics are necessary in the fields of money and banking including consumer
savings and credit availability.
1.5 Levels of Measurement
Proper knowledge about the nature and type of data to be dealt with is essential in order to specify
and apply the proper statistical method for their analysis and inferences.
Scale Types

3|Page
Measurement is the assignment of values to objects or events in a systematic fashion. Four levels
of measurement scales are commonly distinguished: nominal, ordinal, interval, and ratio and each
possessed different properties of measurement systems. The first two are qualitative while the last
two are quantitative.
Nominal scale: The values of a nominal attribute are just different names, i.e., nominal attributes
provide only enough information to distinguish one object from another. Qualities with no ranking
or ordering; no numerical or quantitative value. These types of data are consists of names, labels
and categories. This is a scale for grouping individuals into different categories.
Example 1.3: Eye color: brown, black, etc, sex: male, female.
 In this scale, one is different from the other
 Arithmetic operations (+, -, *, ÷) are not applicable, comparison (<, >, ≠, etc) is impossible
Ordinal scale: - defined as nominal data that can be ordered or ranked.
 Can be arranged in some order, but the differences between the data values are
meaningless.
 Data consisting of an ordering of ranking of measurements are said to be on an ordinal
scale of measurements. That is, the values of an ordinal scale provide enough information
to order objects.
 One is different from and greater /better/ less than the other
 Arithmetic operations (+, -, *, ÷) are impossible, comparison (<, >, ≠, etc) is possible.
Example 1.4: Letter grading (A, B, C, D, F), -Rating scales (excellent, very good, good, fair,
poor), military status (general, colonel, lieutenant, etc).
Interval Level: data are defined as ordinal data and the differences between data values are
meaningful. However, there is no true zero, or starting point, and the ratio of data values are
meaningless. There is no true zero. For example, IQ tests do not measure people who have no
intelligence. For temperature, 00F does not mean no heat at all.
In this measurement scale:-
 One is different, better/greater and by a certain amount of difference than another.
 Possible to add and subtract. For example; 8000c – 500c = 3000c, 7000c – 4000c = 3000c.
 Multiplication and division are not possible. For example; 600c = 3(200c). But this does
not imply that an object which is 600c is three times as hot as an object which is 200c.
Most common examples are: IQ, temperature.

4|Page
Ratio scale: Similar to interval, except there is a true zero (absolute absence), or starting point,
and the ratios of data values have meaning.
 Arithmetic operations (+, -, *, ÷) are applicable. For ratio variables, both differences and
ratios are meaningful.
 One is different/larger /taller/ better/ less by a certain amount of difference and so much
times than the other.
 This measurement scale provides better information than interval scale of measurement.
Example 1.5: weight, age, number of students.

5|Page
CHAPTER TWO
2. Methods of data collection and presentation
2.1 Methods of Data Collection
Data: - is the raw material of statistics. It can be obtained either by measurement or counting.
Sources of data
There are two types of source of data:
1. Primary source
2. Secondary source
The statistical data may be classified under two categories depending up on the sources.
1. Primary data: - Data collected by the investigator himself for the purpose of a specific inquiry
or study. Such data are original in character & are mostly generated by surveys conducted by
individuals or research institutions.
It is more reliable & accurate since the investigator can extract the correct information by
removing doubts, if any, in the minds of the respondents regarding certain questions.
2. Secondary data: - When an investigator uses data, which have already been collected by
others, such data are called secondary data. Such data are primary data for the agency that
collected them, and become secondary for someone else who uses these data for his own
purposes. Example of secondary data: books, reports, magazines, etc.
When our source is secondary data check that:
 The type and objective of the situations.
 The purpose for which the data are collected and compatible with the present problem.
 The nature and classification of data is appropriate to our problem.
 There are no biases and misreporting in the published data.
Note: Data which are primary for one may be secondary for the other.
2.2 Methods of Data Presentation
Having collected and edited the data, the next important step is to organize it. That is to present it
in a readily comprehensible condensed form that aids in order to draw inferences from it. It is also
necessary that the like be separated from the unlike ones.
The presentation of data is broadly classified in to the following two categories:
 Tabular presentation/ Frequency distribution
 Diagrammatic and Graphic presentation.

6|Page
The process of arranging data in to classes or categories according to similarities technically is
called classification. It eliminates inconsistency and also brings out the points of similarity or
dissimilarity of collected items/data.
Classification is necessary because it would not be possible to draw inferences and conclusions if
we have a large set of collected [raw] data.
2.2.1 Frequency distribution
Frequency: - is the number of times a certain value or class of values occurs.

Frequency distribution (FD):- is the organization of raw data in table from using classes and
frequency.

There are three types of FD and there are specific procedures for constructing each type.

The three types are:-

I. Categorical FD
II. Ungrouped FD and
III. Grouped FD
1. Categorical FD: Used for data that can be placed in specific categories; such as nominal,
ordinal level of data. Each category of the variable represents a single class and the number of
times each category repeats represents the frequency of that class (category)
Example 2.1: Twenty five patients were given a blood test to determine their blood type. The
data is as shown below: A B B AB O A O O B AB B B B O A O O O AB AB A O O B A.
Solution: since the data are categorical by taking the four blood types as classes we can
construct a FD as shown below.
Step 1: Make a table as shown below

CLASS TALLY FREQUANCY PERCENRT


A
B
AB
O

Step 2: Tally data and place the result under the column Tally
Step 3: Count the tallies and place the result under the column Frequency.
7|Page
Step 4: find the percentage of values in each class by the formula (%= f/n * 100%; f= frequency,
n total number of observation.)

CLASS TALLY FREQUANCY PERCENRT


A //// 5 5/25* 100 = 20%
B //// // 7 28%
AB //// 4 16%
O //// //// 9 9/25*100 = 36%

II. Ungrouped Frequency Distribution (UFD)


A FD of numerical data (quantitative) in which each value of a variable represents a single class.
The values of the variable are not grouped and the number of times each value repeats represents
the frequency of that class.
Constructing ungrouped frequency distribution:
 First find the smallest and largest raw score in the collected data.
 Arrange the data in order of magnitude and count the frequency.
 To facilitate counting one may include a column of tallies.

Example 2.2: The following data represent the mark of 20 students.

80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Construct a frequency distribution, which is ungrouped.
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data.
Step 4: Compute the frequency.

Mark 60 62 63 65 70 74 75 76 80 85 90

8|Page
Tally // // // // // // // // // // /
Frequency 2 2 2 2 2 2 2 2 2 2 1

Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.
3. Grouped Frequency Distribution (GFD): FD of numerical data in which several values of a
variable are grouped into one class. When the range of the data is large the data must be
grouped in to classes that are more than one unit in width. The number of observations
belonging to the class is the frequency of the class Definition of some basic terms
 Grouped frequency distribution: is a FD when several numbers are grouped into one class.
 Class limits (CL): It separates one class from another. The limits could actually appear in
the data and have gaps between the upper limits of one class and the lower limit of the next
class.
 Unit of measure (U): This is the possible smallest difference between successive values.
E.g. 1, 0.1, 0.01, 0.001……
 Class boundaries: Separate one class in a grouped frequency distribution from the other.
The boundary has one more decimal place than the raw data. There is no gap between the
upper boundaries of one class and the lower boundaries of the succeeding class. Lower class
boundary is found by subtracting half of the unit of measure from the lower class limit and
upper class boundary is found by adding half unit measure to the upper class limit.
 Class width (W): The difference between the upper and lower boundaries of any consecutive
class. The class width is also the difference between the lower limit or upper limits of two
consecutive classes.
 Class mark (Mid point): It is found by adding the lower and upper class limit (Boundaries)
and divided the sum by two.
 Cumulative frequency (CF): It is the number of observation less than the upper class
boundary or greater than the lower class boundary of class.
 CF (Less than type): it is the number of values less than the upper class boundary of a given
class.
 CF (Greater than type): it is the number of values greater than the lower class boundary of
a given class.

9|Page
 Relative frequency (Rf ):The frequency divided by the total frequency. This gives the
percent of values falling in that class.
Rfi = fi/n= fi/∑fi

 Relative cumulative frequency (RCf): The running total of the relative frequencies or the
cumulative frequency divided by the total frequency gives the percent of the values which
are less than the upper class boundary or the reverse.

CRfi = Cfi/n= Cfi/∑fi

STEPS IN CONSTRUCTING A GFD


1. Find the highest and the smallest value
2. compute the range; R = H – L
3. Select the number of class desired (K)
I. Choose arbitrary between 5 and 15.
II. Using sturgles formula
K= 1 + 3.322Log n; n= Total frequency
4. Find the class width (W) by dividing the range by the number of classes and round to the
nearest integer.
W = R/K
5. Identify the unit of measure usually as 1, 0.1, 0.01,…..
6. Pick a suitable starting point less than or equal to the minimum value. Your starting point
is lower limit of the first class.
- Then continue to add the class width to get the rest lower class limits.
7. Find the upper class limits UCLi = LCLi+ w-U. then continue to add width to get the rest
upper class limits
8. find class boundaries
LCBi = LCLi – ½ U, UCBi = UCLi + ½ U
9. Find class mark
CMi = (UCLi + LCLi) / 2 or CMi = (UCBi + LCBi) / 2.
10. Tally the data

10 | P a g e
11. Find the frequencies
12. Find the cumulative frequencies. Depending on what you are trying to accomplish, it may
be necessary to find the cumulative frequency.
13. If necessary find RF and RCF.
When grouping data the following rules are important:
 The groups must not overlap, otherwise there is confusion concerning in which group a
measurement belongs.
 There must be continuity from one group to the next, which means that there must be no gaps.
Otherwise some measurements may not fit in a group.
 The groups must range from the lowest measurement to the highest measurement so that all
of the measurements have a group to which they can be assigned.
 The groups should normally be of an equal width, so that the counts in different groups can
easily be compared.
Example 2.3: Construct FD for the following data.
11 29 33 22 27 19 22 21 18 17 22 26 39 27 6 34 13 20
Solution:-
1) Highest value = 39, Lowest value = 6
2) Range = 39 – 6 = 33
3) K = 1+ 3.322Log20 = 1 + 3.322(2.301) = 5.6 ≈ 6
4) W = R / K = 33/6 = 5.5 ≈ 6
5) U = 1
6) LCL1= 6
7) Find the upper class limits.
8) Find class boundaries
9) Find class mark
10) Tally the data
Class Class Class Tally Frequency CF(<) CF(>) RF RCF(>)
limit boundary Mark
6 – 11 5.5 – 11.5 8.5 // 2 2 20 2/20=0.1 1
12 – 17 11.5 – 17.5 14.5 // 2 4 18 2/20=0.1 0.9

11 | P a g e
18 – 23 17.5 – 23.5 20.5 ///// // 7 11 16 7/20=0.35 0.8
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 4/20=0.2 0.45
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 3/20=0.15 0.25
36 – 41 35.5 – 41.5 38.5 // 2 20 2 2/20=0.1 0.10

2.2.2 Diagrammatic presentation of data: Bar charts, Pie-chart, Cartograms


The most convenient and popular way of describing data is using graphical presentation. It is easier
to understand and interpret data when they are presented graphically than using words or a
frequency table. A graph can present data in a simple and clear way. Also it can illustrate the
important aspects of the data. This leads to better analysis and presentation of the data. In this
article, we discuss the approach for the most commonly used diagrammatic or graphical methods
such as bar chart, pie chart, histogram, frequency polygon and cumulative frequency polygon.
The three most commonly used diagrammatic presentation for discrete as well as qualitative data
are:
 Pie chart
 Bar chart
 Pictogram
A) Pie chart

A pie chart is a circle that is divided in to sections or wedges according to the percentage of
frequencies in each category of the distribution. The angle of the sector is obtained using:

𝑉𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑎𝑟𝑡


𝐴𝑛𝑔𝑙𝑒 𝑜𝑓 𝑎 𝑠𝑒𝑐𝑡𝑜𝑟 = ∗ 3600
𝑇ℎ𝑒 𝑤ℎ𝑜𝑙𝑒 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦

Example 2.4: Draw a suitable diagram to represent the following population in a town.

Men Women Girls Boys


2500 2000 4000 1500
Solutions:

Step 1: Find the percentage.

Step 2: Find the number of degrees for each class.

12 | P a g e
Step 3: Using a protractor and compass, graph each section and write its name with corresponding
percentage.

Class Frequency Percent Degree


Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys 1500 15 54
Total 10000 100 360

Boys Men
15% 25%

Girls Women
40% 20%

B) Bar Charts
 Used to represent & compare the frequency distribution of discrete variables and attributes
or categorical series.
 Bars can be drawn either vertically or horizontally.

In presenting data using bar diagram,

 All bars must have equal width and the distance between bars must be equal.
 The height or length of each bar indicates the size (frequency) of the figure represented.

There are different types of bar charts. The most common being:

 Simple bar chart


 Component or sub divided bar chart.
 Multiple bar charts.

13 | P a g e
I. Simple bar chart
 Are used to display data on one variable.
 They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.

Example 2.5: Number of students in the four department of Science College given as follows:

Department Physics Maths Chemistry Biology

Number of students 200 400 450 600

Male 170 350 250 200

Female 30 50 200 400

Draw a simple bar chart of the number of students by department.

Solution:
Simple bar chart

800 600
Frequency

600 450
400
400 200
200
0
Phys Maths Chem Bio
Deprtm ent

II. Component Bar chart


 When there is a desire to show how a total (or aggregate) is divided in to its component
parts, we use component bar chart.
 The bars represent total value of a variable with each total broken in to its component parts
and different colors or designs are used for identifications

Example 2.6: Draw a component (sub-divided) bar chart of the number of students by department
is given in the example 2.5.

Solution:
14 | P a g e
Sub-divided bar chart

800
600 Female
Frequency 400 Male
200
0
Phys Maths Chem Bio
Department

III. Multiple Bar charts


 These are used to display data on more than one variable.
 They are used for comparing different variables at the same time.

Example 2.7: The following data represent sales by product, 1957- 1959 of a given company for
three products A, B, C.

Product Sales in ($)

1957 1958 1959


A 12 14 18
B 24 21 18
C 24 35 54
Draw a multiple bar chart to represent the sales by product from 1957 to 1959.

Solution:

C) Pictograph

15 | P a g e
In this diagram, we represent data by means of some picture symbols. We decide about a suitable
picture to represent a definite number of units in which the variable is measured.

2.2.4 Graphical Presentation of data

The histogram, frequency polygon and cumulative frequency graph or ogive is most commonly
applied graphical representation for continuous data.

Procedures for constructing statistical graphs:

 Draw and label the X and Y axis.


 Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y axis.
 Represent the class boundaries for the histogram or ogive or the mid points for the frequency
polygon on the X axis.
 Plot the points.
 Draw the bars or lines to connect the points.
Histogram
A graph which displays the data by using vertical connected bars of various heights to represent
frequencies. Class boundaries are placed along the horizontal axis. Class marks and class limits
are sometimes used as quantity on the X axis.
Example 2.8: Construct a histogram to represent the following data.

Class 15-24 25-34 35-44 45-54 55-64 65-74 75-84


limits
Frequency 3 4 10 15 12 4 2

Solution:

16 | P a g e
Histogram
Frequency
20
15
15 12
10
10
4 4
5 3 2

0
Class boundaries

Frequency polygon

If we join the mid-points of the tops of the adjacent rectangles of the histogram with line segments
a frequency polygon is obtained. When the polygon is continued to the x-axis just outside the range
of the lengths the total area under the polygon will be equal to the total area under the histogram.
Example 2.9: Construct a frequency polygon to represent the previous data in example 2.8.
Solution:
Class Frequency Class Class R.F. % R.F. Less than More than
limits marks boundaries C.F. C. F.
(percent)

15 - 24 3 19.5 14.5 - 24.5 0.06 6% 3 50

25 – 34 4 29.5 24.5 - 34.5 0.08 8% 7 47

35 - 44 10 39.5 34.5 - 44.5 0.20 20% 17 43

45 - 54 15 49.5 44.5 - 54.5 0.30 30% 32 33

55 - 64 12 59.5 54.5 - 64.5 0.24 24% 44 18

65 - 74 4 69.5 64.5 - 74.5 0.08 8% 48 6

75 - 84 2 79.5 74.5 - 84.5 0.04 4% 50 2

Total 50 1.00 100%

Adding two class marks with f i  0 , we have 9.5 at the beginning, and 89.5 at the end, the
following frequency polygon is plotted:

17 | P a g e
Frequency Polygon
20
F
r
15
e
q
10
u
e
n 5
c
y 0
9.5 19.529.539.549.559.569.579.589.5

Class mark

Ogive (cumulative frequency polygon)

An Ogive (pronounced as “oh-jive”) is a line that depicts cumulative frequencies, just as the
cumulative frequency distribution lists cumulative frequencies. Note that the Ogive uses class
boundaries along the horizontal scale, and graph begins with the lower boundary of the first class
and ends with the upper boundary of the last class. Ogive is useful for determining the number of
values below or above some particular value. There are two type of Ogive namely less than Ogive
and more than Ogive. The difference is that less than Ogive uses less than cumulative frequency
and more than Ogive uses more than cumulative frequency on y axis.

Example 2.10: Draw a both types of ogives for the F.D. of Example 2.8.

Solutions:

The Less than Ogive The More than Ogive


Cumulative Frequency
60 60
50 50
Cumulative
Frequency

40 40
30 30
20 20
10 10
0
0
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
Class Boundaries
Class Boundaries

Note: For both ogives, one class with frequency zero is added for similar reason with the
frequency polygon.

18 | P a g e
CHAPTER THREE
3. Measures of Central Tendency
Objectives

• To condense a mass of data into one single value.

• To facilitate comparison.

Desirable Properties of Good MCT

• It should be calculated based on all observations.

• It should not be affected by extreme values.

• It should be unique.

19 | P a g e
• It should always exist.

• It should be easy to understand calculate.

Measures of Central Tendency:- give us information about the location of the center of the
distribution of data values. A single value that describes the characteristics of the entire mass of
data is called measures of central tendency

The following are types of Central Tendency which are suitable for a particular type of data. These
are
 Arithmetic Mean
 Geometric Mean
 Harmonic Mean
 Median
 Mode or modal value
3.3.1 Arithmetic Mean:- Arithmetic mean is defined as the sum of the measurements of the
items divided by the total number of items. It is usually denoted by 𝑥̅ .
Arithmetic Mean for individual series
Suppose 𝑥1 , 𝑥2 , … , 𝑥𝑛 are observed values in a sample of size n from a population of size N, n<N
then the arithmetic mean of the sample, denoted by 𝑥̅ is given by
𝑥1 + 𝑥2+ … +𝑥𝑛 ∑𝑛
𝑖=1 𝑥𝑖
𝑥̅ = =
𝑛 𝑛

If we take an entire population the mean is denoted by μ and is given by:


𝑋1 + 𝑋2+ … +𝑋𝑁 ∑𝑁
𝑖=1 𝑋𝑖
𝜇= =
𝑁 𝑁

Where N stands for the total number of observations in the population.


Example 3.2: Consider the samples given below:
i. 46 54 21 35
ii. 10.5 2.4 3.6 5.9 8.7
Find the arithmetic mean
Solution:
i. The sample values are: 46 54 21 35
∑𝑛
𝑖=1 𝑥𝑖 46+ 54+21+35 156
𝑥̅ = = = = 39
𝑛 4 4

20 | P a g e
The arithmetic mean for sample value is 39.
ii. The sample values are: 10.5 2.4 3.6 5.9 8.7
∑𝑛
𝑖=1 𝑥𝑖 10.5+ 2.4+3.6+ 5.9+ 8.7 31.1
𝑥̅ = = = = 6.22
𝑛 5 5

The arithmetic mean for sample value is 6.22.

Arithmetic mean for discrete data arranged in frequency distribution

When the numbers 𝑥1 , 𝑥2 , … , 𝑥𝑘 occur with frequencies 𝑓1 , 𝑓2 , … , 𝑓𝑘 , respectively, then the mean
can be expressed in a more compact form as:
𝑥1 𝑓1 +𝑥2 𝑓2 + …+𝑥𝑘 𝑓𝑘 ∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = = ∑𝑘
𝑓1 +𝑓2 + …+ 𝑓𝑘 𝑖=1 𝑓𝑖

Example 3.3: Calculate the arithmetic mean of the sample of numbers of students in 10 classes:
50 42 48 60 58 54 50 42 50 42
∑𝑛
𝑖=1 𝑥𝑖 50+42+48+60+58+54+50+42+50+42 496
𝑥̅ = = = = = 49.6 ≈ 50
𝑛 10 10

In this case there are three 42’s, one 48, three 50’s, one 54, one 58 and one 60. The number of
times each number occurs is called its frequency and the frequency is usually denoted by f. The
information in the sentence above can be written in a table, as follows.
Value, xi 42 48 50 54 58 60

Frequency, 3 1 3 1 1 1
fi

xifi 126 48 150 54 58 60

The formula for the arithmetic mean for data of this type is
𝑥1 𝑓1 +𝑥2 𝑓2 + …+𝑥𝑘 𝑓𝑘 ∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = = ∑𝑘
𝑓1 +𝑓2 + …+ 𝑓𝑘 𝑖=1 𝑓𝑖

In this case we have:


42𝑥3 + 48𝑥1 + 50𝑥3 + 54𝑥1+58𝑥1+60𝑥1 126+48 + 150+54+58+60 496
𝑥̅ = = = = 49.6 ≈ 50
3+1+3+1+1+1 10 10

The mean numbers of students in ten classes is 50.

Arithmetic Mean for Grouped Continuous Frequency Distribution

21 | P a g e
If data are given in the form of continuous frequency distribution, the sample mean can be
computed as
∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖 𝑥1 𝑓1 +𝑥2 𝑓2 + …+𝑥𝑘 𝑓𝑘
𝑥̅ = ∑𝑘
= where 𝑥𝑖 is the class mark of the ith class; i=1, 2. . . K, 𝑓𝑖 is the
𝑖=1 𝑓𝑖 𝑓1 +𝑓2 + …+ 𝑓𝑘

frequency of the it class and k is the number of classes


Note that ∑𝑘𝑖=1 𝑓𝑖 = n = the total number of observations.
Example 3.4: The following frequency table gives the height (in inches) of 100 students in a
college.
Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total

Frequency (f) 5 18 42 20 8 7 100

Calculate the mean

Solution:
The formula to be used for the mean is as follows:

∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = ∑𝑘
𝑖=1 𝑓𝑖
Let us calculate these values and make a table for these values for the sake of convenience.

Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total

Frequency (f) 5 18 42 20 8 7 100

Mid-Point (𝑥𝑖 ) 61 63 65 67 69 71

𝑓𝑖 𝑥𝑖 305 1134 2730 1340 552 497 6558

Substituting these values with ∑6𝑖=1 𝑓𝑖 = 100, we get


∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖 6558
𝑥̅ = ∑𝑘
= 𝑥̅ = = 65.58
𝑖=1 𝑓𝑖 100

The mean height of students is 65.58

Properties of the Arithmetic Mean


• The algebraic sum of the deviations of a set of numbers 𝑥1 , 𝑥2 , … , 𝑥𝑛 from their mean x is
always zero. i.e.
n

 ( x  x)  0
i 1
i

22 | P a g e
n
• The sum of squares of deviations from the mean is the least. That is,  ( x  A)
i 1
i
2
is minimum

when A  x .

 If the mean of 𝑥1 , 𝑥2 , … , 𝑥𝑛 is 𝑥̅ , then


a) The mean of 𝑥1 ± k, 𝑥2 ± k ,..., 𝑥𝑛 ± k will be 𝑥̅ ± k
b) The mean of 𝑘𝑥1 , 𝑘𝑥2 , … , 𝑘𝑥𝑛 will be k 𝑥̅ .
3.3.2 Geometric Mean
The geometric mean like arithmetic mean is calculated average. It is used when observed values are
measured as ratios, percentages, proportions, indices or growth rates.

Geometric mean for individual series: The geometric mean, G.M. of an individual series of
positive numbers 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined as the nth root of their product.

G.M  n x1.x2  xn = antilog ( 1 ∑ 𝑙𝑜𝑔𝑥𝑖 )


𝑛

Example 3.7: Find the G. M of


(a) 3 and 12
b) 2, 4 and 8

Solution: a) GM  3  12  36  6 ; b) GM= √2𝑥4𝑥8 = √64 = 4


3 3

Properties of geometric mean


 It is less affected by extreme values.
 It takes each and every observation into consideration.
 If the value of one observation is zero its values becomes zero.
Geometric mean for discrete data arranged in FD: When the numbers 𝑥1 , 𝑥2 , … , 𝑥𝑘 occur with
frequencies 𝑓1 , 𝑓2 , … , 𝑓𝑚 , respectively, then the geometric mean is obtained by

1
G.M .  n x1f1 .x2f2 ..xmfm = antilog ( ∑ 𝑓𝑖 𝑙𝑜𝑔𝑥𝑖 )
𝑛

Example 3.8: Compute the geometric mean of the following values: 3, 3, 4, 4, 4, 5, 6 and 6.
Solution
Values 3 4 5 6

23 | P a g e
Frequency 2 3 1 2

8
G.M. = √32 𝑋43 𝑋51 𝑋62 = 4.236

The geometric mean for the given data is 4.236.

Geometric mean for continuous grouped FD:- The above formula can also be used whenever
the frequency distribution is grouped continuous, class marks of the class intervals are considered
as xi.

3.3.3 Harmonic Mean


It is a suitable measure of central tendency when the data pertains to speed, rate and time. The
harmonic of n values is defined as n divided by the sum of their reciprocal.
Harmonic mean for individual series: If 𝑥1 , 𝑥2 , … , 𝑥𝑛 are n observations, then harmonic mean
can be represented by the following formula:
n
H .M 
1 1 1
 
x1 x2 xn

Example 3.9 A car travels 25 miles at 25 mph, 25 miles at 50 mph, and 25 miles at 75 mph. Find
the harmonic mean of the three velocities.
Solution
3
H .M 
n = 1 1 1 = 40.9
+ +
1 1 1 25 50 75
 
x1 x2 xn

Harmonic mean for discrete data arranged in FD: If the data is arranged in the form of
frequency distribution

n
H .M  m
, where n   f k
f1 f 2 f
  m k 1

x1 x 2 xm
Harmonic mean for continuous grouped FD: Whenever the frequency distribution are grouped
continuous, class marks of the class intervals are considered as 𝑥𝑖 and the above formula can be
used as
𝑛 m
H.M. = 𝑓𝑖 where n   f k
∑𝑛
𝑖=1𝑥 k 1
𝑖

24 | P a g e
𝑥𝑖 Is the class mark of ith class?
3.3.4 Median
The median is as its name indicates the middle most value in the arrangement which divides the
data into two equal parts. It is obtained by arranging the data in an increasing or decreasing order
of magnitude and denoted by𝑥̃.
Median for individual series
We arrange the sample in ascending order of the variable of interest. Then the median is the middle
value (if the sample size n is odd) or the average of the two middle values (if the sample size n is
even).
For individual series the median is obtained by
𝑛+1 𝑡ℎ
a/ 𝑥̃ = ( ) value if n is odd, and
2
𝑛 𝑛
( )𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 + ( +1)𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
2 2
b/ 𝑥̃ = if n is even
2

Example 3.10: Find the median for the following data.


a/ -5 15 10 5 0 2 1 4 6 and 8
b/ 5 2 2 3 1 8 4

Solution;
i. The data in ascending order is given by:

-5 0 1 2 4 5 6 8 10 15
n=10 n is even. The two middle values are 5th and 6th observations. So the median
is,
10 10
( )𝑡ℎ +( +1)𝑡ℎ 5𝑡ℎ +6𝑡ℎ 4+5
2 2
𝑥̃ = value = = = 4.5
2 2 2

ii. The data in ascending order is given by:


1 2 2 3 4 5 8
The middle value is the 4th observation. So the median is 3.

Note: The median is easy to calculate for small samples and is not affected by an "outlier".
Median for Discrete data arranged in a frequency distribution:- In this case also, the median
is obtained by the above formula. After arranging the values in an increasing order find the smallest
CF greater than or equal to that value obtained by a & b above formula and the corresponding
value is the median.

25 | P a g e
Median for grouped continuous data:-For continuous data, the median is obtained by the following
formula.

w n 
Median  L    CF   ~
x
f med  2 

Where: L= the lower class boundary of the median class; w = the class width of the median
class;

f m ed = the frequency of the median class; and CF  the cum. freq. corresponding to the class
preceding the median class. That is, the sums of the frequencies of all classes lower than the median
class. Where the median class is the class which contains the (n/2)th observation whether n is odd or
even, since the items have already lost their originality once they are grouped in to continuous
classes.

Example 3.11: Calculate the median for the following frequency distribution.

C.I 1 - 5 6 - 10 11 – 15 16 – 20 21 - 25 26 - 30 31 - 35 Total

Freq. 4 8 12 6 3 4 3 40

Solution: Construct the less than cumulative frequency distribution, then:

C.I 1-5 6 - 10 11 – 15 16 – 20 21 – 25 26 - 30 31 - 35 Total

Freq. 4 8 12 6 3 4 3 40

Cuml. Freq. 4 12 24 30 33 37 40

Since n = 40, 40/2 = 20, and the smallest CF greater than or equal to 20 is 24; thus, the median class

is the third class. And for this class, L = 10.5, w = 5, f m ed =12, CF = 12. Then applying the formula,

we get:
~
x =10.5+(20-12)*5/12=13.8

26 | P a g e
3.3.5 The Mode or modal value
The mode or the modal value is the value with the highest frequency and denoted by 𝑥̂. A data set
may not have a mode or may have more than one mode. A distribution is called a bimodal
distribution if it has two data values that appear with the greatest frequency. If a distribution has
more than two modes, then the distribution is multimodal. If a distribution has no modes, then the
distribution is no modal.

Mode of individual series:- The mode or the modal value of individual series (raw data) is simply
obtained by locating the observation with the maximum frequency.

Example 3.12: Consider the following data:


a. 30 45 69 70 32 18 32. The mode (𝑥̂ ) = 32.
b. 10 20 30 10 40 30. The mode (𝑥̂ ) = 10 and 30.
c. 10 40 30 20 50 60. No mode.
Note that in some samples there may be more than one mode or there may not be a mode. The
mode is not a suitable measure of central tendency in these cases. We use the mode as a measure
of central tendency if we require a measure that takes on one of the sample values. The mode can
be used for variables that are measured on a category (nominal) scale, e.g. the most popular
computer type.

Mode for discrete data arranged in a frequency distribution:-In the case of discrete grouped data,
the mode is determined just by looking to that value (s) having the highest frequency.

Mode for Grouped Continuous Frequency Distribution


For grouped data, the mode is found by the following formula:

In such cases, one can only determine the modal class easily: the class with the highest frequency.

After locating this class, the mode is interpolated using:

1
Mode  L   w , where L = the lower class boundary of the modal class;  1  f mod  f 1
1   2

,  2  f mod  f 2 , w = the common class width, f 1 = frequency of the class immediately

preceding the modal class; f 2 = frequency of the class immediately succeeding the modal class;
and fmode = frequency of the modal class.
Example 3.13: Calculate the mode for the frequency distribution of data of example 3.11.
27 | P a g e
Solution: By inspection, the mode lies in the third class, where L =10.5, fmod = 12, f1=8, f2=6, w = 5

Using the formula, the mode is:

1
Mode  L   w = 10.5 + (12-8)*5/(12-8)+(12-5) = 12.5
1   2

3.5 Measures of Non-central Locations


They are averages of position (non-central tendency). Some of these are quartiles, deciles and
percentiles.
Quartiles: are values which divide the data set in to approximately four equal parts, denoted by
𝑄1 , 𝑄2 𝑎𝑛𝑑 𝑄3 . The first quartile (𝑄1) is also called the lower quartile and the third quartile (𝑄3 )
is the upper quartile. The second quartile ( 𝑄2 ) is the median.
• Quartiles for Individual series:

Let x1 , x 2 ,  , x n be n ordered observations. The ith quartile Qi  is the value of the item
corresponding with the [i(n+1)/4]th position, i = 1, 2, 3.

That is, after arranging the data in ascending order, Q1, Q2, & Q3 are, obtained by:

1(𝑛+1) 𝑡ℎ 2(𝑛+1) 𝑡ℎ 3(𝑛+1) 𝑡ℎ


𝑄1 = ( ) 𝑣𝑎𝑙𝑢𝑒, 𝑄2 = ( ) 𝑣𝑎𝑙𝑢𝑒 and 𝑄3 = ( ) 𝑣𝑎𝑙𝑢𝑒.
4 4 4

• Quartiles for discrete data arranged in a frequency distribution:-Arranged in a frequency


distribution this case also, we will follow the same procedure as the median. That is, we construct the
less than cumulative frequency distribution and apply the formula of quartile for individual series.

• Quartiles in continuous data:- For continuous data, use the following formula:

w  in 
Qi  L    CF 
f Qi  4 

Where i = 1,2, 3, and L, w ,fQi and CF are defined in the same way as the median.
𝑤 𝑛 𝑤 2𝑛 𝑤 3𝑛
i.e. Q1 = L +𝑓 ( 4 − 𝐶𝐹) , Q2 = L + 𝑓 ( 4 − 𝐶𝐹) 𝑎𝑛𝑑 Q3 = L + 𝑓 ( 4 − 𝐶𝐹)
𝑄1 𝑄2 𝑄3

The class under question is the one including (ixn/4)th value. That is, the class with the minimum
frequency greater than or equal to (ixn/4) th is the class of the ith quartile.

Deciles: are values dividing the data approximately in to ten equal parts, denoted by 𝐷1 , 𝐷2,…, 𝐷9 .

28 | P a g e
• Deciles for Individual Series:

Let x1 , x 2 ,  , x n be n ordered observations. The ith decile (𝐷𝑖 ) is the value of the item

corresponding

with the [i(n+1)/10]th position, i = 1, 2, . . . ,9.

That is, after arranging the data in ascending order, D1, D2, . . . & D9 are, obtained by:

1(𝑛+1) 𝑡ℎ 2(𝑛+1) 𝑡ℎ 9(𝑛+1) 𝑡ℎ


𝐷1 = ( ) 𝑣𝑎𝑙𝑢𝑒, 𝐷2 = ( ) 𝑣𝑎𝑙𝑢𝑒 . . . and 𝐷9 = ( ) 𝑣𝑎𝑙𝑢𝑒.
10 10 10

• Deciles for Discrete data arranged in a frequency distribution:-Arranged in a frequency


distribution this case also, we will follow the same procedure as the median. That is, we construct
the less than cumulative frequency distribution and apply the formula of deciles for individual
series.

• Deciles for continuous data: Apply the following formula and follow the procedures of quartile
for continuous data.
𝑤 𝑖𝑛
𝐷𝑖 = 𝐿 + (10 − 𝐶𝐹) ,i = 1, 2,...,9 . Then
𝑓𝐷𝑖

Define the symbols in similar ways as we did in the case of quartiles for continuous data.
Percentiles: are values which divide the data approximately in to one hundred equal parts, and
denoted by 𝑃1 , 𝑃2,…, 𝑃99 .
• Percentiles for Individual Series:

Let x1 , x 2 ,  , x n be n ordered observations. The ith percentile (𝑃𝑖 ) is the value of the item

corresponding with the [i(n+1)/100]th position, i = 1, 2, . . . ,99.

That is, after arranging the data in ascending order, P1, P2, . . . & P99 are, obtained by:

1(𝑛+1) 𝑡ℎ 2(𝑛+1) 𝑡ℎ 99(𝑛+1) 𝑡ℎ


𝑃1 = ( ) 𝑣𝑎𝑙𝑢𝑒, 𝑃2 = ( ) 𝑣𝑎𝑙𝑢𝑒 . . . and 𝑃99 = ( ) 𝑣𝑎𝑙𝑢𝑒.
100 100 100

• Percentiles for Discrete data arranged in a frequency distribution:-Arranged in a frequency


distribution this case also, we will follow the same procedure as the median. That is, we construct
the less than cumulative frequency distribution and apply the formula of percentile for individual
series.

29 | P a g e
• Percentiles for continuous data: Apply the following formula
𝑤 𝑖𝑛
𝑃𝑖 = 𝐿 + (100 − 𝐶𝐹) ,i = 1, 2,...,99 . Then
𝑓 𝑃𝑖

Define the symbols similar ways as we did in the case of quartiles or deciles for continuous data.
Interpretations
1. 𝑄𝑖 is the value below which ( i × 25) percent of the observations in the series are found
(where i = 1, 2,3). For instance 𝑄3 means the value below which 75 percent of observations in
the given series are found.
2. 𝐷𝑖 is the value below which ( i ×10) percent of the observations in the series are found (where
i = 1, 2,...,9 ). For instance 𝐷4 is the value below which 40 percent of the values are found in the
series.
3. 𝑃𝑖 is the value below which i percent of the total observations are found (where i = 1, 2,3,...,99
). For example 60 percent of the observations in a given series are below 𝑃60 .
Example 3.15: Calculate 𝑄1 , 𝑄2 , 𝑄3, 𝐷4, 𝐷9, 𝑃40 & 𝑃90 for the following data given on the table
below.
X 10 11 12 13 14 15 16 17 18

f 2 8 25 48 65 40 20 9 2

Solution: The data is arranged in an increasing order. So we need to construct only the
cumulative frequency table before calculating the required values.
x 10 11 12 13 14 15 16 17 18

f 2 8 25 48 65 40 20 9 2

Cum. 2 10 35 83 148 188 208 217 219


Freq.

The total number of observations is 219 which is odd. Clearly then the median is 14. i.e.
𝑛+1 𝑡ℎ 219+1 𝑡ℎ
𝑥̃ = ( ) =( ) value = 110th value = 14
2 2

1(𝑛+1) 𝑡ℎ 1(219+1) 𝑡ℎ
𝑄1 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 55th value = 13
4 4

2(𝑛+1) 𝑡ℎ 2(219+1) 𝑡ℎ
𝑄2 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 110th value = 14 = 𝑥̃
4 4

30 | P a g e
3(𝑛+1) 𝑡ℎ 3(219+1) 𝑡ℎ
𝑄3 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 165th value = 15
4 4

4(𝑛+1) 𝑡ℎ 4(219+1) 𝑡ℎ
𝐷4 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 88th value = 14
10 10

9(𝑛+1) 𝑡ℎ 9(219+1) 𝑡ℎ
𝐷9 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 198th value = 16
10 10

40(𝑛+1) 𝑡ℎ 40(219+1) 𝑡ℎ
𝑃40 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 88th value = 14
100 100

90(𝑛+1) 𝑡ℎ 90(219+1) 𝑡ℎ
𝑃90 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 198th value = 16
100 100

Example 3.16: Marks of 50 students out of 85 is given below. Based on the data find 𝑄1,
𝐷4 𝑎𝑛𝑑 𝑃7.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80

fi 4 8 15 5 9 5 4

Solution:- first find the class boundaries and cumulative frequency distributions.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80

class boundary 45.5-50.5 50.5-55.5 55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5-80.5

fi 4 8 15 5 9 5 4

Cum. 4 12 27 32 41 46 50
frequency

Q1 Measure of (n/4)th value = 12.5th value which lies in group 55.5 – 60.5
𝑤 𝑛 5
Q1 = L +𝑓 ( 4 − 𝐶𝐹) = 55.5 +15 (12.5 − 12) = 55.7
𝑄1

D4 Measure of (4n/10)th value = 20th value which lies in group 55.5 – 60.5.
𝑤 4𝑛 5
D4 = L +𝑓 ( 10 − 𝐶𝐹) = 55.5 +15 (20 − 12) = 58.2
𝐷4

P7 Measure of (7n/100)th value = 3.5th value which lies in group 45.5 – 50.5
𝑤 7𝑛 5
P7 = L +𝑓 (100 − 𝐶𝐹) = 45.5 +4 (3.5 − 0) = 49.875.
𝑃7

31 | P a g e

You might also like