You are on page 1of 59

Islamic Republic of Afghanistan

Ministry of Higher education

Dunya University

Post Graduate Department

Master of Business Administration (MBA)

HANDOUT OF STATISTICS

By:
Mr. Attaullah Hazrat

Year 2021
Business Statistics Basic Concepts

Chapter One
Basic Concepts of Statistics
Learning Objectives:
After studying this chapter, you will be able to:
 Define and explain the meaning of statistics.
 States fields of statistics.
 Discuses functions and scope of statistics.
 Discuss variables and its types.
 Understand basic terms used in statistics.
 Describe data and its types.
 Discuss Sources of collecting data.

2
Business Statistics Basic Concepts

Basic Concepts of Statistics


Introduction
Modern day life is extremely complex and full of uncertainties. The aspiration to lead a high
quality, happy and economically prosperous life is growing by every passing day. This has led
to a desire to excel in every walk of life. This desire can be fulfilled to a large extent if one
takes appropriate decisions. But the pressure of uncertainties of Business / employment,
declining interest incomes, state policies and their interference, taxation, pollution, health,
family commitments, security, etc., is so intense that comes one often fails to take the correct
decisions. In such situations, the knowledge of statistics and its application comes to one’s
rescue. With its help, one may take informed decisions keeping all the dimensions of the
uncertainties and lead a better quality of life to a large extent.
Statistics is one of the oldest branches of knowledge. Initially, it was applied by those involved
with the collection and study of data for the state. In ancient days statisticians were engaged by
the state for collecting data of the ownership and use of land. Gradually people realized that
statistics cannot be restricted for use of the state only; rather it is a very useful subject for people
working in every walk of life, like Business , economics, academics, public administration,
international relations, labor welfare, health and medicine.

Meaning & Definitions of Statistics:


The word statistics is understood in two ways – plural and singular. Statistics, in plural sense,
means a set of numerical figures or data. In the singular sense, it represents a method of study
and therefore, refers to statistical principles and methods developed for analysis and
interpretation of data. Statistics is defined by different authors.
“Statistics or statistical method may be defined as collection, presentation, analysis and
interpretation of numerical data.” — Croxton and Cowden
“Statistics is the science which deals with the methods of collecting, classifying, presenting,
comparing and interpreting numerical data.” — Seligman
Techniques of data description, presentation, analysis and interpretation are also called
statistics.

Fields of Statistics
Every student of statistics should know about the different branches of statistics to correctly
understand statistics from a more holistic point of view. Often, the kind of job or work one is
involved in hides the other aspects of statistics, but it is very important to know the overall idea

3
Business Statistics Basic Concepts

behind statistical analysis to fully appreciate its importance and beauty. The two most
important fields or branches of statistics are.
Descriptive Statistics: It deals with the description of data, that is, how data are collected,
edited, arranged, presented and analyzed. Population of Afghanistan, number of firms in
manufacturing sector or the number of beverage firms in Kabul, Herat, Mazar e Sharif,
Nangarhar and their sales revenue are examples of descriptive statistics.
Inferential Statistics: As the name suggests, involves drawing the right conclusions from the
statistical analysis that has been performed using descriptive statistics. For example, you might
be interested in the exam marks of all students in Dunya University. It is not feasible to measure
all exam marks of all students in the whole of the University so you have to measure a smaller
sample of students (e.g., 100 students of BBA only), which are used to represent the larger
population of all Dunya University students. Inferential statistics are techniques that allow us
to use these samples to make generalizations about the populations from which the samples
were drawn.

Functions of Statistics
To comprehend the concepts involved in the applications of statistics, it would be better to
understand the functions of statistics.
(1) Collection: Since statistics is based on the study of data, therefore, data should be collected
very carefully. Insufficient data, misleading data, and unreliable data should not be the
basis of statistical analysis. Accurate data should be carefully observed and recorded.
(2) Organization: After the collection of data, they must be organized in such a manner so
that further statistical analysis can be carried out. This includes editing of data. While
collecting data, inconsistencies or factual errors may occur. By editing, suitable
modifications may be incorporated. Data should also be classified and arranged in a manner
so that necessary analysis can be conducted. For example students of Dunya University
may be classified based on gender – male and female.
(3) Presentation: Data can be presented either in the form of graphs, diagrams or a statistical
table. Appropriate presentation is extremely important for the correct understanding of
data.
(4) Analysis: Analysis, in which various techniques of statistics are applied, is the most
important function of statistics. There are large number of techniques through which data
may be analyzed. For example measure of central tendency & measure of dispersion.

4
Business Statistics Basic Concepts

(5) Interpretation: After the analysis of data, intelligent interpretation is undertaken. On the
basis of interpretation, Business men, mangers, politicians, leaders, and other users draw
conclusions for the purpose of decision-making.
Scope of statistics
The history of statistics shows in the ancient times the scope of statistics was limited. Censuses
of population and wealth were conducted in those days to determine the man power
and materials wealth for the purpose of waging wars. With the passage of time the scope
of statistics became wider and wider. Thus the statistical methods bean to be used in physical
and then in other social science
“Today, there is hardly a phase of human activity which does not find Statistical devices at
least occasionally useful. Economics anthropology psychology. Agriculture, Business and
educational lean heavily upon statistics. The medical research worker often must rely upon
statistics to determine the significance of his result. It should of course be added that the
musician, the artist, the actor and the writer to fiction would rarely have occasion to statistics,
but even here certain data sales, box-office receipts and trends of popular taste might be
appropriate.”
Statistics methods are applied to the result of physical chemistry and biological experiments
and observation as well to result to obtain in social and economics investigations.

5
Business Statistics Basic Concepts

Variables:
A variable is an interested criterion to be measured or observed on each individual or variable
is a characteristic under study that assumes different values for different elements such as
Gender, Age Group, Ethnicity, and Marital Status etc.

Types of Variables

Qualitavie Variable Quantitative Variable

Qualitative Variables:
A qualitative variable, also called a categorical variable, are variables that are not numerical or
in other way around measurement expressed not in terms of numbers, but rather by means of a
natural language description. It describes data that fits into categories. For example: Provinces
(variables include: Kabul, Herat, Jalalabad) etc.
a) Nominal Variable: Are the variables which has no intrinsic ordering to its categories. For
example, gender is a categorical variable having two categories (male and female) with no
intrinsic ordering to the categories.
b) Ordinal Variable: Is a type of variable which has a clear ordering. For example,
temperature as a variable with three orderly categories (low, medium and high).
Quantitative Variables:
A quantitative variable is a variable which can have some numerical value i.e. it can be
represented in numbers. Also, arithmetic operations such as addition, subtraction,
multiplication or division can be performed on these variables. Examples: Height, age, crop
yield, GPA, salary, temperature.
a) Discrete variables: Are obtained by counting, have gaps between two variables and
restricted to whole numbers they are mostly integers or numbers used for counting e.g.
number of children, number students etc.
b) Continuous variables: Are based on measurements and may take on any value in a certain
range/ class. They are obtained through measuring process e.g. the height of a person, age,
size, and profit are examples of continuous variables.
Basic Terms of Statistics:

6
Business Statistics Basic Concepts

In order to understand statistics in details we need to first look into some basic terms used in
statistics.
1. Population: In statistics, the term population is used to describe the subjects of a particular
study everything or everyone who is the subject of a statistical observation. For example
all Afghan citizens who are currently registered to vote, all students who study at Dunya
University.
2. Sample: In statistics, a sample refers to a set of observations drawn from a population. In
simple word sample is a smaller group of members of a population selected to represent the
population. For example the registered voters selected to participate in a recent survey
concerning their intention to vote in the next election, Students of Dunya University
pursuing BBA.

Data and its Types:


“Data is the collection of facts and figures from which conclusions can be drawn”. Data can be
of two types;
1. Primary Data: Data collected by the decision maker for the first time for the present
investigation are referred to as an original or primary data. For example heights of students,
marks of students etc.
2. Secondary Data: Data, which were collected earlier for the purpose other than the present
investigation, are known as secondary data. For example Habib Gulzar Company’s balance
sheet data are primary data for the company but secondary data for the other parties.
Sources of Data:
Primary Sources: Data collected from the primary sources are known as primary data. These
sources are as follow.
1. Observation: It means observation of present behavior. It is a very old technique of data
collection. Time and Motion Study is the best example of observation.
2. Interview: It is a very popular technique of data collection. Sometimes, the interviewer
himself asks questions and records responses. This is also called as personal interview.

7
Business Statistics Basic Concepts

3. Questionnaire: It refers to a set of printed questions along-with space for answers. This
set is handed over to the individual who is asked to record his/her response to the questions.
Secondary Sources: Data collected from the secondary sources are known as secondary data.
Following are the secondary sources of data.
1. International organizations: United Nations, OPEC, SARC, ASEAN are the major
sources of data at international level. UN statistical year book, annual reports of IMF,
WHO, OECD country studies, are the prominent sources of data, these data are available
in the form of published books or on CDs.
2. Journals: Many journals in the field of finance, commerce, domestic trade, international
trade industry are published at a regular intervals. Some of these are published by
Universities and other are published by private companies. They published research studies
that provide huge and varied amount of secondary data.
3. Newspapers: Everyday hundreds of newspapers are published from different countries.
Some of these newspapers specialize in economic and trade others are not subject specified.
These data are widely used as secondary data.
Statistics as a Science or an Art:
We have seen above that statistics is a science. Now we shall examine whether it is an art or
not. We know that science is a body of systematized knowledge. How this knowledge is to be
used for solving a problem is work of an art. In addition to this, art also helps in achieving
certain objectives and to identify merits and demerits of methods that could be used. Since
statistics possesses all these characteristics, it may be reasonable to say that it is also an art.
Thus, we conclude that since statistical methods are systematic and have general applications,
therefore, statistics is a science. Further since the successful application of these methods
depends, to a considerable degree, on the skill and experience of a statistician, therefore,
statistics is an art also.

Summary
 Statistics or statistical method may be defined as collection, presentation, analysis and
interpretation of numerical data.
 Descriptive Statistics deals with the description of data, that is, how data are collected,
edited, arranged, presented and analyzed.
 Inferential Statistics; as the name suggests, involves drawing the right conclusions from the
statistical analysis that has been performed using descriptive statistics.

8
Business Statistics Basic Concepts

 A qualitative variable are variables that are not numerical or in other way around
measurement expressed not in terms of numbers, but rather by means of a natural language
description.
 A quantitative variable is a variable which can have some numerical value i.e. it can be
represented in numbers.
 Data collected by the decision maker for the first time for the present investigation are
referred to as an original or primary data.
 Data, which were collected earlier for the purpose other than the present investigation, are
known as secondary data.

Review Questions:
1. What do you understand by Statistics?
2. Explain the functions of Statistics?
3. With the help of example explain the fields of Statistics?
4. Explain the Scope of Statistics?
5. Differentiate between Sample and population?
6. Explain the types of variables?

9
Business Statistics Frequency Distribution

Chapter Two
Frequency Distribution
Learning Objectives
After studying this chapter you will be able to;
 Explain the types of frequency distributions.
 Understand construction of frequency distribution.
 Understand relative frequency distribution.
 Elaborate cumulative frequency distribution.

10
Business Statistics Frequency Distribution

Frequency Distribution
Frequency distribution refers to the division of quantitative data either individually or into
different groups on the basis of their magnitude. In other words, it is an arrangement of ordered
array in a manner so that data are written along with their frequency/ frequencies.
Formation of frequency distribution is essential for the purpose of proper analysis of data.
Types of frequency distribution
The arrangement of data is done in the form of three series:
 Individual Series
 Discrete Series
 Continuous Series

Individual Series: When the data set consist of few variables, it is arranged in an ordered.
Ordered array is the arrangement of data in either ascending or descending order. For example,
the amount collected by a pathological laboratory from the first seven patients on a certain day
was AFS. 340, 70, 180, 550, 490, 210 and 340. These are called raw data but their ordered
array is 70, 180, 210, 230, 340, 490, and 550. This arrangement is known as individual series.
In the individual series, generally, the number of data are less and non-repetitive.

Discrete Series: Another arrangement of data is known as discrete series. It is made from
discrete data. Discrete data are such which jump by finite value, like number of jeeps sold to
different candidates during last elections are 15, 9, 24, or 11 but not 12.4, or 23.6. In discrete
series generally, variables are repeated. Number of times a variable is repeated is called its
frequency denoted by f. Example of discrete series:
No of Burger sold (x) 1 2 3 4 5
No of Customers (f) 13 25 23 33 18

Continuous Series: Continuous series is formed from continuous variables. As opposed to a


discrete variable, a continuous variable can take any value in an interval. Measurements
like height, age, income, time, etc., are some examples of a continuous variable. Continuous
series is an arrangement of data into classes and frequency is mentioned against each class.
Since a continuous variable can take any value in a given interval, therefore, the
frequency distribution of a continuous variable is always a grouped frequency distribution.
Construction of frequency distribution:
1) Number of classes: The number of classes in a frequency distribution depends upon the
number of observations. This number should be neither too small nor too big. Often a

11
Business Statistics Frequency Distribution

frequency distribution is made between 5 to 20 classes. Ultimately, the personal judgement


of the satisfaction in the most important factor in deciding the number of classes. Sturges’
rule may be applied for deciding the number of classes. It is given below:
k = 1+ 3.322(log n)
Where
k = number of classes
n = number of observations
2) Class width or Interval: After determining the number of class intervals, one has to
determine their width. The problem of determining the width of a class or interval.
The approximate size of a class interval can be decided by the use of the following formula:
𝐿𝑎𝑟𝑔𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒 − 𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒
𝐶𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 =
Number of classes
A constant may be added to make this value convenient. Generally, class interval may be
taken as 5 or 10 for all the classes. This is called common class interval.
Illustration
Following is the daily wages of 30 employees working in AGC ltd construct a frequency
distribution table.
12 18 27 31 40 42
14 20 27 32 40 51
14 20 27 32 40 56
14 21 29 34 40 60
16 23 31 36 40 65

Solution:
Following the steps in frequency distribution let’s solve the question step by step.
Step 1: n = 30
Step 2: Range = Largest Value – Smallest Value
Range = 65 – 12 = 53
Step 3: k = 1+ 3.22(log n)
k = 1+ 3.22 (log 30)
k = 1+. 3.22* 1.47
k = 1+ 4.7
k =5.7 ≈ 6 classes
Step 4: Class Interval;
𝐿𝑎𝑟𝑔𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒 − 𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒
𝐶𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 =
Number of classes

12
Business Statistics Frequency Distribution

53
𝐶𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 =
6
𝐶𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 8.833 ≈ 9
Step 5: Frequency distribution table
Daily Wages (X) Number of employees (f)
12 - 21 8
21 - 30 6
30 - 39 6
39 - 48 6
48 - 57 2
57 - 66 2
Total 30

3) Class Limits: Each class has two limits – known as the lower limit and upper limit. For a
frequency distribution, it is necessary to designate these class limits very
unambiguously, because the mid-value of a class is obtained by using these limits. The
class mid-point or mid value of a class is calculated as follow.
𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 + 𝑈𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
𝑀𝑖𝑑 𝑃𝑜𝑖𝑛𝑡 =
2
Illustration: Consider the above example;
Daily Wages (X) Mid-point (X)
12 - 21 (12+21)/2= 16.5
21 - 30 (21+30)/2= 25.5
30 - 39 (30+39)/2= 34.5
39 - 48 (39+48)/2= 43.5
48 - 57 (48+57)/2= 52.5
57 - 66 (57+66)/2= 61.5

Designation of Class Limits:


The designation of class limits for various class intervals can be done in two ways:
 Exclusive Method
 Inclusive Method
Exclusive Method: In this method the upper limit of a class is taken to be equal to the lower
limit of the following class. To keep various class intervals as mutually exclusive, the
observations with magnitude greater than or equal to lower limit but less than the upper limit
of a class are included in it. For example, if the lower limit of a class is 10 and its upper limit
is 20, then this class, written as 10-20, includes all the observations which are greater than or

13
Business Statistics Frequency Distribution

equal to 10 but less than 20. The observations with magnitude 20 will be included in the next
class.
Class Interval 10-20 20-30 30-40 40-50 Total
Frequency 5 2 4 3 14

Inclusive Method: Here all observations with magnitude greater than or equal to the lower
limit and less than or equal to the upper limit of a class are included in it.
Class Interval 10-19 20-29 30-39 40-49 Total
Frequency 5 2 4 3 14

Class Boundaries:
Class boundaries is a concept in statistics that refers to the boundary between one class, or
group, of numbers in a distribution, and the next class. For overlapping class intervals, the class
limits are also called class boundaries or actual class limits. In the case of non-overlapping
class intervals, the class limits are different from class boundaries.
Illustration: Given below are the weights (in pounds) of 70 students.
61, 80, 91, 113, 100, 106, 109, 73, 88, 92, 101, 106, 107, 97, 93, 96, 102, 114, 87, 62, 74, 107,
109, 91, 72, 89, 94, 98, 112, 103, 101, 77, 92, 73, 67, 76, 84, 90, 118, 107, 108, 82, 78, 84, 77,
95, 111, 115, 104, 69, 106, 105, 63, 76, 85, 88, 96, 90, 95, 99, 83, 98, 88, 72, 75, 86, 82, 86,
93, 92.
Construct a frequency distribution when class intervals are inclusive, taking the lowest class as
60-69. Also construct class boundaries.
Solution:
Class Interval Frequency Class Boundaries
60-69 5 59.5-69.5
70-79 11 69.5-79.5
80-89 14 79.5-89.5
90-99 18 89.5-99.5
100-109 16 99.5-109.5
110-119 6 109.5-119.5
Total 70

To determine the class boundaries, we note that measured weights are approximated to the
nearest pound. Therefore, a measurement less than 69.5 is approximated as 69 and included in
the class interval 60 - 69. Similarly, a measurement greater than or equal to 69.5 is
approximated as 70 and is included in the class interval 70 - 79. Thus, the class boundaries are

14
Business Statistics Frequency Distribution

obtained by subtracting 0.5 from the lower limit and adding 0.5 to the upper limit of various
classes. These boundaries are shown in the last column of the above table.
Relative or Percentage Frequency Distribution:
If instead of frequencies of various classes their relative or percentage frequencies are written,
we get a relative or percentage frequency distribution.
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠 =
Total Frequency
% 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠 = 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 ∗ 100
These frequencies are shown in the following table.
Class Frequency Relative Percentage
Interval Frequency Frequency
10-20 6 6/50=0.12 0.12*100= 12%
20-30 18 18/50=0.36 0.36*100= 36%
30-40 11 11/50=0.22 0.22*100= 22%
40-50 11 11/50=0.22 0.22*100= 22%
50-60 3 3/50=0.06 0.06*100= 6%
60-70 1 1/50=0.02 0.02*100= 2%
Total 50 1 100%
Cumulative Frequency Distribution:
The total frequency of all classes less than the upper class boundary of a given class is called
the cumulative frequency of that class. “A table showing the cumulative frequencies is called
a cumulative frequency distribution”. There are two types of cumulative frequency
distributions.
Less than cumulative frequency distribution: It is obtained by adding successively the
frequencies of all the previous classes including the class against which it is written. The
cumulate is started from the lowest to the highest size.
More than cumulative frequency distribution: It is obtained by finding the cumulate total
of frequencies starting from the highest to the lowest class.
Illustration:
Following table shows the marks for 23 students in statistics calculate less than and more than
cumulative frequency distribution.
Marks 30-40 40-50 50-60 60-70 70-80
No of Students 4 5 6 5 3

15
Business Statistics Frequency Distribution

Solution:
Less than cumulative frequency distribution
Marks of Students Cumulative Frequency
Less Than 40 4
Less Than 50 9
Less Than 60 15
Less Than 70 20
Less Than 80 23

More than cumulative frequency distribution


Marks of Students Cumulative Frequency
More Than 30 23
More Than 40 19
More Than 50 14
More Than 60 8
More Than 70 3

Frequency Density:
Frequency density in a class is defined as the number of observations per unit of its width.
Frequency density gives the rate of concentration of observations in a class.
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝐷𝑒𝑛𝑠𝑖𝑡𝑦 =
Calss Interval
Illustration:
Table showing Frequency Density of Various Classes:
Class Interval Frequency Frequency Density
10-20 3 0.3
20-30 4 0.4
30-40 5 0.5
40-50 6 0.6
50-60 2 0.2
60-70 6 0.6
70-90 8 0.4

Summary
 Frequency distribution is an arrangement of ordered array in a manner so that data are
written along with their frequency/ frequencies.
 Each class has two limits – known as the lower limit and upper limit.

16
Business Statistics Frequency Distribution

 Class boundaries is a concept in statistics that refers to the boundary between one class, or
group, of numbers in a distribution, and the next class.
 If instead of frequencies of various classes their relative or percentage frequencies are
written, we get a relative or percentage frequency distribution.
 The total frequency of all classes less than the upper class boundary of a given class is
called the cumulative frequency of that class.

Review Questions:
1. Explain in details the types of frequency distribution?
2. With the help of example explain the steps in constructing frequency distribution?
3. What do you understand by cumulative frequency distribution?
4. Discusses the methods for designation of class limits?
5. What do you mean by relative frequency distribution?

17
Business Statistics Graphical Representation

Chapter Three
Graphical Representation
Learning Objectives;
After Studying this chapter you will be able to;
 Understand the meaning and concept of Graphical Representation.
 Understand the ways of presenting frequency distribution.
 Observe trends of data through graphical representation.

18
Business Statistics Graphical Representation

Graphical Representation
The graphical presentation is a simple and effective method of presenting the information
contained in statistical data. The construction of a diagram is an art, which can be acquired
only through practice. However, the following rules should be observed, in their construction,
to make them more effective and useful.
1) Appropriate title and footnote: Every diagram must have a suitable title written at its top.
The title should be able to convey the subject matter in brief and unambiguous manner. The
details about the title, if necessary, should be provided below the diagram in the form of a
footnote.
2) Attractive presentation: A diagram should be constructed in such a way that it has an
immediate impact on the viewer. It should be neatly drawn and an appropriate proportion
should be maintained between its length and breadth. The size of the diagram should neither
be too big nor too small. Different aspects of the problem may be emphasized by using
various shades or colors.
3) Accuracy: Diagrams should be drawn accurately by using proper scales of measurements.
Accuracy should not be compromised to attractiveness.
4) Selection of an appropriate diagram: There are various types of geometrical figures and
pictures which can be used to present statistical data.
5) Index: When a diagram depicts various characteristics distinguished by various shades
and colors, an index explaining these should be given for clear identification and
understanding.
6) Source-Note: As in case of tabular presentation, the source of data must also be indicated
if the data have been acquired from some secondary source.
7) Simplicity: As far as possible, the constructed diagram should be simple so that even a
layman can understand it without any difficult.
Graphical Representation of Frequency Distribution:
Frequency distribution can be presented graphically through the following diagrams.
 Histogram
 Frequency Polygon
 Frequency Curve
 'Ogive' or Cumulative Frequency Curve
 Pie Chart
 Simple Bar Diagram

19
Business Statistics Graphical Representation

 Multi Bar Diagram


Histogram: A histogram is a graph of a frequency distribution in which the class intervals are
plotted on X- axis and their respective frequencies on Y- axis. On each class, a rectangle is
erected with its height proportional to the frequency density of the class.
In this case the height of each rectangle is taken to be equal to the frequency of the
corresponding class. The construction of such a histogram is illustrated by the following
example.
Illustration:
The frequency distribution of marks obtained by 60 students of a class in a college is given
below
Marks 30-35 35-40 40-45 45-50
No of Students 4 6 5 8

Frequency Polygon: A frequency polygon is another method of representing a frequency


distribution on a graph. Frequency polygons are more suitable than histograms whenever two
or more frequency distributions are to be compared.
A frequency polygon is drawn by joining the mid-points of the upper widths of adjacent
rectangles, of the histogram of the data, with straight lines. Two hypothetical class intervals,
one in the beginning and the other in the end, are created. The ends of the polygon are extended
up to base line by joining them with the mid-points of hypothetical classes. This step is
necessary for making area under the polygon to be approximately equal to the area under the
histogram. Frequency polygon can also be constructed without making rectangles. The points
of frequency polygon are obtained by plotting mid-points of classes against the heights of
various rectangles, which will be equal to the frequencies if all the classes are of equal width.
Illustration:

20
Business Statistics Graphical Representation

The daily profits (in AFS) of 100 shops are distributed as follows.
Profit 0-100 100-200 200-300 300-400 400-500 500-600
No. Shops 12 18 27 20 17 6

Solution

Illustration:
Represent the following data by a frequency polygon.
Class 5-15 15-25 25-35 35-45 45-55 55-65 65-75
f 10 16 18 15 12 5 2
Solution: Here the frequency polygon is drawn by plotting mid-points of class intervals against
their respective frequencies.

Frequency curve: When the vertices of a frequency polygon are joined by a smooth curve, the
resulting figure is known as a frequency curve. As the number of observations increases, there
is need of having more and more classes to accommodate them and hence the width of each
class will become smaller and smaller. In such a situation the variable under consideration
tend to become continuous and the frequency polygon of the data tends to acquire the shape of
a frequency curve. Thus, a frequency curve may be regarded as a limiting form of frequency
polygon as the number of observations become large The construction of a frequency curve
should be done very carefully by avoiding, as far as possible, the sharp and sudden turns.
Smoothing should be done so that the area under the curve is approximately equal to the area
under the histogram. A frequency curve can be used for estimating the rate of increase
or decrease of the frequency at a given point. It can also be used to determine the frequency

21
Business Statistics Graphical Representation

of a value (or of values in an interval) of the variable. This method of determining frequencies
is popularly known as interpolation method.

Cumulative Frequency Curve or Ogive: The curve obtained by representing a cumulative


frequency distribution on a graph is known as cumulative frequency curve or ogive. Since a
cumulative frequency distribution can of ‘less than’ or ‘greater than’ type and, accordingly,
there are two type of ogive, ‘less than ogive’ and ‘more than ogive’.
An ogive is used to determine certain positional averages like median, quartiles, deciles,
percentiles, etc. We can also determine the percentage of cases lying between certain limits.
Various frequency distributions can be compared on the basis of their ogives.
Illustration
Draw ‘less than’ and ‘more than’ ogives for the following distribution of monthly salary
of 250 families of a certain locality.

Income 0-500 500-1000 1000-1500 1500-2000


No of Families 50 80 40 25
Income 2000-2500 2500-3000 3000-3500 3500-4000
No of Families 50 80 40 25

Solution: First we construct ‘less than’ and ‘more than’ type cumulative frequency
distributions.

Income Cumulative Income Cumulative


Less Than Frequency More Than Frequency
500 50 0 250
1000 130 500 200
1500 170 1000 120
2000 195 1500 80
2500 220 2000 55
3000 235 2500 30
3500 245 3000 15
4000 250 3500 5

We note that the two ogives intersect at the median.


22
Business Statistics Graphical Representation

Pie Chart: In pie chart a circle (pie) is drawn to represent the data. In order to show proportions
of various components, a circle can also be partitioned into sections in a similar manner as in
component bar diagrams. Since there are 360º at the center of a circle, these are divided in
proportions to the magnitude of values of different items. The diagram, thus obtained is known
as Angular Sector Diagram or more popularly as Pie Diagram. The construction of a pie
diagram is explained by the following example
Illustration:
Show the following data of expenditure of an average working class family by a suitable
diagram.
Item of Expenditure Percent of Total Expenditure
Food 65
Clothing 10
Housing 12
Fuel and Lighting 5
Miscellaneous 8

Solution:
Item of Expenditure Angles
Food 65/100*360=234 ֯
Clothing 10/100*360= 36 ֯
Housing 12/100*360= 43.2 ֯
Fuel and Lighting 5/100*360=18 ֯
Miscellaneous 8/100*360= 28.8 ֯
The angles of different sectors are calculated as shown below:

Simple Bar Diagram: In case of a simple bar diagram, the vertical or horizontal bars, with
height proportional to the value of the item, are constructed. The width of a bar is chosen
arbitrarily and is kept constant for every bar. Different bars are drawn so that the gap between
the successive bars is same. Bar diagrams are particularly suitable for presenting individual

23
Business Statistics Graphical Representation

series, such as time and spatial series. Through simple bar diagram one can show quantitative
data as well as qualitative attributes.
Illustration:
Represent the following data by a suitable diagram.
Years 2006 2007 2008 2009 2010
Enrolments 7300 9400 12100 14600 16700

Solution:

Multiple Bar Diagram: This type of diagram, also known as compound bar diagram, is used
when comparisons are to be shown between two or more sets of data. A set of bars for a period
or a related phenomena are drawn side by side without gaps while various sets of bars are
separated by some arbitrarily chosen constant gap. Different bars are distinguished by
different shades or colors. In order that various bars are comparable, it is necessary to draw
them on the same scale.
Illustration:
The following table gives the figures of let’s say Afghanistan and India trade during 2007 to
2010. The figures of Afghanistan exports and imports are in $ billion.
Year 2007 2008 2009 2010
Exports 2.529 2.952 3.314 3.191
Imports 1.460 2.484 2.463 2.486

Solution:

24
Business Statistics Graphical Representation

Summary:
 The graphical presentation is a simple and effective method of presenting the information
contained in statistical data.
 A histogram is a graph of a frequency distribution in which the class intervals are plotted
on X- axis and their respective frequencies on Y- axis.
 A frequency polygon is drawn by joining the mid-points of the upper widths of adjacent
rectangles, of the histogram of the data, with straight lines.
 The curve obtained by representing a cumulative frequency distribution on a graph is
known as cumulative frequency curve or ogive.
 In pie chart a circle (pie) is drawn to represent the data.

Review Questions:

1. What do you understand by graphical representation?


2. Explain in details the rules for effective diagrammatical representation?
3. With the help of example explain frequency polygon?

25
Business Statistics Central Tendency

Chapter Four
Measure of Central Tendency
Learning Objectives;
After Studying this chapter you will be able to;
 Understand the concept Central Tendency.
 Discusses meaning of mean.
 Explain the concepts of mode & median.
 Discusses the meaning of quartile.

26
Business Statistics Central Tendency

Measure of Central Tendency


A measure of central tendency is a summary statistic that represents the center point or typical
value of a dataset. These measures indicate where most values in a distribution fall and are also
referred to as the central location of a distribution. You can think of it as the tendency of data
to cluster around a middle value. In statistics, the three most common measures of central
tendency are the
 Arithmetic Mean
 Median
 Mode
 Quartile
Each of these measures calculates the location of the central point using a different method.
Arithmetic Mean:
Arithmetic Mean is defined as the sum of observations divided by the number of observations.
Calculating Mean for Individual Series:
Let there be n observations X1, X2..... Xn. Their arithmetic mean can be calculated as follow.
∑𝑛𝑖=1 𝑋𝑖
𝑋̅ =
𝑛
Illustration:
The following figures relate to monthly output of cloth in (000 meters) of a factory in six
months:
Months Jan Feb Mar Apr May Jun
Outputs 80 88 92 84 96 92
Solution
80 + 88 + 92 + 84 + 96 + 92
𝑋=
6
= 88.67(′000mtrs)
Calculating Mean for an Ungrouped Frequency Distribution:
The arithmetic mean of these observations can be calculated as follow.
∑𝑛𝑖=1 𝐹𝑖𝑋𝑖
𝑋̅ =
∑ 𝐹𝑖
Illustration:
The following is the frequency distribution of age of 670 students of a school. Compute the
arithmetic mean of the data.
Age 5 6 7 8 9 10 11 12 13 14

27
Business Statistics Central Tendency

f 25 45 90 165 112 96 81 26 18 12
Solution:
X (in years) f fX
5 25 5*25=125
6 45 6*45=270
7 90 7*90=630
8 165 8*165=1320
9 112 9*112=1008
10 96 10*96=960
11 81 11*81=891
12 26 12*26=312
13 18 13*18=234
14 12 14*12=168
Total ∑ 𝑓 = 670 ∑ 𝑓𝑋 = 5918

∑ 𝑓𝑋 5918
𝑋̅ = ∑ = = 8.33
𝑓 670

Calculating Mean for a Grouped Frequency Distribution:


In a grouped frequency distribution, there are classes along with their respective frequencies.
To understand its calculation let’s consider the following illustration.
Illustration:
X 0-10 10-20 20-30 30-40 40-50 50-60 60-70
f 3 8 12 15 18 16 11
Solution
X Mid-Point (X) f f*X
0-10 5 3 15
10-20 15 8 120
20-30 25 12 300
30-40 35 15 525
40-50 45 18 810
50-60 55 16 880
60-70 65 11 715
Total ∑ 𝑓 = 83 ∑ 𝑓𝑋 = 3365

∑ 𝑓𝑋 3356
𝑋̅ = ∑ = = 40.54
𝑓 83

Median:

28
Business Statistics Central Tendency

Median of distribution is that value of the variate which divides it into two equal parts. In terms
of frequency curve, the ordinate drawn at median divides the area under the curve into two
equal parts. Median is a positional average because its value depends upon the position of an
item and not on its magnitude
When Individual Observations are given:
The following steps are involved in the determination of median:
1. The given observations are arranged in either ascending or descending order of magnitude.
2. Given that there are n observations, the median is given by:
𝑛+1
a) The size of ( 2
)th observations, when n is odd.
𝑛 𝑛+1
b) The mean of the sizes of ( 2 )th and ( 2
)th observations, when n is even.
Illustration:
Find median of the following observations:
20, 15, 25, 28, 18, 16, 30.
Solution:
Writing the observations in ascending order, we get 15, 16, 18, 20, 25, 28, 30.
7+1
Since n= 7, i.e. Odd, the median is the size of ( 2
)th i.e., 4th observation. Hence, median,
denoted by Md= 20.

Illustration:
Find median of the data: 245, 230, 265, 236, 220, 250.
Solution:
Arranging these observations in ascending order of magnitude, we get 220, 230, 236, 245, 250,
265. Here n= 6, i.e. Even.

Median will be arithmetic mean of the size of (62) th i.e., 3rd and (62 + 1) th , i.e.,4th

observations.

Hence Md = (236+245
2
) = 240.5
When ungrouped frequency distribution is given:
In this case, the data are already arranged in the order of magnitude. Here, cumulative
frequency is computed and the median is determined in a manner similar to that of individual
observations.
Illustration:
Locate median of the following frequency distribution:

29
Business Statistics Central Tendency

Variable (X) 10 11 12 13 14 15 16
Frequency (f) 8 15 25 20 12 10 5

Solution:
(X) 10 11 12 13 14 15 16
(f) 8 15 25 20 12 10 5
(c.f.) 8 23 48 68 80 90 95

95+1
Here N= 95, which is odd. Thus, median is size of ( 2
)th i.e., 48th observation. From the
table 48th observation is 12, Md = 12.

Illustration:
Locate median of the following frequency distribution:
X 0 1 2 3 4 5 6 7
f 7 14 18 36 51 54 52 20
Solution:
X 0 1 2 3 4 5 6 7
f 7 14 18 36 51 54 52 20
c.f. 7 21 39 75 126 180 232 252

Here N= 252, i.e. Even.

Median is the mean of the size of 126th and 127th observation. From the table we note that
126th observation is 4 and 127th observation is 5.

When grouped frequency distribution is given:


The determination of median, in this case, will be explained with the help of the following
example.
Illustration:
The following table shows the daily sales of 230 footpath sellers.
Sales 0-500 500-1000 1000-1500 1500-2000
Sellers 12 18 35 42
Sales 2000-2500 2500-3000 3000-3500 3500-4000
Sellers 12 18 35 42

30
Business Statistics Central Tendency

Locate the median of the above data. Using both types of ogives.
Solution: To draw ogives, we need to have a cumulative frequency distribution.
Class Interval Frequency Less More
Than c.f. Than c.f.
0-500 12 20 230
500-1000 18 30 218
1000-1500 35 65 200
1500-2000 42 107 165
2000-2500 50 157 123
2500-3000 45 202 73
3000-3500 20 222 28
3500-4000 8 230 8
Total 230

Illustration: Taking the above example into account lets solve the above problem through
formula, following is the formula for calculating median in grouped frequency distribution.
𝑁
𝑡ℎ − 𝑐𝑓
𝑀𝑒𝑑𝑖𝑎𝑛 = L + ( 2 ∗ 𝐶𝐼)
𝑓
L= lower limit of the median class
N= total number of observation
cf= the cumulative frequency of preceding class from the median class.
f= frequency of the median class
CI= Class Interval
𝑁
𝑡ℎ
2
230
= 115
2
The median class is 2000-2500.
115 − 107
𝑀𝑒𝑑𝑖𝑎𝑛 = 2000 + ( ∗ 500)
50
𝑀𝑒𝑑𝑖𝑎𝑛 = 2000 + (0.16 ∗ 500)
𝑀𝑒𝑑𝑖𝑎𝑛 = 2000 + 80

31
Business Statistics Central Tendency

𝑀𝑒𝑑𝑖𝑎𝑛 = 2080
Mode:
Mode is that value of the variate which occurs maximum number of times in a distribution and
around which other items are densely distributed. In the words of Croxton and Cowden, “The
mode of a distribution is the value at the point around which the items tend to be most heavily
concentrated. It may be regarded the most typical of a series of values. ” Further, according to
A.M. Tuttle, “Mode is the value which has the greatest frequency density in its immediate
neighborhood.”
If the frequency distribution is regular, then mode is determined by the value corresponding to
maximum frequency. There may be a situation where concentration of observations around a
value having maximum frequency is less than the concentration of observations around some
other value. In such a situation, mode cannot be determined by the use of maximum frequency
criterion. Further, there may be concentration of observations around more than one value of
the variable and, accordingly, the distribution is said to be bi-model or multi-model depending
upon whether it is around two or more than two values.
The concept of mode, as a measure of central tendency, is preferable to mean and median when
it is desired to know the most typical value, e.g., the most common size of shoes, the
most common size of a ready-made garment, the most common size of income, the most
common size of pocket expenditure of a college student, the most common size of a family in
a locality, the most common duration of cure of viral-fever, the most popular candidate in an
election, etc.
When Individual Observations are given:
For a given data set, there can be one or more than one mode. As long as those elements all
have the same frequency and that frequency is the highest, they are all the modal elements of
the data set.
Illustration:
Find the Mode of the following data set.
3, 12, 15, 3, 15, 8, 20, 19, 3, 15, 12, 19, 9
Solution:
Mode = 3 and 15
When Ungrouped Frequency Distribution is given:
In order to find the mode of an ungrouped data, we have to find the frequency of each
number in the given data set. Then, we have to choose the number having the highest
frequency as the mode.

32
Business Statistics Central Tendency

Illustration:
The following table shows the marks of students in statistics find the mode this data set.
X 60 61 62 63 64 65 66 67 71
f 2 4 6 7 12 18 20 7 1

Solution:
Mode=66
Since the maximum frequency is 20 and it is representing 66 as model class.
When Grouped Frequency Distribution is given:
We have defined mode as the element which has the highest frequency in a given data set. In
grouped data, we can find two kinds of mode: the Modal Class, or class with the highest
frequency and the mode itself, which we calculate from the modal class using the formula
below.
𝑓1
𝑀𝑜𝑑𝑒 = L + ( ∗ 𝐶𝐼)
𝑓1 + 𝑓2
L= Lower limit of model class
f1= the frequency of the class Succeeding the model class
f2= the frequency of the class preceding the model class
CI= Class Interval
Illustration:
From the following data set find the model class and calculate the mode of frequency
distribution.
Class Interval 0-10 10-20 20-30 30-40 40-50
f 4 5 8 3 2

Solution:
The model class is 20-30 since the highest frequency is against the same class.
5
𝑀𝑜𝑑𝑒 = 20 + ( ∗ 10)
5+3
𝑀𝑜𝑑𝑒 = 20 + (0.625 ∗ 10)
𝑀𝑜𝑑𝑒 = 20 + 6.26
𝑀𝑜𝑑𝑒 = 26.26

Quartile:

33
Business Statistics Central Tendency

The values of a variable that divide a distribution into four equal parts are called quartiles.
Since three values are needed to divide a distribution into four parts, there are three quartiles,
viz. Q1, Q2 and Q3, known as the first, second and the third quartile respectively.
For a discrete distribution, the first quartile (Q1) is defined as that value of the variate such that
at least 25% of the observations are less than or equal to it and at least 75% of the observations
are greater than or equal to it.
For a continuous or grouped frequency distribution, Q1 is that value of the variate such that the
area under the histogram to the left of the ordinate at Q1 is 25% and the area to its right is 75%.
When Individual Observations are given:
When there are n number of items, Quartiles are given by.
𝑛+1
𝑄1 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
2(𝑛 + 1)
𝑄2 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
3(𝑛 + 1)
𝑄3 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
Illustration:
Find the quartiles of the following data: 3, 5, 6, 7, 9, 22, and 33.
Solution:
Here the numbers are arranged in the increasing order, n = 7.
7+1
𝑄1 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
𝑄1 = 2𝑛𝑑 𝐼𝑡𝑒𝑎𝑚
𝑸𝟏 = 𝟓
2(7 + 1)
𝑄2 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
𝑄2 = 4𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
𝑸𝟐 = 𝟕
3(7 + 1)
𝑄3 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
𝑄3 = 6𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
𝑸𝟑 = 𝟐𝟐
When Ungrouped Frequency Distribution is given:
In the case of discrete series or ungrouped frequency distribution, we first find the cumulative
frequency. The last cumulative frequency will be N.

34
Business Statistics Central Tendency

𝑛+1
𝑄1 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
2(𝑛 + 1)
𝑄2 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
3(𝑛 + 1)
𝑄3 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
Illustration: Find the Quartiles of the following marks.
Marks 2 4 6 8 10
No. of Students 5 4 3 2 4

Solution:
Marks 2 4 6 8 10
No. of Students 5 4 3 2 4
c.f. 5 9 12 14 18
Here, N = 18
18 + 1
𝑄1 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
𝑄1 = 4.75 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
𝑸𝟏 = 𝟐
2(18 + 1)
𝑄2 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
𝑄2 = 9.5 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
𝑸𝟐 = 𝟔
3(18 + 1)
𝑄3 = 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
4
𝑄3 = 14.25 𝑡ℎ 𝐼𝑡𝑒𝑎𝑚
𝑸𝟑 = 𝟏𝟎
When Grouped Frequency Distribution is given:
In the case of continuous series or grouped frequency distribution, we find the cumulative
frequency first and then use the interpolation formula.
𝑁
− 𝑐𝑓
𝑄1 = L + ( 4 ∗ 𝐶𝐼)
𝑓
2𝑁
− 𝑐𝑓
𝑄2 = L + ( 4 ∗ 𝐶𝐼)
𝑓
3𝑁
− 𝑐𝑓
𝑄3 = L + ( 4 ∗ 𝐶𝐼)
𝑓

35
Business Statistics Central Tendency

Where,
L = lower limit of the Q1, Q2 and Q3 classes respectively.
cf = cumulative frequency of the class just preceding the corresponding classes.
f = frequency of the Q1, Q2 and Q3 classes respectively and
CI = class Interval of the corresponding classes.
Illustration:
Find the Quartiles of the following data:
Class 0-10 10-20 20-30 30-40 40-50 50-60
F 4 3 2 1 5 6
Solution:
Class 0-10 10-20 20-30 30-40 40-50 50-60
f 4 3 2 1 5 6
c.f. 4 7 9 10 15 21

𝑁
𝑡ℎ
4
21
𝑡ℎ = 5.25
4
Hence, quartile one class is 10-20
5.25 − 4
𝑄1 = 10 + ( ∗ 10)
3
𝑄1 = 10 + (0.4166 ∗ 10)
𝑄1 = 10 + 4.166
𝑄1 = 14.166
For Quartile two,
2𝑁
𝑡ℎ
4
2 ∗ 21
𝑡ℎ = 10.5
4
2 ∗ 21
𝑡ℎ = 10.5
4
Here the quartile two class is 40-50,
10.5 − 10
𝑄2 = 40 + ( ∗ 10)
5
𝑄2 = 40 + (0.1 ∗ 10)
𝑄2 = 40 + 10
𝑄2 = 50
For quartile three lets calculate the quartile three class first.

36
Business Statistics Central Tendency

3𝑁
𝑡ℎ
4
3 ∗ 21
𝑡ℎ = 15.75
4
Hence, the quartile three class is 50-60.
15.75 − 15
𝑄3 = 50 + ( ∗ 10)
6
𝑄3 = 50 + 1.25
𝑄3 = 51.25
Summary:
 A measure of central tendency is a summary statistic that represents the center point or
typical value of a dataset.
 Arithmetic Mean is defined as the sum of observations divided by the number of
observations.
 Median of distribution is that value of the variate which divides it into two equal parts.
 Mode is that value of the variate which occurs maximum number of times in a distribution
and around which other items are densely distributed.
 The values of a variable that divide a distribution into four equal parts are called quartiles.

Review Questions:
1. What do you understand by Central Tendency?
2. Explain the Concept of Arithmetic Mean?
3. What do you mean by median?
4. Explain the concept of mode and quartiles?

37
Business Statistics Measure of Dispersion

Chapter Five
Measure of Dispersion
Learning Objectives;
After Studying this chapter you will be able to;
 Understand the concept of dispersion.
 Discusses the measures of dispersion.
 Explain the meaning of Inter quartile range.
 Explain the concept of Standard Deviation.

38
Business Statistics Measure of Dispersion

Measure of Dispersion
A measure of central tendency summarizes the distribution of a variable into a single figure
which can be regarded as its representative. This measure alone, however, is not sufficient to
describe a distribution because there may be a situation where two or more different
distributions have the same central value. Conversely, it is possible that the pattern of
distribution in two or more situations is same but the values of their central tendency are
different. Hence, it is necessary to define some additional summary measures to adequately
represent the characteristics of a distribution. One such measure is known as the measure of
dispersion or the measure of variation.
The concept of dispersion is related to the extent of scatter or variability in observations. The
variability, in an observation, is often measured as its deviation from a central value. A suitable
average of all such deviations is called the measure of dispersion.
Some important definitions of dispersion are given below:
“Dispersion is the measure of extent to which individual items vary.”
—L.R. Connor
“The measure of the scatteredness of the mass of figures in a series about an average is called
the measure of variation or dispersion.” — Simpson and Kafka
Measures of Dispersion:
Various measures of dispersion can be classified into two broad categories:
1. The measures which express the spread of observations in terms of distance between the
values of selected observations. These are also termed as distance measures, e.g., range,
interquartile range, interpercentile range, etc.
2. The measures which express the spread of observations in terms of the average of
deviations of observations from some central value. These are also termed as the averages
of second order, e.g., mean deviation, standard deviation, etc.
The following are some important measures of dispersion
a) Range
b) Inter-Quartile Range
c) Quartile Deviation
d) Standard Deviation
Range:
The range of a distribution is the difference between its two extreme observations, i.e.,
the difference between the largest and smallest observations. Symbolically, R = L – S where
R denotes range, Land S denote largest and smallest observations, respectively. R is the
39
Business Statistics Measure of Dispersion

absolute measure of range. A relative measure of range, also termed as the coefficient
of range, is defined as:
𝐿−𝑆
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
𝐿+𝑆
Illustration:
Find range and coefficient of range for each of the following data:
1. Weekly wages of 10 workers of a factory are: 310, 350, 420, 105, 115, 290, 245, 450, 300,
375.
2. The distribution of marks obtained by 50 students:
Marks 0-10 10-20 20-30 30-40
No of Students 8 12 14 16

3. The age distribution of 60 school going children.


Age (in Year) 5-7 8-10 11-13 14-16
Frequency 20 18 10 12

Solution:

1. Range = 450-105 = 345 AFS


450 − 105
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
450 + 105
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 = 0.62
2. Range= 40-0 = 40 marks
40 − 0
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
40 + 0
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 = 1
3. Range = 16 – 5 = 11 Years.
16 − 5
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
16 + 5
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 = 0.52
Uses of Range:
In spite of many serious demerits, it is useful in the following situations:
1. It is used in the preparation of control charts for controlling the quality of manufactured
items.
2. It is also used in the study of fluctuations of, say, price of a commodity, temperature of a
patient, amount of rainfall in a given period, etc.

40
Business Statistics Measure of Dispersion

Interquartile Range:
Interquartile Range is an absolute measure of dispersion given by the difference between third
quartile (Q3) and first quartile (Q1).
Symbolically, Interquartile range = Q3 – Q1
Illustration:
Determine the interquartile range of the following distribution:
C.I 11-13 13-15 15-17 17-19 19-21 21-23 23-25
f 8 10 15 20 12 11 4

Solution:
C.I 11-13 13-15 15-17 17-19 19-21 21-23 23-25
f 8 10 15 20 12 11 4
c.f. 8 18 33 53 65 76 80

Calculation of Interquartile Range;


Calculation of Q1
𝑁
𝑡ℎ
4
80
𝑡ℎ = 20
4
Hence, quartile one class is 15-17
20 − 18
𝑄1 = 15 + ( ∗ 2) = 15.27
15

Calculation of Q3,
3𝑁
𝑡ℎ
4
3 ∗ 80
𝑡ℎ = 60
4
Hence, quartile third class is 19-21,
60 − 53
𝑄3 = 19 + ( ∗ 2) = 20.17
12
Thus, the interquartile range = 20.17 – 15.27 = 4.90
Quartile Deviation or Semi-Interquartile Range
Half of the interquartile range is called the quartile deviation or semi- interquartile
range.
Symbolically,

41
Business Statistics Measure of Dispersion

𝑄3 − 𝑄1
𝑄. 𝐷. = ( )
2
The value of Q.D. gives the average magnitude by which the two quartiles deviate from median.
If the distribution is approximately symmetrical, then Md ± Q.D. will include about 50% of the
observations and, thus, we can write Q1= Md – Q.D.
Further, a low value of Q.D. indicates a high concentration of central 50% observations and
vice versa. Quartile deviation is an absolute measure of dispersion. The corresponding relative
measure is known as coefficient of quartile deviation defined as
𝑄3 − 𝑄1
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄. 𝐷. = 2
𝑄3 + 𝑄1
2
𝑄3 − 𝑄1
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄. 𝐷. =
𝑄3 + 𝑄1
Illustration:
Find the quartile deviation, and its coefficients from the following data:
Age( In Year) 15 16 17 18 19 20 21
No of Students 4 6 10 15 12 9 4

Solution:
Age( In Year) 15 16 17 18 19 20 21
No of Students 4 6 10 15 12 9 4
c.f. 4 10 20 35 47 56 60

Calculation of Q1;
𝑁
𝑡ℎ
4
60
𝑡ℎ = 15
4
Q1= 17 (by inspection)
Calculation of Q3,
3𝑁
𝑡ℎ
4
3 ∗ 60
𝑡ℎ = 45
4
Q3= 19 (by inspection)
Now let’s calculate Coefficient of Quartile Deviation.
𝑄3 − 𝑄1
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄. 𝐷. =
𝑄3 + 𝑄1

42
Business Statistics Measure of Dispersion

19 − 17
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄. 𝐷. =
19 + 17
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄. 𝐷. = 0.056
Merits and Demerits of Quartile Deviation:
Merits
1. It is rigidly defined.
2. It is easy to understand and easy to compute.
3. It can be calculated even for a distribution with open ends.
Demerits
1. Since it is not based on all the observations, hence, not a reliable measure of dispersion.
2. It is very much affected by the fluctuations of sampling.
Standard Deviation
From the mathematical point of view, the practice of ignoring minus sign of the deviations,
while computing mean deviation, is very inconvenient and this makes the formula, for mean
deviation, unsuitable for further mathematical treatment. Further, if the signs are taken into
account, the sum of deviations taken from their arithmetic mean is zero. This would mean that
there is no dispersion in the observations. However, the fact remains that various observations
are different from each other. In order to escape this problem, the squares of the deviations
from arithmetic mean are taken and the positive square root of the arithmetic mean of sum of
squares of these deviations is taken as a measure of dispersion. This measure of dispersion is
known as standard deviation or root-mean square deviation. Square of standard deviation is
known as variance. The concept of standard deviation was introduced by Karl Pearson in 1893.
The standard deviation is denoted by Greek letter ‘σ’ which is called ‘small sigma’ or simply
sigma. In terms of symbols

∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2
𝜎=√
𝑛
For individual observations, X1, X2...... Xn , and

∑𝑛𝑖=1 𝑓𝑖(𝑋𝑖 − 𝑋̅)2


𝜎= √
∑𝑛𝑖=1 𝑓𝑖

For a grouped or ungrouped frequency distribution, where an observation Xi occurs with


frequency fi for i = 1, 2 ... n and It should be noted here that the units of σ are same as the units
of X.

43
Business Statistics Measure of Dispersion

In case of Individual Series:


If there are n observations X1, X2 ... Xn, various steps in the calculation of standard deviation
are:
∑ 𝑋𝑖
1. Find 𝑋⃐ =
𝑛
2. Obtain the deviation ⌈𝑋𝑖 − 𝑋̅⌉ for each i= 1, 2 ... n.
3. Square these deviations and add to obtain ∑𝑛 ̅ 2
𝑖=1(𝑋𝑖 − 𝑋 )

2 ∑𝑛 ̅ 2
𝑖=1(𝑋𝑖−𝑋)
4. Compute variance, i.e., 𝜎 =
𝑛
5. Obtain as the positive square root of 𝜎 2
Illustration:
Calculate variance and standard deviation of the weights of ten persons. Weights (in kgs): 45,
49, 55, 50, 41, 44, 60, 58, 53, 55
Solution:
First let’s calculate mean of the series.
∑X
̅
X=
n
510
̅=
X = 51
10
Weights (X) (𝑿 − 𝑿̅) (𝑿𝒊 − 𝑿̅ )𝟐
45 -6 36
49 -2 4
55 4 16
50 -1 1
41 -10 100
44 -7 49
60 9 81
58 7 49
53 2 4
55 4 16
510 356

∑𝑛 (𝑋𝑖 − 𝑋̅)2
𝜎 = √ 𝑖=1
𝑛

356
𝜎=√ = 5.97 𝑘𝑔𝑠
10
In case of Ungrouped or Grouped Frequency Distributions:

44
Business Statistics Measure of Dispersion

Let the observations X1, X2......Xn appear with respective frequencies f1, f2......fn,
where∑ 𝑓𝑖 = N. As before, if the distribution is grouped, then X1, X2......Xn will denote the
mid-values of the first, second.....nth class intervals respectively. The formulae for the
calculation of variance and standard deviation can be written as

∑𝑛𝑖=1 𝑓𝑖(𝑋𝑖 − 𝑋̅)2


𝜎=√
∑𝑛𝑖=1 𝑓𝑖

Illustration:
Calculate standard deviation of the following data.
X 10 11 12 13 14 15 16 17 18
f 2 7 10 12 15 11 10 6 3

Solution:
First let’s calculate mean of the series.

∑ 𝑓𝑋 1064
𝑋̅ = = = 14
∑𝑓 76
X f f*X ̅)
(𝑿 − 𝑿 ̅ )𝟐
(𝑿𝒊 − 𝑿 ̅ )𝟐
𝒇𝒊(𝑿𝒊 − 𝑿
10 2 20 -4 16 32
11 7 77 -3 9 63
12 10 120 -2 4 40
13 12 156 -1 1 12
14 15 210 0 0 0
15 11 165 1 1 11
16 10 160 2 4 40
17 6 102 3 9 54
18 3 54 4 16 48
Total 76 1064 300

300
𝜎=√
76

𝜎 = 1.99
Coefficient of Variation
The standard deviation is an absolute measure of dispersion and is expressed in the same units
as the units of variable X. A relative measure of dispersion, based on standard deviation is
known as coefficient of standard deviation and is given by
𝜎
∗ 100
𝑋̅

45
Business Statistics Measure of Dispersion

This measure introduced by Karl Pearson, is used to compare the variability or homogeneity
or stability or uniformity or consistency of two or more sets of data. The data having a higher
value of the coefficient of variation is said to be more dispersed or less uniform, etc.
Illustration:
Calculate standard deviation and its coefficient of variation from the following
Measurements 0-5 5-10 10-15 15-20 20-25
Frequency 4 1 10 3 2
Solution:
Let (𝑋 − 𝑋̅)= u
X f Mid value (X) f*X u 𝒖𝟐 f*𝒖𝟐
0-5 4 2.5 10 -9.5 90.25 361
5-10 1 7.5 7.5 -4.5 20.25 20.25
10-15 10 12.5 125 0.5 0.25 2.5
15-20 3 17.5 52.5 5.5 30.25 90.75
20-25 2 22.5 45 10.5 110.25 220.5
Total 20 240 300 695

∑ 𝑓𝑋 240
𝑋̅ = = = 12
∑𝑓 20

695
𝜎=√
20

𝜎 = 5.89
Let’s calculate the coefficient of variation.
𝜎
∗ 100
𝑋̅
5.89
∗ 100 = 49.08
12
Summary:
 The concept of dispersion is related to the extent of scatter or variability in observations.
 The range of a distribution is the difference between its two extreme observations,
i.e., the difference between the largest and smallest observations.
 Interquartile Range is an absolute measure of dispersion given by the difference between
third quartile (Q3) and first quartile (Q1).
 The other measure of dispersion is known as standard deviation or root-mean square
deviation. Square of standard deviation is known as variance.
Review Questions

46
Business Statistics Measure of Dispersion

1. What do you understand by dispersion?


2. Explain the measures of dispersion?
3. What is inter quartile range?
4. What is the concept of Standard Deviation?

47
Business Statistics Correlation & Regression

Chapter Six
Correlation and Regression
Learning Objectives;
After Studying this chapter you will be able to;
 Understand the concept of Correlation
 Explain the measures of coefficient of correlation
 Calculate the Karl Pearson’s Coefficient of correlation.
 Calculate Rank coefficient of correlation.
 Explain the concept of regression analysis

48
Business Statistics Correlation & Regression

Correlation Analysis
Various experts have defined correlation in their own words and their definitions, broadly
speaking, imply that correlation is the degree of association between two or more variables.
Some important definitions of correlation are given below:
“If two or more quantities vary in sympathy so that movements in one tend to be accompanied
by corresponding movements in other(s) then they are said to be correlated.”
— L.R. Connor
“Correlation is an analysis of covariation between two or more variables.”
– A.M. Tuttle
“When the relationship is of a quantitative nature, the appropriate statistical tool for discovering
and measuring the relationship and expressing it in a brief formula is known as correlation.”
— Croxton and Cowden
Measures of coefficient of correlation:
Of the numerous techniques of measuring coefficient of correlation, we will discusses the
following techniques in details
 Scatter diagram
 Karl Pearson’s Coefficient of correlation
 Rank coefficient of correlation
Scatter Diagram:
Let the bivariate data be denoted by (Xi, Yi), where i= 1, 2 ...... n. In order to have some idea
about the extent of association between variables X and Y, each pair (Xi, Yi), i= 1, 2......n, is
plotted on a graph. The diagram, thus obtained, is called a Scatter Diagram. Each pair of values
(Xi, Yi) is denoted by a point on the graph. The set of such points (also known as dots of the
diagram) may cluster around a straight line or a curve or may not show any tendency of
association. Various possible situations are shown with the help of following diagrams:

49
Business Statistics Correlation & Regression

If all the points or dots lie exactly on a straight line or a curve, the association between
the variables is said to be perfect. This is shown below:

A scatter diagram of the data helps in having a visual idea about the nature of
association between two variables. If the points cluster along a straight line, the association
between variables is linear. Further, if the points cluster along a curve, the corresponding
association is non-linear or curvilinear. Finally, if the points neither cluster along a straight line
nor along a curve, there is absence of any association between the variables.
It is also obvious from the above figure that when low (high) values of X are associated with
low (high) value of Y, the association between them is said to be positive. Contrary to this,
when low (high) values of X are associated with high (low) values of Y, the association
between them is said to be negative. This part deals only with linear association between the
two variables X and Y. We shall measure the degree of linear association by the Karl
Pearson’s formula for the coefficient of linear correlation.
Karl Pearson’s coefficient of correlation:
Karl Pearson’s Coefficient of Correlation is widely used mathematical method wherein the
numerical expression is used to calculate the degree and direction of the relationship between
linear related variables. Pearson’s method, popularly known as a Pearsonian Coefficient of
Correlation, is the most extensively used quantitative methods in practice. The coefficient of

50
Business Statistics Correlation & Regression

correlation is denoted by “r”. If the relationship between two variables X and Y is to be


ascertained, then the following formula is used:
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅ )
𝑟=
√∑(𝑥 − 𝑥̅ )2 √∑(𝑦 − 𝑦̅ )2
Properties of Coefficient of Correlation
 The value of the coefficient of correlation (r) always lies between ±1. Such as:
r = +1, perfect positive correlation
r = -1, perfect negative correlation
r = 0, no correlation
 The coefficient of correlation is independent of the origin and scale. By origin, it means
subtracting any non-zero constant from the given value of X and Y the vale of “r” remains
unchanged. By scale it means, there is no effect on the value of “r” if the value of X and Y
is divided or multiplied by any constant.
Assumptions of Karl Pearson’s Coefficient of Correlation
 The relationship between the variables is “Linear”, which means when the two variables
are plotted, a straight line is formed by the points plotted.
 There are a large number of independent causes that affect the variables under study so as
to form a Normal Distribution. Such as, variables like price, demand, supply, etc. are
affected by such factors that the normal distribution is formed.
 The variables are independent of each other.
Illustration: The following table gives the demand and price of figure for a commodity for 6
days. Calculate the coefficient of correlation between price and demand?
Days 1 2 3 4 5 6
Price 22 30 25 20 15 8
Demand 10 12 15 20 23 28

Solution
Let (𝑥 − 𝑥̅ )= X and (𝑦 − 𝑦̅)=Y
Days Price Demand X Y X*Y (𝑿𝟐 ) (𝒀𝟐 )
1 22 10 2 -8 -16 4 64
2 30 12 10 -6 -60 100 36
3 25 15 5 -3 -15 25 9
4 20 20 0 2 0 0 4
5 15 23 -5 5 -25 25 25
6 8 28 -12 10 -120 144 100
Total 120 108 -236 298 238

51
Business Statistics Correlation & Regression

∑𝑥 120
𝑥̅ = = = 20
𝑛 6
∑𝑦 108
𝑦̅ = = = 18
𝑛 6
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅ )
𝑟=
√∑(𝑥 − 𝑥̅ )2 √∑(𝑦 − 𝑦̅ )2
−236
𝑟=
√298 √238
−236
𝑟= = −0.886
266.32
Price and demand are negatively correlated in this example.
Spearman’s Rank Correlation
This is a crude method of computing correlation between two characteristics. In this method,
various items are assigned ranks according to the two characteristics and a correlation is
computed between these ranks. This method is often used in the following circumstances:
1. When the quantitative measurements of the characteristics are not possible, e.g., the results
of a beauty contest where various individuals can only be ranked.
2. Even when the characteristics is measurable, it is desirable to avoid such measurements
due to shortage of time, money, complexities of calculations due to large data, etc.
3. When the given data consist of some extreme observations, the value of Karl Pearson’s
coefficient is likely to be unduly affected. In such a situation the computation of the rank
correlation is preferred because it will give less importance to the extreme observations.
4. It is used as a measure of the degree of association in situations where the nature
of population, from which data are collected, is not known.
The Spearman’s Rank Correlation Coefficient is the non-parametric statistical measure used to
study the strength of association between the two ranked variables. This method is applied to
the ordinal set of numbers, which can be arranged in order, i.e. one after the other so that ranks
can be given to each.
The formula to calculate the rank correlation coefficient is:
6 ∑ 𝐷2
𝑟 = 1−
𝑛 (𝑛2 − 1)
Where 𝐷2 is the square of the difference in rank between paired values. The value of r lies
between -1≤ r ≤+1 such as:
1. r = +1, there is a complete agreement in the order of ranks and move in the same
direction.

52
Business Statistics Correlation & Regression

2. r = -1, there is a complete agreement in the order of ranks, but are in opposite directions.
3. r = 0, there is no association in the ranks.
Illustration:
A pickle manufacturer uses different varieties of mangoes for his mango pickles. Two judges
ranked his pickles as follow.
Pickle Code A B C D E F G H
Judge 1 5 3 1 6 2 4 8 7
Judge 2 3 4 1 8 2 6 7 5

The pickle manufacturer is interested in knowing if there is any correlation between the
judgments of the two judges.
Solution:
Pickle Code R1 R2 D 𝑫𝟐
A 5 3 2 4
B 3 4 -1 1
C 1 1 0 0
D 6 8 -2 4
E 2 2 0 0
F 4 6 -2 4
G 8 7 1 1
H 7 5 2 4
2
∑ 𝐷 = 18
N=8
6 ∑ 𝐷2
𝑟 = 1−
𝑛 (𝑛2 − 1)
6 ∗ 18 108
𝑟 =1− =1−
8 ∗ 63 504
𝒓 = 𝟎. 𝟕𝟖𝟔
Illustration:
The table below shows the respective heights in inches of 10 fathers and their eldest sons.
Father 67 63 66 71 69 65 62 70 61 72
Son 68 66 65 70 69 67 64 71 60 68

Find the coefficient of rank correlation.


Solution:
Since the heights are not mentioned in ranks, their ranks are obtained by assigning rank 1 to
the greatest height, rank 2 to the next lower height and 10 to the least height. The same rules
apply to both the series.
Serial No R1 R2 D 𝑫𝟐
1 5 4 1 1

53
Business Statistics Correlation & Regression

2 8 6 2 4
3 6 7 -1 1
4 2 2 0 0
5 4 3 1 1
6 7 5 2 4
7 9 8 1 1
8 3 1 2 4
9 10 10 0 0
10 1 9 -8 64
n=10 2
∑ 𝐷 = 80
6 ∑ 𝐷2
𝑟 = 1−
𝑛 (𝑛2 − 1)
6 ∗ 80
𝑟 = 1− = 0.52
10 ∗ 99
Regression Analysis:
The Regression Analysis is a statistical tool used to determine the probable change in one
variable for the given amount of change in another. This means, the value of the unknown
variable can be estimated from the known value of another variable.
The degree to which the variables are correlated to each other depends on the Regression Line.
The regression line is a single line that best fits the data, i.e. all the points plotted are connected
via a line in the manner that the distance from the line to the points is the smallest. The
regression also tells about the relationship between the two or more variables, then what is the
difference between regression and correlation? Well, there are two important points of
differences between Correlation and Regression. These are:

 The Correlation Coefficient measures the “degree of relationship” between variables, say
X and Y whereas the Regression Analysis studies the “nature of relationship” between the
variables.
 Correlation coefficient does not clearly indicate the cause-and-effect relationship between
the variables, i.e. it cannot be said with certainty that one variable is the cause, and the other
is the effect. Whereas, the Regression Analysis clearly indicates the cause-and-effect
relationship between the variables.
The regression analysis is widely used in all the scientific disciplines. In economics, it plays a
significant role in measuring or estimating the relationship among the economic variables. For
example, the two variables – price (X) and demand (Y) are closely related to each other, so we

54
Business Statistics Correlation & Regression

can find out the probable value of X from the given value of Y and similarly the probable value
of Y can be found out from the given value of X.
There are as many numbers of regression lines as variables. Suppose we take two variables,
say X and Y, then there will be two regression lines:
 Regression line of Y on X: This gives the most probable values of Y from the given values
of X.
 Regression line of X on Y: This gives the most probable values of X from the given values
of Y.
Regression Equation of Y on X:
This is used to describe the variations in the value Y from the given changes in the values of
X. It can be expressed as follows:
𝑌 = 𝑎 + 𝑏𝑋
Where Y is the dependent variable, X is the independent variable, and a & b are the two
unknown constants that determine the position of the line. The parameter “a” tells about the
level of the fitted line, i.e. the distance of a line above or below the origin and parameter “b”
tells about the slope of the line, i.e. the change in the value of Y for one unit of change in X.
The values of ‘a’ and ‘b’ can be obtained by a method of least squares. According to which the
line should be drawn connecting all the plotted points in such a manner that the sum of the
squares of the vertical deviations of actual Y from the estimated values of Y is the least, or a
best-fitted line is obtained when ∑(𝑌 − 𝑌𝑐 )2 the minimum is.
The following algebraic equations can be solved simultaneously to obtain the values of
parameter ‘a’ and ‘b’.
∑(𝑥 − 𝑥̅ ) (𝑦 − 𝑦̅)
𝑏𝑥𝑦 =
∑(𝑥 − 𝑥̅ )2
∑𝑦 − 𝑏 ∑𝑥
𝑎𝑥𝑦 =
𝑛
Regression equation of Y on X can also be written as:
𝑦 − 𝑦̅ = 𝑏(𝑥 − 𝑥̅ )
Regression Equation of X on Y:
This is used to describe the variations in Y from the given changes in the value of X. It can be
expressed as follows:
𝑋 = 𝑎 + 𝑏𝑌
Where X is the dependent variable and Y is the independent variable. The parameters ‘a’ and
‘b’ are the two unknown constants. Again, ‘a’ tells about the level of fitted line and ‘b’ tells

55
Business Statistics Correlation & Regression

about the slope, i.e. the change in the value of X for a unit change in the value of Y. The
following are the two normal equations that can be solved simultaneously to obtain the values
of both the parameters ‘a’ and ‘b’.
∑(𝑥 − 𝑥̅ ) (𝑦 − 𝑦̅)
𝑏𝑥𝑦 =
∑(𝑦 − 𝑦̅)2
∑𝑥 − 𝑏 ∑𝑦
𝑎𝑥𝑦 =
𝑛
Regression equation of X on Y can also be written as:
𝑥 − 𝑥̅ = 𝑏(𝑦 − 𝑦̅)
Illustration:
A gynecologist records the blood pressures of her pregnant patients and collected the following
data.
Age 23 24 25 26 28 29 31 35 40
BP 65 60 62 70 70 73 75 83 90

Assuming age as X and BP as Y, calculate the two regression equations. Also estimate the BP
if the age of the patient is 38 years.
Solution:
Lets (𝑥 − 𝑥̅ )= V and (𝑦 − 𝑦̅)=U
∑𝑥 261
𝑥̅ = = = 29
𝑛 9
∑𝑦 648
𝑦̅ = = = 72
𝑛 9

56
Business Statistics Correlation & Regression

S.N X Y V U V*U 𝑽𝟐 𝑼𝟐
1 23 65 -6 -7 42 36 49
2 24 60 -5 -12 60 25 144
3 25 62 -4 -10 40 16 100
4 26 70 -3 -2 6 9 4
5 28 70 -1 -2 2 1 4
6 29 73 0 1 0 0 1
7 31 75 2 3 6 4 9
8 35 83 6 11 66 36 121
9 40 90 11 18 198 121 324
Total 261 648 420 248 756

Regression equation of Y on X:
𝑌 = 𝑎 + 𝑏𝑋
∑(𝑥 − 𝑥̅ ) (𝑦 − 𝑦̅)
𝑏𝑥𝑦 =
∑(𝑥 − 𝑥̅ )2
420
𝑏𝑥𝑦 = = 1.694
248
And
∑𝑦 − 𝑏 ∑𝑥
𝑎𝑥𝑦 =
𝑛
648 − (1.694 ∗ 261)
𝑎𝑥𝑦 =
9
𝑎𝑥𝑦 = 22.874
Y=22.874+1.694Y
𝑌𝑐 =22.874+1.694(38)=78.246
Regression equation of Y on X:
𝑋 = 𝑎 + 𝑏𝑌
∑(𝑥 − 𝑥̅ ) (𝑦 − 𝑦̅)
𝑏𝑥𝑦 =
∑(𝑦 − 𝑦̅)2
420
𝑏𝑥𝑦 = = 0.556
756
And
∑𝑥 − 𝑏 ∑𝑦
𝑎𝑥𝑦 =
𝑛
261 − (6 ∗ 648)
𝑎𝑥𝑦 =
9
𝑎𝑥𝑦 = −11.032
X=-11.032+0.556Y

57
Business Statistics Correlation & Regression

Summary:
 Correlation is an analysis of covariation between two or more variables.
 Karl Pearson’s Coefficient of Correlation is widely used mathematical method wherein the
numerical expression is used to calculate the degree and direction of the relationship
between linear related variables.
 When the quantitative measurements of the characteristics are not possible, e.g., the results
of a beauty contest where various individuals can only be ranked.
 The Regression Analysis is a statistical tool used to determine the probable change in one
variable for the given amount of change in another.

Review Questions:
1. What do you understand by correlation?
2. Explain the measures of correlation analysis?
3. Explain the meaning of Ranked coefficient of correlation?
4. Elaborate the concept of regression?
5. Explain the Equation of regression analysis?

58
References
1. Balwani Nitin Quantitative Techniques, First Edition: 2002. Excel Books, New Delhi.
2. Bhardwaj R.S., Business Statistics, Excel Books.
3. Garrett H.E. (1956), Elementary Statistics, Longmans, Green & Co., New York.
4. Psychology and Education , Mc Graw
5. Guilford J.P. (1965), Fundamental Statistics in Psychology and Education , Mc Graw
6. Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.
7. Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
8. Hill Book Company, New York.
9. Hooda R.P., Statistics for Business and Economics, Macmillan India Delhi, 2008
10. Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New
Delhi.
11. Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.

59

You might also like