You are on page 1of 47

PART A

STATISTICS FOR ECONOMICS


UNIT 1
INTRODUCTION
CHAPTER 1: MEANING SCOPE AND IMPORTANCE OF STATISTICS IN ECONOMICS
CHAPTER 1: MEANING SCOPE AND IMPORTANCE OF STATISTICS IN
ECONOMICS

1.1 INTRODUCTION
Historically, all countries had a system of collection of data. There are records to show that ancient
Egypt governments collected data on population and wealth as early as 3050 B.C. Similarly Rome
conducted census as early as 435 B.C. We also find evidences of collection of data in ancient India,
for example, in Manusmriti, Chankya's Arthashastra, etc. Such data collection was necessary for
running the affairs of the state. In the beginning, the data collection was mainly confined to
population, land and revenue. But with the passage of time, as the intensity of administration
increased, the scope of collection also increased. Modern governments do collect a large variety of
data. They not only collect data but also publish data for the knowledge of the public and for the
use by reseachers.
It seems that 'Statistics' has originated from the word status (Latin), stato (Italian), statistik
(German), etc. The meaning of all these words is 'political state'. The meaning itself suggests that
data was collected mainly to run the political affairs of the country. For example, in ancient India
Kautilya collected data about population and wealth during the regime of Chandragupta Mauraya
and published as a book "Kautilya's Arthshashtra". Since this is most probably the only written
evidence, Kautilya can be considered father of ancient statistics. Today, however, the scope is
much wider. Collection of statistics is very old, but development of 'statistics' as a science is about
400 years old. Some famous names who contributed to developing 'statistics' into a science are
Galileo, Abraham De Moiore, Marquis De Laplace, Francis Galton, Karl Pearson and so on. Out of
these Karl Pearson (1857-1937) is regarded as the father of modern Statistics. Among the
noteworthy Indian scholars who contributed to the development of statistics as a science are P.C.
Mahalnobis, V.K.R.V. Rao, R.C. Desai, etc.
Statistics are needed to make an empirical study. The term empirical means relating to direct
observations. Empirical analysis means analysis of knowledge acquired as a result of direct
observations. We go to the market and make a survey. We collect information about prices and
quantities traded. We record this information in a systematic manner. It is called collection of data.
We further work on this data, classify them into groups, calculate averages, trends, etc. We analyse
the results so obtained and draw conclusions say about relation between price and quantity sold
and so on. The type of conclusions we draw is determined by the purpose of collecting data.
Data are collected by private individuals, government institutions, etc. A private individual may
collect data about the business he is running. His interest is to know the progress of his business.
Government institutions collect data, record their income, expenditure, progress of activities, etc.,
undertaken by them. There are many government agencies engaged purely in collection of data
about the country like national income, national expenditure, saving, investment, etc., employment,
production and so on. All data about the economy as a whole called macro economic data. The
analysis of these data with the aim of drawing conclusions about the economy is termed as
quantitative analysis.
For an empirical study we require data. We also require methods to give some meaning to
these data. Both these aspects are studied in statistics as a discipline.

1.2 WHAT IS ECONOMICS?


We are studying "Statistics" as a part of economics curriculum. Therefore, before we study how
statistics are important for study of economics, we must have some idea of what "economics" is.
There is no unique way of describing 'what economics is about'. Different people think differently.
Some think that economics is a study of money. Some think that economics deal with stock market.
Some others think that economics deals with problems like price rise, unemployment, etc. In fact,
economics deals with all these and many other things, but none of these individually conveys what
economics is about.
The simplest way of describing economics is that it is a study of behaviour of people about earning
and spending of income. People earn income by participating in production of goods and services.
Their sum is called national income. People spend income on purchasing goods and services meant
for family members to satisfy their wants. It is called consumption expenditure. People spend on
production units, i.e., farms, factories, offices, etc., for producing goods and services. It is called
investment expenditure. Production, consumption and investment are the three basic economic
activities people perform throughout life.
Economics is not only about how much people earn and spend. Economics is about the basis on
which people take decisions as to how much to work, how much to consume, and how much to
invest. The economists arrive at conclusion in two stages.
In the first stage, the economists apply logic. They try to predict logically as to how people are
expected to behave. For example, they try to predict as to how a consumer is expected to behave
when the price of the good, he buys changes.
In the next stage, they watch actual behaviour. The purpose is to check their conclusions based on
logic. It is possible. that the actual behaviour may not tally the logical behaviour the economists
visualized. If it tallies, logical theory is confirmed. If it doesn't tally there is something wrong with
'logic' or the assumptions behind the logic, or in recording the actual behaviour.
This calls for reformulation of theory until it matches with actual behaviour.
Watching actual behaviour amounts to collecting data, summarizing into tables, calculating
averages and deviations from averages, preparing index numbers, correlating trends of related
variables, and so on. Representing this data in diagrams makes the understanding of data easy.
To sum up, examination of data is a crucial step in formulating an economic theory. This makes
"Statistics" (both data and methods) an integral branch of economics.

1.3 MEANING OF STATISTICS


For a common man statistics means 'data'. By data we mean a collection of any number of related
observations. National income of India during the year 1997-98 was ₹1005617 crores. Production of
wheat in India during the year was 71-3 million tonnes. These are all examples of data.
There is another meaning of statistics. Besides data, statistics is also a subject like the subjects of
Economics, Commerce, Accountancy, Physics, Chemistry, etc. As a subject statistics is taken to mean
'statistical methods'. It is defined as a science of collection, organisation, presentation, analysis and
interpretation of data.
In brief, the word statistics can be looked at from two angles. In one sense, the plural sense, of the term it
is simply data. The data in the plural sense must have the following characteristics :
1. It is aggregate to numerical facts.
2. The facts must be related to one another so that it is comparable.
3. The data must be collected and enumerated in a systematic manner with a reasonable degree
of accuracy.
4. There must be some predetermined purpose behind the collection of data.
On the basis of above statistics in the plural sense can be defined as aggregates of numerical facts
related to one another estimated in a systematic manner with a reasonable standard of accuracy for
a predetermined purpose. This definition is given by H. Secrist.
In the other sense, the singular sense, it is the statistical methods. As statistical methods it deals with
methods of collection, presentation, analysis and interpretation of numerical data with the objective of
drawing conclusions and taking decisions. This definition is given by Spiegel. In this book, we will study
the statistics in the second sense.
We will study methods of collection (Ch 3), organisation (Ch 4), presentation (Ch 5) and analysis
of data.

1.4 SCOPE OF STATISTICS IN ECONOMICS


Scope here means the range of applicability of statistics in different fields. The range is very wide. It is
because of two reasons. First, statistics help in drawing conclusions from the existing facts collected.
Second, the statistics help in predicting the course of events in future on the basis of past experience
converted into statistics. Today, we can more or less accurately predict the likely growth of population in
states.
Statistics are indispensable for all branches of knowledge, be it business, economics, history, civics,
physics, chemistry, and so on. Activities of business, trade, banking, etc. all make extensive use of
statistics.
Statistics are of great relevance in business. Planning of a business requires making a choice
among the alternatives available which is possible only by using statistical techniques. It also
requires conducting of consumer survey to determine tastes and preferences. Manufacturing,
marketing, etc. all require use of statistics and statistical methods.
No government can function effectively without the help of statistics. For example, the education
department of Nagaland must know the number of eligible students before deciding how many
schools to open and what type of schools to open. This can be determined only by conducting
surveys. Population census, manufacturing census, farming surveys are routines of today's
government.
Statistics and economics are inseparable. Statistics are of great help in formulating economic
theories, as explained in section 1.5. They are also of great help in making empirical and
quantitative analysis as explained in section 1.4.

1.5 IMPORTANCE OF STATISTICS IN ECONOMICS


To make empirical and quantitative analysis we require data and the scientific methods to collect,
organise and present the required data. In other words, we require statistics for many other
purposes. Let us explain some of the purposes for which statistics are required.
(i) Needed for systematic presentation of facts
Simply collection of data is not sufficient. It must be organised into groups, tables, etc. For
example, suppose we collect data about 1000 school-going children. It may relate to their sex, age,
income of parents, etc. But these data will not be useful unless organised into groups, tables, etc.
Only then we can give some meaning to the data. Statistics as a subject has evolved scientific
methods for organising data.
(ii) Needed for forecasting future trends
What is likely to be the price level after one year? How much is likely to be the population of India
after 10 years? We are all interested in forecasting, but government has special interest in it. It is
government's duty to make provision of certain common facilities for the population, like drinking
water, education, medical care, etc. Therefore, government must have some idea in advance about
the quantum of facilities to be produced. Statisticians calculate trend on the basis of the data of past
years and make an estimate of the future years. People in all walks of life can gain from forecasts.
Businessmen can plan their production. Consumers can plan their budget.
(iii) Needed for making comparisons
India has adopted the path of planning to develop its economy. The NITI Ayog (formerly Planning
Commission) prepares Five-year Plans and government implements these plans. What is the success rate in
each plan ? The overall success rate is measured by the rate of growth of national income at constant
prices. In brief, it is called 'growth rate'. By comparing growth rates we can know the rate of success of
each plan and make suitable changes in future plans accordingly. Similar comparisons can be attempted in
other areas like production, consumption, investment, saving, share prices, general price level, inequalities,
poverty, unemployment, etc. Such comparisons are very useful for learning lesson about future.
(iv) Useful for formulation of policies
Government makes changes in policies from time to time. For making changes or for formulating
new policies it is necessary that government is in possession of sufficient amount of data. Suppose
government wants to make education free for poor children. For this government must have some
idea as to how many children fall in this category. It has to make provision in the budget and has
also to make other arrangements. Statistics are of great help to the government in administrating the
country effectively. Similarly businessmen, industrialists, exporters, importers, all can plan their
future operations in advance on the basis of data.
(v) Needed for construction of an economic theory
A theory is simply a generalised behaviour of certain variables. An economic theory generalises behaviour
of economic variables. For example, the famous law of demand generalises the relation between change in
price and the consequent change in demand. For evolving the law of demand it is necessary that we collect
information about price and demand of a commodity, put them together and study the relation. An
economic theory can also be constructed on the basis of logical reasoning. But a theory is accepted to be
true only when it is supported by facts.
(vi) Needed for planning
The NITI Ayog (formerly Planning Commission) cannot take even a step forward without statistics. The
full form of NITI Ayog is National Institution for Transforming India. It must know the position of each
sector of the economy in statistical terms before attempting to make allocation of funds among these
sectors. Similarly, businessmen also require sufficient amount of data to plan their future operations. In this
way, statistics play a useful role in planning.

Need for Statistics

For systematic For For making For formulation For construction For
presentation of facts forecasting comparisons of policies of economic planning
theory

1.6 LIMITATIONS OF STATISTICS


There are many limitations of statistics. Some of these are :
(i) Statistics do not deal with qualitative aspects
Statistics deal only with the aspects which can be expressed in numerical terms. It is because only the
numerical values can be analysed and interpreted. Qualitative aspects, like happiness, sorrow, utility, etc.
cannot be expressed in numerical terms. Therefore, the qualitative aspects cannot be organised into tables,
charts, diagrams, etc. and no statistical method can be applied on these.
(ii) Statistics do not deal with individuals
Statistics deal with groups, not individuals. The collective behaviour pattern of a group need not mean that
every individual in the group has the same behaviour pattern. Statistics deal with groups of consumers and
producers, a country, a village, and so on.
(iii) Statistical average of a group may not hold for each individual of that group
For example, average age of a group of persons with age l, 3, 6 and 18 is 7 years. It does not mean that each
person's age is 7 years. Take another example, per capita income (at current prices) of India during 2009-10
was about ₹44,000 per year, i.e. about 3,600 per month. It cannot be taken to mean that each individual in
the country is earning ₹3,600 per month. There are people earning many times more and others earning
much less.
(iv) Most statistical results are mere approximations
Most of the statistical enquiry are based on a small sample taken from a group, called universe. The sample
may not truly represent a group. There may be a bias in selection of a sample. A sample is a sample and can
at the most some approximation of the behaviour of a group.
(v) Statistics can be misused
It can be misused in many ways. For example, the investigator may first decide what he wants to
prove and then manipulate collection of data accordingly. There are many alternative methods to
drive a result. The investigator may choose the method which is helpful in proving the result which
he has already decided.

POINTS TO REMEMBER
 Need for empirical and quantitative analysis establishes need for statistics.
 Statistics, in the plural sense, means data, i.e., collection of related observations.
 Statistics, in the singular sense, means 'statistical methods', i.e., science of collection,
organisation presentation, analysis and interpretation of data.
 Statistics, as a subject, is needed for many purposes. Some main purposes are .
 Needed for systematic presentation of facts.
 Needed for forecasting future trends.
 Needed for making comparisons.
 Useful for formulation of policies.
 Needed for construction of economic theory.
 Needed for planning.

EXERCISES
MULTIPLE CHOICE QUESTIONS [1 Mark]
(Answers at the end of exercise)
Choose the correct alternative in the following questions
I. Statistics in the plural sense is :
(a) Numerical data (b) Presentation of numerical data
(c) Analysis of numerical data (d) All the above
2. Statistics deal with
(a) Qualitative aspects (b) Quantitative aspects
(c) Both (a) and (b) (d) None of the above
3. Economists arrive at conclusions .
(a) By applying logic
(b) By collecting facts
(c) Applying logic followed by collection of facts.
(d) Collection of facts followed by applying logic.
4. Study of economics is concerned with
(a) Earning of income
(b) Spending of income on satisfaction of wants
(c) Spending of income on investment
(d) All the above.

SHORT ANSWER QUESTIONS-I [3 Marks]


Answer the following questions in about 60 words.
1. What is the meaning of statistics?
2.Why do we need statistics? Give three reasons.
3.Explain how statistics are useful (a) for forecasting and (b) for planning.
4.Explain with the help of an example the use of statistics in construction of an economic theory.
5. Explain with the help of an example the use of statistics in formulation of a government policy.
6.Describe three basic economic activities.
7.Explain briefly any two limitations of statistics.

SHORT ANSWER QUESTIONS-II [4 Marks]


Answer the following questions in about 70 words.
1. Explain any two importance of statistics.
2. Explain any two limitations of statistics in details.
3. Explain statistics in the 'plural' sense.
4. Explain statistics in the 'singular' sense.

LONG ANSWER QUESTIONS [6 Marks]


Answer the following questions in about 100 words.
1. Explain the meaning of statistics.
2.Explain any three uses of statistics.
3. Explain any three limitations of statistics.

VALUE BASED QUESTION


How important can be statistics in formation of policy intenteded to raise quality of life of the people?
Explain.
Answer to the Multiple Choice Questions
1. (a) 2. (b) 3. (c) 4. (d)
UNIT 2
COLLECTION, ORGANISATION AND
PRESENTATION OF DATA
Chapter 2: SOURCES OF DATA

Chapter 3: COLLECTION OF PRIMARY DATA: CENSUS AND SAMPLING METHODS

Chapter 4: FREQUENCY DISTRIBUTION

Chapter 5: TABLES

Chapter 6: DIAGRAMS

CHAPTER 2: SOURCES OF DATA


2.1 INTRODUCTION
The purpose of this chapter is to explain the different steps required to be taken in the collection of
data in the course of a statistical enquiry. A statistical study is normally undertaken to analyse
behaviour of some variables, to make comparisons among variables, to make some policy decisions,
etc. It is not necessary that the person or the organisation which needs data always collects the data
himself. Collection of data is not a child's play. It involves time and money. This is why there are
specialised agencies which are engaged in data collection. Most of these agencies are government
owned in all the countries of the world. The data so collected is published periodically (daily,
weekly, monthly, yearly and so on). For example, in India, data on growth of population is collected
by Government of India every ten years published as 'population census'. Data on national income is
collected by the Central Statistics Office (C.S.O.), a government agency and published in the annual
publication 'National Accounts Statistics'. The collection of data on national variables is so much
costly and time consuming that no single individual or any single private organisation can afford to
undertake. So, most users of the data depend on published sources. Very few are in a position to
undertake the collection of data required by them. This leads us to a distinction between primary
sources and secondary sources of data.

2.2 PRIMARY AND SECONDARY SOURCES OF DATA

2.2.1 Meanings of the Two Sources


The word 'primary' stands for original or first rank. The word 'secondary' stands for second rank.
These meanings should give you some idea about the difference between primary and secondary
sources of data.
Before we study the difference let us take an example. There is an important annual publication of
India called Economic Survey published every year before the presentation of budget in the
parliament. It contains lots of statistical tables about Indian economy. The data contained in these
tables is not collected by the publisher organisation of Economic Survey but taken from other
published and unpublished sources like Census of India, National Accounts Statistics, Reserve Bank
of India Bulletin, NITI Ayog (formerly Planning Commission) etc.
It is clear from the above example that Economic Survey uses and publishes data which already
exists. For example, the data about population published in Economic Survey already existed in
another publication 'Census of India'. On the other hand, the data published by the Census of India is
also collected and compiled by the Census of India. The data published in Economic Survey is a
secondary source. The data published by the Census of India is a primary source. We can now define
the two sources.
Primary source of data refers to the data (published or un-published) collected and issued by
the same agency. For example, National Accounts Statistics (NAS) is a primary source because the
data contained in NAS is collected as well as published by the same organisation, i.e., Central
Statistics Office.
Secondary source of data refers to the data published or used by one agency but originally
collected and compiled by some other agency. For example, data on national income published in
Economic Survey is secondary data because it was collected and compiled by the Central Statistics
Office. In other words, Economic Survey simply published the already existed data.
There are a large number of primary sources in India. Important among these are Census of India,
Central Statistics Office, Directorate of Economics and Statistics of state government and various
ministries at the centre, Reserve Bank of India, NITI Ayog, Department of Public Enterprises,
Labour Bureau, etc.
2.2.2 Reasons for Preferring a Primary Source
When users of data have the choice between a primary source and a secondary source, which of
the two should be preferred ? If a choice is there it is always better to refer to the primary source
rather than a secondary source. There are many reasons for preferring a primary source.

(i) The primary source includes definitions of terms and units used.
Take, for example, data on national income published in the Economic Survey (a secondary
source) and that published in the National Accounts Statistics (a primary source). Economic
Survey simply picks up statistical tables from "National Accounts Statistics" without bothering to
publish notes on methodology either published in the primary source itself or given in a separate
publication. For those users who want to be definite about the data they intend to use, there is no
option but also to go through the notes on methodology given in the primary source. These notes
give detailed definitions of terms and units used in compiling data.

(ii) The primary source is often accompanied step by step methodology used in the collection
of data.
The step by step methodology discloses various aspects of data collection. Some of these aspects
are the model of questionnaire used, the procedure used in selecting the sample, the methods used
in approaching the respondents, etc. These aspects make the users of the data more confident in
using the data.

(iii) The primary source gives data in more details.


The secondary source often publishes data in the condensed form. For example, Economic Survey
(a secondary source) publishes population data only on state-wise basis. The Population Census (a
primary source) also publishes data on district-wise basis. For an investigation thus who wants
more details primary source is more useful.

(iv) Mistakes and errors may creep into the secondary source while copying data from the
primary source.
There is possibility that errors may creep into the secondary source when data is copied from the
primary source. For example, instead of 5666 one may copy 566. The intelligent users often
discover such mistakes. They have no option but to refer to the primary source for correction.
From the above, it is clear that an intelligent user of the data can assure himself of the reliability,
accuracy and applicability of data, he is using only after examining the primary source of the data.

2.3 STEPS IN COLLECTION OF DATA


2.3.1 Steps in a Statistical Enquiry
A statistical enquiry (study) refers to the collection, presentation, analysis and interpretation
of numerical data. The enquiry may be conducted just to measure the magnitude of one or more
variables. But normally the purpose is more than this. Once the magnitudes are obtained the
investigator may also study the behaviour of these variables and also the inter-relationship among
those variables. For example, suppose an investigator conducts a statistical enquiry about
agricultural production. In this process he ends up by collecting data on outputs produced and
inputs used for producing the output. He can now use this data to study composition of inputs and
outputs of various crop and changes in these over time. He may also study the relationship
between inputs and outputs. The inferences drawn from these studies may be very useful for
Planning Commission, government, farmers, etc., for making future plans about agriculture.
A statistical enquiry is thus not limited to only collection of data but also to presentation, analysis
and interpretation of data.
Collection of data refers to gathering information by contacting those who can supply this
information or those about whom this information is to be collected. There are many aspects of
collection of data. What statistics should be obtained, from whom and by what methods. All these
aspects are studied in section 2.3.2 of this chapter.
Presentation of data refers to organising data in the form of statistical tables, graphs or in any
other suitable form. The method of presentation are explained in chapter 3.
Analysis of data refers to the classification of data in categories. This categorisation must be in
accordance with the purpose of statistical enquiry.
Interpretation of data refers to the inferences and conclusions drawn out of the analysis.
Interpretation of data answers the question. What do the figures tell us? This is the final step in any
statistical investigation.
The remaining sections of this chapter are confined to only one aspect of a statistical enquiry, i.e.,
collection of data.

2.3.2 Steps in Collection of Data


Collection of data is only one of the many exercises involved in undertaking the conduct of a
statistical enquiry. Presentation, analysis and interpretation are the other exercises involved.
Although collection of data is only one of the exercises, yet it is most important one because the
outcome of all the other exercises that follow will depend on how accurately the data collection
has been done.
The agency, or the individuals, who undertakes the statistical enquiry has to plan all stages of data
collection much in advance. He has to move step by step.
The following are some of the major steps required in undertaking data collection.

(i) Draw a complete plan.


It is necessary to draw as complete a plan as possible before the actual collection begins. The
statistician has to plan three things. First, what data are to be obtained ? Second, from whom the
data are to be obtained ? Third, by what methods the data are to be obtained?
The statistician has also to plan the expenditure to be incurred and the time period during which
the data is to be collected. The expenditure and the time period would determine most of the
subsequent steps he has to take.

(ii) Decide whether a sample or a census to be adopted?


In the year 2000, India's population reached 100 crores mark. Suppose government is
interested in collecting data about occupational structure of India's population. There are two
alternative ways of doing so. One way is to contact each and every individual and record
their occupational status. Such a method of collecting data is called the Census method.
Another way is to contact a part of total population, say one crore people (out of total of 100
crores), record their occupations and apply the result to the entire population. It is called the
Sample method because we collect data only about carefully selected sample from the total
population. It is clear that sample method as compared to the census method would involve
less money and time. The choice between the census and sample methods would largely
depend on the budget of the data collecting agency and also the time period during which
data must be collected and processed and available to the perspective users (A detailed study
of census and sample methods is given in chapter 3).
(iii) Prepare design of the form/questionnaire.
Designing of the questionnaire is influenced by many considerations.
First, how many questions are to be asked from the respondents ? If there are too many
questions, the respondent may loose interest and may not cooperate. He may start giving
vague answers. The questions number must be determined keeping in mind the psychology of
the respondents.
Second, each question must be clearly phrased so that a clear answer comes from the
respondent. The questions should be phrased in such a manner that only short answers are
required. For example, the answers may be of yes/no type. If feasible, all possible alternative
answers may be given on the questionnaire itself so that the respondent is required to tick the
relevant answer. The purpose is to make the respondent as comfortable as possible and take
his time as little as possible.
Third, whether the form is to be filled by an enumerator, or left with the respondent to
complete on his own, or a combination of the two. It will primarily depend on the resources
available with the statistician and the time constraint. Approaching each respondent
individually and filling of the form by the enumerator himself involves lot of expenses and
time but produces better response and qualitatively better outcome.
While preparing a questionnaire the statistician has to keep in mind the content, wording and
the sequence of each question.
(iv) Decide the mode of distribution of questionnaire.
There are two alternative ways of distributing questionnaires. One way is to send the forms
by post and the respondents are also expected to return the forms by post. Another way is to
personally interview the respondents. The interview may be either face to face or by
telephone. Each method has its own advantages.
The main advantages of mail questionnaire vis-a-vis personal interview are .
(a) Lower cost : Mailing costs are lower than the costs of personal interview.
(b) Wider reach : Possible to reach more number of people spread over wider geographical
area. It may not be possible through personal interview.
(c) Independent views by respondents : Since the enumerator is not present personally, the
respondent is not influenced by the enumerator. He expresses his views freely.
The main advantages of personal interview are
(a) Spontaneous response : When questions are asked personally, the response of the
respondent is spontaneous. The respondent has no time to dilute his answer.
(b) Large response rate : In personal interviews there are little chances of 'no response' from
respondents. In mail questionnaire, the respondents may not care to respond and the response
rate is lower.
(c) Clarifications sought by respondents are answered : If a respondent is not able to
understand the question, the inter-viewing enumerator can quickly clarify. In mail
questionnaire such a clarification is not possible.
(v) Check the filled in forms for completeness and consistency.
This is the last stage in data collection. This stage comes when duly filled in forms are
returned by the respondents. Each return, i.e., questionnaire, must be examined and wherever
necessary edited. Alterations may be made during editing.
After the above steps are taken the data recorded in the questionnaire is ready for
summarisation. One method of summarisation is tabulation of data. This is the first step
towards presentation followed by other steps like graphic representation, etc. The methods of
presentation are explained in Chapter 5.
POINTS TO REMEMBER
 There are two sources of data : primary and secondary sources.
 Primary source refers to the data collected and issued by the same agency.
 Secondary source refers to the data published or used by one agency but originally collected
and compiled by some other agency.
 Primary source is preferred to the secondary source on account of the following
reasons:
-The primary source includes definitions of term and units used.
-The primary source gives data in more details.
-Mistakes and errors may creep into the secondary sources while copying data from
the primary sources.
 A statistical enquiry (study) refers to the collection, presentation, analysis and
interpretation of numerical data.
 The major steps undertaken in data collection are:
-Draw a complete plan.
-Decide whether a sample or a census to be adopted.
-Prepare design of the form/questionnaire.
-Decide the mode of distribution of questionnaire.
-Check the filled in forms for completeness and consistency.

EXERCISES
MULTIPLE CHOICE QUESTIONS [1 Markl
(Answers at the end of exercises)
Choose the correct alternative in the following questions

1. Government of India's 'Economic Survey'.


(a) Publishes only primary data
(b) Publishes only secondary data
(c) Publishes both primary and secondary data
(d) Does not publish any data
2. A statistical enquiry is limited to:
(a) Only collection of data (b) Only presentation of data
(c) Only analysis of data (d) All the above
3. Census of India :
(a) Publishes only primary data (b) Publishes both primary and secondary data
(c) Publishes only secondary data (d) Does not publish any data
4. Primary source of data is preferred to secondary source because
(a) It explains definitions (b) It explains methodology
(c) It gives more details (d) All the above
5. One of the following is the characteristic of mail questionaire:
(a) Higher cost (b) Wider reach

(c) Large response rate (d) None of the above

SHORT ANSWER QUESTIONS-I [3Marks]


Answer the following questions in about 60 words.
1. Distinguish between primary and secondary sources of data.
2. What is primary source of data? Give one example of primary source in India.
3. What is secondary source of data? Give two examples of secondary source in India.
4. Give three reasons for preferring primary sources in comparison to secondary sources.
5. Distinguish between the census and sample methods of collection of data.
6. Explain briefly the two modes of distribution of questionnaire.

SHORT ANSWER QUESTIONS-II [4 Marks]


Answer the following questions in about 70 words.
1. Explain the distinction between primary and secondary sources of data.
2. Explain any two reasons for preferring the primary sources of data.
3. What is a statistical enquiry? Explain.
4. State briefly the steps taken in collection of data.
5. Explain the distinction between the census and sample methods of collection of data.
6. Explain the main advantages of personal interview.
7. Explain the main advantages of mail questionnaire.
LONG ANSWER QUESTIONS [6 Marks]
Answer the following questions in about 100 words.
1. Explain four reasons for preferring primary data.
2. Explain four purposes for which statistics are required.
3. Explain the steps taken in the collection of data.
VALUE BASED QUESTION
Conducting a census of population and its characteristics is good thing. But what is good about it?
Explain.
Answer to the Multiple Choice Questions
1. (b) 2. (d) 3. (a) 4. (d) 5. (b)
CHAPTER 3: COLLECTION OF PRIMARY DATA: CENSUS AND
SAMPLING METHODS
3.1 INTRODUCTION
To explain the difference between the census and sample methods of collecting data, let us take a
highly simplified example. Suppose, it is planned to make a study of home environment of students
of a particular school. Suppose, the total number of students in that school is 2000. There are two
alternative ways of conducting the study. The one way is to send investigators to contact parents
and family members of each of the 2000 students and collect the required information from them.
The second way is that we select some students, say 10 per cent of the students, and contact only
their parents and family members. It means that we contact homes of only 200 students instead of
2000 students. Whatever inferences we draw from these 200 students we can apply the same to all
the 2000 students.
Let us now assign technical names to certain words used in our example. The total number of
students of the school is population or Universe. It can be defined as the whole set of individual
ojects of some common characteristics about which we are interested in gaining information. Each
student is one element in the population. It means there are 2000 'elements' in 'population'. When a
study is made of all the elements in a population it is called a Census Survey.
When instead of total population only some selected elements of the population are studied, the set
of selected elements is called a population sub-set or a sample. The study of this sample is called
Sample Survey. In our example the 200 (out of a total of 2000) students of the school is a sample
and the study of their home environment is Sample Survey.
In brief, in a census survey we are interested in obtaining information about all elements in the
population and then arrive at inference. In a sample survey we obtain information about selected
elements, i.e. a sub set of population, and apply the inferences drawn from this study to total
population.

3.2 SAMPLE SURVEY VS. CENSUS SURVEY


Which of the two methods is better? There is no doubt about it that census survey is more
comprehensive a survey than a sample survey. We should get better results when we survey the
homes of all the 2000 students than only the homes of 200 students. But there are many things to
be considered before we make a choice. The things to be considered are :
(i) Time involved in : It takes less time to conduct a sample survey than a census. It will
take one tenth of time to contact 200 homes as compared to contacting 2000 homes.
Therefore, sample survey is less time consuming. It means that we can draw inferences more
quickly from a sample survey as compared to a census survey. This is one merit which
sample survey enjoys.
(ii) Cost involved : Expenditure incurred on conducting a sample survey is many times
less than on conducting a census survey. The cost incurred on contacting 200 homes wide be
about one tenth of the cost incurred on 2000 homes. Therefore, sample survey is less costly.
This is another merit of sample survey.
(iii) Information to be collected : A questionnaire is prepared while conducting a
survey. It contains a certain number of questions about the elements to be surveyed. Asking
ten questions and filling the answers involves more time than asking first five questions.
More questions can be incorporated in the questionnaire for use in sample survey. But if the
same survey is to be conducted on census scale the number of questions may have to be
considerably reduced to save time. In a sample survey 'time' is a lesser problem. By spending
a little more time more questions can be asked. In this way more information can be
collected from a sample survey as compared to a census survey.
(iv) Level of accuracy: By accuracy we mean accuracy in collecting data and also
accuracy in analysing data. The manpower involved in collecting and analysing data is much
less in a sample survey than in a census survey. For efficiency it is necessary that this
manpower is properly trained to carry out the job. At the same time a proper supervision
over the manpower is also necessary. It is obvious that in a sample survey since less
manpower is involved more time and money can be spent on training and supervision. In a
census survey since many times more manpower is involved the investigating authority may
not be in a position to spend that time and money. Better training and supervision means less
chances of errors during data collection, as is so in case of a sample survey. Therefore, the
results obtained through a sample survey are likely to be more accurate than those obtained
in a census survey.
3.3 METHODS OF CHOOSING A SAMPLE
3.3.1 The Terms Used
What constitutes an appropriate sample? The statistician has to take many decisions in
choosing a sample. The decisions relate to the kind, the design and the size of sample.
Before we explain the different sampling methods and designs, let us note a few terms used
in sampling practice. These terms are:
(i) Universe: The alternative name of the term universe is population. A universe or
population is defined as an aggregate of items possessing a common characteristic or
characteristics. In our illustration of the school, all 2000 students of the school is a
universe. The purpose of conducting the survey is to seek knowledge about this universe or
population. The meaning of the term population in statistical sampling differs from the
ordinary meaning of this term. In the ordinary sense population means number of persons,
In statistics population means number of items. In our illustration we spoke of
population of students. Similarly, we can speak of population of prices, of factories, etc.
(ii) Sample : A sample is that part of universe which we select for the purpose of
investigation. In our illustration of the school the 200 students selected for survey out of
total universe (of 2000 students) is the sample. A sample is that portion of the universe
which represents the universe.
(iii) Sampling unit : There are two kinds of sampling units (a) primary and (b) elementary.
When the universe is divided into groups these groups are called primary sampling units.
In our illustration of school, suppose the total number of students are divided into groups
on the basis of classes (standard I, II, III etc. or sections (A, B, C, etc.), these classes or
sections will be called 'primary sampling units'.
In the universe of 2000 students each student is a sampling unit called elementary
sampling unit. It is a single item sampling unit. There are as many elementary sampling
units' in a universe as is its total population. In our illustration there are 2000 elementary
sampling units.
3.3.2 Kinds of Sampling
Sampling means selecting items from a universe for investigation to draw conclusions
about the whole universe. There are many types of sampling methods. Out of these two
major types are : (a) Random Sampling and (b) purposive sampling. In most of the statistical
investigations it is the random sampling which is frequently used. The main difference
between the two is summed up as : Random sampling is by 'chance' while purposive
sampling is by 'choice'. The difference will become more clear as we explain the two major
types of sampling in detail.
3.4 RANDOM SAMPLING
3.4.1 Introduction
Random means chance. Random sampling, also called probability sampling, refers to selecting
of items from a universe based on chance. A clarification is necessary before we proceed further.
Chance here means equal chance of being selected. In our illustration of 2000 students in a school,
random sampling would mean that each of the 2000 students has an equal chance, or opportunity, of
being selected for the sample. There is no bias of any type against any student. The bias may be
intentional or unintentional. For example, the person who has been assigned the task of selecting the
sample may intentionally select students of a particular class, or a section, or male, or female, or of
particular height, and so on. Unintentionally also the bias may creep in because every individual has
likings and dislikings of his own. He may have liking for a particular category of students, and even
though he has no intention of favouring the selection of one or the other, still unintentionally he
shows favouritism in selection of a sample. The gist of the matter is that there should be no bias, as
far as possible in random sampling.
Let us now come to actual sample selection. Suppose you are assigned the task of selecting a
sample. How will you proceed? One simple way is that all the 2000 students are asked to assemble
on the ground. You then just pick and choose 200 students out of them. Suppose some students are
absent on that day. It means that these students have no chance of being selected. It also means that
your selection is not a true random selection. Another way is that you are given a list of all the 2000
students and you pick and choose from that list. Still another way is that prepare a card of each
students, shuffle them an pick up cards one by one till you have picked up 200 cards, like the one
you do to draw prize winners in a lottery.
There is still another way. Suppose there are 50 sections of 40 students each. You select at random 4
students from each section. So, there are many ways of selecting a random sample. Each way has its
own advantages and disadvantages. Let us describe these methods one by one.
Random sampling is further sub-divided into (i) unrestricted random sampling and (ii) restricted
random sampling. There are several types of restricted random sampling (a) stratified, (b) cluster
and (c) systematic.
Sampling

Random Purposive

Unrestricted Restricted

Stratified Cluster Systematic


Let us now explain the meaning of each type of random sampling.
3.4.2 Unrestricted (or Simple) Random Samples
In this approach the entire universe or population is treated as one unit and then items are selected
on random basis from this unit. For example, in our illustration, all the 2000 students of the school
are taken as one unit, as one single group.
The sample may be selected on 'draw of lots basis'. A chit or a card is prepared for each student.
These cards are then shuffled and thoroughly mixed. Then one by one the required number of cards
are picked up at random.
Statisticians have prepared random numbers table as an alternative to the draw of lots methods.
One such random number table is given below :
The table consists of columns and numbers carefully arranged on the basis of practical tests of
randomness. How do we use this table?
As a first step, we number all the items of the universe. In our illustration our universe consists of
2000 students. The items in the universe must be arranged in a sequence, say alphabetically, or
according to the date of admission in the school, or on any other basis. Each student must be
assigned a number ranging from I to 2000. In our illustration we wish to draw a sample of 200
students. We select these students in the following manner.
Random Number Table
col. (1) (2) (3) (4) (5)
Line
101 13284 16834 74151 92027 24670
102 21224 00370 30420 03883 94648
103 99052 47887 81085 64933 66279
104 00199 50993 98603 38452 87890
105 60578 06483 28733 37867 07936
106 91240 18312 17441 01929 18163
107 97458 14229 12063 59611 32249
108 35249 38646 34475 72417 60514
109 38980 46600 11759 11900 46743
110 10750 52745 38749 87365 58959
111 36247 27850 73958 20673 37800
112 66986 99744 72438 01174

113 99638 94702 11463 18148 81386


114 72055 15774 43857 99805 10419

115 24038 65541 85788 55835 38835

116 74976 14631 35908 28221 39470


117 35553 71628 70189 26436 63407
118 35676 12797 51434 82976 42010
119 74815 67523 72985 23183 02446
120 45246 88048 65173 50989 91060
121 76509 47069 86378 41797 11910
122 19689 90332 04315 21358 97248
123 42751 35318 97513 61537 54955
124 11946 22681 45045 13964 57517
125 96518 48688 20996 11090 48396
126 35726 58643 76869 84622 39098
127 39737 42750 48968 70536 84864
128 97025 66492 56177 04049 80312
129 62814 08075 09788 56350 76787
130 25578 22950 15227 83291 41737
131 68763 69576 88991 49662 46704
132 17900 00813 64361 60725 88974
133 71944 60227 63551 71109 05624
134 54684 93691 85132 64399 29182
135 25946 27623 11258 65204 52832
136 01353 39318 44961 44972 91766
137 99083 88191 27662 99113 57174
138 52021 37945 75234 24327
139 78755 47744 43776 83098 03225

140 25282 69106 59180 16257 22810


11959 94202 02743 86847 79725
142 11644 13792 98190 01424 30078

143 06307 97912 68110 59812 95448


76285 75714 89585 99296 52640
145 55322 07598 39600 60866 63007

78017 90928 90220 92503 83375


147 44768 43342 20696 26331 43140

148 25100 19336 14605 86603 51680


149 83612 46623 62876 85197 07824
150 41347 8 1666 82961 60413 71020

First, we choose the point from where to start. Suppose we decide to start at the top of column (1)
and proceed vertically down the column. Two points must be noted at this stage. We can start from
any column, and not necessarily from the top. We can start even at middle or from any point of the
column. Second, we can even proceed horizontally. The crux is that there is no fixed point of
starting. We can start from anywhere in the table, horizontally or vertically.
Since we have decided to start at the top of column (l) and proceed vertically, we will start from the
number 13284. But this is a five digit number while our universe is of four digit number. i.e., 2000.
What do we do here ? We take only the first four digits of the number 13284. It means we take the
number as 1328 (and not 13284). The second number in the random number table is 21224. The
first four digits of this number are 2122. But this is greater than 2000. What do we do now? The
practice is to disregard any number which is higher than 2000, since we have only 2000 students.
So, we disregard 2122. We have to disregard the next number also i.e. 9905. Next to this is 0019.
We select this number because it is lower than 2000. In this way we go on selecting numbers,
recording only those first four digits which are equal to or lower than 2000 and rejecting these
which are above 2000. The first few numbers selected in this manner from column I are :
1328, 0019, 1075, 1968, 1194, 1790, 0135
Once we finish column one we can move to column 2, column 3 and so on. We stop when we have
recorded 200 numbers. If in this process any number is repeated, it must also be disregarded. (Note
that Random Numbers Table given above is only a part of the actual table is much bigger).
The 'draw of lots' and 'random number tables' are examples of unrestricted random sampling. It is
called unrestricted because there are no 'restrictions' or conditions attached during the process of
selection. Each item in the universe has the equal chance of being selected during the entire process
of selection. The whole of universe or population is taken as one group. There are no sub-groups.
The implication becomes more clear when we explain restricted random sampling below.
3.4.3 Restricted Random Samples
Restricted random sampling is one in which conditions, or restrictions are attached during the
process of selection. When instead of pick and choose method (like draw of lots, random number
table, etc), the sample is selected by following a particular selection method, it is called restricted
random sampling. In this approach some restrictions are placed before the items are selected for
sampling based on supplementary information about the universe. The aim is to bring greater
precision in results. There are many restricted random samples. It all depends on the particular
methodology in selecting a sample.
Out of the various samples we will explain the three most important ones :
(i) Stratified sampling (ii) Cluster sampling (iii) Systematic sampling.

(i) Stratified random sampling


In this sampling the universe is subdivided into sub-groups, called strata, and a simple random
sample is then taken from each stratum. Strata means groups. Stratified random sampling means
dividing the entire universe (or population) into groups (i.e. strata). Two things are kept in mind
while dividing the universe into groups. First, there should be as great homogeneity as possible
within each strata. Second, there should be as marked a difference as possible between the strata.
In our illustration of a universe of 2000 students, the population consists of students from 10 classes
(Class I to X). We can conveniently divide this universe into 10 strata, or group, class I forming one
group, class Il forming another group and so on. Such a grouping meets both the requirements.
There is likely to be a greater degree of homogeneity among students within a class. Second,
students from one class to another do differ in many respects.
Once the universe is divided into strata, the next step is to select items for sampling from each
strata. There are two ways of doing so. One way is to take a fixed percentage of items from each
strata. It is called proportionate sampling. Another way is to take a fixed number of items from
each strata irrespective of the size of strata. Suppose it is decided to select 20 students from each
class irrespective of whether a class consists of 50 students, and another class consists of 40
students and so on. It is called disproportionate sampling.
Once the strata are determined a random sample is drawn from each stratum in the same way as
unrestricted random sampling. Now, you must have understood the difference between restricted
random sampling and unrestricted random sampling. In the former the universe is divided into
groups and then sample items drawn from each group. In the latter, there is no division of the
universe into groups and the entire universe is taken as one group.

(ii) Cluster random sampling:


Cluster sampling is similar to the stratified sampling with a difference in the method of grouping.
In cluster sampling the universe is also divided into groups. But instead of drawing a sample from
each group, some of the groups are chosen as sample groups and then items are chosen from these
sample groups. Suppose the universe is divided into 10 groups or clusters. Two steps are taken to
draw a sample. First, out of the total number of clusters some clusters are selected as sample
clusters. Suppose two clusters are chosen as sample. Then from each of the two selected sample
cluster, a random sample of items is drawn.
To explain cluster sampling we take a different illustration. Suppose, a survey is to be undertaken
to examine the working conditions of workers in an industry comprising of 100 factories. There
are all types of workers from lower level to the higher level. Since all factories belong to one
industry the type of workers employed by each factory may not differ considerably. In this
situation, instead of drawing samples from all factories, some of these factories may be selected as
samples and then sample drawn from each factory. The sample so drawn may be 100 per cent or
less depending on the requirements of the survey conducted. (In stratified sampling, on the other
hand, items for sampling will be selected from each factory).
Cluster sampling is suitable when two conditions are fulfilled. First, there is as great a
heterogeneity as possible within a cluster. For example, within a factory there is great
heterogeneity as far as workers are concerned because the workers are of different types. Second,
there should be as small a difference as possible between the clusters. For example, the different
factories must be employing nearly the same types of workers because all these factories are
producing similar products.
(On the other hand, stratified random sampling is more suitable when (l) there is as great a
homogeneity as possible with in strata and (2) there is as big a difference as possible between the
strata.)
To sum up, in cluster random sampling three steps are taken in selecting a sample. First, the
universe is divided into groups, called clusters. Second, sample clusters, called primary sampling
units, are selected from these clusters. Third, sample items called elementary sampling units, are
drawn from each sample cluster on random basis (In stratified sampling the second step is not
taken).

(iii) Systematic random sampling


The word systematic stands for 'ordered'. In this sampling, all the items in the universe are first
ordered and then every n-th item is selected to constitute a sample. The ordering may be
alphabetical, geographical, or any other, For example, in our earlier illustration 2000 students may
be ordered alphabetically or on the basis of their age, or on the basis of income of their parents, or
on any other basis. After ordering is done every tenth item may be chosen for inclusion in sample.
Two points must be kept in mind while ordering items in the universe, once we decide the
characteristics on the basis of which the ordering is to be done. First, there should be as great a
similarity as possible among immediate neighbours (i.e. items). Second, the differences should go
on increasing between distant neighbours (items). It means that if there are 100 items to be
ordered, the difference between item number 1 and item number 21 must be greater than the
difference between item number I and item number 10. Similarly, the difference between item
numbers 1 and 31 must be greater than the difference between items 1 and 21.
The systematic random sampling involves two steps. First step is to order items in universe.
Second step is to select every n-th item. In the second step the statistician has to take two
decisions. First, at what interval the n-th item is to be selected i.e., What is sampling interval?
Second, from which item this interval to be started, i.e. what is the sampling start.
Sampling interval depends on the size of the sample. For example, if we have to survey 200
students out of a universe of 2000 students, the size of the sampling interval would be 10. It means
we have to select every 10th student from the list. Once the size of sampling interval is determined
the range of choice for sampling start is also determined. The sampling start must be somewhere
between 1 and 10. Suppose we select 5, we take items 5th, 15th, 25th, and so on. If we select 9, we
take items 9th, 19th, 29th, and so on. The sampling start is chosen on random basis. Either the
statistician himself decides on random basis or he takes the help of random number table by
picking first random number which is 10 or less than 10.

3.5 PURPOSIVE SAMPLING


Purposive sampling is different from random sampling. Purposive sampling is by choice while
random sampling is by chance.
In purposive sampling the choice of sample is left to the judgement of the sampler. The statistician
chooses the sample on the basis of characteristics under investigation. Suppose a study of fee
structure of schools is to be undertaken. In a city like Delhi there are hundreds of school. Instead of
choosing the sample on random basis, the selection of sample schools is left to the judgement of the
investigator. He, on the basis of knowledge about schools, chooses a certain number of schools
which he thinks represents all schools in Delhi.

3.6 SAMPLING ERRORS VS NON-SAMPLING ERRORS


Two types of errors or inaccuracies may creep in during a sample survey investigation.
Such errors are classified into sampling errors and non-sampling errors.
(i) Sampling errors
To conduct a sample survey we take a sample from a given universe and arrive at a result. Suppose
we take another sample from the same universe and arrive at a result. If on comparing we find some
difference in the two results from the two sample surveys from the same universe, the difference is
termed as sampling error. In practice the results are bound to be different due to the presence of
biased and unbiased errors. Biased errors occur due to the prejudice of the person or the team
conducting the survey like faulty selection, faulty method etc. Unbiased errors arise because of
chance difference between members of population included in the sample and those not included.
For example, one sample may comprise more of rich people and other sample may comprise of
more of poor people.

(ii) Non-sampling errors


Errors not specific to sampling but common to both census i.e. complete enumeration and sampling
are called non-sampling errors. These may be due to vague questionaire, vague answers by the
informants, wrong responses from informants, wrong statistical methods and so on.

3.7 SOME IMPORTANT SOURCES OF SECONDARY DATA

3.7.1 Introduction
India has adopted 'planning' (Five Year Plans) as strategy to achieve economic goals. For planning
we need data. There is a central body called NITI Ayog (formerly Planning Commission) which
prepares five year plans. Then there is Finance Commission which allocated financial resources of
the Central Government between centre and the states. Like this there are large number of
ministries, at centre and state level, looking after different sectors of the Indian economy. For
proper functioning of all such departments and institutions, relevant data is necessary.
Keeping in view the huge data requirement, Government of 'India has created a separate ministry to
coordinate the functioning of various organisations engaged in collecting data. The ministry is
known as The Ministry of Statistics and Programme Implementation. It consists of two wings :
Statistics wing and the Programme Implementation Wing. The Ministry is the apex body in the
official statistical system of the country. The ministry includes Central Statistics Office (CSO), the
National Sample Survey Office (NSSO), the Computer Centre and Accounts Office.
There are many other agencies. The most important among these is Registrar General of India's
office conducting population census of India. Examples of other agencies are . Directorate of
Economics and Statistics, Department of Agriculture and Cooperation; Ministry of Commerce and
Industry; Office of the Textile Commissioner; Ministry of Railways; Ministry of Finance; NITI
Ayog, Reserve Bank of India; Labour Bureau; DGCI&S; and others.
We will study in this section two important sources of secondary data : the Census of India and the
NSSO.

3.7.2 The Census of India


A Brief History
More appropriately it is Population Census of India. There are evidences here and there in the
ancient Indian manuscripts and scriptures that there had been some type of population censuses in
the ancient India. Kautilya's "Arthshastra" (name of the book) details elaborate methods of
conducting population censuses. During the Mughal Period there are hints but there are no records
of census procedure. With the advent of British rule in India, systematic efforts were made in this
direction. Till 1881, there were census of some parts of India, but not the whole of India.
It was in 1881 that for the first time census was conducted for the whole of India. The census
provided a comprehensive record of demographic data. With this census is being conducted in India
uninterruptedly once every 10 years.
Another important year in the history of census in India was 1948, after Independence, when the
Census Act was passed. In 1949, the Government of India took two important decisions : (i) to
initiate steps for improvement of registration of vital statistics, and (2) to establish a single
organisation at the centre under the Registrar General to deal with vital statistics and census. The
first post-independence census was taken in 1948. No census was conducted in 1951. The last
census was conducted in 2011.

What does Census Provide?


The census provides data on the state of human resources available in the country. The data relates
to demography, culture and economic structure. Population census is not a mere head count. It is
also a census of various features of population like age structure, sex structure, castes, religion,
education, employment status, rural-urban status, languages, housing conditions, vehicles owned,
banking habits, etc. The data so collected is useful in determining trends in density of population,
sex ratio, literacy rates, urbanisation, infant mortality rates, rates of growth of population, and so
on. These then become basis of planning.

Phases in Conducting Census


In the first phase House listing operations are conducted about one year before the actual census.
The aim is to prepare a framework for the actual census. The questions asked from households, in
this exercise, are given in appendix 3A.
In the second phase the information is obtained by each member of the household separately. The
questionnaire is called the household schedule. (See Appendix 3B). The schedule has two parts
which are:
Household Schedule
Part I: Location particulars
Part II: Individual particulars with following sub sections.
(i) General and socio-cultural characteristics.
(ii) Characteristics of workers and non-workers.
(iii) Migration characteristics.
(iv) Fertility particulars.

Preparations for Conducting Census


The staff involved in the census operations is given intensive training prior to the commencement
of both the phases.
The officers and staff are introduced to the historic aspects of Census taking, the need and
relevance of Census, various kinds of studies made and reports brought out. They are also made to
understand that they could legitimately feel proud for having been associated with such a gigantic
task, the
results of which would form the basis for various decisions to be taken at different levels for
another decade to come.

Publicity
In order to ensure that people are well informed about the various aspects of census, extensive
publicity utilizing different strategies and modes are adopted. Apart from the conventional
methods of radio talks, press advertisements, posters, hand bills stickers, cinema slides, cable net-
work, the message is conveyed to small groups of youth and women, holding quiz competition and
phone in programmes on the radio.
A new strategy of carrying the message through 'Villupattu' one of the traditional art and also
making appeals through public address system from Auto rickshaws which fanned out into every
nook and corner of the Union Territory are also adopted. The non-Governmental Organizations
engaged in rehabilitation of the disabled groups themselves actively co-operate with the Census
machinery in sensitizing the personnel as well as in publicity measures.

3.7.3 National Sample Survey Office (NSSO)

History in Brief
NSSO is an apex institution of India entrusted with the task of collecting, processing and
publishing data relating to different aspects of the economy. It was set up in 1950 under the name
National Sample Survey. The objective was to meet the data needs of the country for the
estimation of national income and other aggregates. It was reorganised in 1970 and given a new
name National Sample Survey Office (NSSO). Its area of operation was widened with the
objective of bringing together all aspects of survey work under a single agency.

Structure
NSSO works under the overall technical guidance of Governing Council. The council consists of
statisticians, economists and social scientists. The executive head of the NSSO is called the
Director General and the Chief Executive Officer. (DG + CEO). He is responsible for conducting
and supervising the activities of the NSSO. The NSSO has four divisions .
1. Survey Design and Research Division (SDRD)
2. Field Operations Division (FOD)
3. Data Processing Division (DPD)
4. Coordination and Publication Division (CPD).
SDRD and DPD have their headquarters at Kolkata.
FOD and CPD have their headquarters at Delhi. DG + CEO also operates from CPD. The
Divisions have regional and sub-regional offices located at different parts of India.
The Subject Coverage
The NSSO conducts surveys on :
1. Consumer expenditure
2. Employment and unemployment
3. Social consumption (health, education, etc.)
4. Manufacturing enterprises
5. Service sector enterprises in the unorganised sector.
6. Land holdings
7. Livestock holdings
8. Debt
9. Investment
Surveys on the subjects I — 5 are covered once in 5 years and on the subjects 6 — 9 are
covered
once in 10 years. The results of surveys are brought out in the form of NSS Reports. These
reports are available for sale.
The NSSO also publishes a technical journal named Sarvekshana. The summary results of
surveys are published in this journal. The Journal is published biannually.
Other Activities
In addition to its main job of conducting the above surveys, the NSSO also undertakes the
following activities :
1. Undertakes the fieldwork of Annual Survey of Industries (ASI) covering factories
covered under certain sections of the Factories Act and other Acts. (However, the
processing of data and publication of results is done by the C.S.O).
2. Provides technical guidance to states in the field of agricultural statistics for conducting
crop estimation surveys. Also keeps a continuous watch on the quality of crop statistics.
3. Collects data on retail prices on monthly basis from shops/outlets in selected markets
located in villages and urban centres for computation of Consumer Price Index
numbers.
4. Conducts Urban Frame Surveys for providing sampling frame of first stage units in the
urban sector.

POINTS TO REMEMBER
 In a census survey we obtain information about all elements in a population.
 Population is defined as the whole set of observations of some common characteristic
about which we are interested in gaining information.
 In a sample survey we obtain information about selected elements of population, i.e. a
subset of population.
 The choice between a census and sample survey is influenced by (i) time involved, (ii)
cost involved, (iii) information to be collected and (iv) the level of accuracy.
 A sample is that part of universe (or population) which we select for the purpose of
investigation.
 Sampling means selecting items from a universe for investigation to draw inferences about
the whole population.
 The two main methods of sampling are: (i) random sampling and (ii) purposive sampling.
 Random sampling refers to selecting of items from a universe based on chance. It is of
two types: restricted and unrestricted.
 Unrestricted random sampling is one in which no conditions, or restrictions, are
attached during the process of selection. It is also called simple random sampling.
In this each item in the universe has the equal chance of being selected.
 Restricted random sampling is one in which conditions, or restrictions, are attached
during the process of selection. Its three main types are: (i) stratified sampling (ii) cluster
sampling and (iii) systematic sampling.
 In stratified random sampling the universe is sub-divided into sub-groups, called strata,
and a simple random sample is them taken from each stratum.
 In cluster sampling the universe is first divided into groups. Out of these some are
selected as sample groups, called sample clusters. Then a random selection of items
is made from each cluster.
 In systematic sampling the first member of the sample is selected in a random manner,
and then every n-th unit is included in the sample. Here 'n' is the quotient of size of
universe divided by size of sample.
 The first comprehensive census of India was conducted in 1881. Since then it is being
conducted once every 10 years. The first Census, after Independence, was conducted in
1948.
 Two types of errors may creep in conducting a sample survey: sampling errors and non-
sampling errors. Sampling error refers to the difference between two results of the two
samples taken from the same universe. Non-sampling error refers to the errors common to
both census and sampling due to faults in conducting a survey.
 The Census provides data on the state of human resources available in the country. It
covers aspects like age, sex, castes, religion, education, banking habits, etc. These
aspects then become basis of planning.
 The Census of India is conducted in two phases: (1) The first phase is 'house listing'
containing questions relating to (a) structure of the house, (b) head of household, and (c)
living conditions; (2) The second phase collects information about each member of
household.
 The staff involved in the census operations is given intensive training.
 In order to ensure that people were well informed about the various aspects of census,
extensive publicity is given.
 The NSSO was set up in 1950 under the name National Sample Survey. The new
name NSSO was given in 1970.
 The NSSO has four divisions: Survey Design and Research Division, Field
Operations Division, Data Processing Division, and Coordination and Publication
Division.
 The NSSO Conducts Surveys on Consumer expenditure, employment and unemployment,
social consumption (health, education, etc.), manufacturing enterprises, service sector
enterprises in the unorganised sector, land holdings, debt, investment.
 The NSSO publishes a technical journal 'Sarvekshana'.
 The NSSO also (1) Undertakes the fieldwork of ASI; (2) Provides technical
guidance to states in the field of agricultural statistics; (3) Collects data on retail
prices, and (4) Conducts Urban Frame Surveys.
EXERCISES
MULTIPLE CHOICE QUESTIONS [1 Mark]
(Answers at the end of exercises)
Choose the correct alternative in the following questions:
1. One of the following is not the characteristics of sample survey :
(a) Less time (b) Less cost (c) Less information (d) Less accuracy
2. Random sampling is .
(a) By choice (b) By chance (c) Both (a) and (b) (d) None of the above
3. The part of universe selected for investigation is called
(a) Population (b) Sample
(c) Primary sampling unit (d) Elementary sampling unit
4. Simple Random Sampling is :
(a) Stratified sampling (b) Cluster sampling
(c) Systematic sampling (d) Unrestristed sampling
5. Suppose size of the Universe is 1000. The size of sample is 100.
(a) 7 (b) 11 (c) 15 (d) 20
6. The Census of India provides data on .
(a) Number of persons (b) Culture of the people
(c) Economic structure of the people (d) All the above

SHORT ANSWER QUESTIONS-I [3 Marks]


Answer the following questions in about 60 words.
1. What is the difference between a census survey and a sample survey?
2. What is the difference between universe and sample?
3. What is sampling? Give meaning of random sampling.
4. Define unrestricted random sample. Name the two basis on which such a sample can be selected.
5. Define restricted random sampling. Name its types.
6. Differentiable between stratified and cluster random samplings.
7. Distinguish between primary and elementary sampling units.
8. Give in brief the history of Census of India upto 1881.
9. Give in brief the history of Census of India from 1991 onwards.
10. Describe the first phase of Census of India 2011.
11. Name the four divisions of the NSSO and their headquarters,
SHORT ANSWER QUESTIONS-II [4 Marks]
Answer the following questions in about 70 words.
1. Explain briefly any three factors on which depends the choice between census and sample survey.
2. Explain briefly different types of restricted random sampling.
3. Explain briefly the method of unrestricted (or simple) random sampling.
4. Explain briefly stratified random sampling.
5. Explain briefly cluster random sampling.
6. Explain briefly systematic random sampling.
7. Explain purposive sampling.
8. Explain what the Census of India provides.
9. Explain the second phase of Census of India 2011.
10. Write a note on the training provided to the Census staff.
11. Write a note on the 'publicity' about conducting Census of India.
12. List the subjects covered under the NSSO surveys.
LONG ANSWER QUESTIONS [6 Marks]
Answer the following questions in about 100 words.
1. Explain the four things to be considered in making a choice between sample survey and census survey.
2. Explain unrestricted random sampling.
3. Explain cluster random sampling. Also explain conditions under which it is suitable.
4. Explain the phases in conducting a census.
5. Explain the structure and subject coverage of NSSO.
Answer to the Multiple Choice Questions.

1. (c) 2. (b) 3 (b) 4 (d) 5 (a) 6 (d)

Source: Office of the Registrar General, India


CHAPTER: 4 FREQUENCY DISTRIBUTION
4.1 INTRODUCTION
Collection of data is the most important stage of a statistical investigation. But the mass of data
so collected is the raw data. To make such data usable for analysis, it is necessary that it is
arranged in some order. Orderly arrangement of data, ultimately, takes the shape of a frequency
distribution. As such the arrangement of data into frequency distribution is an equally important
stage of a statistical enquiry. How is a frequency distribution prepared ? What are its different
types ? What things must we keep in mind while preparing it ? This chapter deals with all such
questions. The answer to all these questions will greatly facilitate the understanding of the
contents of the chapters that follow.
Before we study how to construct a frequency distribution let us explain a few concepts
associated with it.
Variables
The quantity which varies in magnitude in a frequency distribution is called a variable. For
example, age, wages, prices, quantity produced are variables forming the basis of classification.
There are two types of variables : continuous and discontinuous (or discrete). Variables which
are arranged in 'classes' and can take any numerical value within a class, whether integral or
fractional, are called continuous variables. Suppose there is a class of ₹10 — ₹20 per hour
wage. Within the class wage rate can be ? ₹10, 11, 12 and so on or ₹10.1, 10.2, 10.3 and so on.
Variables which can take only certain values, whether integral or fractional and do not take any
other value between the two variables are called discrete variables. Suppose we classify workers
earning < ₹10 per hour, ₹11 per hour, ₹12 per hour and so on. In case of discrete variables we
do not take any value between ₹10 and ₹11, or between ₹11 and ₹12 and so on. We take only
₹10, or ₹11, or ₹12 etc. Now suppose we classify workers earning

10.5, ₹11.5, ₹12.5, etc. In this case we do not take any value between ₹10.5 and ₹11.5,
between ₹11.5 and ₹12.5, etc.
Normally, continuous variables are expressed in weights and measures and arrived at through
measuring. Discrete variables are arrived at through counting, like number of employees,
number of machines, etc.
Frequency
The number of times a value of a variable or its subgroup appears in a data series is called
"frequency" of the variable or its subgroup. The frequency indicates the concentration of items in
a series around certain values. For example, if out of data relating to 100 workers, 30 workers
earn ₹80 per day, then ₹30 is the frequency of workers earning ₹80 per day.
Tallying
There are two ways of arranging raw data : tallying and array. Tallying is a method of
distributing mass of raw data over different classes. In this method, first, the classes are set up.
Then each item is against the class in which it falls by a sloping or a vertical stroke. When four
such strokes have been made fifth horizontal storke is drawn through them to represent the fifth
item. It makes a bundle of five items (N). It makes counting convenient and expendites the
totalling of frequencies in each class.
An alternative to tallying is to arrange the raw data into array. The method of arrangement of
array is explained in section 4.2.2 below.
4.2 STEPS IN CONSTRUCTION OF A FREQUENCY DISTRIBUTION
4.2.1 Raw Data
To know the various aspects of construction of a frequency distribution, let us take a simple
illustration. Suppose there is a factory in which 20 workers are employed. Further, suppose that
information is collected about the daily wages received by these workers. The data so collected is
called raw data and given in Table 4.1.
TABLE 4.1
Raw data of daily wages (in ₹) of 20 workers employed in a factory
110 95 94 91 111
100 100 102 96 110
120 105 122 100 107
90 125 101 103 105

4.2.2 Raw Data Arranged into an Array


The raw data is an unorganised data. It is extremely difficult to draw any conclusion from
this data. It is, therefore, necessary to arrange the data in some orderly manner. We can give
some order to this data by placing the highest magnitude (i.e., wage rate) first followed by
other wage rate in descending order. Or it can be other way round. Put the lowest magnitude
first followed by other magnitudes in ascending order. Such an arrangement of data is called
array. The raw data in Table 4.1 is arranged into an array in Table 4.2.
TABLE 4.2
Arrays of daily wages (in ₹) of 20 workers employed in a factory
Array in descending order Array in ascending order
125 110 102 96 90 100 103 110
122 107 101 95 91 100 105 111
120 105 100 94 94 100 105 120
111 105 100 91
95 101 107 122
90
110 103 100 96 102 110 125
From the above arrays, we get information about the lowest daily wage ₹90) and the highest
daily wage (< ₹125). We can also have some idea of the concentration of magnitudes.
In our illustration we are dealing with only 20 items. Suppose, the number of items is 200 or 2000
or 20000 or even more. If the number of items in an array is very large, preparing an array may
itself be a difficult task (unless we take the help of the computer). Even if we are able to prepare an
array, can we really draw any conclusion about the concentration of items. It is, therefore,
necessary that we condense the data. This leads us to frequency array.

4.2.3 Array Rearranged into a Frequency Array


Frequency means the number of times a value appears in a series. For example, in Table 4.2,
the daily wage of ₹100 appears thrice. So, frequency of ₹100 is 3. By listing each value once and
noting the number of times each value occurs, we can prepare a frequency array in the following
manner (Table 4.3).
TABLE 4.3
Frequency array of data in Table 4.2
Daily Wage (₹) Frequency
90 1
91 1
94 1
95 1
96 1
100 3
101 1
102 1
103 1
105 2
107 1
110 2
111 1
120 1
122 1
125 1
Total frequency 20

Frequency array now gives somewhat better idea of concentration of items than a simple array.
But, it has a limitation. It gives an idea of concentration of individual items, say number of workers
earning a daily wage of ₹100. But it fails to give an idea of concentration of items of a group, say
the number of workers earning daily wage between 100 and ₹105, etc. This limitation now
leads us to the construction of frequency distribution.

4.2.4 A Frequency Array Arranged into a Frequency Distribution


In a frequency distribution, instead of listing a single daily wage, we list a range of daily wages. For
example, instead of listing how many workers are earning ₹100 per day, or 101 per day, etc., we
list that how many workers are earning daily wage say between ₹100 and ₹109. In statistics such a
range is called a "class". For example, ₹100 — ₹109 is one class. ₹110 — ₹119 is another class and
so on. Table 4.4 and 4.5 are examples of frequency arrays converted into frequency distribution. Let
us note a few things about class.

4.3 CLASS
4.3.1 Meaning of Class
A class represents a range of values. Daily wage between ₹100 to ₹109 is a class. We write it as
"100—109". Like this there can many classes of daily wage rate. In how many classes the data is to
be divided? What should be the size of each class? Should every class have the same size?
There is no unique answer to all these questions. It all depends on the nature and composition of
data. We will not go into the technical details of all the questions relating to "classes", but we will
definitely explain the meanings of the terms associated with classes.
Let us organise the data (in Table 4.3) into classes and frequencies associated with these classes. We
will then use this frequency distribution as an illustration to explain the meanings of the terms
associated with classes. These classes are exclusive classes as distinguished from inclusive classes.
The distinction is explained in the section 4.3.3 of this chapter.
TABLE 4.4
Frequency distribution of daily wages of 20 workers in factory
Daily Wages (in ₹) No. of Workers
90-95 3
95-100 2
100-105 6
105-110 3
110-115 3
115-120 0
120-125 2
125-130
1
Total 20

4.3.2 Number of Classes


How many classes a frequency distribution should have? The basic principle is that the number of
classes should neither be very large nor very small. If it is very large we tend to loose simplicity. If it
is very small we loose details which may be very important for drawing inferences. Another factor
which we should keep in mind while deciding about the number of classes is that there should not be
too many big jumps in the frequencies. For example, as we move from class "110 — 115" to "115
120", the frequency suddenly falls from 3 to 0.
We can remove these jumps by lumping the classes. Instead of 8 classes (as in Table 4.4) we can
have 4 classes in the following manner (as in Table 4.5).
TABLE 4.5
Frequency distribution of daily wages of 20 workers in a factory
Daily Wages (₹) No. of Workers
90 – 100 5
100 – 110 9
110 – 120 3
120 - 130 3

Total 20

4.3.3 Inclusive Classes us. Exclusive Classes


There are two types of classes : Inclusive and exclusive. Let us explain the difference between the
two.
(i) Exclusive Classes:
An illustration of frequency distribution with exclusive classes is as under:
Frequency distribution with exclusive classes
Class Frequency

50 – 100 5
100 - 150 7
150 - 200 8
200 - 250 10

How do you identify? Note that upper limit of a given class is equal to the lower limit of the
next class. This is identification. Also note that item with value 100, as an example, is included
both in class '50-100' and class '100-150'. This is double counting. To avoid double counting
item with value 100 is not counted in class '50-100' and counted only in class '100-150'. So, any
item equal to the upper limit value is excluded from classes. This is why such classes are called
exclusive classes. Now we can define an exclusive class.

The class which includes items with values greater than or equal to the lower limit
but less than the uper limit is called an Exclusive Class.
For example, class 50-100 includes items with value 50 or greater than 50 but less than 100.
(ii) Inclusive Classes:
An illustration of frequency distribution with inclusive classes is as under:
Frequency distribution with inclusive classes
Class Frequency
10 - 19 3
20 - 29 4
30 - 39 5
40 - 49 3

Note that the upper limit of a class differs from the lower limit of the next class. This is
how you can identify the inclusive class. There is no double counting of items. Unlike
exclusive classes, an inclusive class include the item with value equal to the upper limit of
the class. This is why such classes are called inclusive classes. Now we can define an
inclusive class.
The class which includes items with values equal to or greater than the lower limit
but 'less than or equal to' the upper limit is called an Inclusive Class.
Although inclusive classes do not lead to double counting, it lead to discontinuity between
one class and another. For example take classes 10-19 and 20-29. There is jump from 19 in
the first class to 20 in the second class. No information is given about what happen between
19 and 20. How is the rounding done. This creates problem in calculating certain types of
averages like median and mode and also in drawing certain types of diagrams, like bar
diagram. For this we need continuous classes as in exclusive classes. We need equality
between upper limit of a class and lower limit of the next class. For making inclusive classes
usable for calculating certain types of statistical averages and diagrams, there is a need for
converting inclusive classes into exclusive classes. The method of doing so is explained
below.
Method of converting inclusive classes into exclusive classes.
The main steps are:
(1) Determine the adjustment value which equals:
[ lower limit of the second class ]−[upper limit of the first class]
2

(2) Subtract adjustment value, so obtained, from lower limits of all the classes. After adjustment
the lower limits of the inclusive classes become 'true lower limits'.
(3) Add adjustment value to the upper limit of all the classes. After adjustment upper limits of
the inclusive classes become 'true upper limits'.
Example
Refer to the following table.
Serial No. of class Inclusive Classes Derived Exclusive Classes
(adjustment value = 0.5)
A 10-19 9.5-19.5
B 20-29 19.5-29.5
C 30-39 29.5-39.5
D 40-49 39.5-49.5

The left hand side of the table records inclusive classes. The right hand side records derived
exclusive classes after making adjustment. The derivation is done in three steps:
(a) To find adjustment value
[ lower limit of the second class ]−[upper limit of the first class]
Adjustment value =
2

20−19
¿ =¿ 0.5
2
(b) Deduct value of adjustment 0.5 from lower limit of all the classes.
(c) Add value of adjustment 0.5 to upper limit of all the classes.
[Note: If lower limit of an inclusive class is zero, and if value of adjustment is 0.5, the derived
true limit will be (—0.5) with brackets. Brackets indicate that derived value be ignored and
taken as zero for the purpose calculating average etc.]

Effect of Conversion
(a) The effect of conversion of inclusive classes into exclusive classes is that lower and upper
limit of each class change. For example, class 10-19 becomes class 9.5-19.5.
(b) There is no effect on class interval. It was '20—10 = 10' before adjustment and '19.5—9.5 =
10 after adjustment.
10+19
(c) There is no effect on mid value of a class. It was = 2
=14.5 before adjustment and
9.5+19.5 = 14.5 after adjustment.
2

Effect on Calculation of Averages


Only class limits change when inclusive classes are converted into exclusive class. Therefore,
calculation of only those averages is affected which use class limits as a variable in calculation.
For example, calculation, of median and mode use class limits as a variable. Therefore, inclusive
classes must be converted into exclusive classes in calculating median and mode. Same
applies to bar diagrams etc.
Calculation of mean does not use class limits as a variable. Therefore, there is no need to
convert inclusive classes into exclusive classes in calculation of mean.
4.3.4 Class Limit
Class limits represent the lowest and the highest value of an item included in that class. In an
inclusive class a class includes all values equal to or greater than lower limit but less than or
equal to the upper limit. In an exclusive class, a class includes all values equal to or greater
than the lower limit but less than the upper limit. Suppose the exclusive class is 10-20, it should
be read as "10-under 20".
If a frequency distribution is in inclusive classes, it may become necessary sometimes to convert
it into exclusive classes by making adjustment in the manner explained above (Section 4.3.3).
Such a conversion changes the limit of the classes. This necessity arises in calculating certain
averages like median, mode, quartiles, etc. where class limits are an essential part of method of
calculation.
4.3.5 Mid-point of a Class
Why do we need to calculate mid-point? We can explain with the help of an example. In our
illustration (Table 4.5), the class "100 — 109" has a frequency of 9 workers. It means 9 workers
get daily wage ranging from ₹100 to ₹109. Suppose, we do not possess any other detail about
this range, i.e., how many are getting ₹100, how many getting ₹101, and so on. In this situation
there is no alternative but to assume that workers are evenly spread in the range of ₹100 to
₹109. As a representative wage rate we take the mid-point of the class "100 — 109. It is also
called 'mid-value'. The mid-point is needed for drawing graphs and calculating averages, etc.
How do we calculate a mid value of a class? We obtain the mid value of a class by adding the
lower limit and the true upper limit and dividing by 2.

True lower limit −True upper limit


Mid-point of a class =
2

For example, the mid-point of class "100 — 110" is equal to 105 ¿ 2(


100+110
Now, )
we can point out the significance of mid-point. In making further computations it is
assumed that mid-point (or mid-value) is the value of each item in a class. For example,
105 is taken as the daily wage of each of the 9 workers in the class "100 — 110". This
midpoint assumption is used for calculating averages, dispersion, etc., and drawing
frequency graphs. Remember, the mid point is calculated on the basis of true lower limits
and not the actual limits. Also remember that actual limits of an exclusive class are the true
limits, but the actual limits of an inclusive class are not the true limits. In case of inclusive
class, actual limits must first be converted into true limits (as explained in Section 4.3.3)
before using in calculation of averages, etc.
4.3.6 Open-end Classes
An open-end class is a class lacking one limit. Either the lowest or the highest class or both can
be the open-end classes. For example, in Table 4.5, the class "90 — 100" if expressed as "100 or
below" becomes an open-ended class. Similarly, the highest class, i.e., "120 — 130" if expressed as
"120 and above" also becomes an open ended class.
As far as possible, open-end classes must be avoided. It is because, it is impossible to establish the
upper limit and consequently the mid value of an open end class. This creates problems in further
computations and diagrammatic representation like calculating averages, drawing frequency
graphs, etc. To avoid these problems the statistical investigator has no alternative but to make some
assumption about the missing limit of the class. For example for class" 120 and above", the
statistician assume upper limit as the same as of last class.

4.4 CUMULATIVE FREQUENCY DISTRIBUTION


4.4.1 Refer to the table 4.5
Suppose, we are interested in knowing that how many workers get a daily wage of under 110 and
below. To get the answer we have to add the frequencies of the first two classes
i.e., "90—100" and "100—110". The total comes to 14 (= 5 + 9). It means that 14 workers are
getting daily wage of under 109 and below. Frequencies expressed in this way are called
"cumulative frequencies". If we go on adding frequencies in this manner what we get is called the
cumulative frequency distribution. Let us convert Table 4.5 into a cumulative frequency distribution.
TABLE 4.6
Workers in a factory earning specified daily wage or more, and earning
less than specified daily wage
Daily Number of workers earning
Wage
Indicated daily Less than indicated
(₹)
wage or more daily wage
90 20 0
100 15 5
110 6 14
120 3 17
130 0 20
Source Table 4.5.
A cumulative frequency distribution can be prepared on "more than" basis or on "less than" basis.
4.4.2 On "more than" Basis
The distribution on "more than" basis is presented in the second column of Table 4.6. In
this,frequencies are cumulated downwards. It answers questions like : How many workers are
earning daily wage of ₹110 or more? The answer is 6. How many workers are earning daily wage
of ₹120 or more? The answer is 3.
4.4.3 On "less than" Basis
It is presented in third column of Table 4.6. In this, frequencies are cumulated upwards. It answers
questions like : How many workers are earning daily wage of less than ₹120? The answer is 17.
How many workers are earning daily wage of less than ₹130? The answer is that all the 20
workers.
4.4.4 On Percentage Basis
The frequencies in Table 4.6 are expressed on absolute basis. These can also be presented on
percentage basis. Each cumulative frequency is taken as percentage of total number of
frequencies, i.e., 20 in our illustration. Therefore, cumulative frequency 15 will be 75% of 20.
Like this we can convert all cumulative frequencies into percentages. See Table 4.7.
TABLE 4.7
Percent of workers earning "specified or more",
and earning "specified or less" daily wage
Daily wage (₹) Percent of workers earning
Earning specified or Earning specified or less
more

90 100 0
100 75 25
110 30 70
120 15 85
130 0 100
Source : Derived from Table 4.6.
In fact, presentation on percentage basis is more useful for making comparison particularly when
total number of frequencies is large. It also makes the construction of frequency graphs more
convenient.

POINTS TO REMEMBER
 The main steps in the construction of a frequency distribution are :
 Arrange the raw data into an array in ascending or descending order.
 Arrange the array into a frequency array.
 Arrange the frequency array into a frequency distribution, i.e., into classes and the
frequencies of these classes.
 A class represents a range of values.
 The number of classes should neither be very large nor very small. If it is very large
we loose simplicity. If it is very small we loose details.
 The class limits are the lowest and highest values that can be included in the class.
Interpretation of a class depends on whether data is rounded off or not rounded off.
 The size of class interval equals the distance from one lower limit to the next
lower limit.
 Mid-point of a class = True Lower limit + True Upper limit
2
 An open-end class is a class lacking one limit.
 Cumulative frequency distribution records frequencies on "or more" and on "less than"
basis.
EXERCISE
MULTIPLE CHOICE QUESTIONS [1 Markl
(Answers at the end of exercises)
Choose the correct alternative in the following questions :
1. How many classes a frequency distribution should have?
(a) Large number (b) Small number
(c) Neither very large nor very small number (d) Not more than 10.
2. An open end class is the class which lacks
(a) Lower limit (b) Higher limit
(c) Either lower limit or higher limit (d) None of the above
3. The size of class interval is the distance between .
(a) Lower limit and higher limit of the same class
(b) Higher limit of the class and lower limit of the next class
(c) Lower limit of the class and lower limit of the next class
(d) None of the above
4. 'Array' gives some order to data by placing the
(a) Highest magnitude first followed by other magnitudes in descending order
(b) Lowest magnitude first followed by other magnitude in ascending order
(c) Both (a) and (b)
(d) Neither (a) nor (b)

SHORT ANSWER QUESTIONS-I [3 Marks]


Answer the following questions in about 60 words.
1. State the steps in construction of a frequency distribution.
2. Differentiate between a frequency array and a frequency distribution.
3. Differentiate between class-limits and class-interval.
4. Find out the mid-value of the class 50 — 60. Assume that the class is exclusive.
5. Find out the mid-point of the class '100 — 109'. Assume that the class is inclusive.
6. What is an open-end class? Why it should be avoided?
SHORT ANSWER QUESTIONS-II [4 Marks]
Answer the following questions in about 70 words.
1. Arrange the following into an array both in descending and ascending order .
Marks obtained by students
40, 59, 80, 92, 33, 75, 63, 74, 81, o.
2. Arrange the following into a frequency array in descending order :
35, 43, 65, 35, 65, 41, 43, 54, 59, 60.
3. Arrange the following into a frequency distribution into class group 60 — 69, etc
Daily wage (₹) No. of workers
60 2
65 3
70 1
75 1
80 2
85 0
90 1
4. Prepare a cumulative frequency distribution from data in Q. 3.
5. Explain briefly the steps taken in construction of a frequency distribution.
6. What must be kept in view in deciding about the number of classes in a frequency distribution? 7.
Explain with the help of examples the calculation of mid-point of a class.
8. Arrange the following into a frequency distribution with class of 0 — 20, 21 — 40, etc. of the
20 students.
Marks obtained by students
43 65 92 32 65 56 71 22 20 11
94 63 75 81 46 52 63 o 9 56
9. Arrange the frequency distribution obtained in Q. 8 into a cumulative frequency distribution on
(i) "on more than basis" and (ii) on "less than basis".
10. Express the cumulative frequency distributions obtained in Q. 9 on percentage basis.
LONG ANSWER QUESTIONS [6 Marks]
Answer the following questions in about 100 words.
1. Explain the steps in construction of a frequency distribution.
2. Explain 'class limits' by giving examples.
3. Explain the inclusive method and exclusive method of determining class intervals by giving
numerical examples.
4. Why do we need how do we calculate mid-point of a class? Explain.
5. Explain the two basis on which a frequency distribution can be prepared.
6. Explain the steps taken in converting inclusive classes into exclusive classes.
Answer to the Multiple Choice Questions
1. (c) 2. (c) 3. (c) 4. (c)
CHAPTER 5: PRESENTATION OF DATA: TABLES

5.1 INTRODUCTION
Simply collection of data is not enough. It is also necessary to organise and present data in
such a manner that its use can be made for analysis, for comparison, for highlighting
significant findings. In other words, a formal presentation of data is necessary. By formal
presentation we mean systematic organisation of data.
There are two ways in which data can formally presented : (i) tables and (ii) graphs. In
this chapter we will deal with tabular presentation.

5.2 WHAT IS A TABLE?


5.201 Meaning of a Table
A table is a systematic organisation of data in columns and rows. Take, for example, the following
table about Indian economy: (Table 5.1)
TABLE 5.1
Milk Production and Per Capita Availability
Year Milk production Per capita
(million tonnes) availability
(grams/day)
1950-51 17.5 124
1960-61 20.0 124
1970-71 22.0 112
1980-81 31.6 128
1990-91 53.9 176
2000-01 80.6 220
2001-02 84.4 225
2010-11 121.8 227
2014-15 146.3 322
Source : Economic Survey, 2015—16
There are three columns in the table. Column I reads year. Column 2 reads milk
production during each year. Column 3 reads per capita availability of milk during each year.
There are nine rows. Row I gives information about the year 1950—51, row 2 about 1960—
61 and so on.
5.2.2 Use of a Table
In what ways is the table useful to us? The data in a table is so systematically presented that we can
quickly locate the desired information. Suppose, we want the latest position of milk production in
India. We simply have a look at the last row and we find that total milk production in 2014-15 is
146.3 million tonnes and per capita availability is 322 grams per day. Now suppose, we are
interested in knowing about the change in position during the last 65 years. For this, we simply
compare the first and the last row. We find that production increased from 17 to 146.3 million
tonnes, i.e., by about more than 8 times. We also find that per capita availability increased from
124 to 322 grams per day, i.e., almost doubled. In this way, tables are very useful for making
comparisons.
5.3 TYPES OF TABLES
Broadly, there are two types of tables: (a) the reference tables and (b) the text tables. The
reference tables are also called general tables. A reference table is a store house of information.
Such tables give detailed information arranged for ready reference. Reference tables are usually
very extensive and spread over many pages. These tables are often placed in an appendix or form a
separate part of the statistical report. For example, Economic Survey (Government of India) has a
separate section of more than 80 tables spread over 120 pages.
Text table are also called Summary tables. Summary tables are comparatively smaller in size.
Such tables are usually derived from the reference tables and are confined to only one or a few
findings of the statistical study. Text tables are simple and easy to understand. These are called text
tables because they form an essential part of text discussion. Table 5.1 is an example of text table.
5.4 PARTS OF A TABLE
5.4.1 Introduction
There are many parts in a table. There are some parts which must be presented in all tables. There
are other parts which may or may not be presented in a table.
The parts which must be presented in a table are : (l) title, (2) stub, (3) box head or caption and (4)
body or field. The parts which may or may not be presented (5) table number, (6) head note, (7)
footnote and (8) source note. Given below is a table in which all the eight parts are present. Stub in
this table has been sub divided into stub head and stub entries.
5.4.2 Format of a Table
A format of a table with all the parts is given below:
5.4.3 Explanation of Parts of a Table
Let us now describe each part of the format.
(1) Table Number: A statistical report may contain many tables. Each table must be clearly
marked so that it can easily be identified. Table number is essentially an identification mark of
the table.
(2) Title: The title of the table must be completed in all respects. It must answer the questions
what, where and when in that sequence. For example, in Table 5.2 the title answers the three
questions in the following manner.
What: Population covered with drinking water and sanitation facilities.
Where: In India.
When: During 1985 to 1998.
(3) Head note: A head note is a statement below the title which clarifies the contents of the
table. For example, the head note in Table 5.2 clarifies that for each year the figure relates to
31st March.
TABLE 5.2
Population covered with Drinking Water and
Table Sanitation Facilities in India, Title
during 1985 to 1998

(Percentage coverage as on March 31)

Item/Area Year
Stub
Head 1985 1990 1995 1998
(Estimated)

Drinking Water Supply


Rural 56.3 73.9 82.8 92.5
Urban 72.9 83.8 84.3 90.2@
Stub Sanitation Facilities
Entries Rural 0.7 2.4 3.6 8.1*
Urban 28.4 45.9 49.9# 49.3@

# As on 31.3.1993.
* With government initiative under CRSP, MNP, JRY and IAY.
@ As on 31.3.1997.
Note : (i) Figures for rural water supply and sanitation are based on census population.
(ii) Figures for urban water supply and sanitation are based on
current population.

Source: Ministries of Rural Development and Urban Development.


(Government of India) note as quoted in Economic Survey
1999 — 2000, Economic Division, Ministry of Finance, Government of India, Ch. 8,
Page 177, Table 10.8.

TABLE 5.3: Format of a Table


Table No.
Title
Head note
Stub Head Box Head (Caption)

Column Head Column Head Column Head Column Head


Body (field)

Stub
Entries

Footnote
Source note.
(4) Stub: The stub is further subdivided into two parts: stub head and stub entries.
The stub head describes the nature of stub entries. A stub entry labels the data found in the
row of the table.
(5) Box head: It is also called caption. It labels the data found in the columns of
the table. For example, the box head in Table 5.2 is 'years'. Along with box head there may
be many column heads. Each column head may be further subdivided into sub-heads.
(6) Body: It is also called field. It contains the numerical information and
occupies most of the space in a table.
(7) Footnote: Footnote is placed at the bottom of the table. It clarifies some specific
item or part of the table. Its reference mark is also found in the main table. For example,
footnote mark # is also there in the last entry of column of the year 1995. This footnote
clarifies that the figure of 49.9 actually belongs to the year 1993. There can be more than one
footnote in a table. In Table 5.2, there are as many as 5 footnotes. These footnotes are very
helpful for the actual users of the data.
(8) Source note: A source note states from where the data were obtained. For
example, in Table 5.2 the data contained in the table were obtained from Ministries of Rural
Development and Urban Development of Government of India and this entire table is picked
up from Economic Survey, 1999 2000. Statement of source permits user of the data to check
the figures from the primary source and possibly gather additional information. The statement
of the source must be complete in all respects like title, edition, publisher, chapter, page, table
number, etc. It helps the user in quickly locating the primary source. A simple format of a
table is given in Table 5.3

5.4.4 Precautions in Construction of a Table


There is no strictly fixed format of a table. There is a lot of flexibility which can be exercised by the
statistician particularly in the arrangement of stub, box heads, column heads and body. What
particular arrangement is followed will depend upon first, the nature of data and second, the purpose
of presentation. The main criterion is that table must be easy to follow by the user of data. To meet
this requirement, the statistician must keep in mind certain points while constructing a table. These
are as follows.
(i) State clearly and place appropriately the title: The title of the table is the first
thing which user of a table reads. The title conveys what information is contained in the
table. Therefore, it must be clearly stated. There should be no ambiguity. It must state
what (the nature of data), where (about whom) and when (time period). It must be placed
at the top of the table and is centered.
(ii) Avoid abbreviations: Do not use abbreviations especially in titles and
headings. For example, "pop." should not be used for "population" in the title of Table
5.2. Also, "Yr." should not be used for "year" in the box head in the same table.
(iii) Use the singular in headings, whenever possible: For example, in the box
head of table 5.2 use "year" and not "years". Similarly, in the stub head use "item" and
not "items".
(iv) Do not use zero to indicate that information is not available: When
information is not available either use short form "n.a." or simply dash (—). Zero should
be used only for zero quantity. Do not leave the space blank if the quantity is zero, use 0
(zero).
(v) Be consistent in ruling: If ruling is used to close a table on one side, use it also
on the other side. For example, Table 5.2 is closed vertically and horizontally from
both sides. It does not mean that both the sides must necessarily be closed. The rule is
that whether 'closed' or 'open', both the sides must be either closed or open. For
example, Table 5.1 is vertically closed from both sides while horizontally open from
both sides.
(vi) Plan the size of a table according to the page size of publication, as far as
possible: There is no hard and fast rule in this regard. Avoid carrying table to the next
page as far as possible. It may cause inconvenience to the user of the table.

POINTS TO REMEMBER
 There are two ways of presenting data: tables and graphs.
 A table is a systematic organisation of data in columns and rows.
 A table is useful to us in many ways. We can quickly locate the desired information.
We can make comparisons.
 Broadly, there are two types of tables: (a) the reference table and (b) the text tables.
 A reference table gives detailed information arranged for ready reference.
 A text table is a summary table and comparatively smaller in size.
 There are 8 parts of a table: (1) title, (2) stub, (3) box head or caption and (4)
body. These parts must be present in a table, (5) table number, (6) head note,
(7) footnote and (8) source note. These parts may be present or may not be
present in a table.
 Table number is the identification mark of the table.
 Title of the table answer the questions what, where and when.
 Head note is a statement below the title which clarifies the contents of the table.
 Stub labels the data found in the row of the table.
 Box head labels the data found in the columns of the table.
 Body contains numerical information.
 Footnote clarifies some specific item or part of the table.
 Source note states where the data were obtained.
 The precautions required to be taken in construction of a table are:

You might also like