You are on page 1of 13

CHAPTER

2 Collection of Data

chapter, you will study the sources of


Studying this chapter should enable
data and the mode of data collection.
you to:
• understand the meaning and The purpose of collection of data is to
purpose of data collection; collect evidence for reaching a sound
• distinguish between primary and and clear solution to a problem.
secondary sources; In economics, you often come
• know the mode of collection of data; across a statement like,
• distinguish between Census and “After many fluctuations the output
Sample Surveys;
of food grains rose to 176 million tonnes
• be familiar with the techniques of
sampling; in 1990–91 and 199 million tonnes in
• know about some important 1996–97, but fell to 194 million tonnes
sources of secondary data. in 1997–98. Production of food grains
then rose continuously and touched
212 million tonnes in 2001–02.”
1. I N T R O D U C T I O N
In this statement, you can observe
In the previous chapter, you have read that the food grains production in
about what is economics. You also different years does not remain the
studied about the role and importance same. It varies from year to year and
of statistics in economics. In this from crop to crop. As these values
1 0 STATISTICS FOR ECONOMICS

vary, they are called variable. The 2. WHAT ARE THE SOURCES OF DATA?
variables are generally represented by
Statistical data can be obtained from
the letters X, Y or Z. The values of
two sources. The enumerator (person
these variables are the observation.
who collects the data) may collect the
For example, suppose the food grain
data by conducting an enquiry or an
production in India varies between
investigation. Such data are called
100 million tonnes in 1970–71 to 220
Primary Data, as they are based on
million tonnes in 2001–02 as shown
first hand information. Suppose, you
in the following table. The years are
want to know about the popularity of
represented by variable X and the
a film star among school students. For
production of food grain in India (in
this, you will have to enquire from a
million tonnes) is represented by
large number of school students, by
variable Y:
asking questions from them to collect
TABLE 2.1 the desired information. The data you
Production of Food Grain in India get, is an example of primary data.
(Million Tonnes) If the data have been collected and
X Y processed (scrutinised and tabulated)
1970–71 108 by some other agency, they are called
1978–79 132 Secondary Data. Generally, the
1979–80 108 published data are secondary data.
1990–91 176 They can be obtained either from
1996–97 199 published sources or from any other
1997–98 194 source, for example, a web site. Thus,
2001–02 212 the data are primary to the source that
collects and processes them for the
Here, these values of the variables first time and secondary for all sources
X and Y are the ‘data’, from which we that later use such data. Use of
can obtain information about the secondary data saves time and cost.
trend of the production of food grains
For example, after collecting the data
in India. To know the fluctuations in
on the popularity of the film star
the output of food grains, we need the
among students, you publish a report.
‘data’ on the production of food grains
If somebody uses the data collected
in India. ‘Data’ is a tool, which helps
by you for a similar study, it becomes
in understanding problems by
secondary data.
providing information.
You must be wondering where do
3. HOW DO WE COLLECT THE DATA?
‘data’ come from and how do we collect
these? In the following sections we will Do you know how a manufacturer
discuss the types of data, method and decides about a product or how a
instruments of data collection and political party decides about a
sources of obtaining data. candidate? They conduct a survey by
COLLECTION OF DATA 1 1

asking questions about a particular Good Q


product or candidate from a large (i) Is the electricity supply in your
group of people. The purpose of locality regular?
surveys is to describe some (ii) Is increase in electricity charges
characteristics like price, quality, justified?
usefulness (in case of the product) and • The questions should be precise
popularity, honesty, loyalty (in case and clear. For example,
of the candidate). The purpose of the Poor Q
survey is to collect data. Survey is a What percentage of your income do
method of gathering information from you spend on clothing in order to look
individuals. presentable?
Preparation of Instrument Good Q
What percentage of your income do
The most common type of instrument you spend on clothing?
used in surveys is questionnaire/
interview schedule. The questionnaire • The questions should not be
is either self administered by the ambiguous, to enable the respon-
respondent or administered by the dents to answer quickly, correctly
researcher (enumerator) or trained and clearly. For example:
investigator. While preparing the Poor Q
questionnaire/interview schedule, you Do you spend a lot of money on books
should keep in mind the following in a month?
points; Good Q
How much do you spend on books in
• The questionnaire should not be too a month?
long. The number of questions (i) Less than Rs 200
should be as minimum as possible. (ii) Between Rs 200–300
Long questionnaires discourage (iii) Between Rs 300–400
people from completing them. (iv) More than Rs 400
• The series of questions should move • The question should not use double
from general to specific. The negatives. The questions starting
questionnaire should start from with “Wouldn’t you” or “Don’t you”
general questions and proceed to should be avoided, as they may
more specific ones. This helps the lead to biased responses. For
respondents feel comfortable. For example:
example: Poor Q
Poor Q Don’t you think smoking should be
(i) Is increase in electricity charges prohibited?
justified? Good Q
(ii) Is the electricity supply in your Do you think smoking should be
locality regular? prohibited?
1 2 STATISTICS FOR ECONOMICS

• The question should not be a because all the respondents respond


leading question, which gives a clue from the given options. But they are
about how the respondent should difficult to write as the alternatives
answer. For example: should be clearly written to represent
Poor Q both sides of the issue. There is also
How do you like the flavour of this a possibility that the individual’s true
high-quality tea? response is not present among the
Good Q options given. For this, the choice of
How do you like the flavour of this tea? ‘Any Other’ is provided, where the
respondent can write a response,
• The question should not indicate
which was not anticipated by the
alternatives to the answer. For
researcher. Moreover, another
example:
Poor Q limitation of multiple-choice questions
Would you like to do a job after college is that they tend to restrict the
or be a housewife? answers by providing alternatives,
Good Q without which the respondents may
Would you like to do a job, if possible? have answered differently.
The questionnaire may consist of Open-ended questions allow for
closed ended (or structured) questions more individualised responses, but
or open ended (or unstructured) they are difficult to interpret and hard
questions. to score, since there are a lot of
Closed ended or structured variations in the responses. Example,
questions can either be a two-way Q. What is your view about
question or a multiple choice question. globalisation?
When there are only two possible
answers, ‘yes’ or ‘no’, it is called a two- Mode of Data Collection
way question.
Have you ever come across a television
When there is a possibility of more
than two options of answers, multiple show in which reporters ask questions
choice questions are more appropriate. from children, housewives or general
Example, public regarding their examination
Q. Why did you sell your land? performance or a brand of soap or a
(i) To pay off the debts. political party? The purpose of asking
(ii) To finance children’s educa- questions is to do a survey for
tion. collection of data. There are three
(iii) To invest in another property. basic ways of collecting data: (i)
(iv) Any other (please specify). Personal Interviews, (ii) Mailing
Closed -ended questions are easy (questionnaire) Surveys, and (iii)
to use, score and code for analysis, Telephone Interviews.
COLLECTION OF DATA 1 3

Personal Interviews less expensive. It allows the researcher


to have access to people in remote
This method is used
areas too, who might be difficult to
when the researcher
reach in person or by telephone. It
has access to all the does not allow influencing of the
members. The resea- respondents by the interviewer. It also
rcher (or investigator) permits the respondents to take
conducts face to face interviews with sufficient time to give thoughtful
the respondents. answers to the questions. These days
Personal interviews are preferred online surveys or surveys through
due to various reasons. Personal short messaging service i.e. SMS have
contact is made between the become popular. Do you know how an
respondent and the interviewer. The online survey is conducted?
interviewer has the opportunity of The disadvantages of mail survey
explaining the study and answering are that, there is less opportunity to
any query of the respondents. The provide assistance in clarifying
interviewer can request the respon- instructions, so there is a possibility
dent to expand on answers that are of misinterpretation of questions.
particularly important. Misinterpre- Mailing is also likely to produce low
response rates due to certain factors
tation and misunderstanding can be
such as returning the questionnaire
avoided. Watching the reactions of the
without completing it, not returning
respondents can provide supplemen-
the questionnaire at all, loss of
tary information. questionnaire in the mail itself, etc.
Personal interview has some
demerits too. It is expensive, as it Telephone Interviews
requires trained interviewers. It takes
In a telephone interview, the
longer time to complete the survey.
investigator asks questions over the
Presence of the researcher may inhibit
telephone. The advan-
respondents from saying what they
tages of telephone
really think.
interviews are that they
are cheaper than
Mailing Questionnaire
personal interviews and
When the data in a survey are can be conducted in a shorter time.
collected by mail, the questionnaire is They allow the researcher to assist the
sent to each individual respondent by clarifying the
by mail with a request questions. Telephone interview is
to complete and return better in the cases where the
it by a given date. The respondents are reluctant to answer
advantages of this certain questions in personal
method are that, it is interviews.
1 4 STATISTICS FOR ECONOMICS

Activities small group which is known as Pilot


Survey or Pre-Testing of the
• You have to collect information questionnaire. The pilot survey helps
from a person, who lives in a
in providing a preliminary idea about
remote village of India. Which
the survey. It helps in pre-testing of
mode of data collection will be
the most appropriate for the questionnaire, so as to know the
collecting information from him? shortcomings and drawbacks of the
• You have to interview the parents questions. Pilot survey also helps in
about the quality of teaching in assessing the suitability of questions,
a school. If the principal of the clarity of instructions, performance of
school is present there, what enumerators and the cost and time
types of problems can arise? involved in the actual survey.
The disadvantage of this method
is access to people, as many people 4. CENSUS AND SAMPLE SURVEYS
may not own telephones. Telephone Census or Complete Enumeration
Interviews also obstruct visual
A survey, which includes every
reactions of the respondents, which
element of the population, is known
becomes helpful in obtaining
as Census or the Method of Complete
information on sensitive issues.
Enumeration. If certain agencies are
interested in studying the total
Pilot Survey
population in India, they have to
Once the questionnaire is ready, it is obtain information from all the
advisable to conduct a try-out with a households in rural and urban India.

Advantages Disadvantages
• Highest Response Rate • Most expensive
• Allows use of all types of questions • Possibility of influencing
• Better for using open-ended respondents
questions • More time taking.
• Allows clarification of ambiguous
questions.

• Least expensive • Cannot be used by illiterates


• Only method to reach remote areas • Long response time
• No influence on respondents • Does not allow explanation of
• Maintains anonymity of respondents unambiguous questions
• Best for sensitive questions. • Reactions cannot be watched.

• Relatively low cost • Limited use


• Relatively less influence on • Reactions cannot be watched
respondents • Possibility of influencing respon-
• Relatively high response rate. dents.
COLLECTION OF DATA 1 5

The essential feature of this method


is that this covers every individual unit
in the entire population. You cannot
select some and leave out others. You
may be familiar with the Census of
India, which is carried out every ten
years. A house-to-house enquiry is
carried out, covering all households
in India. Demographic data on birth
and death rates, literacy, workforce,
life expectancy, size and composition
of population, etc. are collected and
Source: Census of India, 2001.
published by the Registrar General of
India. The last Census of India was 1981 indicated that the rate of
held in February 2001. population growth during 1960s and
1970s remained almost same. 1991
Census indicated that the annual
growth rate of population during
1980s was 2.14 per cent, which came
down to 1.93 per cent during 1990s
according to Census 2001.
“At 00.00 hours of first March,
2001 the population of India stood
at 1027,015,247 comprising of
531,277,078 males and
495,738,169 females. Thus, India
becomes the second country in the
world after China to cross the one
billion mark.”

Source: Census of India, 2001.

Sample Survey
Population or the Universe in statistics
means totality of the items under
According to the Census 2001, study. Thus, the Population or the
population of India is 102.70 crore. It Universe is a group to which the
was 23.83 crore according to Census results of the study are intended to
1901. In a period of hundred years, apply. A population is always all the
the population of our country individuals/items who possess certain
increased by 78.87 crore. Census characteristics (or a set of characteris-
1 6 STATISTICS FOR ECONOMICS

tics), according to the purpose of the • Sample: Ten per cent of the
survey. The first task in selecting a agricultural labourers in Chura-
sample is to identify the population. chandpur district.
Once the population is identified, the Most of the surveys are sample
researcher selects a Representative surveys. These are preferred in
Sample, as it is difficult to study the statistics because of a number of
entire population. A sample refers to reasons. A sample can provide
a group or section of the population reasonably reliable and accurate
from which information is to be information at a lower cost and
obtained. A good sample (represen- shorter time. As samples are smaller
tative sample) is generally smaller than than population, more detailed
the population and is capable of information can be collected by
providing reasonably accurate conducting intensive enquiries. As we
information about the population at need a smaller team of enumerators,
a much lower cost and shorter time. it is easier to train them and supervise
Suppose you want to study the their work more effectively.
average income of people in a certain Now the question is how do you
region. According to the Census do the sampling? There are two main
method, you would be required to find types of sampling, random and non-
out the income of every individual in random. The following description will
the region, add them up and divide make their distinction clear.
by number of individuals to get the
average income of people in the region. Activities
This method would require huge • In which years will the next
expenditure, as a large number of Census be held in India and
enumerators have to be employed. China?
Alternatively, you select a represent- • If you have to study the opinion
ative sample, of a few individuals, from of students about the new
the region and find out their income. economics textbook of class XI,
what will be your population and
The average income of the selected
sample?
group of individuals is used as an
• If a researcher wants to estimate
estimate of average income of the the average yield of wheat in
individuals of the entire region. Punjab, what will be her/his
population and sample?
Example
• Research problem: To study the Random Sampling
economic condition of agricultural As the name suggests, random
labourers in Churachandpur district sampling is one where the individual
of Manipur. units from the population (samples)
• Population: All agricultural are selected at random. The
labourers in Churachandpur district. government wants to determine the
COLLECTION OF DATA 1 7

tables have been generated to


guarantee equal probability of
selection of every individual unit (by
their listed serial number in the
sampling frame) in the population.
They are available either in a
A Population of 20
published form or can be generated
Kuchha and 20
Pucca Houses
by using appropriate software
packages (See Appendix B).You can
start using the table from anywhere,
i.e., from any page, column, row or
A Representative A non Representative point. In the above example, you need
Sample Sample
to select a sample of 30 households
impact of the rise in petrol price on
out of 300 total households. Here, the
the household budget of a particular
largest serial number is 300, a three
locality. For this, a representative
digit number and therefore we consult
(random) sample of 30 households has
three digit random numbers in
to be taken and studied. The names
sequence. We will skip the random
of all the 300 households of that area
numbers greater than 300 since there
are written on pieces of paper and
is no household number greater than
mixed well, then 30 names to be
300. Thus, the 30 selected households
interviewed are selected one by one.
are with serial numbers: 149, 219,
In the random sampling, every
111, 165, 230, 007, 089, 212, 051,
individual has an equal chance of being
244, 300, 051, 244, 155, 300, 051,
selected and the individuals who are
152, 156, 205, 070, 015, 157, 040,
selected are just like the ones who are
243, 479, 116, 122, 081, 160, 162.
not selected. In the above example, all
the 300 sampling units (also called
sampling frame) of the population got
Exit Polls
an equal chance of being included in
the sample of 30 units and hence the You must have seen that when an
sample, such drawn, is a random election takes place, the television
sample. This is also called lottery networks provide election coverage.
method. The same could be done using They also try to predict the results.
a Random Number Table also. This is done through exit polls,
wherein a random sample of voters
How to use the Random Number who exit the polling booths are asked
Tables? whom they voted for. From the data
of the sample of voters, the
Do you know what are the Random
prediction is made.
Number Tables? Random number
1 8 STATISTICS FOR ECONOMICS

Activity characteristic of the population (that


• You have to analyse the trend of may be the average income, etc.). It is
foodgrains production in India the error that occurs when you make
for the last fifty years. As it is an observation from the sample taken
difficult to include all the years, from the population. Thus, the
you have to select a sample of difference between the actual value of
production of ten years. Using a parameter of the population (which
the Random Number Tables, is not known) and its estimate (from
how will you select your sample?
the sample) is the sampling error. It is
possible to reduce the magnitude of
Non-Random Sampling
sampling error by taking a larger
There may be a situation that you sample.
have to select 10 out of 100
Example
households in a locality. You have to
decide which household to select and Consider a case of incomes of 5
which to reject. You may select the farmers of Manipur. The variable x
households conveniently situated or (income of farmers) has measure-
the households known to you or your ments 500, 550, 600, 650, 700. We
friend. In this case, you are using your note that the population average of
judgement (bias) in selecting 10 (500+550+600+650+700)
households. This way of selecting 10 ÷ 5 = 3000 ÷ 5 = 600.
out of 100 households is not a random Now, suppose we select a sample
selection. In a non-random sampling of two individuals where x has
method all the units of the population measurements of 500 and 600. The
do not have an equal chance of being sample average is (500 + 600) ÷ 2
selected and convenience or judgement = 1100 ÷ 2 = 550.
of the investigator plays an important Here, the sampling error of the
role in selection of the sample. They are estimate = 600 (true value) – 550
mainly selected on the basis of (estimate) = 50.
judgment, purpose, convenience or
quota and are non-random samples. Non-Sampling Errors
Non-sampling errors are more serious
5. SAMPLING AND NON-S AMPLING than sampling errors because a
ERRORS sampling error can be minimised by
Sampling Errors taking a larger sample. It is difficult
The purpose of the sample is to take to minimise non-sampling error, even
an estimate of the population. by taking a large sample. Even a
Sampling error refers to the Census can contain non-sampling
differences between the sample errors. Some of the non-sampling
estimate and the actual value of a errors are:
COLLECTION OF DATA 1 9

Errors in Data Acquisition process and tabulate the statistical


This type of error arises from recording data. Some of the major agencies at
of incorrect responses. Suppose, the the national level are Census of India,
teacher asks the students to measure National Sample Survey Organisation
the length of the teacher’s table in the (NSSO), Central Statistical Organisa-
classroom. The measurement by the tion (CSO), Registrar General of India
students may differ. The differences (RGI), Directorate General of
may occur due to differences in Commercial Intelligence and Statistics
measuring tape, carelessness of the (DGCIS), Labour Bureau etc.
students etc. Similarly, suppose we The Census of India provides the
want to collect data on prices of most complete and continuous
oranges. We know that prices vary demographic record of population. The
from shop to shop and from market Census is being regularly conducted
to market. Prices also vary according every ten years since 1881. The first
to the quality. Therefore, we can only Census after Independence was held
consider the average prices. Recording in 1951. The Census collects
mistakes can also take place as the information on various aspects of
enumerators or the respondents may population such as the size, density,
commit errors in recording or trans- sex ratio, literacy, migration, rural-
scripting the data, for example, he/ urban distribution etc. Census in
she may record 13 instead of 31. India is not merely a statistical
operation, the data is interpreted and
Non-Response Errors analysed in an interesting manner.
The NSSO was established by the
Non-response occurs if an interviewer government of India to conduct
is unable to contact a person listed in nation-wide surveys on socio-
the sample or a person from the economic issues. The NSSO does
sample refuses to respond. In this continuous surveys in successive
case, the sample observation may not rounds. The data collected by NSSO
be representative. surveys, on different socio economic
subjects, are released through reports
Sampling Bias
and its quarterly journal
Sampling bias occurs when the Sarvekshana. NSSO provides periodic
sampling plan is such that some estimates of literacy, school
members of the target population enrolment, utilisation of educational
could not possibly be included in the services, employment, unemployment,
sample. manufacturing and service sector
enterprises, morbidity, maternity,
6. CENSUS OF INDIA AND NSSO child care, utilisation of the public
There are some agencies both at the distribution system etc. The NSS 59th
national and state level, which collect, round survey (January–December
2 0 STATISTICS FOR ECONOMICS

2003) was on land and livestock of data collection is to understand,


holdings, debt and investment. The explain and analyse a problem and
NSS 60th round survey (January– causes behind it. Primary data is
June 2004) was on morbidity and obtained by conducting a survey.
health care. The NSSO also
Survey includes various steps, which
undertakes the fieldwork of Annual
need to be planned carefully. There are
survey of industries, conducts crop
estimation surveys, collects rural and various agencies which collect,
urban retail prices for compilation of process, tabulate and publish
consumer price index numbers. statistical data. These can be used as
secondary data. However, the choice
7. CONCLUSION of source of data and mode of data
Economic facts, expressed in terms of collection depends on the objective of
numbers, are called data. The purpose the study.

Recap
• Data is a tool which helps in reaching a sound conclusion on any
problem by providing information.
• Primary data is based on first hand information.
• Survey can be done by personal interviews, mailing questionnaires
and telephone interviews.
• Census covers every individual/unit belonging to the population.
• Sample is a smaller group selected from the population from which
the relevant information would be sought.
• In a random sampling, every individual is given an equal chance of
being selected for providing information.
• Sampling error arises due to the difference between the actual
population and the estimate.
• Non-sampling errors can arise in data acquisition, by non-response
or by bias in selection.
• Census of India and National Sample Survey Organisation
are two important agencies at the national level, which collect,
process and tabulate data.

EXERCISES

1. Frame at least four appropriate multiple-choice options for following


questions:
(i) Which of the following is the most important when you buy a new
dress?
COLLECTION OF DATA 2 1

(ii) How often do you use computers?


(iii) Which of the newspapers do you read regularly?
(iv) Rise in the price of petrol is justified.
(v) What is the monthly income of your family?
2. Frame five two-way questions (with ‘Yes’ or ‘No’).
3. (i) There are many sources of data (true/false).
(ii) Telephone survey is the most suitable method of collecting data, when
the population is literate and spread over a large area (true/false).
(iii) Data collected by investigator is called the secondary data (true/false).
(iv) There is a certain bias involved in the non-random selection of samples
(true/false).
(v) Non-sampling errors can be minimised by taking large samples (true/
false).
4. What do you think about the following questions. Do you find any problem
with these questions? If yes, how?
(i) How far do you live from the closest market?
(ii) If plastic bags are only 5 percent of our garbage, should it be banned?
(iii) Wouldn’t you be opposed to increase in price of petrol?
(iv) (a) Do you agree with the use of chemical fertilizers?
(b) Do you use fertilizers in your fields?
(c) What is the yield per hectare in your field?
5. You want to research on the popularity of Vegetable Atta Noodles among
children. Design a suitable questionnaire for collecting this information.
6. In a village of 200 farms, a study was conducted to find the cropping
pattern. Out of the 50 farms surveyed, 50% grew only wheat. Identify the
population and the sample here.
7. Give two examples each of sample, population and variable.
8. Which of the following methods give better results and why?
(a) Census (b) Sample
9. Which of the following errors is more serious and why?
(a) Sampling error (b) Non-Sampling error
10. Suppose there are 10 students in your class. You want to select three out
of them. How many samples are possible?
11. Discuss how you would use the lottery method to select 3 students out of
10 in your class?
12. Does the lottery method always give you a random sample? Explain.
13. Explain the procedure of selecting a random sample of 3 students out of
10 in your class, by using random number tables.
14. Do samples provide better results than surveys? Give reasons for your
answer.

You might also like