You are on page 1of 23

Unity University College

BLOCK 2: COLLECTION OF DATA


UNIT 2: COLLECTION OF DATA

INTRODUCTION

Nowadays most executives and other decision makers pass effective decisions based on research
findings. Most researches in different areas of study require data so as to generate valuable
information that facilitate the decision making process. Data are raw materials for researches.
Moreover, the quality of the collected data greatly affects or determines the precision of results
to be obtained from a specific investigation. Therefore, it is extremely important to know about
the basics of data collection.

Creating Opportunity Through Education 15


Unity University College

CONTENTS:

2.0 Aims and Objectives


2.1 Introduction
2.2 Meaning of Collection of data
2.3 Primary and Secondary data
2.4 Scope of statistical Investigation
2.5 Advantages and Limitations of census and sample survey
2.6 Sampling Techniques
2.7 Summary
2.8 Answers for check your progress questions (CYP)
2.9 Model examination questions
2.10 Glossary
2.11 References

2.0 AIMS AND OBJECTIVES

This block aims at addressing two important points. First, it is the goal of this chapter to discuss
about the different sources of data. The different methods of collecting data from primary source
and the precautions to be considered while taking data from secondary sources are the core
elements to be dealt with.
Second, scope of statistical investigation and the different sampling techniques together with
their merits and demerits constitute the second aim or goal of this chapter.
At the end of this unit, you are able to:
 Identify sources of data together with their advantages and limitations.
 Explain the various methods of collecting primary data.
 Distinguish between random and non-random sampling techniques.
 Discuss the various types of random and non-random sampling methods.

2.1 INTRODUCTION
The term “ Data Collection” refers to all the issues related to data sources, scope of investigation
and sampling techniques. In this chapter, our discussion starts with the discussion of the meaning
of data collection. Having the reader acquainted with the meaning of data collection, the chapter
advances to the discussion of the two sources of data namely, primary and secondary sources. In
addition, the different methods of collecting data from primary sources are discussed. Next, the
block addresses the meaning and respective advantages and limitations of census and sample
survey, which are the two scopes of statistical investigation. Finally, the chapter closes by
presenting a satisfactory discussion on the two sampling techniques namely, Random and Non-
Random sampling techniques.

Creating Opportunity Through Education 16


Unity University College

2.2 MEANING OF COLLECTION OF DATA

Collection of data implies a systematic and meaningful assembly of information for the
accomplishment of the objective of a statistical investigation. It refers to the methods used in
gathering the required information from the units under investigation.
The quality of data greatly affects the final output of an investigation. Hence, utmost care should
be attached to the data collection process and every possible precaution should be taken to ensure
accuracy while collecting data. Otherwise, with inaccurate and inadequate data, the whole
analysis is likely to be faulty and also the decisions to be taken will also be misleading.

2.3 PRIMARY AND SECONDARY DATA

2.3.1 Meaning and distinction between primary and secondary data


Statistical data may be obtained either from primary or secondary source. A primary source is a
source from where first-hand information is gathered. On the other hand, secondary source is the
one that makes data available, which were collected by some other agency. Clearly, a source,
which is not primary, is necessarily a secondary source. Primary sources are original sources of
data.
Data obtained from a primary source is called primary data. Likewise, data gathered from a
secondary source is known as secondary data. For example, assume that a simple study is to be
conducted to see the age distribution of HIV/AIDS victim citizens. Clearly, the variable of study
is age. Data about the age of HIV/AIDS victim citizens may be obtained by making direct
interview with the victims. Note, in this specific case, the victim citizens are primary sources.
Moreover, the data to be collected from them are primary data. Alternatively, one may use
records of hospitals and other related agencies to obtain the age of the victim citizens without the
need of tracing the victims personally. Therefore, the records of the hospitals, in our case, are
secondary sources and the data copied from such records are secondary data.

In most cases, secondary data is obtained from such sources as census and survey reports, books,
official records, reported experimental results, previous research papers, bulletins, magazines,
newspapers, web sites, and other publications. Different organizations and government agencies
publish information (data) in the form of reports, periodicals, journals, etc. In the case of
Ethiopia, the Central Statistics Authority (CSA) is the first to be mentioned in publishing such
relevant information (secondary data).

2.3.2 Advantages and Disadvantages of Primary and Secondary data

The following are major advantages of primary data over that of secondary data.
 The primary data gives more reliable, accurate and adequate information, which is
suitable to the objective and purpose of an investigation.
 Primary source usually shows data in greater detail.

Creating Opportunity Through Education 17


Unity University College

 Primary data is free from errors that may arise from copying of figures from
publications, which is the case in secondary data.

The disadvantages of primary data are:


 The process of collecting primary data is time consuming and costly.
 Often, primary data gives misleading information due to lack of integrity of
investigators and non-cooperation of respondents in providing answers to certain
delicate questions.

Advantage of Secondary data:


 It is readily available and hence convenient and much quicker to obtain than primary
data,
 It reduces time, cost and effort as compared to primary data,
 Secondary data may be available in subjects (cases) where it is impossible to collect
primary data. Such a case can be regions where there is war.

Some of the disadvantages of Secondary data are:


 Data obtained may not be sufficiently accurate,
 Data that exactly suit our purpose may not be found,
 Error may be made while copying figures.

The choice between primary data and secondary data is determined by factors like nature and
scope of the enquiry, availability of financial resources, availability of time, degree of accuracy
desired, and the collecting agency. Often, primary data are used in situations where secondary
data do not provide adequate basis of analysis. Meaning, when the secondary data do not suit a
specific investigation we use primary data. Unless for such cases, most statistical investigations
rest up on secondary data since it minimizes cost and saves time. Nevertheless, the following
points should be carefully considered while using secondary data in our investigation.
 One should closely examine whether or not the data are suitable for the intended study,
 The source of data should be viewed, keeping in mind whether, at any time, it is reliable or
not. If there is any doubt about the reliability of data, it should not be used,
 It should be noted that the data is not obsolete,
 In case the data are based on a sample, one should see whether the sample is a proper
representative of the population,
 It should be the case that skilled persons only have handled the primary data
carefully.

Finally, it should be clear that primary data in the hands of one person might be secondary in the
hands of another. That is why it is often said, “the difference between primary and secondary
data is largely one of degree.”

Creating Opportunity Through Education 18


Unity University College

2.3.3 Methods of collecting primary data

After discussing the two sources of data, primary and secondary, it is logical to say a few words
about the methods employed in collecting data from its original or primary source.

Many authors commonly state three methods of collecting primary data. These are:
a. Personal Enquiry Method (Interview method)
b. Direct Observation
c. Questionnaire method

a. Personal Enquiry Method (Interview method)


In personal enquiry method, a question sheet is prepared which is called schedule. The schedule
contains all the questions, which would extract a complete report from a respondent. Usually,
schedules are pre-tested so as to remove certain discrepancies like ambiguities of the questions
and irrelevant questions. This pre-testing process is called a pilot survey. It is worth mentioning
that the schedule is not directly given to the respondent. Rather, it is the interviewer who asks
those questions on the schedule and jot down the interviewee’s (respondent’s) response.
Depending on the nature of the interview, personal enquiry method is further classified into two
types.
 Direct Personal Interview: It is a type of personal enquiry where there is a face-to-face
contact with the persons from whom the information is to be obtained. In other words, the
investigator contacts each respondent personally, without the interference of third party, and
asks questions given in the schedule one by one and notes down respondent’s replies on the
schedule.
 Indirect Personal Enquiry (Interview) : It is the second type of personal enquiry where the
investigator contacts third parties called witnessed who are capable of supplying the
necessary information. Here, the information is not collected directly from the respondent but
from a third person who knows the respondent well. Such an approach is useful in case
where the respondent is expected to conceal information about him or her. For example, if an
enquiry about the habit of using condoms is distributed in a village, most of the villagers may
not provide the correct information. Thus, it would be wiser to get the required information
from other parties, like the nearby condom dealing shop.

b. Direct Observation

In this approach, an investigator stays at the place of survey and notes down the observation
himself. There is no enquires in the case of direct observation. For example, an investigator
making a study on nutritional status of children may directly (physically) measure the weight,
height, and other required parameters himself/herself. Direct observation is more experimental
and usually applied in scientific studies. It is time consuming and also costly.

Creating Opportunity Through Education 19


Unity University College

c. Questionnaire Method

Under this method, a list of questions related to the survey is prepared and sent to the various
respondents by post, Web sites, e-mail, etc. However, this method cannot be used if the
respondent is illiterate. It is a method that is often used in many statistical investigations.
The following are the major points that we need to take into account while preparing a
questionnaire.
 The number of questions should be small. Naturally, respondents are not comfortable with
lengthy questionnaires. Lengthy questionnaires usually bore respondents. Hence, fifteen to
twenty five questions in a questionnaire are optimal. If a lengthy questionnaire is
unavoidable, it should preferably be divided into two or more parts.

 The questions should be short, clear, simple and unambiguous. Moreover, the questions must
be arranged in a logical order so that natural and spontaneous reply to each is induced. For
instance, it is not appropriate to ask a person how many packets of cigarette he/she smokes
before asking whether he/she smokes or not.

 Questions of sensitive nature should be avoided. Sensitive questions are those questions that
are too personal and pecuniary like “ Sources of income”, “Drinking habit”, etc. The logic
here is that respondents do not willingly answer sensitive questions. Such information, if
necessary, may be gathered through interview or through other indirect questions.

 Questions should be capable of objective answers. As much as possible, avoid subjective


questions and keep to questions of fact. To this end, multiple answer questions can be used.

 Mail questionnaires should be accomplished by a covering letter, which should state the
purpose of the questionnaire, promise of confidentiality of responses, etc.
Furthermore, the questions preferably designed in such a way can easily be answered as yes/no.

A Sample Questionnaire
Suppose that it is required to identify the factors that affect the performance of students in their
first year (freshman) college life. The following questionnaire may be used to collect the
information that enables the achievement of the objective of the study.
i. General (Personal Background)
 Name____________________
 Age_________

 Sex M F

 Marital Status Single Married Other

Creating Opportunity Through Education 20


Unity University College

 Are you living with your families? Yes No

If yes, what is the size of your family? _______


 Educational level of parents
 Father______________
 Mother_____________
 Monthly income in your household____________
ii. Education related questions
 Are you satisfied with your fields of study? Yes No

 What is your E.S.L.C.E grade point average (GPA)? ____________________


 Do you have any relative or friend who has studied college education?
Yes No

 Are you satisfied with the college’s service in relation to: -


 Instructors quality yes no
 Reading Materials yes no
 Guidance service yes no
 How far is your residence from the college?
<5km
Between 5km and 10km
Greater than 10km
 How do you rate the total credit hours you take?
Minimum fair (optimal) Maximum (over loaded)
 Do you attend class regularly?
Yes No
 What was your first semester GPA? ________________
 How do you rate the content of the exams you take?
Fair Below Standard Difficult
 Do you have the habit of working (Studying) in-group?
Yes No
 Do you have the habit of reading in libraries?
Yes No
 Is the time allotted for examinations enough?
Yes No

2.4 SCOPE OF STATISTICAL INVESTIGATION

Depending on their coverage, statistical investigations are usually carried out either in the form
of census or sample survey. These two approaches constitute the scope of statistical
investigation.

A census is the one in which all the units connected with the problem are taken into account.
Complete enumeration is the basic characteristic of census. In census approach data is gathered
from each and every member in the population (universe).

Creating Opportunity Through Education 21


Unity University College

On the other hand, in sample survey only some selected representative units are studied. Sample
survey refers to the collection of information about a variable of interest from only some part (or
subset) of the population called sample. Sample elements are selected from a population through
different alternative sampling techniques. For example, suppose a quality controller wants to
know the average number of defective items in 10 batches of bottles produced, each batch
containing 1000 bottles. One possible approach is to check each and every of the 1000 bottles in
all batches. Accordingly, the controller is going to check 10  1000 = 10,000 bottles. This is the
census approach. Alternatively, the controller may take a sample of two batches of bottle out of
the available 10 batches. In this case, he/she is going to check 2  1000 = 2000 bottles only.

2.5 ADVANTAGES AND LIMITATIONS OF CENSUS AND SAMPLE


SURVEY

Advantage of census
 Information is available for each separate part of the universe,
 The results to be obtained are likely to be more representative, accurate and reliable,
 It serves as a basis for various surveys, because it is free from sampling error,
 Easier to check and reduce coverage error.

Limitation of census
 It requires very large effort, money and time,
 In case where the population is infinite, census approach can’t be applied.

Advantages of sample survey


 It reduces cost
 It saves labor. A smaller staff is required both for fieldwork and data processing,
 Sample survey requires a smaller scale of operations and hence it enhances data
collection and processing speed,
 It enables advanced tabulation of selected topics,
 Sometimes conducting a sample survey is the only option for study. For instance,
observation or experimentation could be destructive, or the required information may be
of a technical nature that requires highly trained personnel and specialized equipment,
 Sample survey may be used to test census procedures and updating census results.

Limitation of sample survey


 Sample survey requires trained personnel for data collection purpose. Otherwise, the data
to be colleted will be affected by human bias,
 Sample survey does not give reliable results without careful planning and design,
 It has the task of sample size determination and sample selection,
 Sample survey involves sampling error.
There is no hand and fact rule to choose between census and sample survey approaches. The
selection merely depends on factors like cost, time, trained manpower, accuracy, etc. The
following are some points that, to some extent, facilitate the choice between sample survey and
census.

Creating Opportunity Through Education 22


Unity University College

When to use census?


 When the population is small in size.
 Whenever complete enumeration is the only option
 When more accurate results are needed,
 When the population contains heterogeneous (different) elements,
 When resources (financial and others) are available enough,
 When results are not required soon. That is, when there is no time constraint.

When to use sample survey?


 When the population is very large or infinite,
 When there is resource limitation (constraint),
 When results are needed within a relatively short period of time,
 When the process is destructive (e.g. Testing life time of bulbs).

CYP 1
i) Discuss the difference between primary data and secondary data.
ii) What are parameters and Statistics?

2.6 SAMPLING TECHNIQUES

2.6.1 Definition of some important terms

In real life problems complete census, which is enumeration of all units and thereafter analysis
on the characteristics of all units, may be impractical. This occurs for several reasons. The
population could be too large to manage with the available fund, time, and trained personnel. In
addition, the members of the population may be dispersed in different corners, where
transportation, communication and other necessary facilities are not available. Furthermore, there
may also be areas that are inaccessible like areas where there is war, epidemic disease, etc. In
such cases, complete enumeration or census approach fails to be applicable. Thus, the only
option to be considered is sample survey. In sample survey approach, members that represent the
whole population are selected using an appropriate sampling technique. We first introduce the
definition of some of the important terms.

Sampling
Sampling is the process of taking sample and making inference to the whole population.

Elementary Units
Elementary units are elements or groups of elements in the population about which information
is required.

Sampling Units
Sampling unit is the unit in terms of which the enumerator collects the data. This unit may be a
geographical unit, a construction unit, or social groups or individuals.

Creating Opportunity Through Education 23


Unity University College

The following table presents some examples of sampling units and corresponding possible
elementary units.

Sampling Units Elementary Units


Farmer Associations Farmers
Households Household member
Colleges Students in a selected college
Students in a class Students

Note that there are a number of cases where the sampling unit and the elementary unit turn out to
be the same. The last example is a case in point.
Generally speaking, elementary units constitute sampling units. That is, a sampling unit is a
collection of one or more elementary units. Whenever that collection contains only one
elementary unit, the sampling unit and the elementary unit become identical. Borrowing terms
from set theory, we can conclude that elementary units are subsets, but not necessarily proper
subsets, of sampling units.

Sample Size
Sample size is the number of elements or observation in a sample.

Sampling Frame
Sampling frame is the listing of all sampling units in the population from which sample selection
is to be made at any stage of sampling.
The following are characteristics of a good sampling frame.
 It should be exhaustive, covering the whole population,
 It should be non-repetitive,
 The units should be mutually exclusive.
 There should be clear and unambiguous demarcation between sampling units,
 It should be updated,
 The units in the list must be traceable in field.

Sampling error
Sampling error is the difference between the results obtained from a sample study and the results
that would have been obtained from an equal complete coverage (census), i.e., from an
investigation of the entire population conducted in exactly the same manner as in the sample
study.
Sampling errors are those errors that occur only because we take sample instead of taking the
whole population. The magnitude of these errors increases if a sample is not a good
representative of the population. Unfortunately in sample surveys sampling errors are
unavoidable. As a result the whole effort in sampling is to minimize sampling errors. Taking
larger and larger sample size considerably minimizes sampling errors. Bias of the enumerator,
bias in sample selection, bias in data collection, bias in analysis and interpretation, and
heterogeneity of population are the major causes of sampling errors.

Creating Opportunity Through Education 24


Unity University College

Non Sampling errors Non-sampling errors are those errors that can arise even in census
(complete enumeration). Often non-sampling errors arise due to, among others, the following
factors:
 Questions that are not worded properly and clearly,
 Biases or mistakes on the part of the interviewers,
 Inaccuracy of information furnished by respondents,
 Case of non-response. Meaning, respondents may deviate to give information, etc.

Law of large numbers


Also called law of inertia, the law of large numbers states that whenever the sample size gets
larger and larger there is a tendency of the distribution of samples to resemble the distribution of
the population from where the samples are taken. In other words, other things being equal, the
larger the size of the sample the higher is the accuracy of results to be obtained.

2.6.2 Types of Sampling Techniques

In the theory of sampling, there are two distinct types of sampling techniques. These are:
 Probability Sampling and
 Non-probability Sampling

a) Probability (Random) sampling

Probability or random sampling is a kind of sampling technique where each elementary unit in
the population has a known (pre-calculated) probability or chance of being included in the
sample. In probability, sampling items are chosen strictly at random. The selection process is
such that chance only determines which items shall be included in the sample. Probability
sampling is further classified into two types namely, restricted and unrestricted random
sampling.

i) Unrestricted (Simple) Random Sampling


If the sampling plans calls for elementary units to be selected from a population viewed as a
single pool, then it is called unrestricted or simple random sampling. In other words, in simple
random sampling the population is not subdivided into groups. Rather, it is considered as a single
set without making categorization. In simple random sampling, each and every elementary unit
of the population has an equal opportunity of being selected as a sample element. Note that the
word “equal opportunity” is very important in the definition of simple random sampling.

How to select samples?


Coming to the “How to?” part, there is one obvious question that “How may we go about picking
our samples from the population keeping the truck of randomness?” Often, two approaches are
suggested as a solution for the above problem. These are, the lottery method and the use of
random numbers table. We now discuss each one of them separately as they are really the
common tasks while coming to practice.

Creating Opportunity Through Education 25


Unity University College

Lottery Method
Pretty sure that each one of us uses the lottery method in our day-to-day activities. In lottery
method, each member of the population is represented by identifiable disk. These disks are then
placed in an urn or bow and well mixed. We also use small pieces of papers instead of disks and
rap them in such a way that one cannot be distinguished from the other. Thus, a sample of the
required size is selected. Utmost care should be attached to the process of the lottery method so
that it will generate random sample elements.

Table of random numbers


A sample is said to be a true or satisfactory representative of the population, from which the
sample is taken, if it is selected quite randomly. To insure randomness, the best of all is the use
of random numbers.
A random number table is simply a list of randomly generated 5-digit numbers arranged in rows
and columns. A perfect random number table would be one in which every digit has been entered
randomly. That is, no matter where you start with in the table and no matter in which direction
you may proceed reading numbers, the probability of encountering any one of the ten digits (0-9)
would be the same. Most authors agree that it is difficult, if not impossible, to come up with
perfect table of random numbers.
Following are the steps in selecting simple random samples using the table of random numbers.
 Prepare list of all units in the population of size N, each sampling unit is numbered
from 0 to N-1. In other words, prepare the sampling frame.
 Let N-1 be a d-digit number, then from the random numbers table choose any row or
column randomly (arbitrarily) and suggest to read d-numbers (digits) row wise or
column wise.
 Let the selected d-digit number be w. If 0 ≤ w ≤ N - 1, then the unit with roll number w
will be selected. On the other hand, if w ≥ N, then reject (ignore) w and read the next d-
digit number. Repeat this step till you obtain or select n elements.

Remark:
 At the time we exhaust all the numbers in a selected row or column, we continue with the
immediate next row or column.
 If the value of d is greater than five, then we merge successive rows or columns so that
we obtain a 10 digit, 15 digit, and so on numbers in each column or row.

Illustrative example
Consider a case where a population consists of 180 units and simple random sample of size 5 is
to be taken from this population using table of random numbers.
From the given random number table (see the table at the end of this text book) the 4 th column
and 21st row are selected, the number at that point is 30193.
In our case,
N = 180 n = 5 N-1 = 179
Since N-1=179 is a three digit number we set d=3.

Thus, start from the number 30193 and read the last 3 digits row or column wise, let us read
column wise. See it in the next page.

Creating Opportunity Through Education 26


Unity University College

3 digit random number w w< 179 Total number of


? sample selected
30193 193 No 0
37430 430 No 0
88312 312 No 0
98995 995 No 0
51734 734 No 0
   
07929 929 No 0
09030 30 Yes 1
56670 670 No 1
48140 140 Yes 2
   
50064 64 Yes 3
93126 126 Yes 4
   
53523 523 No 4
23156 156 Yes 5

Therefore, the items or observations located at the 30 th, 140 th, 64th, 126 th and 156 th positions or
roll numbers will be taken as sample elements.

ii) Restricted Random Sampling


There are three types of restricted random sampling method. These are

Stratified Sampling
Stratified Sampling is a procedure that involves the division or stratification of a population by
partitioning the sampling frame into a number of homogenous groups or strata on the basis of
certain characteristic(s) of the sampling units. A number of variables including geographical,
demographic, social, economic, ethnic and political may be used for the stratification purpose.
The selection of the appropriate variable(s) merely rests upon the nature of the study. Sampling
can be performed separately within each stratum.
For example, in an opinion surveys, the population may be divided into homogeneous groups
according to their qualification, age, sex, size of family, etc. Specifically if the population is
divided or stratified according to the variable sex, then we are going to have two separated strata,
namely males and females.
Each individual group formed after stratification is called a stratum (singular). Collection of
Stratum forms Strata (Plural).

Creating Opportunity Through Education 27


Unity University College

The following are the major steps in Stratified random sampling.


 The population is partitioned into a number of parts called strata based on a
variable of interest. Units within a stratum should be similar or homogenous with respect to
the variable of stratification.
 The number of elements to be taken from each stratum is determined. This is what
is often called allocation problem. The basic question here is that what optimum sample
size may be taken from each stratum making the variance as small as possible. Two types
of allocations are usually considered.
- Proportional allocation and
- Optimal allocation.
Here, we will see only the first approach. The second one is somewhat beyond the scope of
this material.

Proportional allocation method: The idea of proportional allocation method is to fix the
sample size of a given stratum according to its proportion. Suppose, we have k strata obtained
from a population of size N each containing Ni elements. In addition, let n be the total sample
size required. Suppose also that ni is the sample size to be taken from the ith stratum. Thus,
according to the proportional allocation method, the population proportion should be equal to the
sample proportion. That is,
Ni n Ni n
 i  ni 
N n N
Where N = N1 + N2 + … + NK
and n = n 1 + n2 + … + n k
k k
Nin n k n
Observe that  ni     Ni  N  n
i 1 i 1 N N i 1 N
In general, according to the proportional allocation method the sample size to be taken from the

Nin
ith stratum is given by ni 
N

Example (Hypothetical)
Dinkinesh Ethiopia Tour has 1000 employees placed in 4 departments, finance & accounting,
personnel, operations, and Marketing. A student wanted to make some kind of research in this
company and decided to take a stratified sample of 100 employees. Moreover, it is known that
there are 50 employees in personnel, 500 in operations, 300 in Accounting and finance, and 150
in marketing. What number of employees should be taken from each of these 4 departments,
using the proportional allocation method?

Solution:

Given N = 1000 n = 100 k=4


Let 1: Accounting and Finance 3: Marketing
2: Personnel 4: Operations

Thus, N1 = 300 N2 = 50 N3 = 150 N4 = 500

Creating Opportunity Through Education 28


Unity University College

N1 n 300  100 N2 n 50  100


n1    30 n2    5
N 1000 N 1000

N3 n 150  100 N4 n 500  100


n3    15 n4    50
N 1000 N 1000

or n4 = n – n1 – n2 – n3 = 100 – 30 – 5 – 15 = 50
Note: Rounding of figures to an integer value is a customary action in determining sample sizes.

Advantages of stratified sampling


 The stratification makes it possible to employ different sampling designs in different
strata thereby enabling effective utilization of the available auxiliary information.
 It may produce a gain in precision in the estimates of population parameters.
 The sample taken through stratified sample technique is more representative of the
population since representative sample elements are taken from all groups each
containing homogenous members.
 It reduces cost and time of interviewing as compared to simple random sampling.

Limitations of Stratified Sampling


 It is very difficult to furnish internally homogenous and externally heterogeneous groups.
Whenever the homogeneity of a given stratum decreased then the results to be obtained
will be biased.
 Cost per unit is higher than simple random sampling.
 It requires skilled manpower.

Cluster Sampling
In simple random sampling and stratified random sampling, we have been considering the
smallest well-identifiable unit of the population called elementary units. It is to mean that the
observations have been taken on these elementary units. For several reasons, however, such an
approach may not be sometimes applicable. Some of these reasons may be:
 The sampling frame may not be available or may be prohibitively expensive to construct the
frame in relation to resources like money, time and labor.
 Elementary units may be situated far apart from one another, and if selected the process will
consume a lot of time and money to survey them.
 The elementary units may not be well identifiable and easily locatable. Specifically,
migratory elementary units, like birds, are not easily identifiable and locatable.
Thus, to cope with the above and other related problems in sampling elementary units, a
sampling plan known as cluster sampling or area sampling is used.
Cluster sampling is a sampling technique that is preferred when the population is subdivided into
groups or clusters. In most cases, the clusters are formed location wise.

Examples of clusters

Cluster Elementary units

Creating Opportunity Through Education 29


Unity University College

*Kebeles *Households within a given kebele


*Farmer associations *Households within the given farmer association
*Apartment Blocks *Households within a given apartment
*Hospitals *Patients
*School (colleges) *Students

The following points should be considered while forming clusters.


 Each elementary unit should be located in one and only one cluster. That is, there should be
no overlapping.
 The clusters should be collectively exhaustive. Meaning, there should be no commission.

Single stage and multi-stage cluster sampling are the two types or plans of cluster sampling
technique. Nevertheless, the scope of this material is limited only to single stage sampling plan.

In single stage, sampling plan clusters are chosen using simple random sampling and within each
sample cluster, all the elementary units are treated or taken as sample units.
To illustrate the idea, suppose we want to take a sample of 1000 college students in Addis
Ababa. Our clusters can be the different colleges in Addis Ababa. According to the cluster
sampling procedure, we take a simple random sample of one college and consider all the students
in that college. To that end, let college A selected from 20 available colleges using the table of
random numbers. Assume also that there are 5000 students in each one of these 20 (hypothetical)
colleges. Then after, take a simple random sample of sample of 1000 students out of 5000
students in college A.
Note that in case where the size of a cluster size is less than the required sample size we will be
forced to take two or more simple random sample clusters instead of one.
The basic premise of cluster sampling is that clusters are internally heterogeneous and externally
homogenous.

Advantages of cluster sampling


 It does not require sampling frame of the whole elementary units.
 It is the most economical technique. Particularly, sampling frame preparation and traveling
costs are greatly reduced.

Limitation of cluster sampling


 It results in less accurate estimates of the population parameters.

CYP 2
i) What is the difference between stratified and cluster sampling techniques?
ii) What are the two methods of taking simple random samples? Which one is more reliable?

iii) Systematic Sampling

Systematic Sampling is a method of selecting units at a fixed interval from a list, starting from a
randomly selected point. It follows a step-by-step procedure.

Creating Opportunity Through Education 30


Unity University College

Suppose a sample of size n is to be selected from a population of size N using the systematic
sampling technique. Thus, the following steps are followed in the sample selection
process.

N
Step 1. Calculate your interval size as K  ; K must be an integer
n
Step 2. Select any random number between 1 and K inclusive. Let that random number be S.
Step 3. Select every kth index, from the sampling frame, starting from S. Meaning
Index of the 1st sample is S
Index of the 2nd sample is S + k
Index of the 3rd sample is S + 2k, etc
Generally the index of the ith sample element is
S + (i – 1)k
Particularly, for the nth sample the index is S + (n – 1)k

Examples

1. It is required to take a sample of 10 households from woreda 8 using systematic sampling


techniques. Generate the samples assuming there are 1055 households in woreda 8.

Solution: Given n = 10 and N = 1055


1055
Step1. K   105.5  106
10
Step2. Select random number between 1 and 106. The selected random number is 60. Therefore,
we have S = 60.
Step3. Identifying indices
Index of the first sample element is at i = 1
S + (i – 1) k = 60 + 0 = 60
Index of the second element at i=2
60 + (2 – 1)106 = 166
   
th
Index of the 10 element is at i=10
60 + (10 – 1) 106 = 1014

Therefore, from our sampling frame of households, we will take the (60) th , (166)th (272) th , (378)
th
, (484) th, (590) th, (696) th, (802) th, (908) th and (1014) th elements.

It is extremely important to note that these numbers are not our observations. They are simply
roll numbers (indices) at which our sample observations are located. In the first case for instance,
the household that is located at the (60) th place or position will be included in our sample.

2. In a systematic sample, it was found that the 2 nd and 7th samples correspond to the indices 8
and 33 respectively. Find
a. the value of K (interval) and the index for the first sample.
b. the index for the 10th sample.
c. what would be the sample size if the population size is 100.

Creating Opportunity Through Education 31


Unity University College

Solution:
a. Given the 2nd sample element to be 8 and the 7 th sample element to be 33, we have the
following two equations from the formula for the indices i = 2 and i = 7
S + (2 – 1) k = 8
and S + (7 – 1) k = 33

Solving the equations simultaneously, we get


S+k=8
- (S + 6k = 33)
- 5k = -25

 k = 25 / 5 = 5

and S + k = 8  S = 8 – k = 8 – 5 = 3
Therefore, we have the values k = 5 and S = 3 where S=3 is the index or serial number for the
first sample element.

b. i = 10
10th = S + (10 – 1)k = 3 + (9) 5 = 48

c. Given N = 100, we need to find the value of n.


N N 100
 K   n    20
n k 5

Advantages of Systematic sampling


 It is very simple to apply
 The sample is evenly distributed over the whole population and hence all contiguous parts of
the population are represented in the sample.
 It requires less resource like time & money.

Limitation(s) of systematic sampling


 If the variation in units is periodic, i.e. if the units at regular intervals are correlated, then the
sample will not be a satisfactory representative of the population.

b) Non-Probability Sampling Techniques

Unlike probability (random) sampling techniques, discussed in part “a” of this section, in non-
probability sampling techniques there is no predetermined probability or chance for a given
elementary unit from the population to be included in the sample. Non-probability sampling
techniques do not use randomization. In non-probability, sampling sample elements are selected
based on such factors as judgment, convenience, preference, intuition, quota, etc.
There are three common types of non- random (non-probability) sampling techniques. These are:
-Judgment Sampling
-Convenience Sampling
-Quota Sampling

Creating Opportunity Through Education 32


Unity University College

Judgment Sampling
In Judgment Sampling samples are selected merely on the basis of the judgment of the
investigator who is believed to be skilled and experienced in such a practice. In simple words, it
is only the investigator’s preference that is the determinant factor for the inclusion or elimination
of an element to or from the sample. Clearly, the investigator is the first person to be attributed
for the good or bad nature of the resulting sample as a representative of the population.

Advantages
-It saves time and reduces cost,
-Sometimes it is used in solving economic and business problems.

Limitations
-It is highly subjected to human bias,
-There is no objective (scientific) way of evaluating sample results.

Overall, the success of judgment sampling entirely depends on the excellence of the investigator
in terms of knowing the population well.

ii) Convenience Sampling

According to the convenience sampling approach sample elements are obtained by selecting
population units that are convenient for the investigator. It is a relatively easy way to select, but
the sample will hardly be representative of the population. In more simplified manner, what we
do in convenience sampling is just to take any element from the population as a sample, which is
convenient (preferable) to us in terms of cost, time, accessibility, suitability to make interview,
etc. Nevertheless, it should be considered here that poor representative sample definitely results
in less accurate outcomes. In other words, if convenient sampling is considered without assuring
its appropriateness, the whole process will be “garbage in garbage out”.

Advantages
 It is simple and cost effective,
 It requires relatively shorter time,
 It is often used for pilot surveys

Limitations
 Hardly representative of the population,
 It is extremely exposed to human bias.

iii) Quota Sampling

Quota sampling is a type of judgment sampling where quotas are set up according to given
criteria and the selection of sample units within the prescribed quota is made according to the
personal judgment of the investigator.

Creating Opportunity Through Education 33


Unity University College

Of course, quota sampling can be viewed as a combination of the concept of stratified and
judgments sampling techniques. The stratification concept arises when we divide the population
into parts and assign individual quota for each one of them. In other words, the quota assigned to
a given group may be viewed as the allocated sample size to be taken in a given stratum in the
case of stratified sampling. On the other hand, the judgment-sampling concept comes while we
take sample from each group. In quota sampling samples are taken from each group using the
judgment of the investigator. Recall, however, that in the case of stratified sampling samples
from each stratum (group) is taken at random, not by judgment.

Advantages
 Mostly applicable in social studies,
 It occasionally provides satisfactory results if the remaining processes, like interviewing, are
carried out with utmost care.

Limitations
 Slight negligence on the part of the interviewer may lead to a great disaster in the final
results.

2.7 SUMMARY

Data may be collected or obtained from two major sources, primary and secondary sources. Data
obtained from a primary (original) source is called primary data. On the other hand, data taken
from a secondary sources, like magazines, newspapers and other publications, are called
secondary data. Primary data is collected through different methods including interview,
questionnaire, and direct or physical observation.

Census and sample survey are the two distinct scopes of statistical investigation. In census
approach, data is collected from each and every member of the population. In sample survey,
however, data is recorded or taken only from some portion of the population called the sample.
The quality of the results to be obtained from a sample survey mainly relies on the
appropriateness of the selected sample as a representative of the population.

In order to assure representativeness of a sample, one should carefully select the appropriate
(most suited) sampling technique under the prevailing conditions and within the given
circumstance. There are random (probability) sampling and non-random (non probability)
sampling techniques. Further, within random sampling we have four sub-categories namely,
Simple Random Sampling (SRS), Stratified Sampling, Cluster Sampling and Systematic
Sampling. In the non-random sampling category also we have three sub-categories namely,
Judgment, Convenience, and Quota Sampling.

2.8 ANSWERS TO CHECK YOUR PROGRESS QUESTIONS

Creating Opportunity Through Education 34


Unity University College

CYP 1
i) Primary data is data obtained or collected from the original sources using interview or
questionnaire or direct observation (physical measurement) methods. When data is collected or
obtained for the first time it is called primary data.
On the other hand, Secondary data is data obtained from secondary sources like web sites,
magazines, newspapers, annual reports, etc. Whenever we make use of data collected by other
agency for some other purpose, then we are said to be secondary users as we are not the owner of
the data.

ii) Parameters are results that refer to the population as a whole. That is, parameters are values
that determine the characteristics of a population. They include population average (), standard
deviation (), correlation coefficient (), etc. A population is characterized by its parameters.
Statistics are also results that, however, refer to not the whole of a population but only to some
part called the sample. Any result calculated or obtained from samples is called a Statistics. We
can calculate or generate a number of results like averages ( x ), standard deviation (s), etc from
a single sample. These results are collectively called Statistics. In short, parameters are
population values while statistics are sample values.

CYP 2
i) Both Stratified and Cluster Sampling techniques are parts of random sampling technique.
Nevertheless, they differ at least in the following points.

Stratified Cluster
*Stratification is based on some variable of *Categorization (partitioning) of the population
interest. is mainly based on area (location).
*Samples are taken from each part or stratum. *Samples are taken from few clusters only.
*Only few elements from a single stratum are *All the elements within a selected cluster are
included in the sample. included within the sample
*Strata are internally homogenous and *Clusters are internally heterogeneous and
externally heterogeneous. externally homogenous.
*It requires full sampling frame *It does not require full sampling frame
*Costly *Less costly
*More accurate results could be obtained *Results are less accurate as compared with
that of stratified sampling.

ii) The two most commonly used methods of taking simple


random samples are the lottery method and the method of random numbers. In most cases,
the use of random numbers for taking simple random samples is believed to be more reliable.
On the other hand, the use of lottery method is relatively simple.

2.9 MODEL EXAMINATION QUESTIONS

Creating Opportunity Through Education 35


Unity University College

1) In a systematic random sampling, the 10th and 15th sample elements correspond to the indices
(serial numbers) 68 and 103 respectively. Find the index for the 5th systematic sample.

2) Discuss the difference between random and non-random sampling techniques.

3) Classify each of the following samples as random, systematic stratified or cluster


a. Every fifth teenager entering an amusement park is asked to select his or her favorite ride.
b. All police officers of a small town are interviewed to determine whether they feel the
crime rate has changed over the past year.

4) Unity University College has registered 12,000 students for the last four years. The college
administration would like to know the number of students who have participated in co-
curricular activities. For the purpose of the study, the administrator collected the names of
400 students from the files by taking proportional number of students from each of the years
(batches) for interview.
Based on the above information, find
a. The variable of interest
b. The source of data (primary or secondary)
c. The population
d. The sample
e. The sampling technique used

2.10 GLOSSARY

Data Collection: Gathering of information on which a study is going to be made.


Elementary units: Elements or groups of elements of the population on which information is
required.
Sampling Units: Sampling unit is the unit in terms of which the enumerator collects the data.
Sampling: The process of taking sample and making inference to the population.
Primary source: The original source of data.
Secondary source: It is one that makes data available collected by some other agency.
Sampling Frame: The listing of all units in the population under study.
Sampling Error: The difference between the results obtained from a sample study and the
results that would have been obtained from an equal complete coverage.
Non-Sampling Error: Errors that can arise even in census or complete enumeration. They
mainly arise at the stage of acquiring, recording and processing of data.
Parameters: Values obtained from a population and used to describe or summarize population
characteristics.
Statistics: Values obtained from samples and used to describe sample characteristic (behavior).

2.11 REFERENCES

 Agarwal B.L (1991); “Basic Statistics,” Second Edition; Wiley Easter Limited; India.
 Bluman G. Allan (1992) “Elementary Statistics: A step-by-step Approach,” Second
Edition; Wn.C. Brown Communications, Inc.; USA.

Creating Opportunity Through Education 36


Unity University College

 Cochran W.G (1977); “Sampling Techniques,” Third Edition, John Wiley and Sons Inc,
India.
 Gupta C.B (1995); “ An introduction to Statistical Methods,” Nineteenth edition; Vikas
Publishing House PVT.LTD; New Delhi
 Gupta S.P (1991); “Statistical Methods,” Twenty sixth edition; Sultan Chand and Sons
Publishers; New Delhi.

Creating Opportunity Through Education 37

You might also like