You are on page 1of 43

Chater 6: Data collection techniques

6.1. Types and sources of data

6.2. Data collection methods

6.3. Sampling Technique

6.4. sample size and its determination

6.5. Sampling Error and Non Sampling Error

6.6. Central Limit Theorem and standard error


6.1. Types and sources of data

 Any research is based on various types of information.

 The more information the researcher has about the phenomenon the better

will be his investigation.

 Just as a building needs bricks and mortar for its construction, similarly, the

research requires relevant information.

 In order to carry on any research activity, information should be gathered

from proper sources.

 The more valid is the sources of information, the more reliable will be the

information received, which in turn, will lead to correct and reliable conclusion.
 According to W. A. Bagley, the sources of information in the field of social science may

be classified into:

i. Primary source: Include the actual information received from individuals directly

concerning the problem of the study.

 The information obtained from primary sources is often called as primary data.

 Are those information gathered by researcher himself and are gathered for the first time,

thus, happening to be original.

ii. Secondary sources of information: These sources of information are called “secondary

data”. Secondary data are those, which have already been.

 Are those which have been collected by someone else and which have already been passed through

the statistical process are known as Secondary data.


6.2. Data Collection methods

 The methods of collecting primary and secondary data differ since primary

data are to be originally collected, while in case of secondary data the nature of

data collection work is merely that of compilation.

6.2.1. Collection of primary data

 We collect primary data during the course of doing experiment in an

experimental research.

 But in case of non-experimental research a researcher conducts survey to

obtain primary data.


 There are several method of collecting primary data particularly in survey

and descriptive research.

 Commonly used methods of collecting primary data are discussed below.

i. Observation method

ii. Interview

iii. Questionnaires

iv.Focus Group Discussion (FDG)


1. Observation method of data collection

 Observation is the primary source of information especially in studies related to

behavioral science.

 Observation becomes scientific tools and methods of data collection:

o When it serves for a formulated research purpose

o When it is systematically planned and recorded

o And it is subject to checks and controls on validity and reliability.

 Under this method the researcher should personally and directly observe the condition

and incidence of his fields of study.

 The researcher would not ask anything from respondents.

• Direct observation is the most reliable method for gathering information related to the

life style, status, conduct, language, the like phenomenon.


Types of Observation

A. Participant Vs. Non-participant observation

Participant observation: The researcher lives in the group or in the community as a member
of it and participates in their life.
Non-participant observation: The researcher does not participate in the group life but
observe as an external spectator.
 Under such approach the presence of the researcher is unknown to the people.
B. Control Vs. uncontrolled observation
Uncontrolled observation: Is the observation, which takes place in natural setting. Here no
attempt is made to use precision instruments.
Controlled observation: Observation takes place according to definite pre-arranged plans,
including experimental procedure and generally done in laboratory under controlled
condition.
Structured Vs. Unstructured

Structured Observation: It is structured and preplanned observation technique.

 This observation is characterized by a careful definition of units, the style of recording


the observation information; standardize conditions of observation and the selection of
pertinent observation.

 It includes The behavior to be observed, the unit of observation, subject of observation


(women, adult, etc.), conditions of observation (time of observation, place of
observation, approaches of observation, etc., style of recording the observed information
and the like are predetermined.

• In general, such observation has standardize format and is pre-planned

Unstructured observation: The observation takes place without the characteristics


mentioned above, i.e., with out standardized format and plan. There is no specification of
subject of observation, behavior to be observed and no standard format for recording the
observed information.

• Such observation is very much flexible and commonly used in exploratory research
2. Interview Method.

 This method of collecting data involves presentation or oral-verbal stimuli

and reply in terms of oral-verbal responses.

 The interview method of collecting data involves presentation of oral-verbal

stimuli and reply in terms of oral-verbal responses.

 Where interviewer asks questions( which are aimed to get information

required for study ) to respondent.

 This method can be used through personal interviews and, if possible,

through telephone interviews.


A. Personal Interview

 Such method requires interviewer-asking question in face-to-face contact

to respondent.

 This approach is suitable for intensive investigation.

 The personal interview can be of two type: structured and unstructured

 Structured interview: Structural interview involves the use of a set of

predetermined questions and has highly standardized technique of recording.

 It is not possible for interviewer to change even the sequences of the

questions. The recording formats also are standardized.


Unstructured interview: Such interview is characterized by a flexibility of questions

to questioning.

 It does not follow a system of pre-determined question and standardize

techniques of recording information.

 The researcher is allowed much greater freedom to, if it is needed, supplementary

questions or at times he may omit certain questions.

B. Telephone Interview

 This method collecting information involves contacting respondents on telephone

itself
Prerequisites of interview

 interviewer should be carefully selected, trained and briefed.

 Interviewer should be honest, sincere, hardworking, impartial and must posses the

technical competence and necessary practical experience.

 Occasional field checks should be also made in advance so that appropriate action must

be taken if some of the selected respondent refuse to cooperate or are not available when

an interviewers calls up on.

 Interviewer has to try to crate friendly atmosphere of trust and confidence so that the

respondent may feel at ease while talking and discussing with the interviewer.

 Interviewer must ask question properly and completely.

 The interviewer should not show surprise or disapproval of a respondent‟s answer.


Basic principles of Interviewing

• Interviewers should follow the following principles while conducting

interview

• Ask only one question at a time

• Repeat the question if necessary

• Listen carefully to the subjects answer

• Observe the subjects facial-expression, gesture and tone of the voice

• Allow the subject sufficient time to answer the question


3. Collection of data through Questionnaires

 Questionnaire is a list of structured questions, which will be present, mailed

or e-mailed to selected respondents to obtain reliable response from them.

 The objective is to find out what a selected group of respondents do, think or

fell.

 This method of data collection is used when the subject of study is very wide

and direct observation is not possible.

 It is also used for such things, which cannot be known through direct

observation (ideas, preference, motive, and so on).


Types of Questionnaires
 Questionnaires can be of the following type:
o Interview Questionnaires (Schedules)-Enumerators are specially appointed for filling questionnaire
and Schedules Like Questionnaires but it filled by enumerator.
o Mail Questionnaires (Self administered questionnaires)
o Questionnaires through Internet (Through electronics media)
A. Forms of questions
 Questions in a questionnaire can have either open or closed end form.
Open-end questions: The respondent is asked to provide his own answer to the question.
 His answer is not in any ways limited. E.g., the respondent might be asked, “What do
feel the most important issue facing your community”:?
 The problem associated with such form of questioning is that, it is not possible to get
uniform answers and hence is difficult to process.
Closed ended questions: The respondent is asked to select his answer from among a list provided by
the researcher (yes, no, I don‟t know, etc.).
 Closed ended questions are very popular in survey research since they provide a great uniformity
response and because they are easy to process.
4. Focus Groups

• Type of qualitative research where small homogenous groups of people are


brought together to informally discuss specific topics under the guidance of
a moderator

• Purpose: to identify issues and themes, not just interesting information, and
not “counts”

 Focus Groups Are Inappropriate when

• language barriers are insurmountable

• evaluator has little control over the situation

• trust cannot be established

• free expression cannot be ensured

• confidentiality cannot be assured


IPDET © 2009 16
Focus Group Process

Phase Action

1 Opening Ice-breaker; explain purpose; ground rules; introductions

2 Warm-up Relate experience; stimulate group interaction; start with least threatening
and simplest questions

3 Main Move to more threatening or sensitive and complex questions; elicit deep
body responses; connect emergent data to complex, broad participation

4 Closure End with closure-type questions; summarize and refine; present theories,
etc; invite final comments or insights; thank participants

17
IPDET © 2009
COLLECTION OF SECONDARY DATA

 Secondary data means data that are already available.

 Usually published data are available in:

a. various publications of the central, state are local government

b. various publications of foreign governments or of international bodies and their


subsidiary organizations

c. technical and trade journals

d. books, magazines and newspapers

e. reports and publications of various associations connected with business and industry,
banks, stock exchanges, etc.

f. reports prepared by research scholars, universities, economists, etc. in different fields;

g. public records and statistics, historical documents, and other sources of published
information.
6.3. Sampling Design

6.3.1. Different concepts in sampling design

 CENSUS AND SAMPLE SURVEY-All items in any field of inquiry constitute a

‘Universe’ or ‘Population.’ A complete enumeration of all items in the

‘population’ is known as a census inquiry.

 When field studies are undertaken in practical life, considerations of time

and cost almost invariably lead to a selection of respondents i.e., selection of

only a few items but representative of the total technically called a ‘sample’

and the selection process is called ‘sampling technique.’


STEPS IN SAMPLE DESIGN

i. Type of universe: The first step in developing any sample design is to clearly

define the set of objects, technically called the Universe, to be studied.

ii. Sampling unit: can be a geographical unit such as state, district, village, etc., or a

construction unit such as house, flat, etc., or a social unit such as family, club,

school, etc.

iii. Source list: It is also known as ‘sampling frame’ from which sample is to be

drawn. It contains the names of all items of a universe.

iv. Size of sample: This refers to the number of items to be selected from the

universe to constitute a sample.


V. Parameters of interest: the specific population parameters which are of interest.

 For instance, we may be interested in estimating the proportion of persons with


some characteristic in the population

Vii. Budgetary constraint: Cost considerations have a major impact upon decisions
relating to not only the size of the sample but also to the type of sample.

vii. Sampling procedure: Finally, finally researcher must decide about the
technique to be used in selecting the items for the sample.

viii. Sampling errors: are the random variations in the sample estimates around the
true population parameters.

 such errors need to be minimum or equal to zero.

 Sampling error decreases with the increase in the size of the sample, and it happens to be of a

smaller magnitude in case of homogeneous population.


6.3.2. Types of sample designs/Sampling Techniques

 Sample designs are basically of two types viz., non-probability sampling and

probability sampling.

Representation
Element selection Probability sampling Non-Probability Sampling
Unrestricted Sampling Simple random sampling Convenience sampling
Restricted Sampling Complex random (such as cluster Purposive sampling (such as
sampling, systematic sampling, quota sampling, judgment
stratified sampling, etc.) sampling,

1. Non-probability sampling: deliberate sampling, purposive sampling and

judgment sampling, non-random; quota sampling.

 In such a design, personal element has a great chance of entering into the selection
of the sample. However, in such a sampling, there is no assurance that every element
has some specifiable chance of being included.
i. Judgmental sampling-in this type sampling units or elements in the population are

purposely selected

ii. Convenience sampling- based on the convenience of the statisticians who is select

a sample, it is also known as accidental sampling, as the respondents in the sample

are included in it merely on account of their being available on the spot where the

survey is in progress.

iii. Quota sampling-involves fixation of certain quotas.

• Under quota sampling the interviewers are simply given quotas to be filled from

the different strata, with some restrictions on how they are to be filled.
2. Probability sampling: 'random sampling’ or ‘chance sampling’. Under this

sampling design, every item of the universe has an equal chance of inclusion in

the sample.

 How to select a random sample?

 There are number of probability sampling some of them are discussed bellow

i. Simple Random Sampling

ii. Systematic Sampling

iii. Stratified Sampling

iv. Cluster Sampling

v. Multi-stage Sampling
i. Simple Random Sampling

 Each element in the population has an equal chance of being included in the

sample. It is drawn by a random procedure from a sample frame.

 Each element in the sample frame is assigned a number and each number is

written on separate pieces of paper, properly mixed and one is selected.

 If say the sample size is 45, then the selection procedure is repeated 45 times.

 When the population is consists of a large number of elements table of random

digits or computer generated random numbers are utilized.


iii. Systematic sampling: First a interval k is calculated .

 Suppose we have to select a sample 0f 50 out of 500 units, then we calculate

the sample interval k(N/n)), where N is total number of units in the population

and n is the size of sample. K is 500/50=10.

 Second a number between 1-10 is chosen at random. Suppose if 9 is selected

then the sample will comprise 9, 19, 29, …., 499.

iv. Stratified sampling –Under this technique population is divided into several

sub-populations that are individually more homogeneous than the total population

and then simple random technique is used to select sample within each strata.

 If a population does not constitute a homogeneous group, stratified sampling

technique is applied to obtain a representative Sample.


 The following three questions are highly relevant in the context of stratified sampling:

o How to form strata? on the basis of common characteristic(s) of the items in such a

way as to ensure elements being most homogeneous within each stratum and most

heterogeneous between the different strata.

o How should items be selected from each stratum?-simple random or systematic

sampling

o How many items be selected from each stratum or how to allocate the sample size of

each stratum? Proportionate or disproportionate.

𝑛
• Example: Proportionate stratified sample size determination for 4 strata:– 𝑁1 =
1

𝑛2 𝑛3 𝑛 𝑛
= = 𝑁4 = 𝑁
𝑁2 𝑁3 4

• In case of disproportionate sample is that the sample size should be more in stratum
v. Cluster sampling

 In cluster sampling instead of selecting individual units from the population, entire

groups or clusters are selected at random.

 If the total area of interest happens to be a big one, a convenient way in which a

sample can be taken is to divide the area into a number of smaller non-overlapping

areas and then to randomly select a number of these smaller areas (usually called

clusters).

vii. Multistage sampling –involves the selection of units in more than one stage.

 Example suppose a sample of 5000 urban households from all over the country is

to be selected . In first stage we select districts, second we may select cities form

each district, sub cities/kebels from each cities and then selection of households.
6.4. sample size and its determination

• In sampling analysis the most ticklish question what should be the size of the

sample or how large or small should be „n’? If the sample size (‘n’) is too

small, it may not serve to achieve the objectives and if it is too large, we may

incur huge cost and waste resources.

• A researcher is worried about sample size because of the fact that sample size

(number of elements in sample) and precision of the study are directly related.

• The larger the sample size the higher is the accuracy.

• The sample size determination is purely statistical activity, which needs

statistical knowledge.
• There are a number of sample size determination methods.

i. Personal judgments: The personal judgment and subjective decision of the

researcher in some cases can be used as a base to determine the size of the sample.

ii. Budgetary approach is another way to determine the sample size.

• Under this approach the sample size is determined by the available fund for the

proposed study.

• E.g., if cost of surveying of one individual or unit is 30 birr and if the total

available fund for survey is say 1800 birr , the sample size then will be

determined as,

• Sample size (n) = total budget of survey /Cost of unit survey, accordingly, the

sample size will be 60 units (1800 / 30 = 60 units)


iii. Traditional inferences: To estimate sample size using this approach we need to

have information about the estimated variance of the population, the magnitude of

acceptable error and the confidence interval

a. Variance or heterogeneity of the population: It refers to the standard

deviation of the population parameter.

o The sample size depends up on the variance of the population.

o f the population is similar (homogenous) small sample size can be enough.

• How to determine standard deviation of a population?

• Standard deviation is calculated from the sample or population. So how can be

determining the standard deviation in advance?


Sources:

1. Similar studies used in the past will be used as a base

2. Researchers without prior information could conduct pilot survey to estimate

population parameters

3. A rule of thumb may be used to estimate standard deviation. Standard

deviation is expected to be one sixth of the range.

o E.g., If the households yearly average income is expected to range between

1500 and 24000 birr, using the rule of the thumb the standard deviation will

be 1/6(22500) = 3750 hence range equal 22500 (24000-1500)


b. Magnitude of acceptable error: It is the range of possible random error (E), or

it is the potential difference between a population mean and an observed value. It

indicates how precise the estimate must be. For best estimate, the range of error has

to be small. Small range of error can be obtained if the sample size is large.

• How to determine magnitude of acceptable error?

 The researcher makes subjective judgment about the desired magnitude of error.

C. Confidence interval: It is a percentage or decimal value that tells how

confident a researcher can be about being correct. In most case (research) 95%

confidence level is used.

o That is, it is assumed that 95 times out of 100 the estimate from sample will

include the population parameter.


Sample size determination

 Once the above concepts are understood and determined the size of sample is quite simple.

o It is determined based on the following relationship.

i) For mean n = (ZS/E)2 and Where E = Magnitude of acceptable error , Z = Confidence level, S=

Standard deviation and n=Sample size.

• Ex: Suppose that a researcher has obtained the following data:

1. standard deviation = 18

2. confidence level of 95%

3. range of error to be 3

• To determine the sample size: n = ( ZS/ E) 2

• Note: for 95% confidence level ( 0.95/2 = 0.0475), the Z = 1.96 (from normal tables)

• n = ( (1.96) (18) / 3 ) 2 = 138.


6.5. Sampling Error and Non Sampling Error
• Sampling study subjected to sampling and non-sampling errors, which are of
random and/or of a constant in nature.
• The error created .due to sampling and of which the average magnitude be
determined are called sampling error, while others are called sampling bias.
Sampling Error
• Sampling error is the difference between the result of a sample and the result of
census.
• It is the difference between the sample estimation and the actual value of the
population.
• These are errors that are created because of the chance only.
• Although the sample is properly selected, there will be some difference between
the sample statistics and the actual value (population parameter).
• The mean of the sample might be different from the population mean by chance
alone. The standard deviation of the sample might also be different from the population
standard deviation.
 Therefore, we can expect some difference between the sample statistics and the

population parameter.

 This difference is known as sampling error.

 To illustrate this let us take a very simple example.

 Suppose an individual student has scored the following grades in 10 subjects (Consider

these subjects as population); 55, 60, 65, 90, 55, 75, 88, 45, 85, 82.

 Say, a sample of four grades 55, 65, 82, and 90 are selected at random from this

population to estimate the average grade of this student. The mean of this sample is 73.

 But the population mean is 70.

 The sampling error is therefore, 73 - 70 = 3.

 However, the variation due to random fluctuation (sampling error) decreases as the

sample size increases though it is not possible to completely avoid sampling error.
Systematic Error (non-sampling error)
• Systematic sampling is also called sampling bias and such error can be created from errors in the sampling
procedure, and it cannot be reduced or eliminate by increasing the sample size.
• Such error occurs because of human mistakes and not chance variation.
• The possible factors that contribute to the creation of such error include
1. Inappropriate sampling: If the sample units are a misrepresentation of the population; it will result in
sample bias.
 It occurs when there is a failure of all units in the population to have some probability of being selected for
the sample.
2. Accessibility bias: When all members of the population are not equally accessible, the researcher must
provide some mechanism of controlling in order to ensure the absence of over and under-representation of some
respondents.
3. Non-response bias: This is an incomplete coverage of sample or inability to get complete response from all
individuals initially included in the sample.
 This is due to the failure in locating some of the individuals of the sample element or due to their refusal to
respond.
 In some cases, respondents may intentionally give false information in response to some sensitive question.
For instance, people may not tell the truth of their bad habit and income.
 Total error = sampling error + Non-sampling Error
 Total error is usually measured as total error variance, also known as mean square (MSE) and (TE) 2
= (SE) 2 + (NE) 2.
6.5. Central Limit Theorem and standard error

 The sampling distribution of a mean of sample taken from a normal population shows two

important properties. First, the sampling distribution has a mean that is equal to

population mean ( 𝜇𝑥 = 𝜇).

 Second, the sampling distribution has a standard deviation that is equal to population

𝜎
standard deviation divided by the square root of the sample size (𝜎𝑥 = √𝑛).

 The central limit theorem states that as sample size increases, the sample distribution of

the mean approaches normal distribution regardless of the distribution of population from

which the random sample is drawn.


 Symbolically, the mean of sample distribution 𝑥 𝑖𝑠 𝜇𝑥 = 𝜇 and the standard

𝜎
deviation is 𝜎𝑥 =
√𝑛

 = the mean of the population

 = Standard deviation of the population

𝜎𝑥 = Standard deviation of the sample

n = number of item in the sample

 The significance of the central limit theorem lies in that it permits us to use

sample statistics inference about population parameters with out knowing

anything about the shape of the frequency distribution of that population

other than what we get from the sample.


 from a normal population, the means of samples drawn from such a population

are themselves normally distributed.

 But when sampling is not from a normal population, the size of the sample

plays a critical role.

 When n is small, the shape of the distribution will depend largely on the shape

of the parent population, but as n gets large (n > 30), the shape of the sampling

distribution will become more and more like a normal distribution, irrespective

of the shape of the parent population.


6.6. standard error and Confidence level and significance level:

 The standard error gives an idea about the reliability and precision of a sample.

 The (S.E) helps in testing whether the difference between observed and expected frequencies could arise due

to chance.

 The smaller the S.E., the greater the uniformity of sampling distribution and hence,

greater is the reliability of sample.

• Conversely, the greater the S.E., the greater the difference between observed and expected

frequencies.

• In such a situation the unreliability of the sample is greater.

• The standard error enables us to specify the limits within which the parameters of the

population are expected to lie with a specified degree of confidence.

• Such an interval is usually known as confidence interval.


 The confidence level or reliability is the expected percentage of times that

the actual value will fall within the stated precision limits.

 Thus, if we take a confidence level of 95%, then we mean that there are 95

chances in 100 (or .95 in 1) that the sample results represent the true

condition of the population within a specified precision range against 5

chances in 100 (or .05 in 1) that it does not.

You might also like