You are on page 1of 83

STEP- 5

COLLECTION OF DATA
DATA COLLECTION

The facts expressed in quantitative form can be termed


as data.

The task of data collection begins after the research


problem has been defined and research design has been
developed.

While deciding the method of data collection to be used


for the study, the researcher should keep in mind two
types of data viz., primary and secondary.
The primary data are those which are collected afresh
and for the first time and thus happen to be original in
character.

The data used in a statistical study is collected under


the control and supervision of the investigation is
referred to as primary data.

Information that is developed or gathered by the


researcher specifically for the research project at hand.
The secondary data are those which have already been
collected by someone else and which have already been
passed through the statistical process.
or
When the data are not collected by the investigator but is
derived from other sources then such data is referred to
as secondary data.

information that has previously been gathered by


someone other than the researcher and/or for some other
purpose than the research project at hand.
1. Difference in originality
2. Difference in the suitability of
objectives
3. Difference in the cost of collection
METHODS OF COLLECTING
PRIMARY DATA

1. Observation
2. Interview
3. Questionnaire
4. Schedule
METHODS OF COLLECTING
SECONDARY DATA

Secondary data may either be published or


unpublished data.

Sources of unpublished data

1. Diaries
2. Letters
3. Unpublished biographies
4. Autobiographies
METHODS OF COLLECTING
SECONDARY DATA
Sources of published data

1. Publications of central, state and local govt.


2. Publications of foreign govt. or international bodies and
their subsidiary organizations.
3. Technical and trade journals
4. Books, magazines and newspapers
5. Reports and publications of various associations connected
with business and industry, banks, stock exchanges etc.
6. Reports prepared by research scholars, universities,
economists etc.
7. Public records and statistics
8. Historical documents
METHODS OF COLLECTING
PRIMARY DATA
1. OBSERVATION METHOD

In the Observation method the investigator asks no questions


but he simply observe or watch the participants or activities
in action and records the necessary data.

The information obtained under this method relates to what


is currently happening, which is collected by way of
investigator’s own direct observation without asking from the
respondent.
OBSERVATION METHOD

Types of observations

1. Structured vs Unstructured observation

2. Participant vs Non-participant observation

3. Controlled vs Uncontrolled observation

4. Disguised vs Non-disguised observation


OBSERVATION METHOD

Advantages of observation method

1. Data is more reliable and free from respondent bias

2. Easy to note the effects of environmental influences


on specific outcomes

3. Easy to observe certain groups of individuals-very


young children, busy executives etc
OBSERVATION METHOD

Disadvantages of observation method

1. Physical presence is a must

2. Tedious and expensive

3. Due to long periods of observation, observer exhaustion


could easily set in

4. Moods, feelings and attitude may affect the observation

5. Training is necessary
METHODS OF COLLECTING PRIMARY
DATA

2. INTERVIEW METHOD

 Interview is a face-to-face interaction between two or


more persons for a particular purpose.

 According to Scott, “ Interview is a purposeful


exchange of ideas, answering of questions and
communication between two or more persons.”
1. Structured and Unstructured Interview: Structured in
case of descriptive and causal study and Unstructured in
case of exploratory study
2. Focused Interview: It is mean to focus attention on the
given experience of the respondent and its effects.
3. Clinical interview: It is concerned with the broad
underline feelings or motivation or with the course of
individual’s life experience.
4. Directive and Non-directive interview: In directive
interview the respondents are directed towards a given
topic. Its function is simply to encourage the respondent to
talk about the given topic with a bare minimum of direct
questioning.
1. Faster and Cheaper method
2. Easy and Flexible
3. Replies can be recorded
4. No field staff is required
 It consists of number of question printed or
typed in a definite order on a form.

 Used in case of big enquiries.

 Free from bias of the interviewer.

 Generally sent through mail.


What are you trying to find out?
1.A good questionnaire is designed so that your results will
tell you what you want to find out.
2.Start by writing down what you are trying to do in a few
clear sentences, and design your questionnaire around this.

How are you going to use the information?


1.There is no point conducting research if the results aren’t
going to be used – make sure you know why you are asking
the questions in the first place. Make sure you cover
everything you will need when it come to analyzing the
answers.
e.g. Maybe you want to compare answers given by men and
women. You can only do this if you’ve remembered to record
the gender of each respondent on each questionnaire.
Keep it short. In fact, quite often the shorter the better.

1. We are all busy, and as a general rule people are less


likely to answer a long questionnaire than a short one.
2. If you are going to be asking your customers to answer
your questionnaire in-store, make sure the interview is no
longer than 10 minutes maximum (this will be about 10 to
15 questions).
3. If your questionnaire is too long, try to remove some
questions. Read each question and ask, "How am I going
to use this information?" If you don’t know, don’t include
it!
Use simple and direct language.
1. The questions must be clearly understood by the
respondent. The wording of a question should be simple
and to the point. Do not use uncommon words or long
sentences.
Start with something general.
Respondents will be put-off and may even refuse to complete
your questionnaire if you ask questions that are too personal
at the start (e.g. questions about financial matters, age, even
whether or not they are married).
Do locate personal or confidential questions at the end of the
questionnaire. The early appearance of unsettling questions
may result in respondents discontinuing the questionnaire.

Place the most important questions in the first half of the


questionnaire.
Respondents sometimes only complete part of a
questionnaire. By putting the most important items near the
beginning, the partially completed questionnaires will still
contain important information.
Leave enough space to record the answers.
If you are going to include questions which may require a
long answer e.g. ask someone why they do a particular
thing, then make sure you leave enough room to write in
the possible answers. It sounds obvious, but it’s so often
overlooked!

Test your questionnaire on your colleagues (Pilot Study).


No matter how much time and effort you put into
designing your questionnaire, there is no substitute for
testing it. Complete some interviews with your colleagues
BEFORE you ask the real respondents. This will allow
you to time your questionnaire, make any final changes,
and get feedback from your colleagues.
Do order categories.
When response categories represent a progression between a
lower level of response and a higher one, it is usually better
to list them from the lower level to the higher in left-to-right
order, for example,

1) Never 2) Seldom 3) Occasionally 4) Frequently

Do ask responders to rate both positive and negative stimuli.


There is sometimes a difficulty when responders are asked
to rate items for which the general level of approval is high
(the "apple pie" problem). There is a tendency for
responders to mark every item at the same end of the scale.
Avoid category increase.
A typical question is the following: Marital Status: 1) Single
(Never married) 2) Married 3) Widowed 4) Divorced 5)
Separated

Avoid responses at the scale mid-point and neutral responses.


The use of neutral response positions had a basis in the past
when crude computational methods were unable to cope with
missing data. In such cases, non-responses were actually
replaced with neutral response values to avoid this problem.
The need for such a makeshift solution has long been
supplanted by improved computational methods. Consider the
following questionnaire item:
The instructor grades fairly. 1) Agree 4) Tend to disagree 2) Tend
to agree 5) Disagree 3) Undecided
 Response categories provided for each close-ended
question should be mutually exclusive and
exhaustive
1. Mutually Exclusive  Response categories must be
such that the same respondent cannot be classified
into more than one category; e.g. the categories
SR1,000-5,000 and SR5,000-10,000 are not
mutually exclusive.
2. Mutually Exhaustive – Response categories should
include all possible response options. Sometimes
this is achieved by including a response option like
“Other (Please specify)….”
 Avoid complexity: use simple, conversational
language.
 Avoid leading and loaded questions.
 Avoid ambiguity: be as specific as possible.
 Avoid double barreled items.
 Avoid making assumptions.
 Avoid burdensome questions.
Leading question: a question that suggests or implies a certain
answer.
Causes:
•The bandwagon effect – e.g. Most Saudis have stopped eating junk food. Do you
eat junk food?
•Partially mentioning some alternatives – e.g. which fast food restaurant do you
prefer, Al-Tazaj or others?
•Questions with the phrase: “ Don’t you think that ..”
•Phrasing question to reflect negative or positive aspect of issue  use split ballot
technique.
 A question that is designed to suggest a
socially desirable answer. Usually it is
emotionally charged.
Causes:
• Choice of words; e.g. using emotionally-charged
words, such as in: In your opinion is it fair that the
Security Dept should be harassing students with
parking tickets?
• Framing question such that honest answer is
painful or embarrassing  use counter biasing
statement.
Two related issues:
1. Order of questions in questionnaire.
2. Order of answer alternatives for specific
questions.
Both can lead to order bias.
 Use simple, interesting opening questions  e.g
asking for respondent’s opinion on an issue.
 Ask general questions before specific questions 
funnel approach
 Use branching questions (filter & pivot) with care.
 Ask for classification information last.
 Place difficult or sensitive questions late in
questionnaire.
 Finish asking questions on one topic before moving to
another.
• Keep questionnaire short if possible, but not too short that you
sacrifice needed information
• Do not over crowd questionnaire
• Provide decent margin space
• Use multiple- grid layout for questions with similar responses
• Use good quality print paper.
• Use booklet form if possible
• Carefully craft the questionnaire title:
1. Captures respondent’s interest.
2. Shows importance of the study.
3. Shows interesting nature of the study.
 Pretesting Process
• Seeks to determine whether respondents have any
difficulty understanding the questionnaire and whether
there are any ambiguous or biased questions.
 Preliminary Tabulation
• A tabulation of the results of a pretest to help determine
whether the questionnaire will meet the objectives of
the research.
 It is like a questionnaire.
 Performa containing a questionnaire being filled
by enumerators.
 Enumerators put question to respondent from the
Performa and record in given space.

Example: Population census


1. Questionnaire can be sent via mail but schedule can be
used only personally
2. Questionnaire is cheaper method than schedule (for
schedule you have to move everywhere)
3. Questionnaire can be returned without answering all the
questions but, in schedule, enumerator ensures the filling
all the questions.
4. Questionnaire can be filled  by anyone but schedule is
always filled by enumerator.
5. Respondent should be literate & co-operative in
Questionnaire but schedule can be filled by illiterate.
6. Risk of incomplete & wrong information is more in
Questionnaire.
7. Physical appearance of Questionnaire has to be attractive
but not such case is necessary with schedule.
8. Success of Questionnaire depends on its design but in
case of Schedule it depends on honesty & competency of
Enumerator.
Measurement in research consist of assigning
numbers to empirical events, objects or properties or
activities in fulfillment with a set of rules.

Scale of Measurement

Scale of measurement refers to the units in which a


variable is measured.
• Nominal

• Ordinal
Transformation
Possible
• Interval

• Ratio

35
Scale values are only used as labels they just classify sample
units into categories
No order or distance relationship
No arithmetic origin
Only count numbers in categories

e.g. Gender: male = 1, female = 2

type: qualitative

statistics: frequencies/percentage and mode

36
Nominal scale

Which of the following media influences your purchasing decisions the


most?
–1 Television
–2 Radio
–3 Newspapers
–4 Magazines
Other examples
Religion Social status
Marital status Days of the week (months)
Geographic location Seasons
Ethnic group Types of restaurants
Brand choice
Job type: executive, technical, clerical

Coded as “0” Coded as “1”


Scale values indicate an order of magnitude (ranking) with
respect to the variable of interest

e.g. How do you evaluate class rooms


poor = 1, no opinion = 2, good = 3

type: qualitative

statistics: median and range

39
Please rank the news programs offered in the following four
networks based on your preference.(1 for most preferred, 4 for
least preferred).

_____ TVP1
_____ TVP2
_____ Rzeszow TV
_____ TVN
•Scale values indicate orders of magnitude as well
as distance (for most behavioral research, Interval
scales are the highest form of measurement).
assumes that the measurements are made in
equal units.
i.e. gaps between whole numbers on the scale
are equal.
e.g. Average grade when entering next semester
type: quantitative

statistics: mean and std. deviation/variance

41
Interval scale
How likely are you going to buy a new automobile within the next six
months? (Please check the most appropriate category)

Definitely will not buy ___ 1


Probably will not buy ___ 2
May or may not buy ___ 3
Probably will buy ___ 4
Definitely will buy ___ 5
Ratio scales have absolute quantities. Money and weight
are ratio scales, as they have an absolute zero as well as
interval properties.

Allows you to compare differences between numbers.

e.g. Sales revenue in a sample of companies

type: quantitative
statistics: geometric mean, coefficient of variation

43
Examples
height, weight, age,
length
time
income
market share

6
1.What is your annual income before
5
taxes? $ _______
4

3 2. How far is your workplace from

2 home?
_______ kilometres
1

0
Classification of Scaling Techniques
SCALING TECHNIQUES

Comparative Nominal Non-comparative


scales scales scales

Paired Rank Constant Continuous Itemized


Comparison order sum rating scales rating
scales
Multiple Choice
Multiple
Multiple Response
Choice
scales
Single
Response
Semantic
scales Likert Stapel
differential
Types of scaling Techniques
 COMPARATIVE SCALES
• Involve the respondent directly comparing stimulus
objects.
• e.g. How does Pepsi compare with Coke on
sweetness

 NON-COMPARATIVE SCALES
• Respondent scales each stimulus object
independently of other objects
•e.g. How would you rate the sweetness of Pepsi on a
scale of 1 to 10
Comparative Scales:
Paired Comparison Items
If we have brands A, B, C and D, we would have
respondents compare
• A and B
• A and C
• A and D
• B and C
• B and D
• C and D

–usually limited to N < 15


Comparative Scales:
Paired Comparison Items
Please indicate which of the following airlines you
prefer by circling your more preferred airline in
each pair:
Air Canada WestJet
Air Transat Air Canada
Horizon Air WestJet
WestJet Air Transat
Air Canada Horizon Air
Horizon Air Air Transat
Comparative Scales: Constant
Sum Scales
Allocate a total of 100 points among the following soft-drinks
depending on how favorable you feel toward each; the more highly you
think of each soft-drink, the more points you should allocate to it.
(Please check that the allocated points add to 100.)
Coca-Cola _____ points
7-Up _____ points
Mirinda _____ points
Fanta _____ points
Pepsi-Cola _____ points

100 points
Comparative Scales: Constant
Sum Scales
Please divide 100 points among the following characteristics so the
division reflects the relative importance of each characteristic to you in
the selection of a bank
Hours of service ________________
Friendliness _______________
Distance from home ________________
Investment vehicles ________________
Parking facilities __________________
Comparative scales:
rank order scales
Rank the following soft-drinks from 1 (best) to 5 (worst) according to your
taste preference:
Coca-Cola _____
7-Up _____
Fanta _____
Pepsi-Cola _____
Mountain Dew _____

√ Top and bottom rank choices are ‘easy’


√ Middle ranks are usually most ‘difficult’
A classification of scaling techniques
SCALING TECHNIQUES

Comparative Non-comparative
scales scales

Paired Rank Constan Guttma Continuo Itemized


compariso order t sum n us rating rating
n scale scales scales

Semantic
Likert Stapel
differential
Non comparative scale
Continuous scale
• How would you rate Stat. Analysis to other
courses this term
The worst X X The Best
0 10 20 30 40 50 60 70 80 90 100


Non-comparative scale
Itemized Rating Scales

Semantic
Likert scale Stapel scale
differential
scale

Thurston
Type Scale
Non-Comparative Scales
 A unique bipolar ordinal scale format that
captures a person’s attitudes or feelings about
a given object.
 This type of scale is unique in its use of
bipolar adjectives and adverbs (good/bad,
like/dislike, competitive/noncompetitive,
helpful/unhelpful, etc.)
Semantic differential scale
Here are a number of statements that could be used to describe
Tesco. For each statement tick ( ) the box that best
describes your feelings about Tesco.

Modern Store Old- fashioned store


Low prices High prices
Unfriendly staff Friendly staff
Narrow product Wide product range
range
Sophisticated Unsophisticated
customers customers
Likert scale
Strongly Disagree Neither agree Agree Strongly
disagree nor disagree agree
Cost is the most important 1 2 3 4 5
consideration when buying a
new car
AGREEMENT To what extent do you agree or disagree that Nokia is
a high quality brand?

•Agree strongly 
•Strongly agree  •Agree moderately 
•Agree  •Agree slightly  •Agree 
•Agree 
•Undecided  •Disagree slightly  •Undecided 
•Disagree
•Disagree  •Disagree •Disagree
•Strongly disagree moderately 
•Disagree strongly

•Disagree strongly  •Completely agree 


•Agree very strongly 
•Disagree  •Mostly agree 
•Agree strongly 
•Tend to disagree  •Slightly agree 
•Agree 
•Disagree 
 •Tend to agree  •Slightly disagree 
•Agree  •Mostly disagree
•Disagree strongly 
•Agree strongly •Completely
•Disagree very strongly
disagree
FREQUENCY How frequently do you go walking?

•Very frequently •Always 


•Always  •Almost always 
•Frequently •Very frequently
•Usually  •To a considerable
•Occasionally •Occasionally 
•About half the time  degree 
•Rarely  •Rarely 
•Seldom  •Occasionally
•Very rarely •Very rarely 
•Never •Seldom
•Never •Never

•A great deal  •Always 


•Often 
•Much  •Very often 
•Sometimes 
•Somewhat  •Sometimes 
•Seldom 
•Little  •Rarely 
•Never
•Never •Never
IMPORTANCE How important is price to you when purchasing jeans?

•Very important 
•Important 
•Moderately important  •Very important 
•Of little importance  •Moderately important 
•Unimportant •Unimportant

QUALITY How would you rate the quality of Toshiba laptops?

•Very good  •Extremely poor 


•Good  •Below average  •Good 
•Barely acceptable  •Average  •Fair 
•Poor  •Above average  •Poor
•Very poor •Excellent
LIKELIHOOD How likely will you be to purchase a car in the next 6
months?

•To a great extent 


•Likely   •Somewhat  •True 
•Unlikely •Very little  •False
•Not at all

•Almost always true 


•Definitely 
•Usually true  •True of myself 
•Very probably 
•Often true  •Mostly true of myself 
•Probably 
•Occasionally true  •About halfway true of myself 
•Possibly 
•Sometimes but infrequently true  •Slightly true of myself 
•Probably not 
•Usually not true  •Not at all true of myself
•Very probably not
•Almost never true
Unipolar scale with 10 categories numbered
from - 5 to +5, without a neutral point (zero),
With values progressions ranging from
positive to negative which measure direction
and intensity simultaneously
Usually presented vertically
Respondents are asked to indicate the object
by selecting a numerical response category
Stapel scale
+5 +5 Rank university BAREK on the
+4 +4 quality of its food and service.
+3 +3
+2 +2
+1 +1
quality service
-1 -1
-2 -2
-3 -3
-4 -4
-5 -5
A Stapel scale for measuring a store’s image
Select a plus number for words that you
think describe the store accurately. The
more accurately you think the work Tesco
describes the store, the larger the plus
number you should choose. Select a minus +5 +5 +5
number for words you think do not describe +4 +4 +4
the store accurately. The less accurately +3 +3 +3
you think the word describes the store, the +2 +2 +2
larger the minus number you should +1 +1 +1
choose, therefore, you can select any HIGH POOR WIDE
number from +5 for words that you think are QUALITY SERVICE VARIETY
very accurate all the way to -5 for words that -1 -1 -1
you think are very inaccurate. -2 -2 -2
-3 -3 -3
-4 -4 -4
-5 -5 -5
 An initial set of indicators are collected designed to range
from one extreme to the other in relation to the topic (eg
attitudes to management)
 A large group of judges (up to 50) are asked to judge the
items objectively. Each judge is asked to sort them into
categories representing degrees of favourableness or
unfavourableness towards the topic, using 7, 9 or even 11
categories
 Then scoring each category from 0 to 10 a median value
is calculated for an item by all the judges – this is the
value that results in half the judges giving an item a lower
score and half of them giving it a higher score.
 The range of score – usually the interquartile range is
calculated of the judgements made. High scatter
items which imply little agreement between judges
are then discarded. A set of items are then selected
which covers the whole range of categories from 0 to
10
 But this is rather laborious and the attitudes of the
judges may not be representative of the attitudes of
survey respondents.
 Also known as
• Scalogram analysis
• Cumulative scaling
 Purpose:
• Establish a one-dimensional continuum
• Perfectly predict item responses from total score
• Seldom perfect in practice
 Guttman Scale
• A more general equivalent of a social distance scale, based
on the fact that some items may prove to be more extreme
indicators than others
• Note: People do not always respond to this scale in the
“correct” order, so you sometimes have to chose between
scoring it as a Guttman scale or as an index!
SOME BASIC CONSIDERATIONS
WHEN SELECTING A SCALE

Odd or even number


Number of of scale categories
categories

Forced versus non-forced


choice Balanced versus
non-balanced
alternatives
Odd versus even
if neutral responses likely, use odd
number

Odd Even
Strongly agree _____ Strongly agree_____
Agree _____ Agree _____
Neutral _____ Disagree _____
Disagree _____ Strongly disagree___
Strongly disagree_____
Balanced vs. Unbalanced

Balanced Unbalanced
Very good ______ Excellent ______
Good ______ Very good ______
Fair ______ Good ______
Poor ______ Fair ______
Very poor ______ Poor ______
Forced vs. Unforced

Forced Unforced
Extremely reliable ___
Extremely reliable ___
Very reliable ___ Very reliable ___

Somewhat reliable ___ Somewhat reliable ___


Somewhat unreliable ___
Somewhat unreliable ___
Very unreliable ___ Very unreliable ___
Extremely unreliable ___
Extremely unreliable ___
Don’t know ___
Labeled vs. End anchored

Labeled End Anchored


Excellent _____ Excellent _____
Very good _____ _____

Fair _____ _____

Poor _____ _____

Very Poor _____ Poor _____


Intervals may not reflect the semantic
meaning of the Adjectives

Excellent _____

Labeled Intervals are


not equal
Excellent _____
Very good _____ Very good _____

Fair _____ Fair _____

Poor _____ Poor _____


Intervals are
Very poor _____ not equal

Very poor _____


Number of scale points

5 Point 10 Point
Excellent _____ Excellent _____________

_____________

_____
_____________
_____________
_____________

_____ _____________
_____________

_____
_____________
_____________

Poor _____________

Poor _____
1. Reliability
• The degree to which a measure accurately captures a true
outcome without error .
• synonymous with repetitive consistency.
2. Validity
• The degree to which a measure faithfully represents the
underlying concept (it asks the right questions).
3. Sensitivity
• The ability to discriminate meaningful differences
between attitudes. The more categories the more sensitive
(but less reliable).
 Reliability can be more easily determined than validity
 If it is reliable, it may or may not be valid
 If a measure is valid, it may or may not be reliable
 If it is not reliable, it cannot be valid
 If it is not valid, it may or may not be reliable
Reliability and Validity

Neither Reliable Reliable But Reliable


Nor Valid Not Valid And Valid
Example of low validity, high
reliability
• Scale is perfectly accurate, but is capturing
the wrong thing; for example, it measures
consumers’ interest in creative writing rather
than preference for kinds of stationery.
Example of modest validity,
low reliability
• Scale genuinely measures consumers’
interest in kinds of stationery, but poorly
worded items, sloppy administration,
data entry errors lead to random errors
in data.

You might also like