Professional Documents
Culture Documents
5 - Types of Errors PDF
5 - Types of Errors PDF
V. Types of errors
V. Types of errors
Content
1. Inference in surveys
2. Sampling errors
2.1 Coverage error
2.2. Sampling error
2.3 Non-response error
2.4 Adjustment error
Inference I: We use
an answer to a
question from a
respondent to draw
inferences about the
characteristics of
that person.
1. Inference in surveys
Associated errors:
Measurement Representation
NON-SAMPLING SAMPLING ERRORS
(MEASUREMENT) ERRORS Construct Target Population
µi Y
Coverage
Validity Error
Sampling Frame
Measurement YC
Yi Sampling
Error
Measurement
Error Sample
YC
Nonresponse
Response
Error
yi
Respondents
Yr
Processing
Error
Adjustment
Error
Edited Response
Postsurvey
yip adjustments
In both cases, errors can be: Yrw
• RANDOM
• SYSTEMATIC (bias)
Survey Statistic
y p rw
2. Sampling errors
Measurement Representation
Sampling Frame
Measurement YC
Yi Sampling
Error
Measurement
Error Sample
YC
Nonresponse
Response
Error
yi
Respondents
Yr
Processing
Error
Adjustment
Error
Edited Response
Postsurvey
yip adjustments
Yrw
Survey Statistic
2. Sampling errors
2.1 Coverage error Discrepancy between target population and sampling frame.
TARGET P.
COVERED POPULATION
UNDERCOVERAGE
Example: in the United States there is no updated list of residents that can be used as a sampling
frame of people. Sample surveys of the target population of all US residents often use sampling
frames of telephone numbers. But, people with lower incomes and in remote rural areas are less
likely to have telephones in their homes while young urban residents tend to have only mobile
phones.
2. Sampling errors
– Telephone directories
• people without telephones or only-cell-phone users
• Several telephone numbers belong to one person
– Customers, employees, members of an organization:
• Up-to-dated?
• Duplicates, temporary absences
• Addresses that describe a role rather than an individual (e.g. secretary)
• Free lance rollers included?
– Frames for web surveys
• For certain populations (e.g. UdG students) – all access and known e-mail
addresses
• For general population:
– Always coverage problems (all internet access? Known addresses?)
– Panels: lists of e-mail addresses (+incentives)
2. Sampling errors
Coverage bias? People who use cell-phones exclusively may not differ
significantly in vote choice but might have big differences on attitudes toward
technology.
2.2 Sampling error Gap between the sampling frame and sample
• Not all people in the sampling frame are measured (error deliberately
introduced).
2.3 Nonresponse error Gap between the sample and the respondents.
Causes:
Contactability
Pre-
Incentives Burden
notification
Initial Decision
Respondent/
Interviewer
Interviewer
behavior
match
Persuasion
Mode switch
letters
Interviewer
switch
Post-survey
2-phase adjuntment
sampling Final Decision
2. Sampling errors
(!) Nonresponse bias exists when the causes of the non-response are linked to the
survey statistics measured.
Post-survey adjustments are efforts to improve the sample estimate in the face of
coverage, sampling, and nonresponse error.
If adjustments are not carried out properly they can also be an error source: they
can increase rather than reduce error.
3. Non-sampling (measurement) errors
Measurement Representation
Sampling Frame
Measurement
Yi Sampling
Error
Measurement
Error Sample
Nonresponse
Response
Error
yi
Respondents
Processing
Error
Adjustment
Error
Edited Response
Postsurvey
yip adjustments
Survey Statistic
3. Non-sampling (measurement) errors
• Construct Validity: the extent to which the measure (one or more questions)
reflects the true value of the construct of interest for each individual.
Example:
• Construct: price sensitivity with respect to ecologically grown foods
• We ask: “Would you be willing to pay 10% more for an ecologically grown apple?”
• Those with tendency to give favourable answers will systematically overstate µi
• Those wishing to present themselves as ecologist or health conscious will systematically
overstate µi
• Those who do not like apples at all will systematically understate µi
CFA: Convergent and Discriminant Validity
My expectations of the e1
establishment have been met at all
times
λ11
Satisfaction e2
λ21 I have always felt satisfied with the
with the
λ31
establishment
establisment
e3
The level of satisfaction attained
was high compared to that of other
φ12
similar establishments
e4
I am satisfied with the tiles
acquired
λ42
e5
Satisfaction λ52 My expectations of the tiles
with the
λ62 purchased have been fulfilled
product
e6
Compared to other tiles that I have
seen the degree of satisfaction is
high
3. Non-sampling (measurement) errors
3.2 Measurement error Gap between the ideal measurement and the response obtained
Sources of error:
• Social desirability
3.3 Processing error Gap between the variable used in estimation and the
response provided by a respondent.
Some examples:
– Data entry errors.
– Misinterpretation of an answer
Exercises
For each of the following design decisions, identify which error sources might be
affected.
e) The decision to increase the number of questions about assets and income in a
survey of income dynamics, resulting in a lengthening of the interview.
Exercises
• A recent newspaper article reported that "sales of handheld digital devices (e.g.,
ebooks, PDAs) are up by nearly 10% in the last quarter, while sales of laptops and
desktop PCs have remained stagnant."
• This report was based on the results of an on-line survey in which 9.8% of the more
than 126,000 respondents said that they had "purchased a handheld digital device
between January 1 and March 30 of this year."
• E-mails soliciting participation in this survey were sent to individuals using an e-mail
address frame from the five largest commercial Internet service providers (ISPs) in
the United States.
• Data collection took place over a 6-week period beginning May 1,2002. The overall
response rate achieved in this survey was 53%.
• Assume that the authors of this study wanted to infer something about the expected
purchases of US adults (18 years old +).
Exercises
b) How the design of this survey might affect the following sources of error: coverage
error, nonresponse error, and measurement error.
c) Without changing the duration or the mode of this survey (i.e., computer assisted,
self-administration), what could be done to reduce the errors you outlined in (b)?
d) To lower the cost of this survey in the future , researchers are considering cutting
the sample in half, using an e-mail address frame from only the two largest ISPs.
What effect (if any) will these changes have on sampling error and coverage error?