Measuring Validity, Reliability and Threats to Research Studies

 Joppe (2000) provides the following
explanation of what validity is:
◦ Validity determines whether the research truly

measures that which it was intended to measure or
how truthful the research results are.
◦ In other words, does the research instrument allow
you to hit "the bull’s eye" of your research object?
◦ Researchers generally determine validity by asking
a series of questions, and will often look for the
answers in the research of others.
 What is External Validity?
 In 1966, Campbell and Stanley proposed the

commonly accepted definition of external validity.
◦ “External validity asks the question of
generalizability: To what populations, settings,
treatment variables and measurement variables
can this effect be generalized?”
 External Validity is usually split into two distinct

types, population validity and ecological validity,
and they are both essential elements in judging
the strength of an experimental design.
 Population validity is a type of external validity
which describes how well the sample used can be
extrapolated to a population as a whole.
 It evaluates whether the sample population
represents the entire population, and also whether
the sampling method is acceptable.
 For example, an educational study that looked at a
single school could not be generalised to cover
children at every school.
 On the other hand, a federally appointed study,
that tested every pupil of a certain age group, will
have exceptionally strong population validity.
 Ecological validity is a type of external validity which
looks at the testing environment and determines
how much it influences behavior.
 In the school test example, if the pupils are used to
regular testing, then the ecological validity is high
because the testing process is unlikely to affect
behavior.
 On the other hand, taking each child out of class and
testing them individually, in an isolated room, will
dramatically lower ecological validity. The child may
be nervous, ill at ease and is unlikely to perform in
the same way as they would in a classroom.
 What is Internal Validity?
 The easy way to describe internal validity is the
confidence that we can place in the cause and
effect relationship in a study.
 The key question that you should ask in any
experiment is:
◦ “Could there be an alternative cause, or causes,
that explain my observations and results?”
 Looking at some extreme examples, a physics
experiment into the effect of heat on the
conductivity of a metal has a high internal
validity.
 Internal Validity is a measure which ensures
that a researcher's experiment design closely
follows the principle of cause and effect.
 “Could there be an alternative cause, or causes,
that explain my observations and results?”
 Test Validity is an indicator of how much meaning
can be placed upon a set of test results.
 Degree to which a test procedure
accurately measures what it was designed
to measure.
 Criterion Validity assesses whether a test reflects a

certain set of abilities.
 Validity of a selection process test assessed by
comparing the test scores with a non-
test criterion. For example, test
for leadership skills will match the test scores with
the traits displayed by known leaders.
 Concurrent validity is a measure of how well a particular
test correlates with a previously validated measure. It is
commonly used in social science, psychology and
education.
 The tests are for the same, or very closely related,
constructs and allow a researcher to validate new
methods against a tried and tested stalwart.
 For example, IQ, Emotional Quotient, and most school
grading systems are good examples of established tests
that are regarded as having a high validity. One
common way of looking at concurrent validity is as
measuring a new test or procedure against a gold-
standard benchmark.
 Predictive validity is a measure of how well a test
predicts abilities.
 It involves testing a group of subjects for a certain
construct and then comparing them with results
obtained at some point in the future.
 Most educational and employment tests are used to
predict future performance, so predictive validity is
regarded as essential in these fields.
 Content Validity is the estimate of how much a
measure represents every single element of a
construct.
 For example, an educational test with strong content
validity will represent the subjects actually taught to
students, rather than asking unrelated questions.
 Face Validity is a measure of how
representative a research project is ‘at face
value,' and whether it appears to be a good
project.
 In other words, a test can be said to have face
validity if it "looks like" it is going to measure

what it is supposed to measure.
 For instance, if you prepare a test to measure
whether students can perform multiplication,

and the people you show it to all agree that it
looks like a good test of multiplication ability,
you have shown the face validity of your test.
 What is Reliability?
 The idea behind reliability is that any
significant results must be more than a one-

off finding and be inherently repeatable.
 Other researchers must be able to perform
exactly the same experiment, under the same

conditions and generate the same results.
 This will reinforce the findings and ensure
that the wider scientific community will

accept the hypothesis.
 Reliability is an essential component of
validity but, on its own, is not a sufficient
measure of validity. A test can be reliable but
not valid, whereas a test cannot be valid yet
unreliable.
 Reliability, in simple terms, describes the
repeatability and consistency of a test.

 Validity defines the strength of the final
results and whether they can be regarded as

accurately describing the real world.
Threats to Internal & External Validity
 Is the investigator’s conclusion correct?
 Are the changes in the independent variable

indeed responsible for the observed variation
in the dependent variable?
 Might the variation in the dependent variable

be attributable to other causes?
Why is Internal Validity Important?
 We often conduct research in order to determine
cause-and-effect relationships.
 Can we conclude that changes in the independent
variable caused the observed changes in the
dependent variable?
 Is the evidence for such a conclusion good or
poor?
 If a study shows a high degree of internal validity
then we can conclude we have strong evidence of
causality.
 If a study has low internal validity, then we must
conclude we have little or no evidence of
causality.
 History
 Events outside of the study/experiment or
between repeated measures of the dependent

variable may affect participants' responses to
experimental procedures.
 Often, these are large scale events (natural
disaster, political change, etc.) that affect

participants' attitudes and behaviors such that it
becomes impossible to determine whether any
change on the dependent measures is due to
the independent variable, or the historical
event.
 For example, consider a researcher testing
whether a particular chemical increases anxiety.
The researcher measures New York City residents'
anxiety on September 5, 2001, gives them the
drug, and measures them again 20 days later.
Scores are likely to have increased because of the
terrorist attacks on September 11th – an external
event.
 In addition to well-publicized national events,
history effects can include subtle factors such as
changes in the weather (for example, improving
mood because people are outside more) or
changes in public policy (for example, increasing
stress because of changes to bankruptcy laws).
 Maturation
When subjects change as a function of time instead of the
independent variable (such as growing older, becoming
more experienced, experiencing physiological
changes,etc)
 Example:
Assessing a new teaching technique in children over a
school year may be very difficult because of the changes
in children that naturally occur in development during a
year.
If the participants became more conscious of health
issues as they aged during the study, and therefore
began to reduce smoking.
The performance of first graders in a learning experiment
begins decreasing after 45 minutes because of fatigue.
 Subjects change during the course of the
experiment or even between measurements.
 For example, young children might mature and
their ability to concentrate may change as they
grow up.
 Both permanent changes, such as physical growth
and temporary ones like fatigue, provide "natural"
alternative explanations; thus, they may change
the way a subject would react to the independent
variable.
 So upon completion of the study, the researcher
may not be able to determine if the cause of the
discrepancy is due to time or the independent
variable.
 Instrumentation
The instruments used for measuring the behavior
may change over time (especially if they’re human),
or the way in which the participants were measured
may change (instrument decay occurs when the
standards of measurement change over time).
Did any change occur during the study in the way
the dependent variable was measured?
 Examples:
Two examiners for an instructional experiment
administered the post-test with different
instructions and procedures.
Volunteer problem
Participants that volunteer to be a part of the
study (are not paid in any way) tend to be
“different” from those that don’t volunteer
(Bell, 1962)
Volunteers tend to be
• More unconventional
• More self-confident
• More extraverted
• Higher in need for achievement
Avoid this by “paying” subjects in some

form
 Confounding
 A major threat to the validity of causal inferences
is confounding: Changes in the dependent variable may
rather be attributed to the existence or variations in the
degree of a third variable which is related to the
manipulated variable.
 Repeated testing (also referred to as testing effects)
 Repeatedly measuring the participants may lead to bias.
Participants may remember the correct answers or may
be conditioned to know that they are being tested.
Repeatedly taking (the same or similar) intelligence tests
usually leads to score gains, but instead of concluding
that the underlying skills have changed for good, this
threat to Internal Validity provides good rival hypotheses.
 Selection bias
 Refers to selecting participants for the various groups
in the study. Are the groups equivalent at the
beginning of the study?
 It refers to the problem that, at pre-test, differences
between groups exist that may interact with the
independent variable and thus be 'responsible' for the
observed outcome.
 Researchers and participants bring to the experiment a
myriad of characteristics, some learned and others
inherent.
 For example, sex, weight, hair, eye, and skin color,
personality, mental capabilities, and physical abilities,
but also attitudes like motivation or willingness to
participate.
 Mortality/differential attrition
 This error occurs if inferences are made on the basis

of only those participants that have participated from
the start to the end.
 However, participants may have dropped out of the
study before completion, and may be even due to
the study or programme or experiment itself.
 In a health experiment designed to determine the
effect of various exercises, those subjects who find
the exercise most difficult stop participating.
 Experimenter bias
 Experimenter bias occurs when the individuals

who are conducting an experiment
inadvertently affect the outcome by non-
consciously behaving differently to members of
control and experimental groups.
 It is possible to eliminate the possibility of

experimenter bias through the use of double
blind study designs, in which the experimenter
is not aware of the condition to which a
participant belongs.
 Contamination:
 This occurs when there is communication about the
experiment between groups of participants.
 Three possible outcomes of contamination:
• resentment: some participants’ performance may
worsen because they resent being in a less
desirable condition;
• rivalry: participants in a less desirable condition
may boost their performance so they don’t look
bad; and
• diffusion of treatments: control participants learn
about a treatment and apply it to themselves.
 Threats to external validity
 A threat to external validity is an explanation
of how you might be wrong in making a
generalization.
 Generally, generalizability is limited when the
cause (i.e. the independent variable) depends

on other factors; therefore, all threats to
external validity interact with the independent
variable.
 Aptitude–treatment Interaction: The sample
may have certain features that may interact
with the independent variable, limiting
generalizability.
 For example, inferences based on
comparative psychotherapy studies often

employ specific samples (e.g. volunteers,
highly depressed).
 If psychotherapy is found effective for these
sample patients, will it also be effective for

non-volunteers or the mildly depressed or
patients with concurrent other disorders?
 Situation:
 All situational specifics (e.g. treatment conditions,
time, location, lighting, noise, treatment
administration, investigator, timing, scope and
extent of measurement, etc.) of a study potentially
limit generalizability.
 Pre-test effects:
 If cause-effect relationships can only be found when
pre-tests are carried out, then this also limits the
generality of the findings.
 Post-test effects:
 If cause-effect relationships can only be found when
post-tests are carried out, then this also limits the
generality of the findings.

Measuring Validity, Reliability and Threats to Research Studies

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Measuring Validity, Reliability and Threats to Research Studies

Uploaded by

Copyright:

Available Formats

 Joppe (2000) provides the following

explanation of what validity is:

◦ Validity determines whether the research truly

 In 1966, Campbell and Stanley proposed the

 External Validity is usually split into two distinct

 Criterion Validity assesses whether a test reflects a

validity if it "looks like" it is going to measure

whether students can perform multiplication,

significant results must be more than a one-

exactly the same experiment, under the same

that the wider scientific community will

repeatability and consistency of a test.

results and whether they can be regarded as

 Is the investigator’s conclusion correct?

 Are the changes in the independent variable

 Might the variation in the dependent variable

between repeated measures of the dependent

disaster, political change, etc.) that affect

Avoid this by “paying” subjects in some

 This error occurs if inferences are made on the basis

 Experimenter bias occurs when the individuals

 It is possible to eliminate the possibility of

cause (i.e. the independent variable) depends

comparative psychotherapy studies often

sample patients, will it also be effective for

You might also like