You are on page 1of 46

VALIDITY

Dr. Sajid Masood


Acknowledgment
• Cohen, L., Manion, L., & Morrison, K. (2007).
Research methods in education. New York:
Routledge.
• Fraenkel, J.R., & Wallen, N.E. (2009). How to design
and evaluate research in education. New York:
McGraw-Hill.
• DeVellis, R.F. (2003). Scale development: theory and
application. Thousands Oak: Sage
• Gregory, R.J. (1992). Psychological testing: history,
principles and applications. Boston: Allyn and
Bacon.
Some definitions…
• Validity

“The soundness or appropriateness


of a test or instrument in
measuring what it is designed to
measure”
(Vincent 1999)
Some definitions…
• Validity

“Degree to which a test or


instrument measures what it
purports to measure”

(Thomas & Nelson 1996)


Validity (Frankel& Wallen, 2009)
• Validity has been defined as referring to the
appropriateness, correctness, meaningfulness,
and usefulness of the specific inferences
researchers make based on the data they
collect.

• Validation is the process of collecting and


analyzing evidence to support such inferences.
Validity

Logical Statistical

Construct

Face Content Concurrent Predictive


Content Validity
• It refers to the content and format of the
instrument
– How appropriate is the content?
– How comprehensive?
– Does it logically get at the intended variable?
– How adequately does the sample of items or questions
represent the content to be assessed?
– Is the format appropriate?
• The content and format must be consistent with the
definition of the variable and the sample of subjects
to be measured.
Content Validity
• A key element is the adequacy of the sampling of the
domain it is supposed to represent.
• The other aspect of content validation is the format of the
instrument.
– Clarity of printing, size of type, adequacy of work space (if
needed), appropriateness of language, clarity of direction etc
• Attempts to obtain evidence that the items measure what
they are supposed to measure typify the process of content-
related evidence.
• Content validation, therefore, is partly a matter
of determining if the content that the instrument
contains is an adequate sample of the domain of
content it is supposed to represent.
Content-Related Evidence

• Items need to effectively act


as a representative sample of
all the possible questions that
could have been derived
from the construct (Crocker & Algina, 1986;
DeVellis, 1991; Gregory, 1992).
How to ensure content validity?

• Usually developed during test


development.
–Not generally empirically
evaluated.
–Judgment of subject matter
experts.
Criterion-related evidence of validity

• Relationship between scores obtained


using the instrument and scores obtained
using another instruments or measure
(often called a criterion).
– How strong is this relationship?
– How well do such scores estimate present or
predict future performance of a certain type?
Criterion-related evidence of
validity
• For example, if an instrument has been
designed to measure academic ability,
student scores on the instrument might be
compared with their grade-point averages
(the external criterion).
Criterion-Related Evidence
• A criterion is a second test presumed to measure
the same variable.
• There are two forms of criterion-related validity:
1) Predictive validity: time interval elapses between
administering the instrument and obtaining criterion
scores
• For example, a researcher might administer a
science aptitude test to a group of high school
students and later compare their scores on the
test with their end-of-semester grades in
science courses.
Criterion-Related Evidence
2. Concurrent validity: instrument data and
criterion data are gathered and compared at the
same time
• An example is when a researcher
administers a self-esteem inventory to a
group of eighth-graders and compares
their scores on it with their teachers’
ratings of student self-esteem obtained at
about the same time.
Criterion-Related Evidence
• A Correlation Coefficient (r) indicates the
degree of relationship that exists between the
scores of individuals obtained by two
instruments.
• When a correlation coefficient is used to
describe the relationship between a set of scores
obtained by the same group of individuals on a
particular instrument and their scores on some
criterion measure, it is called a validity
coefficient.
Construct-related evidence of validity
• It refers to the nature of the
psychological construct or
characteristic being measured by the
instrument.
– How well does a measure of the
construct explain differences in the
behavior of individuals or their
performance on certain tasks?
Construct-Related Evidence
• Construct: something constructed by
mental synthesis
– What is Intelligence? Beauty? Quality?
• Construct Validity Evidence
– assembling evidence about what a test
means (and what it doesn’t)
– sequential process; generally takes several
studies
Construct-Related Evidence
• Considered the broadest of the three categories.
• There is no single piece of evidence that
satisfies construct-related validity.
• Researchers attempt to collect a variety of types
of evidence, including both content-related and
criterion-related evidence.
• The more evidence researchers have from
different sources, the more confident they
become about the interpretation of the
instrument.
Ensuring Construct Validity
• Independent judges all indicate that all items on
the test require mathematical reasoning.
• Independent judges all indicate that the features
of the test itself (such as test format, directions,
scoring, and reading level) would not in any way
prevent students from engaging in mathematical
reasoning.
• Independent judges all indicate that the sample of
tasks included in the test is relevant and
representative of mathematical reasoning tasks.
• A high correlation exists between scores on the
test and grades in mathematics.
• High scores have been made on the test by
students who have had specific training in
mathematical reasoning.
• Students actually engage in mathematical
reasoning when they are asked to “think
aloud” as they go about trying to solve the
problems on the test.
• A high correlation exists between scores on the
test and teacher ratings of competence in
mathematical reasoning.
• Higher scores are obtained on the test as
compared to other subjects.
How correlation is calculated
• Validity coefficient. A correlation coefficient
that indicates the degree to which a measure
predicts or estimates performance on some
criterion measures (e.g., correlation between
scholastic aptitude scores and grades in
school). It ranges from 1.00 to -1.00
Scatter Plots and Types of Correlation
x = SAT score
y = GPA
4.00
3.75
3.50
3.25
GPA

3.00
2.75
2.50
2.25
2.00
1.75
1.50

300 350 400 450 500 550 600 650 700 750 800
Math SAT
Positive Correlation–as x increases, y
increases
Scatter Plots and Types of Correlation
x = hours of training
60
50
y = number of accidents
Accidents

40
30

20

10
0
0 2 4 6 8 10 12 14 16 18 20
Hours of Training
Negative Correlation–as x increases, y
decreases
Scatter Plots and Types of Correlation
x = height y = IQ
160
150
140
130
IQ

120
110
100
90
80
60 64 68 72 76 80
Height
No linear correlation
Factors affecting validity
• Unclear test directions
• Confusing and ambiguous test items
• Vocabulary that is too difficult for test takers
• Overly difficult and complex sentence structure
• Inconsistent and subjective scoring
• Untaught items
• Failure to follow standardized administration
procedures
• Cheating by the participants or someone
teaching to the test items
Threats to Internal Validity
Internal Validity
• Defending against sources of bias arising in
research design.
• Indicates whether the independent variable was
the sole cause of the change in the dependent
variable
• When there is lack of internal validity, variables
other than the independent(s) being studied may
be responsible for part or all of the observed
effect on the dependent variable(s).
PhD Dissertation Kent State University

• Task based language teaching versus traditional


way of language teaching in Saudi
Intermediate School: A comparative study
• Independent Variable
– Teaching Methods
• Tasked based
• Traditional
• Dependent Variable
– Students’ reading comprehension achievement test
Subject characteristics Threat
• Subjects differ in characteristics like
– Age
– Gender
– Grade in school
– Intelligence
– SES
• Control is required
Loss of subjects (Mortality Threat)
• Mortality (death)
– Drop out of school
– Move out of town
– Stop participating
Effects generalizability
Is replacement a solution
Inform in advance
Try to minimize
Location Threat
• Environmental factors
– Factors related to the setting
• Lighting
• Noise
• Changes in setting
• Hold location constant
Instrumentation
• Instrument decay
– The instrument changes over time
• grading papers, fatigue of teacher
• Battery grows weaker
– Schedule data collection/scoring
Instrumentation
• Data collector characteristics
– Gender
– Age
– Language
– Authority figure
Instrumentation
• Data collector bias
– Different administrators of test
– Inadvertent body language
– Training can overcome this

• Standardize all procedures


• Train data collectors
Testing Threat
• Being exposed to the testing has an effect
– Pre-test informs participants
– Test-retest, first exposure provides practice
History
• Events that occur to the participants during the
span of the data collection
• In experimental study, one group has a
different experience than another group
Maturation
• The participants change over time – over the
span of the data collection
Attitude of Participants
• Subjects opinion and participation can
influence the outcome.
• Observing or studying subjects can affect their
responses, Hawthorne effect.
• Subjects receiving experimental treatment may
perform better due to “receiving” treatment.
• Subjects in the control group may perform
more poorly than the treatment group.
Implementation
• Usually shows up with experimental groups
• Researcher may favor one method over
another and subtly sets up one group for better
performance
• The difference between the selection of two
groups is not random but an unrelated factor
Methods to Minimize Threats
 Standardization of the conditions under which the
research study is carried out will help minimize threats
from history and instrumentation.
 Obtain as much information as possible about the
participants in the research study minimizes threats from
mortality and selection.
 Obtain as much information as possible about the
procedural details of the research study, for example,
where and when the study occurs, minimizing threats
from history and instrumentation.
 Choose an appropriate research design which can help
control most other threats.
An Example
Face Validity
• Face Validity This concept is really not an
index of validity at all. Rather, it simply
addresses the layman acceptability of a
measure (Gregory, 1992).

You might also like