Professional Documents
Culture Documents
Introduction to Validity
What is validity?
When we talk about validity, we are asking:
Does a test measure what it claims to measure?
Does the achievement test measure how well an individual has mastered the content of a course or
training program?
Does the aptitude test measure a person’s ability to perform a task or activity?
Does the test predict what it claims to predict?
Does an employment test predict future performance on the job?
What is the relationship between Validity and Reliability?
If a test is unreliable, it can’t be valid
Reliability is necessary but not sufficient for validity (places a ceiling on validity)
How to Assess Validity?
Validating a test refers to accumulating empirical data and logical arguments to show that the inferences
are indeed appropriate:
Analyze the content of the test
Relate the scores to specific criteria
Examine the psychological constructs measured by the test
What are the types of validity?
Content Validity
How well do the items on a test represent everything the test attempts to measure (I.e., how well has the
construct been operationalized)?
Requires a well-defined trait/construct
This can be difficult to do with abstract/complex constructs
“Expert” judges can be used to determine content validity.
Type of Content Validity = Face Validity:
Does a test superficially measure the construct
Tells us nothing about what a test really measures
Non statistical
Established by non-experts
Criterion Validity
Extent to which the results relate to a certain criterion
Types of Criterion Validity
Concurrent validity: correlation between results and a pre-existing criterion.
Extent to which test scores accurately estimate an individual’s present position on relevant
criteria.
Appropriate for validating clinical tests that diagnose behavioral, emotional, or mental disorders
(e.g., personality inventories).
Often used in time sensitive areas, or with short forms of a measure.
Predictive validity: predictive power of results on a future outcome (regression).
Test scores are used to estimate outcome measures obtained at a later date.
E.g., entrance examinations and employment tests (e.g., GRE)
The good ol’ regression equation:
Y= b0 + b1X
Y= the predicted score on the criterion
b0 = the intercept
b1 = the slope
X = the score the individual made on the predictor test
Criterion Contamination: artificial inflation of the strength of relationship between a test and criterion.
Max validity is set by the reliability of the test and the criterion.
Construct Validity
Extent to which a test measures the construct it claims to measure
Construct: an intangible quality in which individuals differ, e.g., depression, intelligence, etc.
Considerations:
1)Test homogeneity
2) Appropriate developmental changes
3) Theory consistent group differences
4) Theory consistent intervention effects
Types of Construct Validity
Convergent validity – when a test highly correlates with other related tests or variables.
Discriminant validity – when a test does not correlate with unrelated variables.
Ways to establish evidence
C & D Validation
When do you use which validation strategy?
Content validity: for tests that measure concrete attributes (observable and measurable
behaviors).
Criterion-related validity: for tests that predict outcomes.
Construct validity: for tests that measure abstract constructs.
How do you calculate and Evaluate Validity
Coefficients? Example: The correlation between SAT
Calculating Validity Coefficients scores and college performance is 0.40. How
Correlation coefficient – quantitative much of the variation in college performance
estimate of the relationship between two is explained by SAT Scores?
variables
Validity Coefficient – the correlation r2 = 0.16, so 16% of the variance is
coefficient between two sets of test explained (and so 84% is not
scores explained).
There are 2 methods for evaluating validity
coefficients
Tests of significance
Expectations for strong relationship not as high as with reliability coefficients
Answers question “How likely is it that the correlation between the test and the
criterion resulted from chance or sampling error?”
If p<.05 we have evidence that the test and criterion are related
Coefficient of determination
Answers the question, “What amount of variance do the test and criterion share?”
Equals validity coefficient squared (r2)
E.g., if r = .50, then r2 = .25 (25% shared variance)
Multitrait-multimethod matrix
Background Information
A way of representing the relationship between multiple traits (constructs) and multiple
methods for measuring those traits.
Visually displays information about both reliability and validity in an succinct way.
Basic principles of the MTMM:
Coefficients in the reliability diagonal should consistently be the highest in the matrix.
A trait should be more highly correlated with itself than with anything else
Coefficients in the validity diagonals should be significantly different from zero and
high enough to warrant further investigation.
This is evidence of convergent validity
A validity coefficient should be higher than values lying in its column and row in the
same heteromethod block.
A validity coefficient should be higher than all coefficients in the heterotrait-
monomethod triangles.
Trait factors should be stronger than methods factors
The same pattern of trait interrelationship should be seen in all triangles.
The Matrix
What is Intelligence?
Standardized test performance?
Ability to perform certain mental operations quickly and accurately?
Memory?
Intelligence:
An internal capacity hypothesized to explain people’s ability to solve problems, learn about new
materials, and adapt to new situations.
How accurate are we in judging intelligence?
Anything missing in the definition?
Theories of Intelligence
Sir Francis Galton
Spearman
Thurstone
Vernon
Cattel + Cattell-Horn-Carroll
Guilford
Biological Theory
Information Processing
Gardner
Sternberg
Verbal
Proposed abilities measured
Comprehension
Similarities Abstract verbal reasoning
The degree to which one has learned, been able to comprehend and
Vocabulary
verbally express vocabulary
Information Degree of general information acquired from culture
Ability to deal with abstract social conventions, rules and
(Comprehension)
expressions
Processing
Proposed abilities measured
Speed
Symbol Search Visual perception/analysis, scanning speed
Visual-motor coordination, motor and mental speed, visual
Coding
working memory
(Cancellation) Visual-perceptual speed
WAIS-IV Standardization
Used census data sample 2,220 adults stratification of population
13 Age bands (16-17, 18-19)
Reliability:
Split half reliability full scale is .98
Supported use with specialized populations
SEM 2.6 pts 95% of time +/- 4 points of IQ
Individual subtest are weaker
Validity
Good content validity, criterion related validity .94 with WAIS-III, appropriate convergent and
divergent validity, construct validity confirmed
Learning Disabilities
Previous method
Difference of 1+ SDs between individual intelligence test and individual achievement test in one or
more areas
Contemporarily
National Joint Committee on Learning Disabilities (NJCLD) abandoned the approach between
discrepancy between ability and achievement
Social and emotional difficulties common
Operationalizing LD
1) Discrepancy between general ability and specific achievements
2) Related considerations
Examine psychosocial skills, physical and sensory abilities
Types: Verbal vs Non-Verbal
3) Alternative Explanations
Rule out non LD explanations for learning difficulties (hearing loss, depression, ADHD, etc.)