Validity R2

Validity
“Something is valid if it is sound,

meaningful, or well-grounded
on principles or evidence”
People make judgments
based on the following:
» Evidence of the meaningfulness or the veracity of
something
» Validity is a term used in conjunction with the
meaningfulness of a test score—what the test score truly
means.
Learning Objectives
» Differentiate validity from reliability

» Discuss the types of validity.
» Determine the relationship between establishing test validity and using the scientific
method
» Discuss the difference between predictive and concurrent criterion validity evidence
» Identify the characteristics of each type of validity
There are many parallels
between validity and reliability
» Both are characteristics of good test.
» Both assesses the degree to which test scores are accurate
measures of knowledge or characteristics.
» Both are terms that actually identify a set of related concepts
and procedures rather than a single type of assessment.
In essence
» A reliable test, like a reliable person, can be depended on to
operate in a consistent manner.
» A valid test, like a valid point in a discussion, is appropriate
to the issue at hand.
What is Validity?
» The extent to which a/an test /instrument measures what it claims to
measure or what it purports to measure.
» Is a judgment or estimate of how well a test measures what it purports
to measure in a particular context
» It is a judgment based on evidence about the appropriateness
of inferences drawn from test scores.
What is Validity?
» Characterizations of the validity of tests and test scores are frequently
phrased in terms such as “acceptable” or “weak.”
» Inherent in a judgment of an instrument’s validity is a judgment of how
useful it is for a particular purpose with a particular population of
people.
Why is it necessary that test
should be valid?
» It is vital for a test to be valid in order for the results to
be accurately applied and interpreted.
Relationship between R & V
» RA indicates the ability of a test to produce consistent

scores (stable characteristics)
» VA indicates which stable characteristics test scores
measure
Thoughts to ponder
» In general, reliability is viewed as necessary but not
sufficient for validity. All valid tests are reliable, but a
reliable test may or may not be valid. R is a prerequisite for
validity, R studies typically precede validity studies in
the process of test analysis.
Validity vs. Validation
» Validation is the process of gathering and evaluating evidence about validity.
» Both the test developer and the test user may play a role in the validation of a
test for a specific purpose.
» It is the test developer’s responsibility to supply validity evidence in the test
manual. It may sometimes be appropriate for test users to conduct their own
validation studies with their own groups of test takers.
» Local validation studies are absolutely necessary when the test user plans
to alter in some way the format, instructions, language, or content of the test.
Types of Validity
» Face validity
» Content validity
–Measures appropriate domain
» Criterion-related validity
–Predicts future performance on appropriate variables
» Construct validity
–Measures appropriate characteristics of test takers
Types of Validity
» The classic conception of validity, referred to as the trinitarian
view (Guion, 1980)
» It might be useful to visualize construct validity as being
“umbrella validity” since every other variety of validity falls under
it.
Types of Validity
» Three approaches to assessing validity—associated,
respectively, with content validity, criterion-related
validity, and construct validity—are
1. Scrutinizing the test’s content
2. Relating scores obtained on the test to other test scores or other
measures
3. Executing a comprehensive analysis of:
a. How scores on the test relate to other test scores and measures
b. How scores on the test can be understood within some theoretical
framework for understanding the construct that the test was designed to
measure
Face Validity
Face Validity
» Suggests that an instrument “looks like” it is measuring what it is
supposed to measure
» In face validity, the judgment about item appropriateness is made by
the test taker
» Is determined by a superficial examination of test items and is based on
the presence of obvious relationships between items and the domain.
Content Validity
Content Validity
» Refers to the representativeness of the instrument to the entire domain of
content desired
» Is determined by an in – depth analysis of the exam by someone who
is knowledgeable about the content domain.
» It is based on clear relationships among item content, format, and
distribution and the structure of the domain.
» Does the instrument represent all possible questions in the domain?
» Panel of experts rate each items relevance experts must know the objectives
– What is a score supposed to represent?
When a test has content validity
» The items on the test represent the entire range of possible items
the test should cover.
» Individual test questions may be drawn from a large pool of items
that cover a broad range of topics.
How to quantify content validity?
» Lawshe (1975) developed a method for gauging agreement
among raters or judges regarding how essential a particular item
is.
» Is the skill or knowledge measured by this item
–Essential
–Useful but not essential
–Not necessary
Formula
𝑛𝑒 − 𝑁ൗ
2
CVR=
𝑁ൗ
2
» CVR: Content Validity Ratio

» ne: Number of panelists indicating “essential”
» N: Total number of panelists
Note: Refer to Table 6 – 1 page 179 of Cohen & Swerdik’s Book

for the minimum CVR needed in your instrument
Exercise 1: Compute for the CVR of
the following questions.
Useful But Not
Essential Not Necessary
Essential
How do mothers describe the experiences of
raising their children diagnosed with ASD?
How do mothers of children diagnosed with

ASD describe their daily lives?
Under what situations or circumstances do

Validator 1
mothers of children diagnosed with ASD feel

that the given support are sufficient or
lacking for them?
What recommendation/s would mothers of
children diagnosed with ASD give to other
mothers who are experiencing the same
situation?
Useful But Not
Essential


Validator 2

lacking for them?
situation?
Useful But Not
Essential


Validator 3

lacking for them?
situation?
Criterion-Related Validity
» The characteristics measured by the test that do predict criterion scores
are defined as the valid or relevant characteristics.
» Is a judgement of how adequately a test score can be used to infer an
individual’s most probable standing on some measure of interest –
the measure of interest being the criterion
» A test is said to have criterion-related validity when the test
has demonstrated its effectiveness in predicting criterion or indicators
of a construct.
» 2 types of criterion–related validity:

1. Concurrent
2. Predictive
» If test scores are obtained at about the
Concurrent same time that the criterion measures
are obtained, measures of the
Validity relationship between the test scores and
the criterion provide evidence of C.V.
» Is an index of the degree to which a
test score is related to some
criterion measure obtained at the same
time.
Concurrent » Occurs when the criterion measures
are obtained at the same time as the
Validity test scores.
» This indicates the extent to which the test
scores accurately estimate an individual’s
current state with regards to the criterion.
» Test scores may be obtained at one time
and the criterion measures obtained at a
Predictive future time, usually after some intervening
event has taken place.
Validity » Measures of the relationship between the
test scores and a criterion measure
obtained at a future time
» Is an index of the degree to which a
test score predicts some criterion
measure
Predictive » Occurs when the criterion measures

are obtained at a time after the test.
Validity » Examples of test with predictive
validity are career or aptitude tests, which
are helpful in determining who is likely
to succeed or fail in certain subjects
or occupations.
What is a criterion?
» A standard against which a test or test score
is evaluated.
Characteristics of a Criterion
» Criterion is relevant
–It is pertinent or applicable to the matter at hand
» Criterion is valid
–For the purpose for which it is being used
» Criterion is uncontaminated
–Criterion contamination is the term applied to a criterion that has
been based, at least in part, on predictor measures.
2 Types of Statistical Evidence:
Concurrent or Predictive Validity
1. Validity coefficient
» A correlation coefficient that provides a measure of the relationship
between test scores on the criterion measure
2. Expectancy data
» Provide information that can be used in evaluating the criterion –
related validity.
» Shows the percentage of people within specified test – score
interval who subsequently were placed in various categories of the
criterion.
Sample Expectancy Chart
Exercise 2: Sample Data for the validity
coefficient
Person A B C D E F G H I J
Test Score 2.0 5.5 4.5 4.0 3.0 6.0 2.5 3.0 3.5 4.0
Job
1.0 3.0 3.0 2.0 2.5 3.5 1.5 1.5 2.0 1.5
Performance
With Reference to Exercise 2
• What specific type of validity must be used?
Construct Validity
Construct Validity
» Is a judgment about the appropriateness of inferences drawn from test
scores regarding individual standing on a variable called a construct.
–Construct is an informed, scientific idea developed or hypothesized to describe
or explain behavior.
–Intelligence, anxiety, job satisfaction, personality, clerical aptitude, depression,
motivation
Construct Validity
» Constructs are unobservable, presupposed (underlying) traits that a
test developer may invoke to describe test behavior or criterion
performance.
–The researcher investigating a test’s construct validity must
formulate hypotheses about the expected behavior of high scorers and low
scorers on the test.
» A test has construct validity if it demonstrates an association between
the test scores and the prediction of a theoretical trait.
Evidence of
Construct Validity
Evidence of » Homogeneity refers to how uniform
a test is in measuring a single
Homogeneity concept.
Ways to Improve Homogeneity
» Eliminating items that do not show significant correlation
coefficients with total test scores.
–If all test items show significant, positive correlations with total test
scores;
–If high scorers on the test tend to pass each item more than low scorers
do, then each item is probably measuring the same construct as the total
test.
–Each item is contributing to test homogeneity.
» Item-analysis procedures have also been employed in the quest for
test homogeneity.
» Some constructs are expected to
Evidence of change over time.
–If a test score purports to be a measure
Changes with of a construct that could be expected
to change over time, then the test score,
Age too, should show the same progressive
changes with age to be considered a
valid measure of the construct.
» Evidence that test scores change as
Evidence of a result of some experience between
Pretest– a pretest and a posttest can be
evidence of construct validity.
Posttest Changes –Formal education, a course of therapy or
medication, and on-the-job experience.
» One way of providing evidence for the
validity of a test is to demonstrate that
scores on the test vary in a predictable
Evidence From way as a function of membership in some
group.
Distinct Groups or
Method of » The rationale here is that if a test is a
valid measure of a particular construct,
Contrasted Groups then test scores from groups of people
who would be presumed to differ with
respect to that construct should have
correspondingly different test scores.
» Evidence for the construct validity of a
particular test may converge/unite from a
number of sources, such as other tests or
measures designed to assess the same
(or a similar) construct.
Convergent » Thus, if scores on the test undergoing

Evidence construct validation tend to correlate
highly in the predicted direction with
scores on older, more established, and
already validated tests designed to
measure the same (or a similar)
construct, this would be an example of
convergent evidence.
» Correlate scores on new test of anxiety
with physiological and behavioral
measures characteristics of anxiety;
expect high + correlations.
Convergent
» If scores on the test undergoing
Evidence construct validation tend to correlate
highly in the predicted with scores on
older, more established, and already
validated test designed to measure the
same or similar construct.
» Convergent evidence for validity may
come not only from correlations
with tests
Convergent » Purporting to measure an identical construct
but also from correlations with measures
Evidence purporting to measure related constructs.
» Correlate scores on new test of anxiety with
physiological and behavioral
measures characteristics of anxiety; expect
high + correlations.
» If scores on the test undergoing
Convergent construct validation tend to correlate
highly in the predicted with scores on
Evidence older, more established, and already
validated test designed to measure the
same or similar construct.
» A validity coefficient showing little (that
is, a statistically insignificant)
relationship between test scores
Discriminant and/or other variables with which scores
on the test being construct-validated
Evidence should not theoretically be correlated
provides discriminant evidence of
construct validity (also known as
discriminant validity).
Factor Analysis
» Is a shorthand term for a class of mathematical procedures designed to
identify factors or specific variables that are typical attributes, characteristics,
or dimensions on which people may differ
» Exploratory factor analysis – typically entails estimating or extracting
factors; deciding how many factors to retain
» Confirmatory factor analysis – a factor structure is explicitly hypothesized and
is tested for its with the observed covariance structure of the measured
variables
Validity Coefficient
» The relationship between a test and a criterion is usually expressed as
a correlation called a validity coefficient.
» This coefficient tells the extent to which the test is valid for making
statements about the criterion.
» There are no hard-and-fast rules about how large a validity
coefficient must be to be meaningful.
» In practice, one rarely sees a validity coefficient larger than .60.
» Validity coefficients in the range of .30 to .40 are commonly considered
high.
Sources of Invalidity
Unreliability Response Sets Bias
» The test is not reliable

» Psychological orientation or bias towards answering in a particular way:

–Acquiescence – tendency to agree
–Social desirability – tendency to portray self in a positive light.
–Faking bad – purposely saying 'no' or looking bad if there's a ‘reward’
» Cultural bias » Bias in measurement occurs when the test

» Gender bias may also be possible. makes systematic errors in measuring a
particular characteristic or attribute
» Test bias
» Bias in prediction occurs when the test
makes systematic errors in predicting
some outcome (or criterion).
Personal review guide:

Validity R2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Validity R2

Uploaded by

Copyright:

Available Formats

Validity

“Something is valid if it is sound,

» Differentiate validity from reliability

» RA indicates the ability of a test to produce consistent

» CVR: Content Validity Ratio

Note: Refer to Table 6 – 1 page 179 of Cohen & Swerdik’s Book

How do mothers of children diagnosed with

Under what situations or circumstances do

mothers of children diagnosed with ASD feel

How do mothers of children diagnosed with

Under what situations or circumstances do

mothers of children diagnosed with ASD feel

How do mothers of children diagnosed with

Under what situations or circumstances do

mothers of children diagnosed with ASD feel

» 2 types of criterion–related validity:

Predictive » Occurs when the criterion measures

Convergent » Thus, if scores on the test undergoing

Unreliability Response Sets Bias

» The test is not reliable

Unreliability Response Sets Bias

» Psychological orientation or bias towards answering in a particular way:

Unreliability Response Sets Bias

» Cultural bias » Bias in measurement occurs when the test

You might also like