You are on page 1of 5

6.

Validity

Reliability is a necessary, but not sufficient, condition for validity. The lower the reliability, the
lower the validity à Rxy < √Rxx.
Rxy=validity coefficient(correlation between scores on procedure X and external criterion Y).
Rxx=reliability coefficient.

Correction for attenuation in the criterion variable:

Rxt =        Rxy     à Rxt = correlation between scores on some procedure and “true score”.
   √Ryy

Using different reliability estimates is likely to lead to different conclusions regarding validity.
An underestimation of Ryy produces an overestimation of the validity coefficient.

* Traditionally, validity was viewed as the extent to which a measurement procedure actually
measures what it is designed to measure.
* Validation = the investigative process of gathering or evaluating the necessary data:
1). WHAT a test or procedure measures;
2). HOW WELL it measures.
* Validity is not a dichotomous variable (valid or not valid), but a matter of degree.
* No different “kinds” of validity, but only different kinds of evidence for analyzing validity.
* It is the inferences regarding the specific uses of a test or other measurement procedure that are
validated, not the test itself.
* Validity is an evolving property and validation is a continuing process.
 

1. Content-related evidence
The extent to which items cover the intended domain.

Whether or not a measurement procedure contains a fair sample of the universe of


situations it is supposed to represent (adequacy of the sampling). 3 assumptions:
1. the area of concern to the user can be conceived as a meaningful, definable universe of
responses;
2. a sample can be drawn from the universe in some purposeful, meaningful fashion;
3. the sample and sampling process can be defined with sufficient precision to enable the
user to judge how adequately the sample of performance in the universe.
* If a selection procedure focuses on work products (typing), then content-related
evidence is appropriate. Of the focus is on work processes (reasoning ability), then
content-related evidence is not appropriate (because not directly observable).
* The distinction between a content-related strategy and a construct-related strategy is a
matter of degree, because constructs underlie all psychological measurement. Content-
related is a precondition for construct-related validity evidence.
* Steps:
1. Conduct a job analysis;
2. Share the list of KSAO’s with Subject Matter Experts (SME’s);
3. They should think of an individual who is a newly appointed job incumbent;
4. Think about alternative items;
5. Keep minimum qualifications straightforward and express them using the same format;
6. Ask SME’s to rate the list of potential items independently;
7. Link each of the potential items back to the KSAO’s;
8. Group potential items in a thoughtful manner.
* Content-related evidence may be evaluated in terms of the extent to which members of
a content-evaluation panel perceive overlap between the test and the job-performance
domain.
*              - Content-Validity Index (CVI) à content-validity ratio (CVR)
               - Substantive-Validity Index
               - Content-Adequecy procedure
               - Analysis-of-Variance Approach.
These procedures illustrate that content-related evidence is concerned primarily with
inferences about test construction rather than with inferences about the test scores.

* Although it does have its limitations, it has made a positive contribution by directing
attention toward (1) improved domain sampling and job analysis procedures, (2) better
behavior measurement, (3) the role of expert judgment in confirming the fairness of
sampling and scoring procedures and in determining the degree of overlap between
separately derived content domains.
 

2. Criterion-related evidence
Empirical relationship between predictor and criterion scores.

Predictive study = oriented toward the future and involves a time interval during which
events take place à “Is it likely that Laura will be able to do the job?”
*              1. Measure candidates for the job;
               2. Select candidates without using the results of the measurement procedure;
               3. Obtain measurements of criterion performance at some later date;
               4. Assess the strengths of the relationship between predictor and criterion.
* Statistical power: the probability of rejecting a null hypothesis when it is false:
Parameters:       1. The power of the test (1–ß)
                           2. Type I error (α)
                           3. Sample size, N (power increases as N increases)
                           4. Effect size (power increases, as effect size increases).
* A power analysis should be conducted before a study is conducted. 
* “small” (.10), “medium” (.30) of “large” (.50) effects.
* Assuming that multiple predictors are used in a validity study and that each predictor
accounts for some unique criterion variance, the effect size of a linear combination of the
predictors is likely to be higher than the effect size of any single predictor.
* When has an employee been on the job long enough to appraise his performance?
When there is evidence that the initial learning period has passed (+ after 6 months).

Concurrent study = oriented toward the present and reflects only the status quo at a
particular time à “Can Laura do the job now?”
* Criterion measures usually are substitutes for other, more important, costly, complex
performance measures. Valuable only if:
1). There is a relationship between the convenient/accessible measure and the
costly/complex measure;
2). The use of the substitute measure is more effective.
* With cognitive ability tests, concurrent studies often are used as substitutes for
predictive studies.
* This design ignores the effects of motivation and job experience on ability.
 

Factors affecting the size of obtained validity coefficients:

 Range Improvement
 Range Limit
Because the size of the validity coefficient is a function of two variables, restricting the
range (truncating or censoring) either of the predictor of the criterion will serve to lower
the size of the validity coefficient (figure 7-1, p. 150). Selection effects on validity
coefficients result from changes in the variance(s) of the variable(s).
 à Direct range restriction & Indirect/Incidental range restriction (when an experimental
predictor is administrated to applicants, but is not used as a basis for selection decisions).
* The range of scores also may be narrowed by preselection; when a predictive validity
study is undertaken after a group of individuals has been hired, but before criterion data
become available for them.
* Selection at the hiring point reduces the range of the predictor variable(s), and selection
on the job during training reduces the range of the criterion variable(s).
 

To correct for direct range restriction on the predictor when nu third variable is involved; 3
scenario’s (formula’s à figures 7-4, 7-5, 7-6, p. 151).
1). See formula
2). Selection takes place on one variable (either the predictor or the criterion), but the
unrestricted variance is not known
3). If incidental restriction takes place on third variable z and the unrestricted variance on z is
known.
à In practice, there may be range-restriction scenarios that are more difficult to address with
corrections. These include:
1. Those were the unrestricted variance on the predictor, the criterion, or the third variable is
unknown, and:
2. Those were there is simultaneous or sequential restriction on multiple variables.
Multivariate-correction formula = can be used when direct restriction (one or two variables)
and incidental restriction take place simultaneously. Also, the equation can be used repeatedly
when restriction occurs on a sample that is already restricted à Computer program RANGEJ,
which makes this correction easy to implement.
There are several correction procedures available. Criterion-related validation efforts focusing on
a multiple-hurdle process should consider appropriate corrections that take into account that
range restriction, or missing data, takes place after each test is administered. Corrections are
appropriate only when they are justified based on the target population (the population to which
one wishes to generalize the obtained corrected validity coefficient).

Homoscedasticity = The computation of the Pearson product-moment correlation coefficient


assumes that both variables are normally distributed, the relationship is linear, and when the
bivariate distribution of scores (from low to high) is divided into segments, the column variances
are equal = the data points are evenly distributed throughout the regression line and the measure
predicts as well at high score ranges as at low score ranges. (=als alle variabelen dezelfde
eindige variantie hebben. Homoscedasticiteit duidt met andere woorden aan dat in een reeks
resultaten de variantie van de residuen onafhankelijk is van de afhankelijke variabele).

3. Construct-related evidence
The understanding of a trait or construct that a test measures.

A conceptual framework specifies the meaning of the construct, distinguishes it from


other constructs, and indicates how measures of the construct should relate to other
variables. Provides the evidential basis for the interpretation of scores.
Construct validation is not accomplished (volbracht) in a single study; it requires an
accumulation of evidence derived from many different sources to determine the meaning
of the test scores and an appraisal of their social consequences à both logical and
empirical process.

The construct is defined not by an isolated event, but, rather, by a nomological


network. Information relevant either to the construct or to the theory surrounding the
construct may be gathered from a wide variety of sources.

Convergent validation = samenhang tussen de resultaten van het oorspronkelijke onderzoek en


de resultaten van een gelijksoortig onderzoek.
Discriminant validation = samenhang tussen de onderzoeksresultaten en een andersoortig
onderzoek.

Multitrait-multimethod matrix (MTMM) = an approach to examining construct validity à figure


7-2, p. 155. In this approach, reliability estimated by two measures of the same trait using the
same method, while validity is defined as the extent of agreement between two measures of the
same trait using different methods. This shows that the concepts of reliability and validity are
intrinsically connected, and a good understanding of both is needed to gather construct-related
validity evidence.
Limitations:
1). The lack of quantifiable criteria;
2). The inability to account for differential reliability;
3). The implicit assumptions underlying the procedure.

Cross Validity = to whether the weights derived from one sample can predict outcomes to the
same degree in the population as a whole or in other samples drawn from the same population.
There are procedures available to compute cross-validity:
- Empirical cross-validity à fitting a regression model in a sample and using the resulting
regression weights with a second independent cross-validation sample.
- Statistical cross-validity à adjusting the sample-based multiple correlation coefficient (R) by a
function of sample size (N) and the number of predictors (k).
Cross-validation, including rescaling and reweighting of items if necessary, should be continual
(recommended annually), for as values change, jobs change, and people change, so also do the
appropriateness and usefulness of inferences made from test scores.

In many cases, local validation may not be feasible due to logistics or practical constraints.
Several strategies available to gather validity in such situations:

 Synthetic Validity
Process of inferring validity in a specific situation from a systematic analysis of jobs into
their elements, a determination of test validity for these elements, and a combination or
synthesis of the elemental validities into a whole.

 Test Transportability
to be able to use a test that has been used elsewhere locally without the need for a local
validation study, evidence must be provided regarding some points.

 Validity Generalization (VG)


Meta-analyses conducted with the goal of testing the situational specificity hypothesis
have been labelled psychometric meta-analysis or VG studies. This allows small
organizations to implement tests that have been used elsewhere without the need to
collect data locally. The use of VG evidence alone is not recommended.
 

Empirical Bayes Analysis = his approach involves first calculating the average inaccuracy of
meta-analysis and a local validity study under a wide variety of conditions and then computing
an empirical Bayesian estimate, which is a weighted average of the meta-analytically derived and
local study estimates.

You might also like