You are on page 1of 14

440286

and Evaluation in Counseling and Development


MECXXX10.1177/0748175612440286Wilson et al.Measurement

Applications of Assessment
Measurement and Evaluation in

Recalculation of the Critical Counseling and Development


45(3) 197­–210
© The Author(s) 2012
Values for Lawshe’s Content Reprints and permission: http://www.
sagepub.com/journalsPermissions.nav

Validity Ratio DOI: 10.1177/0748175612440286


http://mecd.sagepub.com

F. Robert Wilson1, Wei Pan1, and Donald A. Schumsky1

Abstract
The content validity ratio (Lawshe) is one of the earliest and most widely used methods for
quantifying content validity. To correct and expand the table, critical values in unit steps and at
multiple alpha levels were computed. Implications for content validation are discussed.

Keywords
measurement, content validity, content validity ratio, table of critical values

Content validation rests on demonstration that of the construct’s theoretical definition;


the test’s items are a representative sample of it is the extent to which a measurement
all items within the content domain of interest instrument captures the different facets
(Anastasi & Urbina, 1997; Kerlinger, 1986). of a construct. (Rungtusanatham, 1998,
Whether the researcher is evaluating the items p. 11)
on a test, questions in an interview, or ele-
ments of a set of accreditation standards, the However, many assessment tools are
items, questions, themes, or elements should developed for more practical reasons. An
all reflect the intended content of the evalua- assessment tool’s content validity is crucial
tion tool (Basham & Sedlacek, 2009). Fitzpat- when its scores are used as evidence in mak-
rick (1983) described six distinct views of ing decisions affecting an examinee’s access
content validity, including four that focus on to an educational or occupational opportunity,
the test items—clarity of the content domain, retention, or promotion. Lawshe (1975), an
relevance of test content to the content domain, industrial–organizational psychologist with
sampling adequacy of the test content, and the expertise in job performance assessment,
technical quality of the test items. Two others speaking about the late 1960s and early 1970s,
focused on the test responder—sampling ade- noted that “civil rights legislation, the atten-
quacy of test responses and relevance of test dant actions of compliance agencies, and a
responses to a behavioral universe. Spanning few landmark court cases have provided the
the breadth of views identified by Fitzpatrick, impetus for the extension of the application of
a centrist definition for content validity might
be phrased, 1
University of Cincinnati, Cincinnati, OH, USA
Content validity of a measurement Corresponding Author:
instrument for a theoretical construct F. Robert Wilson, University of Cincinnati, 445 TC/Dyer,
reflects the degree to which the mea- Cincinnati, 45221-0002, OH, USA
surement instrument spans the domain Email: f.robert.wilson@uc.edu

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


198 Measurement and Evaluation in Counseling and Development 45(3)

content validity from academic achievement critical values for a selection of subject matter
testing to personnel testing in business and expert (SME) sample sizes to permit signifi-
industry” (p. 563). Decrying the lack of litera- cance testing. As will be shown, Lawshe’s statis-
ture on content validity for employment tic has filled a need, becoming an internationally
assessment, he argued, “until professionals recognized method for establishing the con-
reach [consensus] regarding what constitutes tent validity of instrumentation across many
acceptable evidence of content validity, there disciplines. Developed at a time when statisti-
is a serious risk that the courts and enforce- cal analysis in the social sciences relied on
ment agencies will play the major determin- submitting data recorded on Hollerith punch
ing role” (p. 563). In an effort to advance the cards into mainframe computers, Lawshe’s
scholarship of assessment in employment set- item-level CVR, and its multi-item summary
tings he proposed, “Content validity is the statistic, the Content Validity Index, when
extent to which communality or overlap exists coupled with Schipper’s table of critical val-
between (a) performance on the test under ues, provided an easy-to-compute method for
investigation and (b) ability to function in the quantification and significance testing in stud-
defined job performance domain” (p. 566). ies of content validity.
Content validity is established by design and Unfortunately, whether due to a calculation
evaluated by rational analysis of test content by error, a typographical error, or a typesetter’s
qualified experts in the domain of content to be error, Schipper’s table of critical values
assessed (Allen & Yen, 2002). To establish con- appears to contain an anomaly. Although dis-
tent validity, assessment designers follow a mul- tributions of critical values are typically mono-
tistep process that includes defining the content tonic, Schipper’s table contains a discontinuity
domain and its facets, defining the level of dif- (noted by Stelly, 2006). Moreover, there is
ficulty or abstraction for the items, developing a apparently no record of how Schipper com-
pool of prospective items for each defined facet puted the set of critical values Lawshe pub-
of the content domain, and determining domain lished. The purpose of this study, therefore,
relevant sampling ratios (Anastasi & Urbina, was to identify how Schipper’s values were
1997). Some test authors might argue that if cor- computed and then to recompute the table of
rect process was strictly followed, a content critical values to correct the discontinuity.
valid instrument must surely follow. Best prac-
tices in test development, however, use postde-
velopment assessment of the instrument, based Lawshe’s Content
on a rational analysis by experts, of the repre- Validity Methodology
sentativeness (the extent to which each item Following established methodology, Lawshe’s
within each facet of the domain of content approach called for the assembly of a set of
reflects the facet’s content definition) and sam- SMEs who rated each of an instrument’s
pling adequacy (the extent to which all aspects items on a 3-point scale: (a) “essential,” (b)
of a facet are adequately covered by items; “useful, but not essential,” and (c) “not neces-
Reynolds, Livingston, & Willson, 2009). This sary.” His statistic, the content validity ratio
process was aided greatly by the development of or CVR, was a linear transformation of the
methods for quantification of the expert’s judg- ratio of the number of SMEs judging an item
ments, the first of which was the content validity to be “essential” to the total number of SMEs
ratio (CVR; Lawshe, 1975). in the panel. Specifically,
Lawshe introduced his method for quantify-
ne − ( N / 2)
ing content validity at the small, invitational CVR = ,
Content Validity Conference held at Bowling N /2
Green University in October 1974 (Guion, where ne is the number of SMEs indicating
1974). Subsequently, according to Guion, that the item is “essential,” and N is the total
Lawshe’s colleague, Lowell Schipper, calculated number of SMEs in the panel.

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


Wilson et al. 199

When all SMEs rate the item as being assessing child-rearing knowledge and practices
“essential,” the value of CVR will compute to for women with epilepsy (Saramma & Thomas,
be 1; when the number rating the item as 2010), a belief-based physical activity question-
“essential” is more than half but less than all, naire for diabetic patients (Ghazanfari, Niknami,
the value of CVR will be between 0 and 1; and Ghofranipour, Hajizadeh, & Montazeri, 2010), a
when less than half of the SMEs rate the item as checklist for performing content analysis on
“essential,” the value of CVR will be negative. patient education course syllabi (Gail-Hinckley
Although this statistic is no more than a linear Heitzer, McKenzie, Amschler, & Bock, 2009),
transformation of the proportion of SMEs judg- and for assessing whether generic quality of life
ing the item as “essential,” Lawshe’s true con- scales were free of content related to physical
tribution was in providing a table of critical function (Hall, Krahn, Horner-Johnson, & Lamb,
values, which he attributed to his colleague 2011). In the field of mental health and rehabili-
Lowell Schipper, for determining whether the tation, researchers developed scales for assessing
SMEs’ judgments exceeded chance expectation feelings of competence among children with
at a one-tailed alpha level of .05. attention-deficit/hyperactivity disorder (ADHD;
Compared with alternative methods for Hanc & Brzezinska, 2009), satisfaction with
quantifying content validity judgments, the treatment for sexual dysfunction (Corty, Althof,
Lawshe method is straightforward and user- & Wieder, 2011), and psychotherapist counter-
friendly, requiring only simple computations transference (Hayes, 2004) using CVR method-
and providing a table for determining a criti- ology to assess content validity. In a novel study,
cal cutoff value. Alternative methods such as cross-cultural researchers used the CVR to deter-
Cohen’s kappa (κ; Cohen, 1960), the Tinsley– mine the cultural relevance of items drawn from
Weiss T index (Tinsley & Weiss, 1975), the Indiana Job Satisfaction Scale (IJSS) thereby
James, Demaree, and Wolf’s (1993) rWG and producing a Chinese version of the IJSS for use
rWG(J) indexes, and Lindell, Brandt, and in vocational rehabilitation programs for indi-
Whitney’s (1999) r*WG(J) indexes are more viduals with mental retardation in China (Tsang
computationally complex than Lawshe’s CVR & Wong, 2005).
and focus on interrater agreement in general Medical and nursing assessment specialists
rather than on the specific issue of agreement have relied on the Lawshe approach for devel-
that an item is “essential” (Lindell & Brandt, oping an adult intubation procedural checklist
1999). (Stausmire, 2011), a quality-of-life index for
AIDS patients in Uganda (Namisango, Katabira,
Karamagi, & Baguma, 2007), a system for
Critical Acceptance of Lawshe’s auditing nursing care plans (Bjorvell, Thorell-
Methods Ekstrand, & Wredling, 2000), a low-literacy
Since its introduction in 1975, critical accep- assessment of patient knowledge regarding
tance of Lawshe’s CVR methodology has chronic obstructive pulmonary disease (Maples,
grown. The popularity of the Lawshe Franks, Ray, Stevens, & Wallace, 2010), a sur-
approach in scale development for health and vey to assess Medicaid recipients’ understand-
education sciences is demonstrated by the ing of the postpartum tubal sterilization process
number of articles published making refer- (Zite & Wallace, 2007), and a Swedish lan-
ence to the CVR and by the wide ranging guage version of the Problem Areas in Diabetes
studies in which it has been used. An elec- Scale (Amsberg, Wredling, Lins, Adamson,
tronic search of the Summon electronic data- & Johansson, 2008).
base revealed 94 articles containing the In the field of education, the content valid-
phrase, “content validity ratio” of which 51 ity of a scale for evaluating team-designed
were published in the past 5 years. material development manuals (Erdem, 2009)
Prevention and health promotion specialists and an affective response to literature scale
have used Lawshe’s CVR to develop scales for (Fischer & Fischer, 2007) was established by

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


200 Measurement and Evaluation in Counseling and Development 45(3)

SMEs working according to Lawshe’s meth- predict service representative applicants’


ods. Training specialists have used the CVR future job performance (Flint & Haley, 2008),
to assess job relatedness of the content of a a structured behavioral interview to hire pri-
job training program (Ford & Wroten, 2006) vate security personnel (Moscoso & Selgado,
and the job relatedness of an assessment of 2001), a job performance rating criterion
posttraining job knowledge (Distefano, Pryer, (Distefano, Pryer, & Erffmeyer, 2006), and
& Craig, 2006). job termination criteria for assessing mentally
Organizational developers and management ill workers (Mak, Tsang, & Cheung, 2006).
specialists have used Lawshe’s content validity Mathews, Smith, Hussey, and Plack (2010)
approach to assess the impact of the Deming used the CVR to develop an assessment tool
model for quality management (Collard, 1992) to measure participants’ perceptions of the
and to define and measure servant leadership roles, practices, education, and preferred rela-
behavior (Sendjaya, Sarros, & Santora, 2008). tionship of physical therapists and physical
A series of studies based on applications of the therapist assistants. Finally, Lawshe’s CVR
enterprise resource planning model in Asian was also used to develop tools for assessing
business markets, has used the CVR to develop critical factors related to Taiwanese expatri-
performance indicators or critical success fac- ates’ foreign post selection and overseas per-
tors (J. Huang, Zhao, & Li, 2007; S.-M. Huang, formance (Cheng & Lin, 2009).
Hung, Chen, & Ku, 2004; Wei, 2008; Yu, Ng, Despite its competitors (e.g., Cohen’s κ,
Chang, Chang, & Yen, 2011). Drossos and 1960; the Tinsley–Weiss T-Index, Tinsley &
Fouskas (2010) used the CVR to assess the con- Weiss, 1975; James et al.’s rWG and rWG(J)
tent validity of a tool developed to measure indexes, 1993; and Lindell et al.’s r*WG(J)
industry perceptions of the competitiveness of index, 1999), the Lawshe method has been
market environments and their own competi- endorsed in texts on personnel management
tive responses. (Lindell & Brandt, 1999) and endorsed for
Market research has also embraced the use in nursing research (Polit & Beck, 2006;
Lawshe method for assessing content validity. Polit, Beck, & Owen, 2007). Its tabled critical
Tools for assessing consumer adaption to or values have been reproduced in texts such as
adoption of broadband (Choudrie, Dwivedi, & the Cohen and Swerdlik (2005) text on psy-
Brinkman, 2006), Internet stock trading (Hung, chological testing and assessment.
Huang, & Yen, 2004), and airport self-service
check-in kiosks (Chang & Yang, 2008a) were
developed using CVR methodology. The CVR Problems With Schipper’s
was also used in developing criteria to segment Table of Critical Values
a customer base (Tai, 2011; Tai & Ho, 2010), Though Lawshe’s method has received com-
assess brand personality appeal (Henard, mendation and has been featured in research
Freling, & Crosno, 2011), and assess passenger studies in multiple disciplines, and is even
repurchase motivation (Chang & Yang, 2008b). being used in defense of the content validity
Concern over issues of internet security of high-stakes tests, it is not without criticism.
prompted the development of tools for assess- The main thrust of the criticism has been
ing perceived functional and relational value of directed toward three aspects of the table of
information sharing services (Tai, 2011) and critical values Lawshe provided for the CVR,
for assessing privacy concerns and levels of a table which Lawshe acknowledges was
information exchange for e-services on the developed by his colleague, Lowell Schipper.
Internet (Dinev & Hart, 2006), both of which Schipper’s table of critical values is terse.
were CVR-supported research tools. It only provides critical values for pools of 5,
In the field of personnel psychology, 6, 7, . . ., 14, 15, 20, 25, 30, 35, and 40 SMEs
Lawshe’s methodology has been used in the and this only for one alpha level. Although
development of a situational interview to interpolation of missing data points with a

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


Wilson et al. 201

linear function can be accomplished easily, a and a “Content Validity Ratio” by


scatter plot of Schipper’s table reveals that the which the relevance of a test item or a
critical values are curvilinear making accurate total test score might be scaled. (Lowell
interpolation problematic. Schipper has subsequently related the
A careful examination of the critical values CVR to chi square, permitting signifi-
also reveals an anomaly. The critical value for cance testing). (p. 18)
the CVR increases monotonically from the case
of 40 SMEs (CVRcritical = .29) to the case of 9 Apparently not having had access to Guion’s
SMEs (CVRcritical = .78) only to unexpectedly review of this conference, Lindell and Brandt
drop at the case of 8 SMEs (CVRcritical = .75) (1999) and Stelly (2006) speculated that the
before hitting its ceiling value at the case of 7 critical values were associated with the bino-
SMEs (CVRcritical = .99). When Cohen and mial distribution.
Swerdlick (2005) reproduced Schipper’s table
in their assessment text, they did not comment
on this apparent anomaly. When Wallace, Purpose of This Investigation
Gregory, Parham, and Baldridge (2003) used Since the Lawshe method is being used to pro-
the CVR method with nine SMEs to develop duce knowledge for diverse disciplines and its
and validate family residency recruitment ques- possibly flawed tabled values are being dis-
tionnaires, they used a CVRcritical of .75. seminated in print and electronic media, correc-
Whether using a CVRcritical of .75 at N = 9 was tion of the apparent errors in Lawshe’s (1975)
an error on their part in reading Schipper’s table presentation of Schipper’s table and extension
or an attempt to adjust for the apparent anomaly of the range of tabled values are warranted. The
at N = 8 is unknown. On reviewing Wallace purpose for this study is therefore to explore the
et al.’s (2003) work, Stelly (2006) observed, “it CVR’s underlying distribution and to correct and
is possible that the authors reversed the mini- expand the range of its tabled critical values.
mum CVRs for 8 and 9 panelists to correct
what they perceived to be an error in the origi-
nal table” (p. 6). The anomaly may also be a Do Schipper’s
function of something as simple as a typograph- Critical Values Map to the
ical error which escaped proofreading, or per- Binomial Distribution?
haps a typesetter’s error given the fact that in
the 1970s, many journals used hand-set type for Both Lindell and Brandt (1999) and Stelly
tables, if not for the whole of the journal. (2006) speculated that Schipper’s critical val-
But the most unsettling problem is that the ues were associated with the binomial distri-
statistical distribution underlying Lawshe’s bution, a more precise hypothesis than
table is not specified. In his defining article, Guion’s (1974) report of Schipper relating the
Lawshe (1975) stated that the table of critical CVR to chi square. To evaluate the proposi-
values for the CVR was calculated by his tion that Schipper’s table of critical values for
friend, Lowell Schipper. Unfortunately, he did the CVR was based on the binomial distribu-
not describe the basis on which these values tion, two approaches were taken: (a) an
were calculated. Lawshe had introduced the examination of the cumulative probabilities
CVR at the 1974 Content Validity Conference, for sets of independent Bernoulli trials and
a small invitational conference sponsored by (b) an examination of the normal approxima-
the Society for Industrial and Organizational tion for the binomial distribution.
Psychology (SIOP). Another SIOP member,
Robert M. Guion (1974) wrote,
Discrete Binomial Probabilities
C. H. Lawshe presented a scheme for To determine whether Schipper based his
classifying content validity problems table of critical values on discrete binomial

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


202 Measurement and Evaluation in Counseling and Development 45(3)

probabilities, the cumulative probabilities for and those computed using CRITBINOM for
sets of independent Bernoulli trials were cal- each alpha level was tested using the nonpara-
culated. Although we expected that this metric Wilcoxon signed-rank for dependent
approach would not yield a monotonic pro- samples to determine for which, if any, of the
gression of values, it seemed important to test alpha levels did the computed CVRcritical val-
this approach given Stelly’s (2006) advocacy ues differ from those attributed to Schipper.
for using exact probabilities. Because Schipper’s values achieve a ceiling
A key parameter in these calculations is the value of CVRcritical = .99 at a pool size of N =
value for p, the probability for any given trial 7, only the calculated values in the range of
of achieving success. The conventional way N = 7, . . ., 40 were tested for departure from
of construing the problem would be to view Schipper’s values.
Lawshe’s rating scale as a trichotomy, with Discrete binomial construing SME ratings as a
the three outcomes being (a) “essential,” (b) trichotomy. Examination of the proposition
“useful, but not essential,” and (c) “not neces- that Lawshe’s rating scale should be treated as
sary.” From this point of view, the parameter, a trichotomy rather than Lawshe’s favored
p would be ⅓. However, Lawshe construed dichotomy produced a poor fit to Schipper’s
the scale as a dichotomy, with the two out- critical values for the CVR. With the proba-
comes being (a) “essential” and (b) “not bility of success set at p = ⅓, for each criterion
essential” (with “useful, but not essential” and value for alpha, the distribution of binomial
“not necessary” being combined as the second probabilities yielded a pronounced, jagged or
category) yielding a value for p of ½. For this “saw-toothed” pattern. The best fit as evi-
exploration, both approaches were tried. denced by the mean absolute deviation from
For each approach (i.e., dichotomous, tri- Schipper’s values, was found to be at α = .05,
chotomous), a table of critical values based on two-tailed (or α = .025, one-tailed). The mean
the discrete binomial was computed using the absolute departure of the calculated values for
Microsoft Excel function: CVRcritical from Schipper’s critical values
ranged from a minimum difference of .09 at
ncritical = CRITBINOM( N , p,1 − α ),
α = .001, one-tailed (or α = .002, two-tailed)
where ncritical is the smallest value for n (the to a maximum difference of .56 at α = .10,
number of SMEs judging the item as “essen- one-tailed (or α = .20, two-tailed). The Wil-
tial”) for which the cumulative binomial dis- coxon signed-rank test revealed that at all but
tribution is greater than or equal to a criterion one of the tested alpha levels, the difference
value 1 − α, N is the number of Bernoulli trials between the calculated values for CVRcritical
(the number of SMEs in the pool), and p is the departed significantly from those provided by
probability of success on each trial. Since Schipper’s values (p < .01 for all tests). The
CRITBINOM returns the smallest value for ne only distribution of CVRcritical computed using
(the number of SMEs judging the item as CRITBINOM with p = ⅓ that was sufficiently
“essential”), CRITBINOM’s output was con- close in value to those supplied by Schipper to
verted to a value of CVRcritical according to be considered interchangeable with his table
Lawshe’s CVR formula: of minimum values was at an extreme alpha
n − ( N / 2) level, α = .0005, one-tailed (α = .001, two-
CVR critical = critical . tailed). These results are presented in Table 1.
( N / 2)
Discrete binomial construing SME ratings as a
To obtain a complete table of values, we dichotomy. With the probability of success set
computed CVRcritical for each N from 5 through at p = ½, for each criterion value for alpha, the
40 in unit steps. We also expanded the table by distribution of binomial probabilities yielded
considering the traditional range of values for a less pronounced “saw-toothed” pattern. The
alpha. For each alpha level, the significance of mean absolute departure of the calculated val-
difference between Schipper’s critical values ues for CVRcritical from Schipper’s critical

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


Wilson et al. 203

Table 1. Comparison of Schipper’s CVRcritical With Three Recalculations Based on the Binomial
Distribution at Eight Levels of Significance

Discrete Binomial
Normal Approximation to
Trichotomous Rating Dichotomous Rating Discrete Binomial

Level of Significance Wilcoxon Signed- Wilcoxon Wilcoxon Signed-


Rank Test Mean Signed-Rank Test Mean Rank Test
One-Tailed Two-Tailed Mean Absolute Absolute Absolute
Test Test Difference N T p Difference N T p Difference N T p

.1 .2 .56 14 0 <.01 .20 13 14 <.05 .20 14 0 <.01


.05 .1 .47 14 0 <.01 .10 14 0 <.01 .10 14 0 <.01
.025 .05 .38 14 0 <.01 .05 13 29.5 ns .04 14 40 ns
.01 .02 .30 14 0 <.01 .09 13 14 <.05 .09 14 11 <.01
.005 .01 .21 14 0 <.01 .12 13 1 <.01 .15 14 1 <.01
.0025 .005 .14 13 0 <.01 .17 14 1 <.01 .20 14 0 <.01
.001 .002 .09 13 14 <.01 .22 14 0 <.01 .28 14 0 <.01
.0005 .001 .07 13 23 ns .25 14 0 <.01 .33 14 0 <.01

Note: CVR = content validity ratio.

values was least at α = .05, two-tailed (or α = According to Box, Hunter, and Hunter (1978,
.025, one-tailed) and when tested using the p. 130), for N > 5 the normal approximation is
Wilcoxon signed-rank test, the calculated val- adequate if
ues at this alpha level were found to not differ
significantly from Schipper’s values. At all 1  1− p p 
 − < 0.3.
other alpha levels, the mean difference was N p 1 − p 
higher (range: .09–.22) and the departure of
the calculated values from those proposed by In this case, p = ½, so the assumption above is
Schipper was significant with the level of sig- satisfied.
nificance ranging from p = .05 to p = .01. The The task of the CVR is to identify items in
results of these tests are presented in Table 1. an instrument deemed by a critical number of
content experts to be “essential.” This task calls
for a one-tailed hypothesis test, expressed as
Normal Approximation to the
N N
Binomial Distribution H 0 : ne ≤ versus H1 : ne > ,
2 2
Although the calculation of discrete probabil-
ities yielded values that at α = .05, two-tailed for which the corresponding critical value is
(or α = .025, one-tailed) bracketed in “saw- ne.α − ( N / 2)
tooth” fashion those provided by Lawshe, = zα ,
they failed to be monotonic. Calculation of ( N / 2)
the normal approximation to the discrete where α is a prespecified significance level; or
binomial calculations would yield a mono-
tonic curve. Assuming that ne ˜ B(N, p) and N N
ne.α = zα × + .
assuming that p = ½, the normal approxima- 2 2
tion of the binomial distribution may be Therefore, the critical value for CVR is
expressed as
ne − Np n − ( N / 2) ne.α − ( N / 2) z
z= = e  N (0,1). CVR α = = α .
Np(1 − p) ( N / 2) ( N / 2) N

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


204 Measurement and Evaluation in Counseling and Development 45(3)

As above, to obtain a complete table of at α = .05, two-tailed (or α = .025, one-tailed)


values, we computed CVRcritical for each N is presented in Figure 1.
from 5 through 40 in unit steps and expanded
the table by considering the traditional range
of values for alpha. The departure of the com- Discussion
puted CVRcritical values from those of The questions raised about Schipper’s table
Schipper’s was again tested using the non- of critical values for Lawshe’s CVR focus on
parametric Wilcoxon signed-rank for depen- four issues: (a) How did Schipper compute
dent samples. The recalculated table of critical the table? (b) Was Lawshe correct in labeling
values is presented in Table 1. Schipper’s table of critical values as repre-
For each level of alpha, the distribution of senting a test at α = .05, one-tailed? (c) Why
binomial probabilities yielded a smooth, does Schipper’s table contain an anomaly?
monotonic curve. As was noted earlier, (d) If errors were made, what are the likely
because Schipper’s values achieve a ceiling consequences of the errors?
value of CVRcritical = .99 at a pool size of N =
7, only the calculated values in the range of N
= 7, . . ., 40 were analyzed. The mean absolute How Did Schipper Compute His
departure of the calculated values for Table of Critical Values for CVR?
CVRcritical from Schipper’s critical values It appears that Schipper did not compute dis-
ranged from a maximum difference of .28 at α crete binomial probabilities. It appears more
= .001, one-tailed (or α = .002, two-tailed) to likely that he used the normal approximation
a minimum difference of .04 at α = .025, one- for computing binomial probabilities to create
tailed (or at α = .05, two-tailed). When tested his table. Although the curve produced by
using the Wilcoxon signed-rank test, the cal- calculating the normal approximation to the
culated values α = .025, one-tailed (or at α = binomial does not provide an exact fit to
.05, two-tailed) were found to not differ sig- Schipper’s values, the curve produced at α =
nificantly from Schipper’s values. At all other .05, two-tailed (or α = .025, one-tailed) is a
alpha levels, the mean difference was higher very close approximation. Values calculated
(range: .09–.28) and the departure of the cal- at all other alpha levels result in larger mean
culated values from those proposed by absolute discrepancy. The Wilcoxon test
Schipper was significant with the level of sig- found significant discrepancy between the
nificance of p < .01. For small pools of judges calculated values and Schipper’s values for
(N = 5, . . ., 10), values computed for CVRcritical all alpha levels tested except at α = .05, two-
at α = .05, two-tailed (or α = .025, one-tailed) tailed (or α = .025, one-tailed).
were more liberal while at increasingly larger
pools of judges (N = 20, . . ., 40), the values
computed for CVRcritical at α = .05, two-tailed Does Schipper’s Table Provide a
(or α = .025, one-tailed) were slightly more Test at α = .05, One-Tailed?
conservative. The results of these tests are It also appears that Lawshe was in error in
presented in Table 1 and the complete table of labeling Schipper’s table as providing a test
recalculated values based on the normal for CVRcritical at α = .05, one-tailed. As noted
approximation to the binomial is presented as above, although the curve produced by calcu-
Table 2. A graph illustrating Schipper’s values lating the normal approximation to the bino-
for CVRcritical and curves showing the values mial does not fit the full range of Schipper’s
for CVRcritical calculated by the normal data exactly, the values produced at α = .05,
approximation to the binomial with the scal- two-tailed (or α = .025, one-tailed) provide a
ing construed as a dichotomy and as discrete very close fit. A quantitative methods special-
binomial probabilities with the scaling con- ist with 50 or more years in the profession,
strued both as a trichotomy and as a dichotomy observed that in those early years, many

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


Wilson et al. 205

Figure 1. Comparison of Schipper’s values of CVRcritical with critical value results from three
recalculations: the discrete binomial (p = 1/3), the discrete binomial (p = ½), and the normal approximation
to the binomial

quantitative analysts ran two-tailed tests even Why Does Schipper’s


when the hypothesis under test was direc- Table Contain an Anomaly?
tional (D. A. Schumsky, personal communi-
cation, June 10, 2011). Perhaps Schipper, the Although this was the question which initiated
statistician, produced a table of values at α = this project, it may go unanswered. We had
.05, two-tailed out of habit and Lawshe, the hoped that we would be able to reproduce
theoretician and applied personnel psycholo- Schipper’s values exactly (except, of course,
gist did not realize that such was the case. the anomalous value). We would have then

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


206 Measurement and Evaluation in Counseling and Development 45(3)

Table 2. Critical Values for Lawshe’s (1975) The anomaly could have arisen from a typo-
Content Validity Ratio (CVRcritical) graphical error, a failure in proofreading.
Another possibility, since tables in older journals
Level of Significance for One-Tailed Test
were often set by hand, is that the anomaly
  .1 .05 .025 .01 .005 .001 was a result of interchanging the two lines of
type containing the critical values for N = 8
  Level of Significance for Two-Tailed Test and N = 9. Finally, since the values computed
using the normal approximation to the bino-
N .2 .1 .05 .02 .01 .002 mial fit well with Schipper’s values until SME
 5 .573 .736 .877 .99 .99 .99 pool sizes fall below 10, there may be more
 6 .523 .672 .800 .950 .99 .99 than a single anomaly in Schipper’s table. If
 7 .485 .622 .741 .879 .974 .99 one presumes there is a single anomaly at N =
 8 .453 .582 .693 .822 .911 .99 9, the anomaly could have been the result of a
 9 .427 .548 .653 .775 .859 .99 single calculation error. With such a small
10 .405 .520 .620 .736 .815 .977 number of values to compute, Schipper may
11 .387 .496 .591 .701 .777 .932 have computed the values longhand or with the
12 .370 .475 .566 .671 .744 .892 aid of a calculator and may have simply made
13 .356 .456 .544 .645 .714 .857 a mistake in calculating the value for CVRcritical
14 .343 .440 .524 .622 .688 .826 at N = 9. However, Schipper’s value for
15 .331 .425 .506 .601 .665 .798 CVRcritical at N = 7 is also very different from
16 .321 .411 .490 .582 .644 .773 that which is produced using the normal
17 .311 .399 .475 .564 .625 .750 approximation to the binomial. In Schipper’s
18 .302 .388 .462 .548 .607 .729
table, the CVRcritical at N = 7, 6, and 5 was set
19 .294 .377 .450 .534 .591 .709
at a ceiling value of .99. One possibility is that
20 .287 .368 .438 .520 .576 .691
these ceiling values were not calculated but
21 .280 .359 .428 .508 .562 .675
were inserted, a priori, as a statement that at
22 .273 .351 .418 .496 .549 .659
23 .267 .343 .409 .485 .537 .645
such small sample sizes, only perfect agree-
24 .262 .336 .400 .475 .526 .631 ment among the SMEs that the item under
25 .256 .329 .392 .465 .515 .618 scrutiny was “essential” could be accepted
26 .251 .323 .384 .456 .505 .606 safely. In his 1975 article, Lawshe provided no
27 .247 .317 .377 .448 .496 .595 discussion about the construction of the table.
28 .242 .311 .370 .440 .487 .584 It is unfortunate that no question was raised
29 .238 .305 .364 .432 .478 .574 about the anomalous value or values in
30 .234 .300 .358 .425 .470 .564 Schipper’s table before his death in 1984 and
31 .230 .295 .352 .418 .463 .555 that there was apparently no contact between
32 .227 .291 .346 .411 .455 .546 Lindell and Brandt, who published their review
33 .223 .286 .341 .405 .448 .538 of methods for quantifying content validity
34 .220 .282 .336 .399 .442 .530 judgments in the same year that Lawshe died
35 .217 .278 .331 .393 .435 .522 (1999), to enquire about Schipper’s table.
36 .214 .274 .327 .388 .429 .515
37 .211 .270 .322 .382 .423 .508
38 .208 .267 .318 .377 .418 .501 What Are the Consequences of
39 .205 .263 .314 .372 .412 .495 Lawshe’s and Perhaps Schipper’s
40 .203 .260 .310 .368 .407 .489 Apparent Errors?
Note: Values for CVRcritical greater than or equal to the
limit value of 1.00 were set to .99. The apparent mislabeling of the alpha level
for Schipper’s table and the presence of one
been able to replace the anomalous value with or more anomalies in the table suggest that
a correct one. Several speculations have arisen. the table may lead to erroneous decisions by

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


Wilson et al. 207

the researcher. Given that Lawshe’s method direction offers greater safety from allega-
has been used widely, even in high-stakes tions that the test contained items not judged
testing situations, the consequences could be to be “essential” for job performance. Given
potentially harmful. Fortunately, both appar- the consequences of using an invalid test in
ent errors are errors on the side of stringency: high-stakes testing, if an error was to be
made, an error in the conservative direction is
• Consequence of the apparent mis- the better of the two possible errors.
labeling of the table’s alpha level.
Lawshe’s labeling of Schipper’s table
as representing a test at α = .05, one- Conclusions
tailed is an error in the conservative Lowell Schipper’s table of critical values for
direction. Since Schipper’s table of Charles Lawshe’s CVR, which Lawshe
critical values appears to represent a described as representing a test at α = .05, one-
test at α = .05, two-tailed (or α = .025, tailed, was examined. Evidence showed that it
one-tailed), an item’s content validity, had one or more anomalous values for
rated at a level beyond chance expec- CVRcritical. A review of literature failed to shed
tation at a true α = .05, one-tailed light on the method used by Schipper in calcu-
would be rejected according to Schip- lating the table. Trial tables of critical values
per’s values for CVRcritical. This is an were computed using both discrete calculation
error in the conservative direction. and normal approximations to the binomial
• Consequence of the apparent anom- distribution. Schipper’s values mapped con-
aly or anomalies in Schipper’s table. vincingly to the normal approximation of the
Compared with the values calculated binomial at α = .05, two-tailed (or α = .025,
for the normal approximation to the one-tailed) suggesting that Lawshe may have
binomial, Schipper’s value for the mislabeled the alpha level for Schipper’s
CVRcritical of .78 at N = 9 is a much table—rather than being a table of values for
more stringent criterion than the value α = .05, one-tailed, it is likely that it is a table
of .653 computed at that pool size for of values for α = .05, two-tailed. This finding
the normal approximation to the bino- suggests that, at small SME pool sizes,
mial at α = .05, two-tailed (or α = .025, Schipper’s values for CVRcritical represent a
one-tailed). While Schipper’s appar- more conservative criterion for item inclusion
ently anomalous value of .75 is closer than may be warranted.
to the normal approximation value of
.693 at N = 8, his value of .99 at N = Declaration of Conflicting Interests
7 is also much more stringent than the The author(s) declared no potential conflicts of
value of .741 computed for the normal interest with respect to the research, authorship,
approximation to the binomial. and/or publication of this article.

With small pools of SMEs, a test author Funding


who used Schipper’s table for setting the cri- The author(s) received no financial support for the
terion for item inclusion would have little research, authorship, and/or publication of this
reason to worry about whether an item with article.
low content validity had been included in the
test. Both errors (i.e., the anomalous value or References
values and the apparent mislabeling of the Allen, M. J., & Yen, W. M. (2002). Introduction
table) lead to increasing the stringency of the to measurement theory (2nd ed.). Prospect
criterion for item inclusion. Since Lawshe’s Heights, IL: Waveland Press.
CVR has been used to produce high-stakes Amsberg, S., Wredling, R., Lins, P., Adamson, U.,
employment tests, erring in the conservative & Johansson, U. (2008). The psychometric

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


208 Measurement and Evaluation in Counseling and Development 45(3)

properties of the Swedish version of the prob- cooperation (Unpublished doctoral disserta-
lem areas in diabetes scale (swe-PAID-20): tion). University of Minnesota, Minneapolis.
Scale development. International Journal of Corty, E. W., Althof, S. E., & Wieder, M. (2011).
Nursing Studies, 45, 1319–1328. doi:10.1016/j. Measuring women’s satisfaction with treat-
ijnurstu.2007.09.010 ment for sexual dysfunction: Development and
Anastasi, A., & Urbina, S. (1997). Psychological initial validation of the Women’s Inventory
testing (7th ed.). New York, NY: Prentice Hall. of Treatment Satisfaction (WITS-9). Journal
Basham, A., & Sedlacek, W. E. (2009). Validity. In of Sexual Medicine, 8, 148–157. doi:10.1111/
American Counseling Association (Ed.), The ACA j.1743-6109.2010.01977.x
encyclopedia of counseling (p. 557). Alexandria, Dinev, T., & Hart, P. (2006). Privacy concerns and
VA: American Counseling Association. levels of information exchange: An empiri-
Bjorvell, C., Thorell-Ekstrand, I., & Wredling, R. cal investigation of intended e-services use.
(2000). Development of an audit instrument for e-Service Journal, 4(3), 25–60. doi:10.2979/
nursing care plans in the patient record. Qual- ESJ.2006.4.3.25
ity Health Care, 9, 6–13. doi:10.1136/qhc.9.1.6 Distefano, M. K., Pryer, M. W., & Craig, S. H. (2006).
Box, G. E. P., Hunter, W. G., & Hunter, J. S. (1978). Job-relatedness of a posttraining job knowledge
Statistics for experimenters: An introduction criterion used to assess validity and test fairness.
to design, data analysis, and model building. Personnel Psychology, 33, 785–793.
New York, NY: Wiley. Distefano, M. K., Pryer, M. W., & Erffmeyer, R. C.
Chang, H.-L., & Yang, C.-H. (2008a). Do airline (2006). Application of content validity meth-
self-service check-in kiosks meet the needs of ods to the development of a job-related perfor-
passengers? Tourism Management, 29, 980– mance rating criterion. Personnel Psychology,
993. doi:10.1016/j.tourman.2007.12.002 36, 621–631.
Chang, H.-L., & Yang, C.-H. (2008b). Explore Drossos, D. A., & Fouskas, K. G. (2010). The role of
airlines’ brand niches through measuring pas- industry perceptions in competitive responses.
sengers’ repurchase motivation: An application Industrial Management & Data Systems, 110,
of Rasch measurement. Journal of Air Trans- 477–494. doi:10.1108/02635571011038981
port Management, 14, 105–112. doi:10.1016/j. Erdem, M. (2009). Effects of learning style pro-
jairtraman.2008.02.004 file of team on quality of materials developed
Cheng, H.-L., & Lin, C. Y. Y. (2009). Do as the in collaborative learning processes. Active
large enterprises do? Expatriate selection and Learning in Higher Education, 10, 154–171.
overseas performance in emerging markets: doi:10.1177/1469787409104902
The case of Taiwan SMEs. International Busi- Fischer, R. G., & Fischer, J. M. (2007). The
ness Review, 18, 60–75. doi:10.1016/j.ibus- development of an emotional response to lit-
rev.2008.12.002 erature measure: The affective response to lit-
Choudrie, J., Dwivedi, Y. K., & Brinkman, W. erature survey. Alberta Journal of Educational
(2006). Development of a survey instrument Research, 52, 265–276.
to examine consumer adoption of broadband. Fitzpatrick, A. R. (1983). The meaning of content
Industrial Management & Data Systems, 106, validity. Applied Psychological Measurement,
700–718. doi:10.1108/02635570610666458 7, 3–13.
Cohen, R. J. (1960). A coefficient of agreement for Flint, D., & Haley, L. (2008). Content oriented
nominal scales. Educational and Psychologi- development of a situational interview. The
cal Measurement, 20, 37–46. Business Review, Cambridge, 10(2), 21.
Cohen, R. J., & Swerdlik, M. E. (2005). Psycho- Ford, J. K., & Wroten, S. P. (2006). Introducing
logical testing and assessment. New York, NY: new methods for conducting training evalu-
McGraw-Hill. ation and for linking training evaluation to
Collard, E. F. N. (1992). The impact of deming program redesign. Personnel Psychology, 37,
quality management on interdepartmental 651–665.

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


Wilson et al. 209

Gail-Hinckley Heitzer, J., McKenzie, J. F., Support Systems, 40, 315–328. doi:10.1016/j.
Amschler, D. H., & Bock, W. (2009). A descrip- dss.2004.02.004
tive analysis of patient education courses in James, L. R., Demaree, R. G., & Wolf, G. (1993).
undergraduate and graduate health education Rwg: An assessment of within-group interrater
programs. Health Promotion Practice, 10, agreement. Journal of Applied Psychology, 78,
244–253. doi:10.1177/1524839907309046 306–309.
Ghazanfari, Z., Niknami, S., Ghofranipour, F., Kerlinger, F. N. (1986). Foundations of behav-
Hajizadeh, E., & Montazeri, A. (2010). Devel- ioral research (3rd ed.). New York, NY: Holt,
opment and psychometric properties of a Rinehart, & Winston.
belief-based physical activity questionnaire Lawshe, C. H. (1975). A quantitative approach to con-
for diabetic patients (PAQ-DP). BMC Medical tent validity. Personnel Psychology, 28, 563–575.
Research Methodology, 10, 104. Lindell, M. K., & Brandt, C. J. (1999). Assessing
Guion, R. M. (1974). Content validity conference. interrater agreement on the job relevance of
The Industrial-Organizational Psychologist a test: A comparison of the CVI, T, rwg(j), and
(Newsletter), 12(1), 18. r*wg(j) indexes. Journal of Applied Psychology,
Hall, T., Krahn, G. L., Horner-Johnson, W., & 84, 640–647.
Lamb, G. (2011). Examining functional con- Lindell, M. K., Brandt, C. J., & Whitney, D. J.
tent in widely used health-related quality of life (1999). A revised index of interrater agreement
scales. Rehabilitation Psychology, 56, 94–99. for multi-item ratings of a single target. Applied
doi:10.1037/a0023054 Psychological Measurement, 23, 127–135.
Hanc, T., & Brzezinska, A. I. (2009). Intensity of Mak, D. C. S., Tsang, H. W. H., & Cheung, L. C. C.
ADHD symptoms and subjective feelings of com- (2006). Job termination among individuals
petence in school age children. School Psychol- with severe mental illness participating in a
ogy International, 30, 491–506. doi:10.1177/ supported employment program. Psychiatry,
0143034309107068 69, 239–248.
Hayes, J. A. (2004). The inner world of the psy- Maples, P., Franks, A., Ray, S., Stevens, A. B., &
chotherapist: A program of research on coun- Wallace, L. S. (2010). Development and vali-
tertransference. Psychotherapy Research, 14, dation of a low-literacy chronic obstructive
21–36. doi:10.1093/ptr/kph002 pulmonary disease knowledge questionnaire
Henard, D. H., Freling, T. H., & Crosno, J. L. (COPD-Q). Patient Education and Counsel-
(2011). Brand personality appeal: Conceptual- ing, 81, 19–22. doi:10.1016/j.pec.2010.11.020
ization and empirical validation. Journal of the Mathews, H., Smith, S., Hussey, J., & Plack, M. M.
Academy of Marketing Science, 39, 392–406. (2010). Investigation of the preferred PT-PTA
doi:10.1007/s11747-010-0208-3 relationship in a 2:2 clinical education model.
Huang, J., Zhao, C., & Li, J. (2007). An empirical Journal of Physical Therapy Education, 24(3),
study on critical success factors for electronic 50–61.
commerce in the Chinese publishing industry. Moscoso, S., & Selgado, J. F. (2001). Psychomet-
Frontiers of Business Research in China, 1, ric properties of a structured interview to hire
50–66. doi:10.1007/s11782-007-0004-1 private security personnel. Journal of Business
Huang, S.-M., Hung, Y.-C., Chen, H.-G., & Ku, C.-Y. and Psychology, 16, 51–59.
(2004). Transplanting the best practice for imple- Namisango, E., Katabira, E., Karamagi, C., &
mentation of an ERP system: A structured induc- Baguma, P. (2007). Validation of the Missoula-
tive study of an international company. Journal of Vitas Quality-of-Life Index among patients with
Computer Information Systems, 44(4), 101–110. advanced AIDS in urban Kampala, Uganda.
Hung, Y., Huang, S., & Yen, D. C. (2004). A study Journal of Pain and Symptom Management, 33,
on decision factors in adopting an online stock 189–202. doi:10.1016/j.jpainsymman.2006.
trading system by brokers in Taiwan. Decision 11.001

Downloaded from mec.sagepub.com at CENTRAL MICHIGAN UNIV on June 16, 2014


210 Measurement and Evaluation in Counseling and Development 45(3)

Polit, D. F., & Beck, C. T. (2006). The content content validation of family practice residency
validity index: Are you sure you know what’s recruitment questionnaires. Family Medicine,
being reported? Research in Nursing & Health, 35, 496–498.
29, 489–497. Wei, C.-C. (2008). Evaluating the performance of
Polit, D. F., Beck, C. T., & Owen, S. V. (2007). an ERP system based on the knowledge of ERP
Is the CVI an acceptable indicator of content implementation objectives. International Jour-
validity? Appraisal and recommendations. nal of Advanced Manufacturing Technology,
Research in Nursing & Health, 30, 459–467. 39, 168–181. doi:10.1007/s00170-007-1189-3
Reynolds, C. R., Livingston, R. B., & Willson, V. Yu, S., Ng, C. S., Chang, S., Chang, I., & Yen, D. C.
(2009). Measurement and assessment in educa- (2011). An ERP system performance assess-
tion (2nd ed.). Upper Saddle River, NJ: Pearson. ment model development based on the bal-
Rungtusanatham, M. (1998). Let’s not overlook anced scorecard approach. Information
content validity. Decision Line, 29, 10–13. Systems Frontiers, 13, 429–450. doi:10.1007/
Saramma, P. P., & Thomas, S. V. (2010). Child rearing s10796-009-9225-5
knowledge and practice scales for women with Zite, N. B., & Wallace, L. S. (2007). Development
epilepsy. Annals of Indian Academy of Neurol- and validation of a medicaid postpartum tubal
ogy, 13, 171–179. doi:10.4103/0972-2327.70877 sterilization knowledge questionnaire. Contra-
Sendjaya, S., Sarros, J. C., & Santora, J. C. (2008). ception, 76, 287–291. doi:10.1016/j.contracep-
Defining and measuring servant leadership tion.2007.06.012
behaviour in organizations. Journal of Manage-
ment Studies, 45, 402–424. doi:10.1111/j.1467- Bios
6486.2007.00761.x F. Robert Wilson, PhD, is an emeritus professor
Stausmire, J. M. (2011). Interdisciplinary develop- of counseling of the University of Cincinnati with
ment of an adult intubation procedural check- 35 years as a counselor educator. He completed
list. Family Medicine, 43, 272–274. doctoral studies at Michigan State University and
Stelly, D. J. (2006, May). An explication of statis- post-graduate studies at the Cincinnati Gestalt
tical significance testing applied to minimum Institute. His research interests include quantita-
content validity ratio (CVR) values. 2006 Soci- tive methods in counseling research, counselor
ety for Industrial and Organizational Psychol- education and supervision, and individual and
ogy Conference, Dallas, TX. group treatment of mental illness. He provides
Tai, Y. (2011). Perceived value for customers mental health counseling for indigent and home-
in information sharing services. Industrial less individuals with chronic mental illness.
Management & Data Systems, 111, 551–569.
doi:10.1108/02635571111133542 Wei Pan, PhD, is an associate professor of quanti-
Tai, Y., & Ho, C. (2010). Effects of information tative research methodology at the University of
sharing on customer relationship intention. Cincinnati. He received his doctorate in measure-
Industrial Management & Data Systems, 110, ment and quantitative methods from Michigan
1385–1401. doi:10.1108/02635571011087446 State University in 2001 and his master’s degree in
Tinsley, H. E. A., & Weiss, D. J. (1975). Interrater mathematical statistics from Fuzhou University,
reliability and agreement of subjective judg- China, in 1989. His research interests include
ments. Journal of Counseling Psychology, 22, causal inference, advanced statistical modeling,
358–376. meta-analysis, and their applications in the social,
Tsang, H. W., & Wong, A. (2005). Develop- behavioral, and health sciences.
ment and validation of the Chinese version
of Indiana Job Satisfaction Scale (CV-IJSS) Donald A. Schumsky, PhD, is an emeritus profes-
for people with mental illness. International sor of psychology of the University of Cincinnati
Journal of Social Psychiatry, 51, 177–191. following a 45 year (42 at University of Cincinnati)
doi:10.1177/0020764005056766 career in teaching and research. His research inter-
Wallace, L. S., Gregory, H. B., Parham, J. S., & ests include quantitative methods in psychological
Baldridge, R. E. (2003). Development and science, learning, motor skills, and cognition.

You might also like