0 ratings0% found this document useful (0 votes) 493 views9 pagesBar Examiner
From the 1996 Bar Examiner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
THe Bar ExaMINER
«63, Number 3, August 199%
Articles
Oy Sosmeviur Howor ti: A Ristarganiy We
ADDrr SS PROM tHE ANNUAL Dinstr
or THe Americas Law Ixsrivei
i Ruhard A. Posner
Tht Costs axp Bextrrts oF
Prrtormanct Testing
ON THE Bar Exasina tion:
Stephen PKlem
Nats, Gexper, axp Evasterry 1s tHe MBE
Tyna Ledger and Mary M.S
Departments
Tene rot Tn Citar by Richard JBTue Costs AND BENEFITS OF
PERFORMANCE TESTING
ON THE Bar ExaM’NATION
by Stephen P. Klein
he National Conference of Bar Examin=
ts will begin offering the Multistate Per
formance Test in February to97. This
article reviews how performance test
PT problems differ from typical bar exam essay ot
multiple-choice questions. I also discuss what we know
about the technical quality and other characteristics of
PT problems from over fiftcen years of research with this
type of mesure. Taken together the findings from these
Studies show that mserting one o two PT problems into.
a sate’ bar exam will most likely improve the overall
quality of that exam. However, junsdictions should not
rely on PT problems alone to make pass/fail decisions,
{combination of multiple-choice MBE. essay, and PT
questions 1s the best way to go, States also should be
careful about how much weight they give the PT in
dletermining an applicants pass/fail status, This article
summarizes the basis for these conclusions
es to the technical studies that support them,
How Are PT Prosiems Dirrerent?
MBE questions
really ask applicants to select the
choice that correspo:
to the most app
for re
ng a case, Essay questions normally ask appli-
cants to present a balanced analysis of a case situation.
not unlike a micro version of an appeals court decision.
Both of these question types usually present all of the
relevant facts in one of two paragraphs, unessential or
ambiguous information is avoided, and applicants must
draw heavily on their knowledge of the law to respond
appropriately. While questions posed in a rultiple-
choice or essay format assess important legal skills, such
as analytical abvlity, it is unlikely that a newiy licensed
attorney would be presented seth a multiple
cesar type task in practice
In contrast, PT problems ask candidates to carry out
kkinds of activities Lawyers actually perform, such as
drafting a brief in support of a motion. PT problems
require applicants to read and analyze a wide aray of
documents with which attorneys normally work such as
statutes, regulations egal opinions, transcripts of depo-
sitions. briefs. police reports, news articles, interview
notes, investigator reports, and internal memos’. Appli-
cants must sift through several pages of these materials to
identify the information that is salient to their assigned
task and then priontize and organize that information in
preparing their response. Unlike the typical MBE or essay
question, a PT problem includes documents that contain
the specific laws that are needed to respond. In this sense.
PT problems are more like an “open book” exam, As in
practice information sources for PT problems (such as
witness accounts of events) may be unreliable or biased
facts are sometimes ambiguous. incomplete, oF even
conflicting: and applicants have to recognize these prob-
Tut Costs ano Brstrrs oF Pearonsascs Trstisc ox mit Bax ExasanarionIn sum, PT problems emphasize the day-to-day
practical skills attorneys need to function effectively, such
as the ability to integrate facts and the law. to present a
coherent and persuasive legal argument, or to plan an
appropnate course of action “MacCrate et al. 1992
(O'Hara & Klein, 1981 . In comparison to MBE and essay
ons. PTF tasks place far more weight on these kinds
oof sulls than they do on specific legal knowledge. Legal
analysis and reasoning do, of course. play important roles
in responding to MBE and essay questions, Thus, the
major underlying distinctions among these different
‘pes of test formats are the emphasis they place on legal
knowledge versus other job-related skills and how closely
they simulate the actual practice experience
SUMMARY OF RESEARCH ON PT Tasks:
Alaska and Califorma
have included PT problems as a
regular part of their bar exams for over ten years. Cole
nado, Georgia, Hasan, New Mesico, Puerto Rico, and
Virginia also have used this form of testing on an opera
or experimental basis. The experiences in these
junsdictions provide important insights inte the techn
ty and other features of PT tasks, These charac
cal qu
teristics are discussed below in terms of how they relate
to five major indicators of test quality, namely: inter:
reader consistency, score rebability, validity, fairness, and
‘ost effectiveness
Inter-reader Consistency
Interreader consistency refers to the extent to which
different qualified readers assign she same score to a given
response, Highly consistent readers would rank the qual:
ity of different anywers the same way and one reader's
mean score would not be substantially higher than some
fers mean on a set of answers they grade in
‘ommon. If there i hittle oF no agreement among readers
in their judgments about the relative quality of the ar
ta Tw Bax Exawsan, Avast iyy6
. then the scores they assign simply reflect each
reader's idiosyncratic views regarding answer quality and
the test should not be used fr making pass /farl decisions
an applicant, Reasonably high inter-reader agree
ment 1 therefore an essential 1
shient of test quality:
The major finding regarding this c-.tetton 1 that
experienced PT readers are just as consistent with each
other ay are experienced essay readers Klean, 109 1994
With both types of measures, there isan adequate degeee
‘of agreement among readers provided they receive appro
priate training and supervision in the use of the scoring
guidelines for their question, For example, inter-reader
agreement on a PT ask was mich higher m Alaska
where the graders are accustomed to evaluating PT
answers and there ss extensive reader training than it was
fon this same task in jurisdictions which were using PT
problems for the first ume, The factors that contribute
to high inter-reader agreement on essay questions Klein,
1996 abo apply to PTT readers, However. f PT answers
are longer than the typical essay answer training also may
take longer and there may be a greater advantage i using
an analytic or semi-analytic grading scale
Score Reliability and Weighting the PT Questions
A test normally consists of questions that ate sampled:
from some theoretical universe of questions that could
have been asked. Because of time and costs, we cannot ask
all the questions mn the universe. Thats why we rely on
a sample, Nevertheless, we want to use the results in the
sample to make an inference abous how well an applicant
swould have performed if hat applicant had answered all
the questions in the universe. Score reliability indicates
the confidence we can place in this inference. The higher
the reliability, the more certain we are that the results with
the sample of questions that were asked ate indicative of
how well the applicants would have performed af they
had taken all the questions that could have been asked.
Thus, score rebiabulty is analogous to the margin of errorin an exit poll where we estimate how all voters voted on
election day based on how a small sample of them voted.
The higher the score reliability, the smaller the margin of
error in making an inference from the simple Le. the
test results) tothe universe,
Another way of thinking about score reliability is,
that at indicates the consistency with which different
versions of the bar exam would make the same pass/fail
decision about an applicant. In this contest, reliability
refers te the probability that a given appli
W's pass/fail
status would remain the same regardless of which set of
MBE a
say questions were asked: ex. the July 1997
version of the test instead of the February 1997 version.
Several factors inthuence score rebiabiity and thereby
the consistency of pass/fail decisions Klein. 1993 . One
of the most important of these factors 1s the number of
questions asked. All other things being equal, the larger
the number of questions... the larger the sample, the
ay. However, this ts case of diminish
Iuugher the real
‘tums, Going from three to four questions produces
1 greater increase tn reliability than does going from four
to five questions but the five-question test produces a
more rehable score than the four-question test. Conse
quently, in most junsdictions, adding one or two PT
tasks to the state's ex
ng essay test ell produce a small
bur noticeable improsement in the reliability of th
Adding PT questions to a states bar exam will
normally improve score reliability provided each PT task
carries no more than twice as much werght asa standard
essay question in determining an applicant’ total written
essay + PT score, This stra even if PT task takes
three or more times longer to answer than a stats typical
ess) question, This is an especially important rule of
thumb if the written section as less than six to eight
weight than the MBE in determining an_applicant’s
pass/fail status, and/or if less than 85 percent of the
Trae Costs ante Basten
applicants pass the exam (score reliability usually is not
4 major concern when there isa very high passing rate
Some of the states that are considering including PT
«questions on their exam are unable to lengthen the total
amount of testing time, Thus, in these jurisdictions,
adding or or more PT question will require eliminating
bone oF more essay questions, My statistical modeling of
these trade-offs suggests that in most jurisdictions, re-
placing two or even three essay questions with one PT.
problem will have little effect on the overall consistency
with which pass/fail decisions are made, Again the size
and direction of this effect will depend on several factors,
anclading the number of essay questions that remain, the
average correlation among these questions, the comrela
ns, the cor
tion between PT and regular essay quest
Lnison between the MBE and written portion ofthe exam,
the relative weights of these sections in determining an
applicant's total bar exam score, and the passing rate. All
of these factors make a difference,
Valid
If score reliability ws the only criterion of test quality
that had
» be satisfied, the bar exam would consist
catitely of multiple-choice questions (as noted below,
they produce more reliable scores p
hour of testing
and they cost less to achieve a given level rebability than
any other type of question — which is why they are so
prominent an large-scale testing. programs). However.
there ore many critical, job relevant skills that cannot be
tested with multiple-choice questions. such as the ability
to identify issues in
case and express ideas in writing,
That 1s ovhy all bar exams include essay questions.
Similarly there are many sillsJawyers uses practice
that cannot be measured or measured well with either
multiple-choice or essay questions, but can be assessed
with PTS. For example. a twelve-membor “content vali!
ity” panel of attorneys found that four prototype PTs
developed by NCBE did a good job a assessing certain
1 Pratonsascd Trstse or) ak EXAMINATION 152° skills such as extracting rues of la from
authority, recognizing the precise points of law at issue
recognizing the specific facts necessary to. resolve legal
questions, and applying legal rules to demons
they determine the result sought. “fact analysis” skills
such as identifing the relevant facts and breaking down,
legal rules into components and connecting them £0 the
facts and “problem solemg” skills such as identifving
factual and legal obstacles and solu
tions to a cheats objectives. and
priorities Tn the judgment of this
panel. the PTs also did a better job
in assessing some of these skills than
did the typical essay: question
ACT. 1994 THAN EITHER
There hase been several separ
rate independent surveys of a
cant opinions regarding PTs. All of
these surveys
ont that applicants judge PTs to be a
significantly better measure of their ability to perform as
ney than either multiple-choice of essay testing
The applicants’ opinions appear to be confirmed by
empirical data, Analyses of California, Georgia. and Vir-
sinia data show that attorneys with four or more years of
practice experience score higher on the PT section than
would be expected on the basis oftheir scores onthe rest
of the exam Klein. agg. In short, after holding other
factors constant, attorneys with practical experience do
better on the PT
PTs also appear to be sensitive to the effects of legal
education. For example, one study: Klein. 1988) found
that students who are just entering one of four well
known Califorma law schools the “novices”, earned
much lower PT scores than comparable graduates from
these same schvols. In fact, none of the novices earned a
passing score or any PT problem nor did any novice
score higher than any graduate from their school
16 Tht Bae Exanistn, Avctst iy
AppLicants jupGr PTs To
BE A SIGNIFICANTLY BETTER
MEASURE OF THEIR ABILITY
TO PERFORM AS AN ATTORNEY
CHOICE OR ESSAY TESTING.
‘Candidates who earn relatively high PT scores also
tend to carn relauvely high MBE and essay: scores. Miter
controlling for the differences in reliability between MBE
and essay scores, PT scores are more closely associated
with thie essay section than with the MBE. In other words
PT problems are more aligned with that portion of the
exam which virtually all bar examiners consider to be the
most important indicator of applicant ability. However,
even with the controls for reliabil-
ity. the correlation between essay
and PT. scores is far from perfect.
Moreover, the correlation between
two PT problems is generally
higher than their correlations with
meCrntee a typical essay: question. In short,
the applicants who have the abilities
needed to do well on the essay sec-
ton usually have the skills that are
needed for success on the PT, but its evident that a PT
task is not just another essay question, This i analogous
to the relationship between MBE an essay scores —
applicants who receive high MBE scores also tend to
receive high essay scores, but the correlation is far from
perfect
Fairness
It appears that taking one PT problem enables applicants
dio better on the next one. Inthe ACT study, for example.
the applicants who took PTA and then B did relatively
+ on B whereas those who took B then A did
relatively better on A. These practice effects probably
stemmed from applicants learning how to budget their
time better,
Applicants taking PT type problems for the first
‘ume often say that they cannot finish in the time allotted.
However, applicants frequently say the same thing about
the MBE and essay sections. In all likelihood, these
concerns will dissipace as PT problems become an opera-tional part of the exam, Applicants will certainly have
ample opportunity to practice on previously used pro
lemy all PT problems are generally released following
the exain
PT problems are designed to assess skills rather than
content knowledge, Consequently, the tasks deal with
topics that either all applicants should know about or
topres that few if any applicants know about stich as
manitime shipping rules . However
with bo: types of tasks, applicants
are given copies of all the applicable
albert fictional statutes, regula:
tions, cases, ete, that are relevant te
the case, Thus, specific content
knowledge of the topes covered
should not play a significant role
although 1s conceivable that
who is familiar with che terminology in a spe=
cualized field could have an advantage if a case situation
snvolved that field
Muluple-chotce and essay: questions are not immune
to this same concem, Indeed, scores on these tests are just
1s likely as PT scores to be sensitive to some applicants
having an advantage on a particular question, For exam:
ple. an applicant may have asisted on a case that had a
fact pattern that was very simular to the one a an essay
for MBE question. Test developers try to minimize this
problem by asking several questions: t.,s0 that no one
Jestion carries an excessive amount of weight in deter
‘mining an applicant’ pass/fail status. In short, the more
«questions that are asked. the less likely an applicant’ total
score and thereby pass/fail status, will be influenced by
pectalized knowledge on a single question. ‘That is why
score reliability ts sensitive to the number of questions
asked and why [recommend not giving a single PT
{question more than twice as much weight asa single essay
A related consideration is what applicants should do
to prepare for the PT. Unlike torts or contracts there 1s
no law school course called “Performance Testing
There are climcal courses that should help and the pres
ence of PT problems on the exam may encourage mote
professors to include such tasks in their classroom activi
ties im much the same way Harsard Business School asks
students to resolve case problems
In also should be no. 4 that
although multiple-choice and essay
questions are currently: charac
terized by content area, these labels
bear no relationship to the ques-
tions’ statistical properties. For ex
ample, two torts multiple-choice
questions generally correlate no
higher with each other than they do
with any other question (Linn, 1992. The same is true
oon the essay. These findings mican that specific content
knowledge not dr sing the differences in scores among
applicants, While tis certainly true that content knowl:
exkge i required todo well on the MBE and essay sections,
cross-cutting abilities such as legal reasoning) appear to
play the major role in determining who passes and fails
These reasoning skills are needed on all three sections of
the exam MBE, essay. and PTD, which may help to
explain the relatively high correlations among these sec
Some PT problems may be harder than others
and/or the responses to one PT problem may, on the
awerage, be graded more lententhy than the responses to
some other problem.
Thus. of no adjustment is made for this
ray be easier t0 pays one exam than another simply
because © differences in the difficulty of the particular
set of PT problems that happen to go into each of them.
The same is true, of course, of the essay section, but it ts
Bixtins oF Puasonaasce Testis os i Bar ExaMesarion 497not the case with the MBE, Raw MBE scores (ie the
number of questions answered correctly) are adjusted
“equated” for possible differences in average question
difficulty across different administeations of the exam,
Consequently, most jurisdictions now scile their essay
scores to the distribution of MBE scores in their state
‘lei. 1995. The same thing can be done with the PT
section the simplest method involves combining the
essay and PT scores inte a total written score that s then
scaled to the MBE
Under these conditions, the inclusion of PTT probs
eas om the exam probably sil have no effect on the
overeIl paying rate or on differences in passing rates
between gender and racial etme groups. dn general, men
tend to score higher than women on the MBE while the
reverse struc on the essay and PT sections, Adding a PT
section would therefore benefit women only if this ak
tion aly resulted in reducing the weight gwen to the
MBE in determining an applicant's total score and
pass/fail statis, Inching or not meluding a PT section
or changing the weight given to the MBE) would have
Inte af anyetfect on the differences in passing:
racial/ethnic groups because all three sections MBE
essays and PT result in about the same sized differences
erage scores among these groups. Klein, 1989 . These
dlilerences ako correspond to the disparities among these
soups in law school grade point averages Klein, 1995
Some applicants regardless of their group atfia
thon tend to relate better t multiple-choice tests while
others prefer esay or PT exams, For example some
applicants write faster or neater than others and chat
may result in their doing better on the essay than on d
MBE even though writing speed or ne
related to success on the job what you write 1s far more
Important. Thus, the more ways we can test, the fater
the examination process because there 1s less likelihood
that extraneous features of the test format will intluence
16 Tha Bar Exatisen, Ararat ios
an applicant's pass/fall status, For that reason, there isa
real advantage to using several assessment methods. In
cluding one or more PT problems is an important con-
ribution to that end.
Cost Effectiveness and Benefits
To an economist, a “cost effectiveness
dletermining the costs of different methods to achieve the
same end or fora given cost. determining which method
provides the best result, such as the most rehable test per
dollarspent).A “cost benefit” analysis, on the other hand.
recognizes that all methods may not be able to achieve
the same
ids no matter how much we spend on them
og. multiple-choice tests cannot measure issue spore
tung). Consequently, the determination of which method
isthe most “beneficial” involves judgments regarding the
relative value of different outcomes (such as score reli
ability. validity, fuirness, efficiency, ete
In the context of the bar exam, some costs are
joutof-pocket expenses (such as the fee paid for the
MBE) while others ate “opportunity costs” such as the
value of the time board members donate to: writing essay
questions and grading the responses: ie. time they could
have been spent doing something ese. Cost effectiveness
dies usually sigh market values to donated services
ete what other states actaully pay readers to grade
answers). Festing time also i vale resource, Ths, a
ffir comparon among methods has to hold testing time
i
constant achich alo effectively conttols the costs of test
administration, proctor, spave, ete
ible: shows the prostated cost of thee host o
testing time for three testing methods and the score
reliability that is obtained with each method for a three
hour test. Materials costs ate the foes NCBE charges for
the MBE, Multistate Essay Examination MEL and the
MBE sor
Multistate Performance Test MP is
included in ts maternals cost, Essay and P'T scoring costs
were est
ated at 52.00 and 62.50 pot answer, respectivelyTABLE 1
Esniaten Costs axp Score Reutasie
1 10R THe
Hous oF Trstixe Tiste
wir Taner Dirraesr Asstssstest Metiops
Estimated Cost per Applicant
Testing Method Ma crials
Muluple-choice 20.00
Performance Test oa
excluding expenses for data entry and similar services
The reliability for the MBE is based on tow questions.
The tehability forthe essay and PT sections are hased on
empirical data from several jurisdictions with so-minute
‘say questions and go-minute PT problems, rey
tively. Actual costs and seliability may wary across juris
aie # shows that the MBE as by far the best deal if
the only goal is to obtain a high level of score reliability:
Because of differences in scoring changes. a set off six
PT problems, but the essay produces a higher level of
rehabuhty, We would have to almost double the testing
tume forthe PT to bring its rehabiity up tothe same level
However, score reliability is not the only messate of
test quality. Valldity and faimess ako must be cons
ered. Pow attorneys would rely solely on the MBE to
inake pass/fail decisions even though at as the least
expense way to achieve a given level of score reliability
Bar examiners recognize there are amportant skills that
cannot be
sted with the MBE just as there are amportant
abfities that cannot be assessed with the traditional essay
question, One type of testing simply cannot replace
another if they assess somewhat different abilities, More
exer although the applicants who do well on one type of
test also tend to do well on another, this relations!
Via Costs 9st Be strs oF Praromsiaser Teste os tn Bat Easy tion
Scon
Toul Score Reliability
cliability of
far from perfect even afier adjusting forthe
the measures). Includingthe PT on the exam may encour-
cal kills. In chis
age more students to develop their prac
way, it may improve the applicants’ overall level of
profictency even if 1 oes not affect their relative sta
ings. That is why bar examiners shoud no rely solely on
the method that costs the least to obtain a given level of
score reliability. They alo must consider the unique
benefits that are derived from cach method,
AND IMPLICATIONS,
There is now ample empirical and jaxigmental evidence
ConcLusion:
tosupport including PT problems on the bar exam. They
are more job relevant than the typical multiple-choice or
essay question. Thus, their use facilitates responding to
challenges to the bar exam process, Nevertheless, boards
of bar examiners must be sensitive to realistic budgetary
‘operational legal, and political constraints, Testing pro-
grams must strike a balance among costs, testing time,
and technical quality including score reabulity, validity:
and faimess
In this context, 2 jurisdiction that ts limited ta
twoskay test with sis hours of testing time per day
night well consider using the MBE, six so-minute essay
questions, and 80 go-nsinate PE problems (with each
PT problem given twice as much werght as.a single essayquestion and the MBE and Written sections carryiny
equal weight in determining an applicant's total score
if
re testing time was available, then the board could
crease the umber of essay and PT questions askev
and/or the tinge allocated to them, However the parti
lar blend of measures. anal the weights attached to them
pat would work best for a jurisdiction will depend on
several technical factors sucha the rehability of its essay
tioms (auch asthe kind o essay questions it has used in
Rererences
acl R searchin the NCBE Performance Test
le R sted tthe NCBE Test
Klan So aust. Testing jon the Calon Ba
xanan State Bar of Calton and the National
k - analast of the ae toes tl
tls anal far canation Re pate
matter of Bar Examiners of the State Bar of
Gi the National Conferene of Bar Esamuners
Klan. ass Rel fr xanatins to pertoemance
Kk . tu8e . An wal the reais
Focal sll ad far examanation reals, Repu prepared tor
Comuntte of Bar Examiners the State Rar of Cabra
and the Natnal Conference of Par Hany
Klean Ssoss MM legal research dally on a bar examina
Paper presented to the American Paholopeal Avs.
two, Anaheim, California,
lem. An analysts of the pettormanse txt on the fl
Caldortna Har Examination, Report prepared
Commuttee of Har Examiners ofthe State Bar of Calfornta
k oY ves take a Icensing test, Paper pre
edt the meets of the American Faucatwnal Reseatch
f Bar xan the State Bar of
Calienia PRASS-3
Klein. 189 . Does performance testing on the bar examination
reduce diferences in scores among se and racial groups
Paper pesentedat the mectings tthe American Pa
Research Assastation
Kew. S. toot Performance testing on the bar xamunatin. Rept
puted forthe National Confer of Bar Famers
Klevn. Stone k Malt Barb
Navonal Contre
Klein. ag. Relationships MIE esata Jl
Pedormance Fest scores, Report prepared forthe Nation
Conference of Ba "
Klein. So ryt Options for combining MIE sales scores
Tek ; s
Klkin. 8. (ogo6}- Options for assigning essay scores, The Ra
hs ‘
Linn, R. Giggs). An analpsis of the subtest structure af the
dnstate Bar Examunation, Report prepared for the Na
al Conference of Bar Exarnnets,
McCrae, Re et al tags Legal education and professional
pment — An educational continuum, Report vn Law
hols and the Professions: Narrowing the gap. Amerian
x Ealucaton and Anis
the Bar
Ohi 8 Klein, 1oSt Isthebar evaminationan aequate
measure of Langer competence? Ty Bar Examiner 528%
ns Rens. PRD,