You are on page 1of 11

Improving the Reliability of Physician Performance Assessment: Identifying the "Physician

Effect" on Quality and Creating Composite Measures


Author(s): Sherrie H. Kaplan, John L. Griffith, Lori L. Price, L. Gregory Pawlson and Sheldon
Greenfield
Source: Medical Care, Vol. 47, No. 4 (Apr., 2009), pp. 378-387
Published by: Lippincott Williams & Wilkins
Stable URL: http://www.jstor.org/stable/40221891
Accessed: 13-12-2015 15:32 UTC

REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/40221891?seq=1&cid=pdf-reference#references_tab_contents

You may need to log in to JSTOR to access the linked references.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Lippincott Williams & Wilkins is collaborating with JSTOR to digitize, preserve and extend access to Medical Care.

http://www.jstor.org

This content downloaded from 137.205.50.42 on Sun, 13 Dec 2015 15:32:39 UTC
All use subject to JSTOR Terms and Conditions
Original Article

Improvingthe Reliability
of PhysicianPerformance
Assessment
the"Physician
Identifying Effect"
on Quality
and Creating
Composite
Measures
SherrieH. Kaplan,PhD, MPH* JohnL Griffith,
PhD,f Lori L Price,MS,f
MD*
L GregoryPawlson,MD, MPH,} and SheldonGreenfield,

qualityof carescores
positemeasureyieldsreliablephysician-level
Background:The proliferation of effortsto assess physicianper-
forpatientswithdiabetes.
formance underscore theneed to improvethereliability of physi-
cian-levelqualitymeasures. Key Words: physicianperformance assessment,physician
Objective:Usingdiabetescareas a model,to address2 keyissues of compositequalitymeasures
reliability
profiling,
in creatingreliablephysician-levelqualityperformance scores:es-
the effecton and (Med Care 2009;47: 378-387)
timating physician quality creatingcomposite
measures.
Design: Retrospective longitudinalobservational study.
Subjects: A nationalsampleof physicians (n = 210) theirpatients
withdiabetes(n = 7574) participatingintheNationalCommittee on proliferation of efforts to evaluatephysicianperfor-
QualityAssurance- AmericanDiabetesAssociation'sDiabetesPro- mance1"11 has stimulated concernsovertheaccuracyand
viderRecognition Program. reliabilityof physician-level performance assessment.12"18
Measures: Using 11 diabetesprocessand intermediate outcome The possibility ofbeingmisclassified, particularlyon quality
quality measures abstractedfromthe medical recordsof participants, measuresover whichtheymay exertrelatively littleinflu-
we testedeach measureforthemagnitude of physician-levelvaria- ence,suchas thoserequiring patientsto altertheirlifestyles,
tion (the physicianeffector "thumbprint"). We thencombined adheretomedication regimens, orfollowthrough is
onreferrals,
measureswith a substantialphysicianeffectinto a composite, especiallyonerousforphysicians. Ifqualitymeasures aremore
physician-level diabetesqualityscoreand testeditsreliability. likelyto reflect
patient ratherthanphysician behavior, theyare
Results: We identified the lowesttargetvalues foreach outcome to
unlikely yield reliable (consistent)physician-level scores.
measureforwhichtherewas a recognizable "physician thumbprint" That is, consistentefforts on the physician'spartto reduce
(ie, intraclasscorrelation >0.30) to createa composite
coefficient patients'hemoglobin Ale levels,forexample, maybe thwarted
performance score.The internalconsistency (Cronbach's
reliability bypatients' diet,exerciseor lackofmedication adherence.
a) of thecompositescore,createdby combiningtheprocessand Two keystepsare neededto improvethereliability of
outcomemeasureswithan intraclass correlation coefficient^0.30, physicianperformance assessment and to minimizethelike-
exceeded 0.80. The standarderrorsof the compositecase-mix lihoodthatphysicians willbe heldaccountableforaspectsof
adjustedscoreweresufficiently smallto discriminate thosephysi- quality of care beyondtheircontrol.First,theproportion of
cians scoringin the highestfromthose scoringin the lowest variationin qualitymeasuresthatis attributable to thephy-
quartilesof thequalityof care distribution
withno overlap. sician,versusthe patientstheycare for,or the healthcare
Conclusions: We conclude that the aggregationof well-tested systemstheypracticein, shouldbe determined empirically.
qualitymeasuresthatmaximizethe"physicianeffect"intoa com- Measuresoverwhichphysicians havelittleinfluence couldbe
eliminatedfromphysician-level performance assessment.
Second,because singleitemmeasuresof performance are
Fromthe*CenterforHealthPolicyResearchand Department of Medicine, unreliable,19"21 compositesof thosemeasuresthatreflecta
Schoolof Medicine,University of California,
Irvine,California;tlnsti-
tuteforClinicalResearchandHealthPolicyStudies,Tufts-New England
patternof consistent physicianbehavioracross patientsin
Medical Center,Boston,Massachusetts; and ^NationalCommitteeon his/her practice(ie, those witha strong"physicianthumb-
QualityAssurance,Washington, DC. print" or substantial physicianversuspatientor healthcare
Supported by grantsfromtheCommonwealth Fund,New York,NY. systemvariation)shouldbe created.Creatingsuchcomposite
Reprints:SherrieH. Kaplan,PhD,MPH, Department ofMedicine,Centerfor or summary scoresis based on thepremisethatphysicians'
HealthPolicyResearch,University of California,111 AcademySuite
practicepatterns fora specificdisease or conditionwill be
220, Irvine,CA 92697. E-mail:skaplan@uci.edu.
Copyright © 2009 by LippincottWilliams& Wilkins relativelyconsistent (or reliable)acrosspatientsundertheir
ISSN: 0025-7079/09/4704-0378 care forthatcondition.These 2 stepscould improvephysi-

378 Medical Care • Volume 47, Number4, April2009

This content downloaded from 137.205.50.42 on Sun, 13 Dec 2015 15:32:39 UTC
All use subject to JSTOR Terms and Conditions
MedicalCare • Volume 47, Number4, April2009 of PhysicianPerformance
Reliability Assessment

cian-levelreliabilityand reducethe risk thatphysicians' For patientcase-mixmeasures,missingdata were imputed


performance will be misclassified. usinga singleimputedHot-Deck26procedure.For clinical
Usingthe data gatheredas partof theNationalCom- processmeasures,missingdata were scoredas "0" or not
mitteeon QualityAssurance(NCQA) and the American performed. Iftheoutcomemeasurewas missing, theoutcome
DiabetesAssociation's(ADA) Diabetes ProviderRecogni- measurewas scoredas "0" or notachievingthetargetvalue
tionProgram(DPRP),22we testedtheassumption that,with if theprocessmeasurewas notperformed, and leftblankif
this 2-stepapproach,reliabilityof physician-level perfor- theprocessmeasurewas performed. Beforeimputation,miss-
mancescoreswould be greaterthanpreviouslyreported,23 ing data foroutcome measuresranged from <1% for blood
and riskof misclassification
wouldbe reduced.This article pressurevaluesto 8% forlipidvalues.
will presentthephysician-level resultsof a com-
reliability
positescoreusingthis2-stepapproach. Design
The studywas a retrospective
assessmentofthequality
METHODS of care providedby participating physiciansfor eligible
Data foreachpatientreflected
patients. a 1-yearretrospective
Settings and Sample intervalfromtheindexdate.
StudydataderivedfromtheNCQA andADA DPRP, a
nationalprogram to evaluatethequalityof careprovidedfor StatisticalAnalysis
chronicdiseases.22'24 physicianswereaskedto
Participating All datawereanalyzedusingSAS software.27 We used
submitqualityof diabetescare data fora minimumof 35 univariateanalysesto describecharacteristics of the study
eligiblepatients.Patientswereeligibleforinclusionif they samplesandtoexaminethefrequency ofperformance ofeach
hadbeenseeingtheparticipating physicianas theirprincipal diabetesqualitymeasure.We fitbinarymixedmodelsusing
provider fordiabetescarefor12 months. All eligiblepatientsthe Newton Raphson algorithmwith adaptive Gaussian
presenting tothepracticewithin1 yearofan initialindexstart quadrature (PROC NLMIXED procedurein SAS) withand
datewererecruited consecutivelyuntiltheminimum sample without adjustment forpatientcase-mixto estimate theintra-
size was achieved.Of the 124 primary care practices,126 class correlationcoefficient (ICC) and assess the effectof
endocrinology practices,and 50 physiciansfrommultiple physician-level clustering on qualitymeasures,particularly
specialtypracticeswho voluntarily enrolledin theDPRP in on varyinglevels of outcomemeasures.We provideaddi-
thestudyperiod,we restricted thesampleto thoseforwhom tionaldetailon theuse of intraclass correlations
to estimate
individualphysicians couldbe identified
andwhohadmetthe 'physician effects'on qualitymeasuresinAppendixA, http://
minimum samplesize (n = 35), determined
patient byNCQA links.lww.com/A740.
to providesufficient physician-level
reliabilityforprogram We used item-to-total correlation coefficients
to assess
purposes.22 thecontribution of individualdiabetesqualitymeasuresto a
compositescore. We used Cronbach'sa to assess the
Data Collection internalconsistency reliability of the compositephysician-
Practicessubmitted datato theDPRP betweenOctober level diabetesqualitymeasure.The compositescale was
1999 and July2001. Data were abstractedfromeligible computedas a simplealgebraicsumof each patient'sscore
patients'medicalrecordsbypersonnel ateachpracticesite,in on thedichotomized individual qualityitems;physician-level
compliancewiththedetailedspecifications oftheDPRP.22'24 compositescores were computedas the mean of patient
Data integritywas monitored usinga randomauditof 5% of scores for each physician.These scores were thentrans-
officesprovidingdata. Patientsurveys,administered at the formed to rangefrom0 to 100 by subtracting thetheoretical
timeof studyrecruitment, were used to collect case-mix minimum scorefromthephysician'smeanscore,dividingby
variablesincludingage, gender,education,minority status, thetheoretical maximum minusthetheoretical minimum and
healthstatus,duration of diabetes,and insulinuse. multiplying by 100. We used generalizedestimating equa-
tionswithcovariatesto obtainadjusted,physician-level qual-
Measures ityscores.We thencalculatedthemeanand standard errors
The diabetesqualityof care measuresused forthis foreach physician'sperformance score on the composite
studywereinitiallydevelopedby theDiabetesQualityIm- measure.Nonparametric smoothingsplines were used to
provement Projectand thenadoptedforuse in theNCQA/ generateplotsfordiffering levelsof riskfactors.
ADA DPRP.25The measuresusedforthisstudyincludedthe
annualperformance andcorresponding levelsofHbAlc, lipid Role of the Funding Source
profiles[high-density lipoprotein cholesterol(HDL), low- This researchwas supportedby The Commonwealth
densitylipoprotein cholesterol(LDL), and a
triglycerides], Fund. Data fromtheNCQA andADA's DPRP wereanalyzed
testforurinemicroalbuminuria, dilatedeyeexamination, foot bytheauthorswithsupportfromThe Commonwealth Fund.
examination, and annualcheckand level of blood pressure. The Commonwealth Fund had no otherrole in thedesign,
Patient-reportedcase-mixmeasuresincludedpatient conduct,or reporting of thestudy.
age, gender,minority status,education,overallhealthrating Because all data used in theseanalyseshad been de-
(reportedon a 5-pointLikertscale rangingfrom"excellent" identified by NCQA to protectbothpatientand physician
to "poor"),andduration ofdiabetes.Each participating we receivedexemption
prac- identities, statusfromtheInstitutional
ticeself-reported
specialtiesofphysicians withinthepractice. ReviewBoard.

© 2009Lippincott
Williams
& Wilkins 379

This content downloaded from 137.205.50.42 on Sun, 13 Dec 2015 15:32:39 UTC
All use subject to JSTOR Terms and Conditions
Kaplanetal MedicalCare • Volume47, Number4, April2009

RESULTS TABLE 2. Physician Performance Of Diabetes Quality


Measures*
Patient and PhysicianSample Characteristics
A total of 210 physiciansand 7574 patientswere Performance
includedin thisstudy(Table 1). Patientsrangedin age from Diabetes QualityMeasure % 95% CI
43 to 74 years[meanage, (±SD), 60.4 ± 14.5],themajority
Processmeasures(annual)
werefemale,20% wereminority, approximately 1 in 4 had
HbAlc 84.5 (81.2,87.8)
attended the
college, overallhealth 0-100)
rating(range, was
Lipids 78.7 (75.4, 82.0)
relativelylow55[meanrating(±SD), 53.4 ± 25.0],37.4% of 45.5
Urinemicroalbumin (40.7, 50.3)
patientshad had diabetesfor>15 years,and 51.2% were 65.0
thesame Eye exam (61.0,69.0)
currentlybeingmanagedon insulin.Approximately, Foot exam 71.9 (67.2,76.6)
of and
proportion specialists(endocrinologists) generalists Blood pressurechecked 97.7 (96.6,98.8)
(generalinternistsand familymedicinephysicians)were
Outcomemeasures
includedin thestudy. 88.4
HbAlc<10% (85.0,91.8)
Physician-LevelPerformanceof Individual HbAlc <9% 72.9 (69.5, 76.3)
Diabetes Quality Measures HbAlc<8% 58.2 (54.9,61.5)
Withtheexceptionof annualcheckof bloodpressure, HbAlc <7% 35.2 (32.2, 38.2)
LDL <130 mg/dL 56.6 (53.1,60.1)
performed for an average of 97.7% of studyphysicians'
LDL <100 mg/dL 30.9 (27.9, 33.9)
patients,the majorityof the measuresof the process of 49.2
diabetescarewerewelldistributed,withenoughvariabilityto HDL adequate1 (46.0, 52.4)
suchas those <200 mg/dL 53.0 (49.6, 56.4)
allow forcomparison of groupsof physicians, Triglycerides
BP < 140/90mmHg 66.4 (63.5, 69.3)
scoringin thehighestand lowestquartilesof quality(Table
2). The outcomeswereachievedsignificantlyless frequently ♦Table entriesare percentof testsperformed or outcomesachieved,averaged
forthe
withinphysicianforeach physician'spatientsthenaveragedacrossphysicians
as the designatedlevels of each measuredecreased,with =
totalphysiciansample(n 210).
relativelyfew patientsachievingHbAlc levels <7% and +HDL >35 mg/dLformenand >45 mg/dLforwomen.
LDL <100 mg/dL(38.2% and 30.9%, respectively).

PhysicianVersus Patient "Effects"on Individual As evident,foradjustedvalues of HbAlc <8% and


Diabetes Outcome Measures
LDL <120 mg/dL,theadjustedICCs dropbelow 0.30, the
To estimate themagnitude ofthevariabilityin individ-
criteriawe establishedformaximizing physicianreliability
ual outcomemeasuresproducedby physician-level cluster- to
witha feasiblenumber ofqualityitems.We also attempted
ing,thephysician"effect"(see Statistical Analysis,above), the lowest value of each outcome measure nearest
we computedthe ICC foreach outcomemeasure,adjusted identify
establishednationalnormsand associatedwiththegreatest
and unadjustedforpatientcase-mix.Adjustment forpatient
physician effect.Usingthesecriteria,thegreatestphysician
characteristics shownin previousstudiesto affectdiabetes 9% for
effectappearsto occurat a value of approximately
outcomes24'31"34 reducedthephysicianeffectat the lowest
HbAlc andbetween130mg/dLand 120mg/dLforLDL. We
values of theseoutcomemeasures.These relationships are
thereforechosevaluesof 9% forHbAlc and 130 mg/dLfor
presented forthecontinuous values of HbAlc and LDL in
LDL forcreationof the compositemeasure.These values
Figures1 and 2, respectively. also corresponded tothelevelsusedintheNCQA healthplan
HEDIS35 diabetesmeasuresset at thetimeof thestudyand
subsequently endorsedby NQF.
TABLE 1. Description of the Study Samples* We useda similarapproachto identifythresholdvalues
Characteristics Mean (SD) or % forthebloodpressure measures.However,we wereunableto
Patientsample(n = 7574) identifyanyvalues forsystolicand diastolicbloodpressure
Mean age (SD) 60.4(14.5) for which the physicianeffect,as measuredby the ICCs
Male (%) 46.6 approximated0.30. Further,unlike otheroutcomemea-
Minority (%) 20.5 sures wherethe ICCs followeda more lineartrendwith
Some collegeeducation(%) 24.0 declining values before approximatingasymptosis,the
Mean overallhealthrating(SD) 53.4 (25.0) ICCs were highlyvariableand followedno obvioustrend
Durationdiabetes>15 yr(%) 37.4 for systolic values <150 mm Hg and diastolic values
On insulin(%) 51.2 <120 mm Hg.
Physiciansample(n = 210)
Reliabilityof a Composite Physician-Level
Specialty
41.9
Diabetes Quality Scale Based on Individual
Endocrinology(%)
Generalinternal
medicineand familypractice(%) 41.4
Measures With High Physician Effects
Both(%) 16.7 Because singlemeasuresare less reliablethanmulti-
itemmeasuresofanyconstruct,we examinedtheassociations
as indicated.
aremeanswithstandard (SD) orpercentages
theycouldbe
♦Tableentries deviations whether
amongdiabetesmeasuresto determine

380 © 2009Lippincott & Wilkins


Williams

This content downloaded from 137.205.50.42 on Sun, 13 Dec 2015 15:32:39 UTC
All use subject to JSTOR Terms and Conditions
Medical Care • Volume 47, Number 4, April2009 Assessment
of PhysicianPerformance
Reliability

0.8 H

' I Unadjusted Adjusted | ^s--^'

°'7~ s''"' ^^^

o • S y^^

- : /' ,/

o0.4; /' ^^

"
o.3:

o , , , ,
y^ j
7 8 9 10 11

HbftiC X

FIGURE 1. "Physicianeffects"on HbAic. Binarymixed models using the NewtonRaphsonalgorithmwithadaptive Gaussian


quadrature(PROC NLMIXED procedurein SAS) with and withoutadjustmentforpatientcase-mix used to estimateintraclass
correlationcoefficients status,and
(ICC); adjustmentvariableswere patient'sage, overallhealth rating,education, minority
durationof diabetes.

aggregatedintoa compositemeasure.The internalconsis- fixedat 9. Withat least 25 patientsper physician,and an


tencyreliability for 10 of the measuresin this
coefficients intraclass of0.20,an internal
correlation reliabil-
consistency
dataset(a = 0.79,Table 3) was adequateforcomparisons of of 0.76 couldbe achieved.
itycoefficient
groupsofphysicians, suchas thosescoringat thehighestand
lowestquartilesof thequalityscore.One processmeasure, Effectof Case-Mix Adjustment on Composite
annualcheck of blood pressure,was eliminatedfromthe Physician Performance Scores
measuresto be aggregated due to lack of variation.Further, Using ordinaryleast squaresregression, we adjusted
thebloodpressure outcomecontributed relativelylittleto the thecompositephysician performance scoresforpatient'sage,
variation in thecompositephysician-level measure(item-to- overallhealthrating, education,
minority status,andduration
totalcorrelation= 0.18). Elimination ofthismeasuredidnot of diabetes(Table 4). Age,gender,overallhealthrating, and
resultin a reductionof theinternal consistency reliabilityof minoritystatuswere all significantly associatedwiththe
the compositemeasure(column 2, Table 3), and it was compositephysicianperformance score.However,although
therefore excluded. the overall model was statisticallysignificant =
(F(44572)
Because the reliabilitycoefficient could have been 26.69,P < 0.001), thetotalvarianceexplainedby case-mix
inflateddue to lack of independence betweenprocessand adjustorswas only4%. We used adjustedphysicianperfor-
outcomemeasures(outcomescould not be >0 unless the mancescoresto evaluatethedistribution of qualityof care
processmeasurewas performed), we combinedtheprocess acrossindividualproviders. the
Although adjustment shifted
measuresintoa singlecompositeprocessscore.As expected, of scores,it did not substantively
the distribution alterthe
thiscombinedprocessscorewas significantly less correlated positionofphysicians (rank)(datanot
withinthedistribution
witheach of the outcomemeasures(P < 0.05 for each shown).
measure).The reduction in thenumberof itemsincludedin
thecompositemeasuredid notaltertheobservedreliability Distributionand Individual Variation of
coefficient(a = 0.82; column3, Table 4). Usingthegener- Composite Physician-LevelPerformanceScores
alized SpearmanBrownprophecyformula,36 we estimated To examinewhetherand to what extentphysicians
theeffecton internal consistencyreliability reduction
of a in could be arrayedon a continuumof diabetesquality,we
samplesize withthenumberof itemsin thecompositescale computedtheadjustedmeanqualityscores,alongwithstan-

© 2009Lippincott
Williams
& Wilkins 381

This content downloaded from 137.205.50.42 on Sun, 13 Dec 2015 15:32:39 UTC
All use subject to JSTOR Terms and Conditions
Kaplan et al Medical Care • Volume 47, Number4, April2009

NLICC , .
0.39:

| Unadjusted Adjusted | J
38j

0.37:
f
0.36:
/

0.35- yy

o.33:
y/s"'

0.32 :
^^
0.3! ■
^"""
^^^

0.30: ^^ ^~''"' s^

0.29-
^^-^

0.28:\^^ ^^^^

0.27:

*.9k1 ' ' ■ ■ ' ' ' ' ' ' ' ' ' ' ' ' ' • ' ' ' ' ' ' ' ' ' ' ' ' ' '
I I ' ■ ' ' I I I
tOO 110 120 130 140

LDL ng/di

FIGURE2. "Physician on LDL levels.Binary


effect" mixedmodelsusingthe NewtonRaphsonalgorithm withadaptiveGaussianquadra-
ture(PROC NLMIXEDprocedurein SAS)withand withoutadjustment correlation
forpatientcase-mixused to estimateintraclass coeffi-
cients(ICC); adjustment
variableswerepatient'sage, overallhealthrating, status,and durationofdiabetes.
education,minority

TABLE 3. AssociationBetween IndividualDiabetes Quality TABLE 4. Case-Mix Adjustment:Effecton Composite


Itemsand Composite Scale* PhysicianPerformanceScores*
CorrelationWithCompositeScale Case-Mix Variables ParameterEstimates t P
IndividualQualityItems WithBPf WithoutBP* Composite§ Intercept 40.46 20.0 <0.0001
- Age 0.28 10.6 <0.0001
AnnualHbAlc 0.43 0.43
- Male 1.80 2.5 0.01
Annuallipidtests 0.75 0.76
- Overallhealthrating 0.09 5.9 <0.0001
Annualurinealbumin 0.30 0.28
- Some collegeeducation 1.24 1.5 0.14
Annualeye exam 0.41 0.41
- Minoritystatus 6.11 7.0 <0.0001
Annualfootexam 0.30 0.32
- - Durationdiabetes>15 yrs 0.78 1.0 0.32
Aggregateprocessmeasure 0.62
On insulin 1.44 1.9 0.06
HbAlC <9% 0.46 0.48 0.40
LDL <130 mg/dL 0.63 0.65 0.70 = 26.69; P < 0.0001.
F(4.4572)
HDL adequate1 0.63 0.64 0.66 R2 = 0.04.
Table entriesare derivedfromgeneralizedestimatingequationmodels,with
<200 mg/dL
Triglycerides 0.63 0.64 0.66 variables.
compositediabetesqualityscoresas dependent
BP < 140/90 0.18 - -
Cronbach'sa11 0.79 0.80 0.82
mentforcompositescoreswas 4.74. Figure3 graphically
Table entriesare item-to-total
correlation
coefficients.
that,withrelatively
illustrates errorsforindi-
smallstandard
+Bloodpressureoutcomeincludedin scale.
♦Bloodpressureoutcomeexcludedfromscale. vidualphysicianscores,and meansthatrangefrom44 to 93
^Internalconsistencycoefficient
computedwithall processmeasurescombinedto on the0 to 100 compositediabetesqualityscore,thecom-
createda singlecompositemeasure. betweenphysi-
^HDL >35 mg/dLformenand >45 mg/dLforwomen, positemeasurecan be used to discriminate
"internalconsistency coefficient.
reliability ciansscoringatthehighestandlowestquartilesofthequality
of care distribution.

darderrorsforthosemeans,forthe individualphysicians. DISCUSSION


The meanforthiscompositescale forall physiciansin the in providingeasilyaccessible
The increasinginterest
samplewas 72.3 (SD = 11.2); thestandard information
errorofmeasure- andinterpretable aboutthequalityofphysicians'

382 & Wilkins


Williams
© 2009Lippincott

This content downloaded from 137.205.50.42 on Sun, 13 Dec 2015 15:32:39 UTC
All use subject to JSTOR Terms and Conditions
MedicalCare • Volume 47, Number 4, April2009 Assessment
of PhysicianPerformance
Reliability

FIGURE 3. Distribution of individual


physician performance scores on com-
posite diabetes quality measure:
means and standard errors. Each verti-
cal line represents the mean and stan-
dard error of 1 of the 210 study phy-
sicians.

care9'3747 underscoresthe need for accurateand reliable closely relatedto achievementof the lowestvalues. The
measurement of physicianperformance. This articlefocuses evidencebasis for settingqualitystandardsis frequently
on improving thereliability
of physician-level performance based on relatively homogeneous patientsamplesfromran-
assessment, usingqualityof diabetescare as a model. domized controlledtrials.Frequentlyexcluded fromthis
Thereare at least 3 majorsourcesof unreliability in evidencebase arepatientswhoareolder,havemorecompli-
physician performance measurement: variationdue to inade- catedcomorbid conditions,arenotabletotolerate sideeffects
quate number of patientssampled, variationdue to an inad- or are not willingor able to adhere to treatment. Those
equatenumberofmeasuresusedto reflect performance in an organizationsinvolvedin the generationof qualitystan-
area,suchas diabetescare,andvariation dueto inconsistency dards22'25'35 have respondedto this issue by settingless
of a physician'sperformance acrosspatientsin his/her prac- stringentstandards, usuallybya processofconsensusinvolv-
tice.We usedtheNCQA andADA DPRP as thebasisforour ing clinicalexperts.Our approachto identifying thresholds
analysis.Methodsused to createthatdatabaseestablisheda thatmaximizethephysician effect on qualitymeasureswould
minimum patientsamplesize thatwas sufficientto minimize provideempiricalsupportforthosedecisions.
variabilitydue to samplesize. Curiously, forbothsystolicanddiastolicbloodpressure
To insurethata sufficient numberof measureswere values,we wereunableto identify anyvalueforwhichthere
used to create a compositephysician-level diabetescare was a compelling ICC (physician thumbprint); theremovalof
measure,we used a modifiedversionof the NCQA/ADA blood pressureoutcomesdid notaffectthereliability of the
Provider Recognition Program's measures
well-tested recom- composite diabetes quality of care scale. Blood pressure
mendedby theNationalDiabetesQualityImprovement Al- controlmayeitherbe moreinfluenced by patientthanphy-
liance,a coalitionof nationalorganizations.48
Because these sicianbehaviors, mayrepresent a separatebutrelatedquality
measureswere to be used to discriminate physician-level construct, andmayrequiremoremeasuresto createa reliable
performance, we firstexaminedthemeasuresforvariation at physician-levelcompositemeasure.
thephysicianlevel.Amongthecandidatemeasures,onlythe To maximizethereliability of physician-leveldiabetes
annualcheckof blood pressureshowedsuchlittlevariation performance assessment,we createda physician-level com-
thatit would not contribute meaningfully to differentiatingpositediabetesscale out of the9 individualmeasures.The
physicians'qualityofdiabetescareandhad littlerelationship reliability of thiscompositediabetesscale metthestandard
to theothermeasuresof diabetesquality.Removalof that thatwouldbe consideredsufficient forcomparing physician
measurefromtheoverallscoredid notreducethereliability groups,such as thosescoringabove or below a threshold
of thecompositemeasure. score,or thosescoringin thehighestor lowestquartileof
We nextexaminedtheextentof variationin theindi- scores,a >0.70.19'p The standarderrorsforindividual
vidualdiabetesqualitymeasuresthatwas attributable to the physicians wereconsiderably smallerthanthosewe observed
physicianversusthe patient(the physicianversuspatient in a previousstudy.23 Although thosescoringat thesehighest
As we observedinpreviousresearch,24
effect). we foundthat and lowestquartileshave,as a group,verydifferent perfor-
therewas a distinctivephysicianeffect
or"thumbprint" forall mance scores,individualsnear the thresholds have scores
of the processmeasures.That is, physiciansappearedto that,withinmeasurement error,could notbe distinguished
behaveconsistently acrosspatientsin theirpracticesforthe fromthose near the otherside of the threshold value. In
processmeasures.Fortheoutcomemeasures,we observeda practice,it wouldbe reasonabletherefore, to boundthresh-
progressive decreasein thephysicianeffectas valuesof the olds by someconfidence interval thatwoulddefinea "ques-
outcomemeasuresdecreased,suggesting thatpatientcharac- tionable"or"preliminary" areaofhighorlowperformance to
as opposed to physicianbehavior,may be more mediatemisclassification.
teristics,

© 2009Lippincott
Williams
& Wilkins 383

This content downloaded from 137.205.50.42 on Sun, 13 Dec 2015 15:32:39 UTC
All use subject to JSTOR Terms and Conditions
Kaplanetal MedicalCare • Volume47, Number4, April2009

Witha sufficient number ofpatients perphysician (n = responsiveness to efforts to improvescores,the subjectof


36), and a sufficient sample of items with intraclass correla- recenton-goingstudies61"65 - was not addressedin this
tions>0.30, withwhichto represent thequalityof diabetes study.
care (k = 9), we were able to demonstrate the value of
Limitations
creatingcompositeor summaryscales to reflectqualityof
care reliably.Withthe intraclasscorrelations we observed, We developedthisapproachin diabetes,a prevalent
a derivation of the Brown we disease withmultiplewell-tested measures.Otherdiseases
using Spearman prophecy formula,36
also estimatedthatwith25 patientsper physician,scores may have fewer measures to representqualityor fewer
wouldremainsufficiently reliableforthekindsof compari- patients with the disease per physician.Further researchis
sonswe conducted. needed to determine whether for other conditions or other
in scale scoresforthecase-mixmeasures data sources, a similar approach to performance assessment
Adjustment can be used to createcompositescores aggregatedacross
resulted inan overallpositiveshiftintheentiredistribution of
common diseasesto improvereliability.
physicians'scoresbutdidnotresultin substantial changesin Ourphysiciansamplevolunteered to participate in the
individual physician ranks(eg, fromthelowestto thehighest
ProviderRecognition Program. Mean scoresforqualitymay
quartile).Adjustment forcase-mixundersuchcircumstances,
notgeneralizeto othersamplesof physicians. However,our
althoughappropriatefor credibility and to minimizethe
the internal of the
possibility of misclassification, wouldtherefore be expected findings regarding consistency composite
have been observedforothermulti-item measures
to have littleeffect,as observed.However,althoughthis measure, of quality66 and shouldbe robust.
overallgeneralhealthratingmeasurehas been linkedwith
Finally,data were abstractedand submitted by the
declinesin functioning and increasedmortality rates,49it themselves, the concerns over data
shouldbe notedthatbettercase-mixmeasures50 practices raising integrity.
may have Physicians couldhaveselectedthosepatientsforwhomtheir
had a moresubstantial effecton physicianrankings. was optimal,versus consecutivepatientsas
Asnoted 1'51~53 performance
byothers evaluatingproviderperformance,81 instructed in thesamplingspecifications. If so, theobserved
our findingsdo not supportthe use of compositequality meansforthe measures would have been ex-
performance
scoresevenatthelevelofreliability we observed, tocompare
pectedto be farin excessofthoseactuallyreported. Further,
individualphysiciansone to another.Rather,our findings we eliminated thosemeasureswithverylittlevariation.
indicatethattheuse ofcompositemeasuresofqualityofcare,
chosenfroman evidencedbased setof individualmeasures, Implications
could be used withreasonableconfidence to identify those With attention to specificmethodologiccriteria,we
physicians whose scores for example,positioned them in the have an
developed approachto thecreationof a composite
highestor lowestquartileof physiciansstudiedor whose diabetesqualityscale thatis reliableat theindividualphysi-
scoresexceededor fellbelow somedefineddistancefroma cian level.We suggestthatthereliability of physician-level
threshold.In practicalterms,confidenceintervalsforthe qualitymeasurescan be improvedin 3 ways: by choosing
threshold valuescouldbe defined;physiciansscoringwithin cutoff valuesto dichotomize outcomemeasuresclosesttothe
thoseconfidence intervals couldbe given"provisional status" desiredtargets thatmaximizethephysician effect; byincreas-
and re-evaluated in subsequentpay-for-performance assess- ingthenumber ofmeasuresusedto createa compositescore;
ments.Qualitycontrolto minimize misclassification couldbe andby increasing thenumberof patientssampledperphysi-
augmentedby a further secondary review of outliers or cian. We further suggestthatthisapproachcouldbe applied
randomly sampledsubjects.54 to other highly prevalent chronicconditions, suchas cardio-
Based on well-established principles of measurement vascular disease, with a sufficient number of reliableand
andrecentadvancesin provider performance assessment,55"60valid measuresused to represent quality,and measuredin
we havedevelopeda setofcriteria thatcouldbe usedtoguide thosepracticeswithadequatepatientsamplesperphysician.
individualphysicianperformance assessment. These criteria If further testsof our approachreplicatetheresultsof this
(see Appendix Table B for further explanation, http://links. study, we would be able to providea scientifically defensible
lww.com/A740) include:(1) choosingwell-testedpatient- basis forpay-for-performance programs whilebothaddress-
levelqualityof caremeasuresforspecificdiseasesto assess ingmuchof theresistance by thephysiciancommunity and
of physician (2) finding appropriate thresholds responding to the public's and payor's call for public access
performance;
forany outcomemeasuresused to reflectthe physician's to information aboutthequalityofphysicians'medicalprac-
versus or other effects; (3) sampling tice.
performance patient
sufficient numbersof patientsand physiciansto achievethe
REFERENCES
power needed for conductingmeaningful comparisonsof 1. Institute
of Medicine:RewardingProviderPerformance: AligningIn-
profilescores;(4) developingscoringmethodsforaggregat- centivesin Medicare. Washington, DC: NationalAcademiesPress;
ingqualitymeasuresintoa singleprofile score;(5) testing the 2006.
of
reliability compositeprofile scores; (6) adjustingcompos- 2. PhamHH, SchragD, O'Malley AS, et al. Carepatternsin Medicareand
ite profilescores forphysician-level differences in patient theirimplications forpay forperformance.
TVEngl J Med. 2007;356:
1130-1139.
case-mix;and (7) identifying mutablephysician/practice 3. Medicare
PhysicianPaymentReformand QualityImprovement Act of
characteristics to
relatedto profilescores facilitatequality 2006: HS 5866.
improvement. Of the7 criteria, onlythelatter - mutability or 4. O'Brien SM, ShahianDM, DeLong ER, et al. Qualitymeasurement in

384 © 2009Lippincott
Williams
& Wilkins

This content downloaded from 137.205.50.42 on Sun, 13 Dec 2015 15:32:39 UTC
All use subject to JSTOR Terms and Conditions
MedicalCare • Volume 47, Number4, April2009 of PhysicianPerformance
Reliability Assessment

adult cardiac surgery:Part 2-statisticalconsiderations in composite 32. KarterAJ, FerraraA, Liu JY, et al. Ethnicdisparitiesin diabetic
measurescoringand providerrating.Ann ThoracSurg.2007;83:513- complications in an insuredpopulation. JAMA.2002;287:2519-2527.
526. 33. SchneiderEC, ZaslavskyAM, EpsteinAM. Racial disparitiesin the
5. BurgerI, Schill K, GoodmanS. Disclosureof individualsurgeon's qualityof care forenrolleesin Medicaremanagedcare.JAMA.2002;
performance ratesduringinformed consent:ethicaland epistemological 287:1288-1294.
considerations.AnnSure. 2007;245:507-513. 34. Fiscella K, FranksP. Influenceof patienteducationon profilesof
6. RosenthalMB, FrankRG, Li Z, et al. Earlyexperiencewithpay-for- physicianpractices.AnnInternMed. 1999;131:745-751.
performance: fromconceptto practice.JAMA.2005;294:1788-1793. 35. NationalCommitteeon QualityAssurance.Availableat: http://www.
7. IglehartJK. Linkingcompensation to quality - Medicarepaymentsto ncqa.org/dprp/dqip2.htm#synopsis.
physicians.N EnglJ Med. 2005;353:870-872. 36. SpearmanC. Demonstration of formulaefortruemeasurement of cor-
8. Huang IC, Diette GB, Dominici F, et al. Variationsof physician relation.AmJ Psychol.1907;18:161-169.
group profilingindicatorsfor asthmacare. Am J Managed Care. 37. Ryne SL, GerhartB, Parks L. Personnelpsychology:performance
2004;10:38-44. evaluationand pay forperformance. AnnuRev Psychol.2005;56:571-
9. EpsteinAM, Lee TH, Hamel MB. Payingphysiciansforhigh-quality 600.
care.N EnglJ Med. 2004;350:406-410. 38. Centersfor Medicare and Medicaid Services. CMS demonstration
10. ParkertonPH, SmithDG, Belin TR, et al. Physicianperformance projects:Medicarephysician grouppracticedemonstration. Availableat:
assessment: nonequivalence ofprimary caremeasures.Med Care. 2003; http://www.cms.hhs.gov/researchers/demos/PGP.asp.
41:1034-1047. 39. The LeapfrogGroup. Incentiveand rewardcompendiumguide and
11. Landon BE, NormandSL, BlumenthalD, et al. Physicianclinical glossary.Availableat: http://www.ir.leapfroggroup.org/compendiumresult.
performance assessment: prospectsandbarriers. JAMA.2003;209:1183- cfrn.
1189. 40. O'Kane ME. Performance-based measures:the early resultsare in.
12. ShahianDM, NormandSL, TorchianaDF, et al. Cardiacsurgery report J Manag Care Pharm.2007;13(Suppl8):S3-S6.
cards:comprehensive reviewand statistical critique.AnnThoracSurg. 41. Davies TJ.Pay forperformance: a businesscase forqualityforCalifor-
2001;72:1845-1848. nia physiciangroups.Manag Care. 2004;13(Suppl10):3-8.
13. FischerE. Payingforperformance-risks and recommendations. N Engl 42. Integrated HealthcareAssociation.Historyof IHA's pay forperfor-
J Med. 2006;355:1845-1847. manceinitiative. Availableat: http://www.iha.org/payfprfd.htm.
14. Steinbrook R. Publicreportcards:cardiacsurgery and beyond.N Engl 43. GarberA. Evidence-based guidelinesas a foundation forperformance
JMed. 2006:355:1847-1849. incentives.HealthAff.2005;24:174-180.
15. WernerRM, Asch DA. The unintended consequencesof publiclyre- 44. TerryK. Pay for performance: a double-edgedsword.Med Econ.
porting qualityinformation. JAMA.2005;293:1239-1244. 2005;82:64-66.
16. HannanEL, SarrazinMS, DoranDR, etal. Provider profilingandquality 45. KlugeEW. Physicians'practiceprofiles andthepatient'srightto know.
improvement effortsin coronaryartery bypassgraftsurgery: theeffect J Eval ClinPractice.2000;6:235-239.
on short-term mortalityamongMedicarebeneficiaries. Med Care. 2003; 46. MannionR, Davies HT. Reporting healthcare performance: learning
41:1164-1172. fromthepast,prospectsforthe future. J Eval Clin Practice.2002;8:
17. DranoveD, KesslerD, McClellanM, et al. Is moreinformation better? 215-228.
The effectsof "reportcards"on healthcare providers. J Polit Econ. 47. GalvinR, MilsteinA. Largeemployers'new strategies in healthcare.
2003;! 11:555-588. N EnglJMed. 2002;347:939-942.
18. JhaAK, EpsteinAM. The predictive accuracyof theNew York state 48. NationalDiabetesQualityImprovement Alliance.Availableat: http://
coronaryarterybypasssurgeryreport-card system.HealthAff(Mill- www.nationaldiabetesalliance.org.
wood).2006;25:844-855. 49. Lee Y. The predictivevalues of self assessed general,physicaland
19. Nunnally JC,BersteinIH. Psychometric Theory. 3rded. New York,NY: mentalhealthon functionaldecline and mortality in older adults.
McGraw-Hill;1994. J EpidemiolCommunity Health.2000;54:123-129.
20. ShroutPE. Measurement and agreement
reliability in psychiatry.Stat 50. LitwinMS, Greenfield S, ElkinEP, et al. Assessment ofprognosis with
MethodsMed Res. 1998;7:301-317. the totalillnessburdenindexforprostatecancer.Cancer. 2007;109:
21. MclverJP,CarminesEG, Zeller RA. "Multipleindicators."In: Car- 1777-1783.
minesEG, ZellerRA, eds. Measurement in theSocial Sciences.Cam- 5 1. Rosenthal MB, LandonBE, NormandSL, et al. Pay forperformance in
bridge,UK: CambridgeUniversity Press; 1980. commercial HMOs. N EnglJ Med.2006;355:1895-1902.
22. NationalCommittee forQualityAssurance,DiabetesPhysicianRecog- 52. NarinsCR, DozierAM, LingFS, etal. The influence ofpublicreporting
nitionProgram. Availableat: http://www.ncqa.org/dprp/. ofoutcomedataon medicaldecisionmakingbyphysicians. ArchIntern
23. HoferTP, HaywardRA, GreenfieldS, et al. The unreliability of Med. 2005;165:83-87.
individualphysician "report cards"forassessingthecostsandqualityof 53. GlickmanSW, Ou FS, DeLong ER, et al. Pay forperformance, quality
careof a chronicdisease.JAMA.1999;281:2098-2 105. of careand outcomesin acutemyocardial infarction.JAMA.2007;297:
24. Greenfield S, Kaplan SH, Kahn R, et al. Profiling care providedby 2372-2380.
differentgroupsof physicians:effectsof patientcase-mix(bias) and 54. HoferTP, Asch SM, HaywardRA, et al. Profiling qualityof care: is
physician-levelclusteringon qualityassessment AnnInternMed.
results. therea roleforpeerreview?BMC HealthServRes. 2004;4:9.
2002;136:l11-121. 55. NormandSL, Zou KH. Samples size considerations in observational
25. FlemingBB, Greenfield S, EnglegauMM, et al. The diabetesquality healthcare qualitystudies.StatMed. 2002:21:331-345.
improvement project:movingscienceintohealthpolicyto gainan edge 56. BronskillSE, NormandSL, LandrumMB, etal. Longitudinal of
profiles
on thediabetesepidemic.DiabetesCare. 2001;24:1815-1820. healthcareproviders. StatMed. 2002;21:1067-1088.
26. ReillyM, Pepe M. The relationship betweenhot-deck multipleimputa- 57. HuangIC, DominiciF, Frangakis C, etal. Is risk-adjustorselectionmore
tionand weightedlikelihood.StatMed. 1997;16:5-19. important thanstatistical approachforproviderprofiling? Med Decis
27. SAS Institute. StatisticalAnalysisSystem.Version8. Cary,NC: SAS Making.2005;25:20-34.
2000.
Institute; 58. ZaslavskyAM, Shaul JA,ZaborskiLB, et al. Combininghealthplan
28. Deletedin Proof. performance indicators intosimplercompositemeasures.HealthCare
29. CronbachLJ.Essentialsof PsychologicalTesting.4th ed. New York, FinancRev.2002;23:101-115.
NY: Harperand Row; 1984:Chapter 6. 59. Lied T, MalsbaryR, Ranck J. CombiningHEDIS indicators:a new
30. GrantRW, Buse JB,Meigs JB,et al. Qualityof diabetescare in US approachto measuringplan performance. Health Care Financ Rev.
academicmedicalcenters.DiabetesCare. 2005;28:337-442. 2002;23:l 17-129.
3 1. HeislerM, SmithDM, HaywardRA, et al. Racialdisparities in diabetes 60. AustinPC, AlterDA, Tu JV. The use of fixed-and random-effects
care process,outcomesand treatment intensity.Med Care. 2003;41: modelsforclassifying hospitalsas mortality outliers:a MonteCarlo
1221-1232. assessment. Med Decis Making.2003;23:526-539.

Williams& Wilkins
© 2009 Lippincott 385

This content downloaded from 137.205.50.42 on Sun, 13 Dec 2015 15:32:39 UTC
All use subject to JSTOR Terms and Conditions
Kaplanetal MedicalCare • Volume47, Number4, April2009

61. Epping-Jordan JE,PruittSD, Benoga R, et al. Improvingthequality bya modification


abilityofsucha measurecanbe determined
of healthcare forchronicconditions.Qual Saf Health Care. 2004;
13:299-305.
of theSpearmanBrownprophecyformula(36):
62. HynesDM, PerrinRA, RappaportS, et al. Informatics resourcesto
supporthealthcarequalityimprovement in theVeteransHealthAdmin-
istration.
J AmMed Inform Assoc. 2004;11:344-350. k(ICC)
63. Kim C, WilliamsonDF, SaffordMM, et al. Managedcare organiza- r"~
tionand thequalityof diabetescare: The TranslatingResearchinto i + (k- \){icc)
ActionforDiabetes (TRIAD) study.Diabetes Care. 2004;27:1529-
1534. where:
64. ChinMH, Cook S, DrumML, etal. Improving diabetescarein Midwest
community healthcenterswiththe HealthDisparitiesCollaborative. rtt= physician-level coefficient
reliability
DiabetesCare. 2004:27:2-8. k = numberof qualityitemsused to createphysician-
65. Lindenauer PK, RemusD, RomanS, et al. Publicreporting and payfor
performance in hospitalqualityimprovement. N EnglJMed. 2007;356: level score
486-496. ICC = physician-level
intraclasscorrelation
66. KaplanSH,Greenfield S, GandekB, etal. Characteristics with
ofphysicians
participatory
decision-making AnnIntlMed. 1996;124:497-504.
styles. increasesas thenumberof
coefficient
This reliability
67. ShroutPE, Fleiss JL. Intraclasscorrelations: uses in assessingrater measuresandtheICC's ofthosemeasuresincrease(see Table
PsycholBull. 1979:86:420-428.
reliability.
68. McGraw KO, Wong SP. Forminginferencesabout some intraclass A., below). For physicianprofilesthe ICC is givenby the
formula:ICC = r, = (msbetween - +
correlationcoefficients.
PsycholMethods.1996;l:30-46. mswithin)/(msbetween
[k-l]mswithin)where =
msbetween mean-square estimate of
APPENDIX A. THE USE AND INTERPRETATION betweenphysicianvariation,mswithin = mean-squareesti-
OF INTRACLASS CORRELATION COEFFICIENTS mateof variationacrosspatientswithina physician'sprac-
TO ASSESS THE RELIABLE "PHYSICIAN EFFECT" tice,and k = averagenumberof patientsperphysician.As
The intraclasscorrelation is calculatedas the ratioof the can be seenfromthisequation,physician can
levelreliability
variation betweenphysicians inthesample(numerator) tothe also be improved bysamplinga largernumber ofpatients per
variationbetweenphysiciansplus the variationwithina
physician.
physician'spatientsample(denominator). If the qualityof
is
physicians'performance very consistentacross patientsin
theirpractices(i.e. thevariation withphysicians'practicesis
fromeach TABLE A. ReliabilityCoefficients for Varying Number of
small)and theypracticesubstantially differently
Items and Intraclass Correlation Coefficients
other(i.e. thebetweenphysician variationsis large),thenthe
ICC will be large.Conversely, if the qualityof physicians' ICC
Numberof items .10 .20 30 .40 .50
varies across
performance substantially patients intheirpractices |
is .40 .50
(i.e. the within physician variation large)and physicians 1 .10 .20 .30
toeachother(i.e. thebetween 2 .18 .33 .46 .57 .67
practice verysimilarly physician
3 .25 .43 .56 .67 .75
variation is small),thentheICC willbe small.The largerthe
4 .31 .50 .63 .73 .80
ICC, thegreater thephysician-level reliabilityor consistency.
5 .36 .56 .68 .77 .83
Use of ICC coefficients to estimatereliability is well docu- .87 .91
10 .53 .71 .81
mented &
(Shrout Fliess,1979; McGraw & Wong,1996). 20 .69 .83 .90 .93 .95
If a compositemeasureof physician-level diabetes
Shrout,PE & Fleiss,JL (1979). IntraclassCorrelations:Uses in AssessingRater
quality is created bysumming the dichotomized performance PsychologicalBulletin,Vol. 86, 2, 420-428.67
Reliability,
indicators foreach patientin a practice,and thenaveraging McGraw KO, Wong SP, Forminginferences about some intraclasscorrelation
PsychMethods1996; l(l):30-46.68
coefficients,
acrossall patientssampledfromthatpractice,thenthereli-

386 © 2009Lippincott & Wilkins


Williams

This content downloaded from 137.205.50.42 on Sun, 13 Dec 2015 15:32:39 UTC
All use subject to JSTOR Terms and Conditions
MedicalCare • Volume 47, Number4, April2009 of PhysicianPerformance
Reliability Assessment

PerformanceScores
APPENDIX B. CriteriaforDeveloping Composite Physician-Level
Issue Implications
• Qualityof caremeasurescan represent
physician'scare • Choose measuresthatmakeit easierto detectphysician"thumbprint,"
i.e.
and patientand/orsystemcharacteristics thosewith:
• less "patientinfluence'Vgreater
physicianeffect
• normaldistributions
• adequatevariation
• smallersamplesizes neededto see meaningfuldifferences
• Patientswithspecificcharacteristics
(e.g. age, gender, • Hierarchicalor cluster-adjustedanalysisrequired
disease-severity, chooseand staywith
co-morbidity) # Choose good measuresof caSe-mixto minimizebias; somedatasetsmay
physicians withspecificcharacteristics
(e.g. age, gender, notincludekeycase.mixvariables(e.g. patientadherence, co-morbidity);
specialty)(case-mixbias) scrutinizedata setsto identifyas manysourcesof patientcase-mix
variablesas appropriate
• Samplesize constraints limitnumberof case-mixvariablesforadjustment
of individualphysicianprofiles;use as manyof thesemeasures(patient
sociodemographic characteristcs,illnessburden,preferences,
adherenceto
treatment)as patientsamplesize perphysicianwill tolerate,
or consider
forcreatinga 'case-mixcomposite'
strategies
• Power • Sufficient
numbersof patientsmustbe sampledformeasuresto be reliable
at physicianlevel
• Increasingnumberof patientsto improvereliability numberof
multiplies
physiciansneededto comparelevelsof care (i.e. pass/fail,
tertiles,
etc.),
maynotcontribute and adds coststo datacollection;
uniqueinformation,
therefore
avoid unnecessarily
largepatientsamples
• Manychronicdiseasesare multi-dimensional; individual • Createcompositemeasuresthatare clinicallysensible
qualitymeasuresaren'treliableand don'treflect • Testcompositemeasuresforreliability, discriminate
validityforcomparing
physicians'totalcare fortheindexdisease;creating individualphysicians'overallqualityof care
compositesenhancesreliability
• Scoringmethods(suchas compensatory
or conjunctive • Test scoringmethodsforreproducibility
scoring)mayproducedifferent
rankingsof physicians • Interpreting
aggregatescores,determining differences,
meaningful etc.,
norms,broadtesting
requiresvalidation,
• Reliability/validity
of profilescores • Reliability
of individualmeasuresand aggregatephysician-level profile
scoresmustmeetminimalanalyticand feasibility standards
• Althoughexogenousvalidityvariablesforqualitydifficult to identify,
discriminant(betweenvs. withinphysician)validityof scoresshouldbe
determined
• Assess stability
of profilesovertimeifpossible,takingintoaccountreal
change(e.g. due to qualityimprovementactivities)
• Physician/practice relatedto quality
characteristics • Ideally,foroptimalinterpretation
of profilescores,changesin scores
shouldcorrespondwithqualityimprovement efforts

Williams& Wilkins
© 2009 Lippincott 387

This content downloaded from 137.205.50.42 on Sun, 13 Dec 2015 15:32:39 UTC
All use subject to JSTOR Terms and Conditions

You might also like