Level of Agreement in Clinicians'

J Oral Maxillofac Surg
52:565-571, 1994
Level of Agreement in Clinicians’

Perceptions of Class II Malocclusions
CEIB PHILLIPS, MPH, PHD,* L’TANYA J. BAILEY, DDS, MS,t
AND ROBERT P. SIEBER, DMD, MS+
To evaluate the extent to which surgeons and orthodontists agree on the nature
and severity of dentofacial problems requiring orthognathic surgery, three clini-
cians active in a specialized clinic for treatment of dentofacial deformities scored
the pretreatment records of 37 adult class II patients. Each clinician first indicated
whether a skeletal/dental problem existed in the maxilla and mandible and then
rated the severity of the problem on a visual analog scale. The level of agreement
among the three clinicians was highest for dental problems and lowest for skeletal
anteroposterior measures. There was a significant difference among the clinicians
in the percentage of patients identified as having a retrusive midface and excessive
facial thirds. The agreement on the severity of the problem was generally low
even for those patients for whom the clinicians agreed on the type of problem.
The data suggest that personal experience and clinical background play a major
role in diagnosis and treatment planning. Joint treatment planning conferences
between the surgeon and orthodontist offer an opportunity for different plans to
be discussed, with the preferred treatment option selected for an individual patient.
When a patient seeks treatment for a dentofacial effect of the treatment. Applying similar treatment to
deformity, the clinician must assess the patient’s skel- patients with different underlying morphologic prob-
etal and dental problems and evaluate the options for lems will likely increase the variability in the outcome
correction. A synthesis of the problems observed sub- and decrease the precision of the estimate of the treat-
jectively and objectively by the clinician, and the sub- ment efficacy. Because type of treatment, usually the
jective concerns of the patient, are required to identify surgical approach, is the basis for sample selection in
the most appropriate recommendations for treatment.’ most published literature on orthognathic treatment
Because no “gold standard” for the assessment of skel- effect and efficacy, the inclusion of patients with dif-
etal malocclusion exists, clinical perception of specific ferent “problems” may have an important influence
areas may differ. However, perceptual differences that on the reported findings. A review of the literature in-
affect treatment planning decisions can have important dicates that very little information is available describ-
consequences. ing how well experienced clinicians agree on the iden-
When the efficacy of different treatments is com- tification of skeletal and dental problems. In a field
pared, the expectation is that the differences in out- study of North Carolina children, examiner reliability
comes and their variability is directly related to the was excellent after an initial calibration for facial and
occlusal measures, but then declined for all measures
during the actual data collection.’ In an assessment of
* Research Professor, Department of Orthodontics, School of orthodontic treatment planning decisions by five or-
Dentistry, University of North Carolina-Chapel Hill, Chapel Hill,
NC. thodontists, Han et al2 reported a wide range of intra-
t Assistant Professor, Department of Orthodontics, School of examiner reliability estimates (K from -0.25 to 1.0)
Dentistry, University of North Carolina-Chapel Hill, Chapel Hill, reflecting a single clinician’s variation with repeated
NC.
$ In private practice, Base& Switzerland. selection of treatment for the same children, as well as
Address correspondence and reprint requests to Dr Phillips: De- extremely low interclinician agreement (K values rang-
partment of Orthodontics, UNC School of Dentistry, Chapel Hill, ing from -0.03 to -14) suggesting virtually no agree-
NC 27599-7450.
ment in the selection of treatment options among cli-
0 1994 American Association of Oral and Maxillofacial Surgeons nicians beyond that expected by chance.
0276-2391/94/5206-0005$3.00/O Since the identification and ranking of skeletal and
565
566 CLINICIAN PERCEPTIONS OF CLASS II MALOCCLUSIONS
Soft Tlsrue Chin, Retru8lvo I Normal I Protrurlve

dental problems are such fundamental parts of treat-
ment planning for orthognathic surgery, this retro-
spective study was conducted to assess how well three
team members (an orthodontist and two oral and
maxillofacial surgeons), in a specialized clinic for den- Poeltlva I Normal I Neg8tlw
tofacial deformity patients, agreed on the type and se-
verity of pretreatment skeletal and dental problems in
patients who had had mandibular advancement alone
FIGURE 1. Examples of the scoring system used for attribute eval-
or combined with a Le Fort I osteotomy.
uation and the visual analog scale used for severity.
Methods
provided. The clinicians were allowed to use any eval-
The records of 37 patients (32 females) treated at uative techniques they wished, including cephalometric
the University of North Carolina (Chapel Hill, NC) measurements, to assess the physical records of each
who had had a bilateral sagittal split osteotomy for patient. No time limit was set and the clinicians were
mandibular advancement alone or with a Le Fort I allowed to review the records as often as needed. The
osteotomy for superior repositioning of the maxilla clinicians were informed that the patients had had
prior to the initiation of the study, were evaluated. The mandibular advancement alone or with a Le Fort I
mean age at the start of treatment was 23.2 years (range, osteotomy. One of the oral surgeons had been involved
12 to 54 years). in the treatment of 27 of the patients; the other oral
Pretreatment records of each patient were evaluated surgeon had treated four patients. The orthodontist had
independently by two oral and maxillofacial surgeons not previously seen the patients.
and an orthodontist. The clinicians, who had all been The clinicians were asked to evaluate the skeletal/
in clinical practice for at least 10 years, were full-time dental relationships of the maxilla and the mandible
faculty members and team members in the Dentofacial (Table 1). For each measure, the clinicians first circled
Deformity Clinic in which potential surgical-ortho- the attribute present and then rated the severity of any
dontic patients are evaluated. The records examined problem using a visual analog scale (Fig 1). “Normal”
by the clinicians consisted of a lateral cephalometric on the left end of the severity scale was circled if the
radiograph taken in natural head position with the teeth clinician did not consider the measure to be one of the
in centric relation and the lips in repose and five in- patient’s problems and it was recorded as zero severity.
traoral (front, right and left lateral, maxillary and man- Otherwise, the severity of each problem was measured
dibular occlusal) and three facial (frontal rest, frontal as the distance from the left hand of the scale, “nor-
smiling, profile with lips in repose) photographs taken mal,” to the mark made by the clinician. The clinicians
immediately prior to treatment. Dental casts were not were given three text scenarios to illustrate the use of
the scoring system. There were no visual anchors pro-
vided for “normal” or “severe.”
Table 1. Attributes Used for Each Measure
For the purposes of statistical analysis, the three at-
to Indicate Type of Problem
tributes of each measure were scored as - 1,O = normal,
Attributes or +l (Table 1). This scoring reflects the underlying
continuum of the measure as well as the equal weight,
-l/o/+I
but opposite directions, of the two “nonnormal” at-
Facial measures tributes. Differences among the clinicians in the per-
Midface Retrusive/Normal/Protrusive centage of patients assessed as having a given attribute
Lower face Retrusive/Normal/Protrusive were analyzed for each measure using the row mean
Soft tissue chin Retrusive/Normal/Protrusive
statistic from a generalized Mantel Haenszel test for
Nasiolabial angle Obtuse/Normal/Acute
Mentolabial fold angle Deep/Normal/Shallow matched designs. This approach is similar to using the
Anterior lower face height Deficient/Normal/Excessive Cochran’s Q criterion for measures with more than
Dental measures two responses, but has the advantage of accounting for
Ovejet Positive/Normal/Negative the symmetric nature of the responses.3
Overbite Positive/Normal/Negative
Upper incisor angle Upright/Normal/Flared
Consistency (agreement) among the clinicians, as
Lower incisor angle Upright/Normal/Flared well as between the pair of oral and maxillofacial sur-
Right canine Class II/Class I/Class III geons, in the identification of attributes for each mea-
Left canine class II/Class I/Class III sure was reported as a percent agreement (percent of
Right molar Class II/Class I/Class III patients for whom clinicians agreed on the attribute)
Left molar Class II/Class I/CIass III
and as an intraclass correlation using the symmetric
PHILLIPS, BAILEY, AND SIEBER 567
scores (- 1, 0 = normal, + 1). Only percent agreement

values were given for those measures for which the
clinicians agreed on the attribute evaluation for 75%
or more of the patients because the intraclass correla- ItIght Cwklo
tion is not useful in settings where almost no variation
exists in the responses.
Because disagreements among clinicians in the Loft cmbn
identification of attributes, particularly normal versus
nonnormal, would be reflected in the severity ratings
for that measure, the agreement among the clinicians night uolw
for the severity of each measure was conditioned on
the clinicians’ agreement on the “nonnormal” attribute
of the measure. Only the severity scores of those pa- Loft ht0h
tients for whom at least two clinicians agreed on the
“nonnormal” attribute were used in the calculation of
the intraclass correlation. Because the number of cli-
nicians who agreed on a patient’s nonnormal attribute
varied, subject was considered a random effect in the
one-way analysis of variance used to provide the vari-
ance estimates. The level of significance was set at -05
for all analyses.
I I I . I I I I I
100 80 0 60 100
Re*tivo Froquemcy (Poreon
FIGURE 3. Attribute evaluation of dental measures. Relative fre-

quency of patients noted by each clinician as having a “nonnormal”
attribute.
Results
TYPE OF PROBLEM
The percentage of patients assessed as having a given

attribute was significantly different (Figs 2, 3) among
the three clinicians for four of the facial measures
(midface, both soft tissue angles, and anterior lower
face height; Fig 2), but not for the lower face or soft
tissue chin. The oral and maxillofacial surgeons differed
(P < .02) for the same measures (Fig 2). There was no
apparent pattern for the differences among the clini-
I 1 I 1 I 1 I I I
cians. For the midface (Fig 2), clinician 1 rated 59%
100 60 0 80 100
of the patients as “normal” and 38% of the patients as
Rdcthm Rowwcy (Puceal
protrusive; clinician 2 rated 84% as “normal” and an
approximately equal number as protrusive and retru-
sive; while clinician 3 rated 73% as “normal” and 24%
as retrusive. For the lower face, all three clinicians rated
FIGURE 2. Attribute evaluation of facial measures. Relative fre-
quency of patients noted by each clinician as having a “nonnormal” over 90% of the patients as retrusive. In the vertical
attribute. dimension, clinician 3 rated only 30% of the patients
568 CLINICIAN PERCEPTIONS OF CLASS II MALOCCLUSIONS
Table 2. Level of Agreement Among the Clinicians
Attribute Severity of Problem
Oral Surgeons All Clinicians Oral surgeons
Rho 5%Agreement Rho Rho Rho
Facial measures
Midface 35 -0.03 57 0.00 * *
Lower face 92 t 95 0.33 0.25
Soft tissue chin 76 t 78 : 0.32 0.35
Nasiolabial angle 16 0.17 43 -0.06 0.14 -0.02
Mentolabial fold angle 35 0.40 46 0.30 0.14 0.01
Anterior lower face height 38 0.54 76 0.64 0.27 0.25
Dental measures
Ovejet 84 t 89 0.52 0.53
Overbite 51 0.58 84 : 0.36 0.56
Upper incisor angle 51 0.73 70 0.70 0.66 0.57
Lower incisor angle 21 0.42 49 0.41 0.22 0.03
Right canine 81 86 0.36 0.64
Left canine 81 : 86 : 0.17 0.51
Right molar 60 0.52 68 0.39 0.26 0.33
Left molar 62 0.62 70 0.60 0.17 0.40
* Rho not computed because clinicians only agreed on the “nonnormal” attribute of two patients.
t Rho not computed because % agreement > 75%.
as excessive while the other clinicians rated at least uation of severity varied substantially. The clinicians
57% of the patients excessive. did not, on average, differ systematically in the eval-
The percentage of patients assessed as having a given uation of severity (Table 4) as indicated by only a few
attribute was similar (P > .05) among the clinicians of the mean severity ratings being significantly different
for all of the dental measures except for upper and (P < .05). However, the discrepancy in severity ratings
lower incisor angulation (Fig 3). The oral and maxil- was apparent in the large standard deviations associated
lofacial surgeons differed (P < .O1) only in evaluation with each measure and is reflected in the poor to mod-
of the upper incisor angulation. erate agreement levels of the intraclass correlation (Ta-
Agreement among the clinicians in the identification
of attributes (Table 2) was moderate for the dental
measures. Percent agreement on a patient’s attribute Table 3. Percentage of Patients For Whom
ranged from 27% for the lower incisor angle to 84% Clinicians Disagreed on the “Abnormal”
for overjet. Except for lower face and soft tissue chin, Attribute
agreement was generally low for the facial measures.
Clinicians Clinicians Clinicians
The percent agreement between the two oral and max- 1 and 2 I and 3 2 and 3
illofacial surgeons was slightly higher for all measures.
Although, overall, the percentage of patients about W) (%) (5)
whom the clinicians disagreed on the attribute of a Facial measures
measure seems high, the number of times a major dis- Midface 3 8 0
cordance (retrusive vs protrusive, for example) occurred Lower face 5 5 0
between clinicians was small (Table 3). All three cli- Soft tissue chin 8 11 5
nicians agreed on 86% of the attributes for the patient Nasiolabial angle 5 8 5
Mentolabial fold angle 14 11 I1
illustrated in Figure 4, but on only 2 1% of the attributes Anterior lower face height 0 14 14
for the patient illustrated in Figure 5. Of the 14 mea- Dental measures
sures evaluated on each patient, all the clinicians agreed Ovejet 0 0 0
on the attribute of 10 or more measures for only five Overbite 5 19 19
patients (14%). Upper incisor angle 0 0 0
Lower incisor angle 8 14 8
Right canine 0 0 0
EVALUATION OF SEVERITY
Left canine 0 0 0
For those patients for whom a pair of clinicians Right molar 3 0 0
Left molar 3 0 0
identified the same “nonnormal” attribute. the eval-
I FIGURE 4. Physical records of the patient for whom the three

clinicians agreed on the attributes of I2 ofthe 14 measures evaluated.
A. Cephalogram. B, Facial profile photograph. C, Frontal intraoral
photograph.
ble 2). Again, the agreement was slightly better, overall, decisions and potentially the ultimate outcome of
for the dental measures. treatment may be affected.
Previous estimates of the agreement among clini-
Discussion
cians (examiners) in the assessment of dental/skeletal
Assessment of clinical treatment and technologies characteristics have come primarily from field studies
has increasingly focused on ultimate outcomes, ie, of the prevalence of malocclusion’ and the develop-
global measures that reflect the success or failure of ment or use of clinical indices for scoring treatment
the technology or treatment. However, success or fail- need or effect.6 In these contexts, examiners are given
ure must be judged in light of the subjective decisions specific criteria for the assessment of the measures and
that motivate a clinician to recommend a given treat- the estimate of interexaminer agreement reflects the
ment.4 In orthognathic surgery, the “problem-oriented” similarity in the interpretation of the criteria. In this
approach,5 which requires the recognition of the com- study, the estimates of agreement on the attributes of
ponents of the malocclusion and dentofacial deformity a measure are not simply “consistency” statistics, but
and the evaluation of the cause and severity of each reflect the subjective impression of the relative impor-
problem, has become the basis for the decision-making tance of that measure to the clinician.
process in treatment planning. If clinicians’ evaluations The levels of agreement observed overall for the
lead to different problem lists than treatment planning measures were much lower than expected. The clini-
CLINICIANPERCEPTIONSOF CLASS II MALOCCLUSIONS
FIGURE 5. Physicalrecords of the patient for whom the three

cliniciansagreedon the attributes of only 3 of the 14 measures eval-
uated.A, Cephalogram.B, Facial profile photograph. C, Frontal in-
traoralphotograph.
cians who participated in this project were all experi- misinterpretation of instructions or a transcription er-
enced clinicians (a minimum of 10 years of clinical ror, the frequency of such discordance (0 to 19% of the
practice) who were involved routinely in the evaluation patients) suggests that these clinicians have conceptual
and treatment planning of orthognathic surgery pa- and evaluative differences in their interpretation of a
tients. Although no training or calibration sessions were patient’s problems as identified from physical records.
used, the expectation was that their experience working Agreement between the two oral and maxillofacial sur-
as a part of the dentofacial deformities team would geons was slightly higher than that among all three
have served as an internal calibration. All the clinicians clinicians and, overall, agreement was better on severity
were aware that the patients evaluated as a part of the than attribute identification.
study had already had mandibular advancement with However, it may be that the agreement observed in
or without maxillary surgery. Despite this, the clinicians this study is reasonable given that there are no clearly
often rated the problems of a given patient in com- defined criteria for recommending treatment. The un-
pletely different ways. For example, one clinician rated derstanding of the clinician’s decision-making process
a patient as having a severe asymmetry while the other is further complicated by the lack of a diagnostic “gold
two examiners rated the patient as symmetrical. Al- standard” for developmental deformities. Even con-
though this could have been an error in rating due to sideration of multiple cephalometric analyses does not
Table 4. A Comparison of the Differences Among the Clinicians in the Rating of Severity for Those
Patients For Whom the Clinicians Agreed on the “Nonnormal” Attribute
Clinicians 1 and 2 Clinicians 1 and 3 Clinicians 2 and 3
I’ Meall SD c Mall SD B Meall SD
Facial measures
Midface 5 4.4 4.9 0 3 2.5
Lower face 92 1.6 28.1 92 -2.3 22.8 97 -4.2 28.9
Soft tissue chin 76 0.3 26.7 76 -4.2 22.9 86 -9.1 28.5
Nasiolabial angle 16 18.9 31.9 46 -1.5 30.3 14 -24.0 46.1
Mentolabial fold angle 43 26.0t 32.1 57 6.2 22.2 41 -19.67 35.7
Anterior lower face height 16 1.5 24.9 49 9.6 22.4 40 2.3 33.3
Dental measures
Ovejet 89 -0.9 22.1 92 -12.1t 17.7 86 -12.9t 20.3
Overbite 81 0.7 22.5 59 -5.1 29.3 62 -8.0 30.6
Upper incisor angle 40 -3.9 21.4 54 - 1o.ot 20.0 43 -4.8 20.1
Lower incisor angle 3.5 13.1 30.8 49 -2.0 25.7 35 -15.9 33.2
Right canine 81 -1.9 15.2 76 0.7 19.0 81 1.0 24.2
Left canine 86 0.3 19.0 81 -4.2 23.6 86 -5.3 24.1
Right molar 57 -0.2 12.5 49 -7.1 16.8 57 -9.2 22.3
Left molar 65 -2.1 19.3 59 -10.4 28.5 68 -8.6 22.9
* % of patients for whom pair of clinicians agreed on “nonnormal” attribute.

t P < .05.
provide well-defined, widely accepted criteria to dis- evolve the most acceptable treatment options for an
criminate “abnormal” from “normal” along the con- individual patient.
tinuum of dental/skeletal relationships.7S8
In a recent decision analysis study,2 five orthodontists
References
were presented a combination of diagnostic records for
57 Class II patients. Incremental records were added
1. Fields H, Rozier G, Ross D: Examiner reliability for disaggregated
to assess the orthodontists’ consistency in treatment facial and occlusal variables. J Dent Res 67(Special Edition):
decision-making. The proportions of agreement in the 359. 1988
treatment plan chosen did not increase with the ad- 2. Han UK, Vig KW, Weintraub JA, et al: Consistency of orthodontic
treatment decisions relative to diagnostic records. Am J Grthod
dition of diagnostic records as would have been ex- Dentofacial Orthop 100:212, 1991
pected. Both these and our findings suggest that al- 3. Kuritz SJ, Landis JR, Koch GG: A general overview of Mantel-
though diagnostic records provide the major Haenszel methods: Applications and recent developments.
Ann Rev Public Health 9: 123, I988
information for the data base, other much more sub- 4. Antczak-Bouckoms AA, Tulloch JF: Measuring outcomes in
jective information gained from personal history and clinical research, in Vig KD, Vig PS (eds) Clinical Research
clinical examination provides significant adjunctive as the Basis of Clinical Practice. University of Michigan, Ann
Arbor, MI, Center for Human Growth and Development,
information on which therapeutic decisions are made. 1991 pp 141-154
If the level of dissimilarity in clinicians’ perceptions 5. Proffit WR, Ackerman JL: Diagnosis and treatment planning in
observed in this study are representative, the question orthodontics, in Graber TM, Swain SB (eds): Orthodontics:
Current Principles and Techniques. St Louis, MO, Mosby,
of whether different assessments lead to different treat- 1985
ment plans becomes an important sampling consid- 6. Richmond S, Shaw WC, O’Brien KD, et al: The development of
eration. Clearly, joint treatment planning conferences the PAR Index (Peer Assessment Rating): Reliability and va-
lidity. Eur J Orthod 14:125. 1992
between the surgeon and the orthodontist can provide 7. Simon LA: A quantitative analysis of the measurements used to
a forum for differences in clinicians’ perceptions of at- define and describe Class II malocclusion and the effects of
tributes and the weight attributed to these attributes to treatment of growth. Master’s thesis, University of North Car-
olina, Chapel Hill, NC, 1993
be resolved. Mutual decision-making among the or- 8. Fields HW, Proffit WR, Nixon WL, et al: Facial pattern differences
thodontist, surgeon, and the patient may be critical to in long-faced children and adults. Am J Orthod 85:2 17, 1984

Level of Agreement in Clinicians'

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Level of Agreement in Clinicians'

Uploaded by

Copyright:

Available Formats

J Oral Maxillofac Surg

Level of Agreement in Clinicians’

Soft Tlsrue Chin, Retru8lvo I Normal I Protrurlve

scores (- 1, 0 = normal, + 1). Only percent agreement

Re*tivo Froquemcy (Poreon

FIGURE 3. Attribute evaluation of dental measures. Relative fre-

The percentage of patients assessed as having a given

Table 2. Level of Agreement Among the Clinicians

Attribute Severity of Problem

Oral Surgeons All Clinicians Oral surgeons

Rho 5%Agreement Rho Rho Rho

I FIGURE 4. Physical records of the patient for whom the three

FIGURE 5. Physicalrecords of the patient for whom the three

Clinicians 1 and 2 Clinicians 1 and 3 Clinicians 2 and 3

I’ Meall SD c Mall SD B Meall SD

* % of patients for whom pair of clinicians agreed on “nonnormal” attribute.

You might also like