You are on page 1of 10

Osteoarthritis and Cartilage 20 (2012) 476e485

Comparison of cartilage histopathology assessment systems on human knee joints


at all stages of osteoarthritis development
C. Pauli yz, R. Whiteside x, F.L. Heras k, D. Nesic {, J. Koziol y, S.P. Grogan yz, J. Matyas #, K.P.H. Pritzker yy,
D.D. D’Lima yz, M.K. Lotz y *
y Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, USA
z Shiley Center for Orthopaedic and Education at Scripps Clinic, La Jolla, California, USA
x Arthroteq Preclinical Services, Guelph, Canada
k Department of Pathology, University of Chile, Clinical Hospital, Santiago, Chile
{ Osteoarticular Research Group, Department of Clinical Research, University of Bern, Bern, Switzerland
# McCaig Institute for Bone and Joint Health, Department of Comparative Biology & Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Canada
yy Pathology and Laboratory Medicine, Mount Sinai Hospital, University of Toronto, Canada

a r t i c l e i n f o a b s t r a c t

Article history: Objective: To compare the MANKIN and OARSI cartilage histopathology assessment systems using human
Received 9 August 2011 articular cartilage from a large number of donors across the adult age spectrum representing all levels of
Accepted 6 December 2011 cartilage degradation.
Design: Human knees (n ¼ 125 from 65 donors; age range 23e92) were obtained from tissue banks.
Keywords: All cartilage surfaces were macroscopically graded. Osteochondral slabs representing the entire central
Osteoarthritis
regions of both femoral condyles, tibial plateaus, and the patella were processed for histology and
Cartilage
Safranin O e Fast Green staining. Slides representing normal, aged, and osteoarthritis (OA) tissue were
Histology
Grading
scanned and electronic images were scored online by five observers. Statistical analysis was performed
for inter- and intra-observer variability, reproducibility and reliability.
Results: The inter-observer variability among five observers for the MANKIN system showed a similar
good Intra-class correlation coefficient (ICC > 0.81) as for the OARSI system (ICC > 0.78). Repeat scoring
by three of the five readers showed very good agreement (ICC > 0.94). Both systems showed a high
reproducibility among four of the five readers as indicated by the Spearman’s rho value. For the MANKIN
system, the surface represented by lesion depth was the parameter where all readers showed an
excellent agreement. Other parameters such as cellularity, Safranin O staining intensity and tidemark had
greater inter-reader disagreement.
Conclusion: Both scoring systems were reliable but appeared too complex and time consuming for
assessment of lesion severity, the major parameter determined in standardized scoring systems.
To rapidly and reproducibly assess severity of cartilage degradation, we propose to develop a simplified
system for lesion volume.
Ó 2012 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.

Introduction and subsequently it has also been used to evaluate cartilage degra-
dation, repair and regeneration in various animal models of OA. The
The histologic/histochemical grading system (MANKIN system) MANKIN system assesses four parameters, cartilage structure,
proposed by Mankin et al. in 1971 has been widely used for the cellularity, Safranin O staining, and tidemark integrity. Each param-
evaluation of osteoarthritic (OA) cartilage1,2. This system was eter has subcategories and the scores are summed to provide a total
developed originally for the assessment of human hip OA cartilage score ranging from 0 (normal) to 14 (most severe OA).
Over the last four decades, several “modified Mankin scores”
have been developed. These systems assess similar parameters as
the original MANKIN system, but parameters such as Safranin O
* Address correspondence and reprint requests to: M.K. Lotz, Department of
staining intensity or cellularity for example are either scored in
Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North
Torrey Pines Road, La Jolla, CA 92037, USA. Tel: 1-858-784-8960; Fax: 1-858-784-2744. a different fashion, or an overall score is normally applied instead of
E-mail address: mlotz@scripps.edu (M.K. Lotz). separate subscores3e8. Since the MANKIN system was based on

1063-4584/$ e see front matter Ó 2012 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.joca.2011.12.018
C. Pauli et al. / Osteoarthritis and Cartilage 20 (2012) 476e485 477

specimens with advanced OA, it may have limitations for mild and four tissue blocks. After dehydration in an alcohol series and
moderate OA9. Additionally, the horizontal extent to which the clearing in Pro-Par (Anatech), the tissue blocks were infiltrated and
cartilage surface is affected by the disease process is not assessed embedded in paraffin. From this collection of 125 knees, we
with this system. Scoring features such as ‘pannus’ and ‘surface prepared approximately 1600 osteochondral tissue blocks. Five
irregularities’ worsen the score, but these features may also be micron-thick sections were cut from each block and stained with
found in certain areas of healthy or regenerative tissue. In the past, Safranin O e Fast Green. These sections were scored by an experi-
different authors validated10 but also questioned the reproducibility enced observer using the Mankin and OARSI system.
and the validity of the MANKIN system9,11. Also, there are conflicting
reports with respect to intra- and inter-observer variability9,10,12. Histological scoring
To address limitations of the MANKIN system and obtain
a useful method for applications in clinical as well as experimental From the collection of approximately 1600 scored slides, a set of
OA assessment, the OARSI System Working Group developed the 300 slides was selected. These slides represented all locations
Osteoarthritis Cartilage Histopathology Assessment System (in this (femoral condyles, tibia, patella) and all grades and subgrades. All
manuscript referred to as ‘OARSI system’)13. With this system, the sections were scanned with a digital slide scanner (Aperio Scan-
“stage” of OA is based on the extent of the joint cartilage surface, Scope System, Aperio Technologies, Vista, CA) at a magnification of
area or volume involved in the local OA process and points are 40 (pixel size ¼ 0.25 micrometres2) and the scans were evaluated
assigned ranging from 0 [normal] to 4 [>50%]. The “grade” of OA is online with WebScope (Spectrum Digital Information Management
based on the extent of pathology into the depth of the cartilage and System, Aperio Technologies). We recruited five observers that
points are assigned ranging from 0 [surface intact] to 6 [full- were familiar with cartilage histopathology and had different levels
thickness loss of cartilage and bone deformation]. Optional of experience with both grading systems. The observers were
“subgrades” were also proposed, ranging from 1.0 (cells intact) to blinded regarding donor age, gender and disease state for all
6.5 (joint margin and central osteophytes). For the “stage” of OA, specimens as well as to the grades of the other observers. Three
a score of 0e4 is assigned to indicate the extent to which the observers scored the 300 slides twice, at least 3 weeks apart with
surface, area or volume is affected. The values of “stage” and both grading systems (Supplementary Tables SIeSIII).
“grade” are multiplied to yield an overall joint “score”. The OARSI The manuscript of Pritzker et al. was used as the OARSI system
system was intended to be more sensitive to different grades of grading template13. The staging parameter of the OARSI system was
mild OA and that it can be applied more consistently by less applied to the entire tissue section on each slide. For the MANKIN
experienced observers than the MANKIN system. The OARSI system system, we prepared a template with representative images
was published as a model to be validated in other studies. (Figs. 1e4).
Thus far, three comparative studies of the MANKIN and OARSI
systems have been performed, using goat12 and human tissue from Statistical analyses
patients that underwent unilateral knee arthroplasty14,15. The
studies on human tissues used OA knees with very advanced Reliability and reproducibility were assessed by comparing scores
disease and did not include knees with early OA changes. from all observers for all histological specimens and for both scoring
The objective of the present study was to use a large collection systems. Two methods were used to quantify and summarize intra-
of human knee joints from donors across the entire adult age and inter-observer agreement. Intra-class correlation coefficients
spectrum and covering the complete range of cartilage pathology to (ICCs)16,18 were determined for all pairwise comparisons among and
compare the two systems. A detailed data analysis reveals potential within observers. These were calculated from a two-way random
limitations of each system and suggests that they should be further effects analyses of variance, with an objective absolute agreement17.
simplified for use as a standardized, easy to use lesion severity In this regard, intra-observer ICCs were calculated with the initial
assessment tool, or modified to address specific questions on and repeat scores from three observers, and inter-observer ICCs were
disease mechanisms. calculated with the initial scores from five observers. Bootstrap
resampling with 1000 samples was used to construct 95% confidence
Materials and methods intervals for the ICCs, via the percentile method. We also used the
Bland-Altman limits of agreement (LOA) method18,19 to assess intra-
Human cartilage procurement and inter-observer agreement. We report 95% LOA for these pairwise
comparisons. Such comparisons provide intervals within which 95%
Human knees (n ¼ 125) from 65 donors (29 males, 36 females; of differences between the two measurements are expected to be.
age range ¼ 23e92) were obtained from tissue banks (approved by
Scripps Institutional Review Board) and processed within 24e72 h Correlations between the two scoring systems
post mortem. We used Spearman’s nonparametric correlation coefficient rho to
compare the scores of the MANKIN and OARSI scoring systems.
Tissue harvesting and processing Spearman’s rho is preferable to Pearson’s (parametric) correlation
coefficient in this setting since both scoring systems represent
Sagittal osteochondral slabs were harvested from both femoral ordinal rather than continuous scales. Bootstrap resampling with
condyles. A coronal osteochondral slab through the central part of 1000 samples was used to construct 95% confidence intervals for rho
the tibial plateau was harvested. The location of the slabs was via the percentile method. Calculations were performed in Stata 9.2
selected to represent the central region in each compartment that (Statacorp, College Station, TX) and SPSS 16.0 (SPSS Inc., Chicago, IL).
is most exposed to mechanical loading. A transverse osteochondral
slab was harvested from the patella (Supplementary Fig. S1). The Results
samples were fixed in Z-Fix (Anatech, Battle Creek, MI) immedi-
ately after harvesting and subsequently decalcified with TBD-2 MANKIN system reliability
(Thermo Fisher). Decalcified specimens were cut to smaller tissue
blocks at defined locations. Each femoral condyle was divided into The inter-observer variability between five observers for the
5e7 tissue blocks, the patella into three and the entire tibia into MANKIN system showed a good ICC range of 0.811e0.961. The ICC
478 C. Pauli et al. / Osteoarthritis and Cartilage 20 (2012) 476e485

Fig. 1. Histological assessment of the surface structure parameter according to MANKIN on sections from femoral condyles: (A) Normal (intact smooth surface), score 0. (B) Surface
irregularities (undulations), score 1. (C) Pannus and surface irregularities (fibrillation), score 2. (D) Clefts to transitional zone, score 3. (E) Clefts to radial zone, score 4. (F) Clefts to
calcified zone, score 5. (G) Complete disorganization, score 6. Safranin O - Fast Green, pictures taken with 4 and 40 objectives.

Fig. 2. Histological assessment of cellularity according to MANKIN on sections from femoral condyles: (A) Normal (1-2 cells/chondron), score 0. (B) Diffuse hypercellularity, score 1.
(C) Chondrocyte cloning (clusters), score 2. (D) Hypocellularity, score 3. Safranin O e fast green, pictures taken with 4 and 40 objectives.
C. Pauli et al. / Osteoarthritis and Cartilage 20 (2012) 476e485 479

Fig. 3. Histological assessment of the Safranin O staining intensity parameter according to MANKIN on sections from femoral condyles. (A) Normal (staining except for surface
zone), score 0. (B) Slight reduction (particularly superficial zone, score 1). (C) Moderate reduction (extending down to mid zone), score 2. (D) Severe reduction (entire cartilage
thickness), score 3. (E) No dye noted, score 4. Safranin O-fast green stain, pictures were taken with 4 and 40 objectives.

Fig. 4. Histological tidemark assessment according to MANKIN on sections from femoral condyles: (A) Tidemark intact, score 0. H&E stain, pictures taken with 4 and 40
objectives. (B) Tidemark crossed by blood vessels (tidemark duplication), score 1. Safranin O - fast green stain, pictures taken with 4 and 40 objectives.
480 C. Pauli et al. / Osteoarthritis and Cartilage 20 (2012) 476e485

Table I
between the readers for the surface parameter ranged from 0.832
Intra-class correlation coefficients for MANKIN total scores and each parameter:
to 0.945, while the other parameters such as cellularity, Safranin O MANKIN total scores and each parameter on 300 specimens were assessed by each
staining intensity, and tidemark showed a lower range of the ICC. of five observers. ICC and associated 95% confidence intervals were calculated from
The ICC for the intra-observer variability between the readers was the observers’ scores. The three entries per cell are: lower 95% confidence limit,
higher for all parameters and the overall score (Table I). observed ICC (in bold), and upper 95% confidence limit. The diagonal cell entries,
that is, the (A, A), (B, B), and (C, C) cells, compare the replicate scores of observer A, B,
and C respectively. Graders D and E did not perform second scoring. The off-diagonal
MANKIN system reproducibility entries correspond to inter-observer comparisons.

Grader for A B C D E
Average differences and 95% LOA for intra- and inter-observer total scores
differences are given in Table II. The 95% LOA for intra-observer A .957, .935, .933, .903, .694,
[test-retest] differences were typically within two points for total .966, .950, .946, .933, .847,
.973 .961 .957 .952 .911
scores, and one point for surface, cellularity, safranin O staining
B .979, .941, .951, .580,
intensity and tidemark scores. The 95% LOA were somewhat wider .983, .959, .961, .839,
for inter-observer differences: with the exception of observer E, .986 .970 .969 .920
inter-observer differences were typically within three points for C .954, .895, .714,
total scores, within two points for surface and cellularity scores, .963, .932, .844,
.971 .954 .905
and one point for safranin O staining intensity and tidemark scores. D .484,
The LOAs indicate that scores from observer E were lower, and had .811,
much higher variability than scores from the other observers: one .909
cannot rule out a five point discrepancy in MANKIN total scores Grader for A B C D E
surface
between grader E and each of the other observers, on a 14 point
A .935, .900, .891, .884, .571,
scale. .948, .920, .912, .907, .840,
.958 .936 .929 .926 .921
B .969, .931, .889, .675,
OARSI system reliability .975, .945, .914, .879,
.980 .956 .933 .940
For the OARSI system, the ICC for the grades between the five C .955, .894, .527,
observers ranged from 0.781 to 0.965. For the staging component, .964, .915, .867,
.971 .932 .942
the ICC ranged from 0.365 to 0.902 for inter-observer variability. D .417,
This high variability was mainly due to the divergent scores of one .832,
observer. The OARSI system score showed an ICC range of .928
0.790e0.974. The ICC ranged from 0.698 to 0.895 for the intra- Grader for cells A B C D E
A .784, .562, .600, .421, .639,
observer variability (Table III).
.831, .720, .746, .673, .714,
.867 .811 .830 .800 .773
B .891, .784, .801, .671,
OARSI system reproducibility
.912, .824, .839, .731,
.930 .858 .870 .781
For total scores, 95% LOA for intra-observer [test-retest] differ- C .843, .748, .679,
ences were typically within three points with observer B, four .874, .799, .736,
.899 .840 .784
points with observer A, and five points with observer C (Table IV).
D .638,
Intra-observer 95% LOAs were somewhat tighter for grade and .720,
stage: differences were typically within two points for stage, and .782
one point for grade. Inter-observer differences were somewhat Grader for A B C D E
greater: with the exception of observer E, 95% LOAs are typically Safranin O
A .902, .827, .777, .793, .472,
within two (in absolute value) for grade and stage, and within five
.921, .860, .844, .832, .780,
for total score. .937 .887 .887 .864 .887
B .901, .733, .899, .370,
.920, .842, .919, .772,
Correlations between MANKIN and OARSI systems .936 .899 .935 .892
C .879, .715, .704,
As expected, there was a strong positive correlation between the .902, .826, .799,
.921 .886 .859
two scoring systems. Individually, Spearman’s rho values (95% CIs)
D .377,
from the two scoring systems were: observer A, 0.921 (95% CI .757,
0.898e0.937); observer B, 0.945 (95% CI 0.928e0.956); observer C, .881
0.939 (95% CI 0.917e0.955); observer D, 0.915 (95% CI Grader for A B C D E
0.888e0.935); observer E, 0.886 (95% CI 0.835e0.926). We also tidemark
A .661, .589, .381, .451, .436,
averaged the scores from observers A, B, C, and D, and found that .720, .658, .534, .568, .523,
rho improved to 0.960 (95% CI 0.949e0.970). Averaging is a proven .770 .718 .646 .660 .601
technique for smoothing minor fluctuations, and can result in more B .882, .516, .617, .609,
stable scores. We considered observer E’s scores to evince more .905, .635, .701, .675,
.923 .723 .766 .733
than minor fluctuations relative to the others, hence excluded this
C .650, .563, .418,
individual’s scores from the summary calculation. .711, .636, .541,
A plot of average MANKIN vs average OARSI system among four .763 .699 .639
graders showed a strong monotonic relationship between the two D .510,
scoring systems [reflective of the Spearman rho value near one], .607,
.685
which is roughly linear (Fig. 5).
C. Pauli et al. / Osteoarthritis and Cartilage 20 (2012) 476e485 481

Table II Table III


LOA for MANKIN total scores and each parameter: Total scores and each parameter Intra-class correlation coefficients for OARSI grade, stage and total scores: OARSI
on 300 specimens were assessed by each of five observers. Differences in the total grade, stage and total scores on 300 specimens were assessed by each of five
scores and the parameters were calculated as (row designated observer score) observers. Intra-class correlation coefficients and associated 95% confidence
minus (column designated observer score). The three entries per cell are: lower 95% intervals were calculated from the observers’ scores. The three entries per cell are:
limit of agreement, average difference (in bold), and upper 95% limit of agreement. lower 95% confidence limit, observed ICC (in bold), and upper 95% confidence limit.
The diagonal cell entries represent intra-observer differences: (A, A), (B, B), and (C, C) The diagonal cell entries, that is, the (A, A), (B, B), and (C, C) cells, compare the
compare the replicate scores, namely first score e second score, of graders A, B, and replicate scores of graders A, B, and C respectively. Observers D and E did not
C respectively. Graders D and E did not undertake replicate scoring. The off-diagonal perform second scoring. The off-diagonal entries correspond to inter-observer
entries correspond to inter-observer differences. comparisons.

Grader for A B C D E Grader A B C D E


total scores for
A 1.963, 2.596, 2.313, 2.949, 2.747, grade
0.144, 0.288, 0.060, 0.441, 1.140, 5.028 A .960, .968, .924, .940, .901, .920, .881, .934, .675, .829,
1.676 2.020 2.433 2.066 .975 .953 .936 .960 .898
B 1.468, 1.695, 2.225, 2.444, B .973, .979, .937, .954, .948, .965, .594, .810,
0.054, 0.348, 0.154, 1.428, 5.300 .983 .965 .975 .895
1.361 2.391 1.918 C .928, .942, .861, .926, .715, .827,
C 2.012, 2.986, 2.911, .953 .955 .888
0.120, 0.502, 1.080, 5.072 D .416, .781,
1.772 1.983 .894
D 2.421, Grader A B C D E
1.582, 5.584 for
Grader for A B C D E Stage
surface A .634, .698, .856, .889, .838, .870, .777, .840, .161, .365,
A 0.927, 1.307, 1.424, 1.486, 1.107, .752 .913 .895 .882 .521
0.067, 0.064, L0.030, L0.114, 0.611, 2.238 B .870, .895, .876, .900, .878, .902, .273, .464,
1.061 1.434 1.364 1.259 .916 .920 .922 .602
B 0.860, 1.281, 1.594, 0.983, C .697, .752, .794, .834, .215, .394,
L0.037, L0.094, L0.177, 0.550, 2.084 .799 .866 .533
0.787 1.094 1.239 D .280, .440,
C 0.984, 1.478, 0.827, .566
L0.064, L0.084, 0.641, 2.109 Grader A B C D E
0.857 1.311 for
D 0.865, Score
0.725, 2.315 A .955, .964, .943, .956, .918, .934, .933, .951, .644, .815,
Grader for A B C D E .972 .966 .947 .964 .891
cells B .975, .980, .936, .951, .968, .974, .545, .798,
A 1.253, 1.681, 1.496, 1.714, 1.693, .984 .962 .980 .892
L0.140, L0.334, L0.294, L0.408, L0.207, C .927, .941, .922, .942, .663, .820,
0.973 1.012 0.908 0.898 1.278 .953 .957 .892
B 0.787, 0.972, 1.026, 1.260, D .514, .790,
0.000, 0.787 L0.972, L0.074, 0.127, .890
1.053 0.879 1.514
C 0.815, 1.078, 1.206,
L0.064, L0.114, 0.087, Level of experience
0.688 0.851 1.380
D 1.084,
0.201,
We hypothesized that the level of experience could be an
1.485 important factor in inter- and intra-observer variability. In our
Grader for A B C D E study, all graders were familiar with the analysis of articular
Safranin O cartilage for at least more than 5 years yet with a different level of
A 0.850, 1.166, 0.851, 1.309, 0.769,
experience for both systems as well as for different species.
L0.043, L0.060, 0.194, L0.074, 0.455,
0.763 1.046 1.239 1.162 1.678 From examination of the LOA tables (Tables II, IV), observers A
B 0.881, 0.778, 0.892, 0.701, and C, on average, scored slightly less OA severity than observers B
L0.043, 0.254, L0.013, 0.515, and D on both scales, with comparable levels of variability.
0.794 1.287 0.866 1.731 Observer E scored slides at significantly lower severity than the
C 0.796, 1.383, 0.991,
other raters: on average, discrepancies were 1e1.5 for the MANKIN
0.007, L0.268, 0.261,
0.809 0.848 1.513 system total scores, and 2e3 with OARSI system total scores; and,
D 0.769, LOA were typically twice the widths of all other pairwise compar-
0.528, isons. Observers E’s high level of variability was also reflected in
1.825
reduced ICC values, and increased widths of LOA intervals, relative
Grader for A B C D E
tidemark to the other observers.
A 0.761, 0.763, 0.675, 0.703, 0.926,
L0.027, 0.043, 0.191, 0.154, 0.030,
Discussion
0.707 0.850 1.056 1.010 0.987
B 0.395, 0.619, 0.602, 0.801,
0.027, 0.147, 0.110, L0.013, Standardized histological scoring systems for cartilage are
0.448 0.913 0.823 0.774 needed to assess the severity of degradation in human tissues and
C 0.681, 0.812, 1.029,
experimental models. The MANKIN system proposed by Mankin in
0.000, L0.037, 0.161,
0.681 0.738 0.707
1971 is the most widely used system yet has several limita-
D 0.947, tions9,10,12. To overcome these limitations, the OARSI system
L0.124, Working Group postulated five principles for an ideal cartilage
0.699 histopathology system: simplicity, utility, scalability, extensibility
482 C. Pauli et al. / Osteoarthritis and Cartilage 20 (2012) 476e485

Table IV
LOA for OARSI grade, stage and total scores: OARSI grade, stage and total scores on
300 specimens were assessed by each of five observers. Differences in the grades
were calculated as (row designated grader’s grade) e (column designated grader’s
grade). The three entries per cell are: lower 95% limit of agreement, average
difference (in bold), and upper 95% limit of agreement. The diagonal cell entries
represent intra-observer differences: (A, A), (B, B), and (C, C) compare the replicate
grade, namely first grade e second grade, of observers A, B, and C respectively.
Observer D and E did not undertake replicate scoring. The off-diagonal entries
correspond to inter-observer differences.

Grader for A B C D E
Grade
A 0.696, 1.219, 1.277, 1.307, 1.292,
0.072, L0.109, 0.040, L0.259, 0.513,
0.839 1.002 1.357 0.789 2.319
B 0.695, 0.858, 0.983, 1.292,
L0.012, 0.149, L0.151, 0.682 0.622,
0.671 1.156 2.536
C 1.085, 0.047, 1.465, 1.478,
1.178 L0.299, 0.867 0.473,
2.424
D 1.129,
0.773,
2.674
Grader for A B C D E
Stage
A 1.463, 0.827, 0.971,0.077, 0.887, 0.191, 2.003,
L0.094, 0.120, 1.125 1.269 0.833,
1.275 1.068 3.668
Fig. 5. Summary plot of average MANKIN vs average OARSI system: Summary plot
B 1.016, 1.036, 0.876, 0.070, 2.016,
comprising 95% confidence ellipse, and marginal histograms of the frequency distri-
0.000, L0.043, 0.949 1.017 0.712,
bution. The graph shows a strong monotonic relationship between the two scoring
1.016 3.440
systems, reflective of the Spearman rho value of 0.96, The relationship is roughly, but
C 1.604, -0.127, 1.120, 0.114, 2.177,
not perfectly, linear: Given a particular MANKIN [or OARSI] score, there is a fair amount
1.350 1.348 0.756,
of scatter around the corresponding OARSI [or MANKIN] scores. The marginal MANKIN
3.689
and OARSI system distributions are rather flat.
D 2.136,
0.642,
3.420
Grader Total A B C D E knee OA specimens, both systems proved reliable, reproducible,
Score and exhibited similar variability14. Rout et al. performed a study on
A 3.340, 4.431, 4.877, 4.770, 5.450, sixteen cases undergoing unicompartmental knee arthroplasty and
0.154, L0.485, 0.057, L0.627, 2.251,
3.648 3.461 4.991 3.515 9.952
concluded that both the modified MANKIN and OARSI system are
B 2.688, 3.692, 3.340, 5.205, useful for histological grading, although the OARSI system was
0.087, 0.542, L0.142, 3.056 2.736, easier and quicker to use15. While these studies provide helpful
2.862 4.775 10.677 information on the relative utility of the two systems in severe or
C 4.460, 5.277, 5.575,
end-stage cartilage degradation, validation on a broad range of
0.100, L0.684, 2.194,
4.660 3.909 9.963 severities remained to be performed. To address this, the present
D 5.205, study used an extensive collection of human knee joints across the
2.878, entire adult age spectrum at all stages of OA severity.
10.961 This study is the first comparison of the OARSI and MANKIN
systems using a large number of human knee joints including
a selection of donors representing all stages of OA severity. For each
and comparability. In, 2006 the Osteoarthritis Cartilage Histopa- knee, a standard topographic sampling for each joint compartment
thology Assessment System (OARSI system) was published13. Thus was used. Standardized histology preparation and staining of the
far, the MANKIN system continues to be the most widely used sections was used to minimize technical variability.
system, with modifications across different studies2e4. As a conse-
quence, studies with animal models are difficult to compare Intra- and inter-observer variability, reproducibility and reliability
because of the varied assessment systems employed20. On the other
hand, the OARSI system has not yet been widely implemented, in In our study, the inter-reader variability was good for both
part because of the historical bias towards the MANKIN system, and systems, with the ICC range for the total score of the MANKIN
in part because it has not yet been sufficiently validated. system slightly higher compared to the OARSI system. Among five
Three studies compared the MANKIN and OARSI systems. The readers both scoring systems appeared to be reliable and repro-
observers were experienced with the MANKIN, but were new to the ducible especially among four readers for all stages of OA and not
OARSI system12,14,15. In the study using goats, cartilage sections only for normal and end-stage OA tissue as previously validated.
were collected from four animals that developed cartilage damage There is no gold standard for either MANKIN or OARSI scoring.
on the femoral condyle due to articulation with a chromium-cobalt Nevertheless, our study does provide some guidelines for rater
implant on the tibial plateau. While both MANKIN and OARSI performance, relative to individuals undertaking MANKIN or OARSI
systems were equally reproducible, the OARSI system was more scoring. We suggest that intra-rater intra-class correlations on
reliable. Observer experience appeared less important when using MANKIN or OARSI scores should exceed 0.95, and inter-rater intra-
the OARSI system but the value of its staging component was class correlations should exceed 0.90, in representative samples. Or,
difficult to determine12. In the study by Pearson et al., on ten human in testing scenarios as undertaken here, about 95% of intra-rater
C. Pauli et al. / Osteoarthritis and Cartilage 20 (2012) 476e485 483

differences on MANKIN scores should be within two units of each degeneration across the tissue and is therefore mainly useful for
other, and 95% of inter-rater differences should be within three localized lesions. Finally, detection of changes is confined mainly to
units of each other. With OARSI scores, about 95% of intra-rater cartilage while bone changes are not considered.
differences should be within one unit of each other, and 95% of
inter-rater differences should be within two units of each other. Potential limitations of the OARSI system
It was reported that the OARSI system is easier and quicker to
use, presumably because it requires assessment of fewer parame- The OARSI system describes the grade as an index of the severity
ters as compared to the MANKIN system. However, the OARSI of the OA process and can serve as a good indicator of disease
system appears more difficult to apply by inexperienced readers. progression (Supplementary Tables SII, SIIIA and SIIIB). Grade 1 in the
Even experienced readers in this study did find the OARSI system OARSI system is considered the threshold for OA. The primary
more complex to use and there was less agreement especially in the criterion for Grade 1 is intact surface with other features of OA
staging parameter. present such as uneven surface or fibrillation within the superficial
zone being present. The challenge is to reproducibly score slight to
Detecting early histological changes mild edema, uneven surface or slight fibrillation and distinguishing
this from surface artifacts during tissue processing. Staging is not
The OARSI system was designed to detect histological features a representative measurement when only specific regions are
prior to the recognition of overt clinical OA. None of the published analyzed as with a histology section from a larger animal or human
validation approaches included samples with early OA joint. In our study, we observed smaller ICCs for the staging
changes12,14,15. Our data showed a high agreement level between component for the inter-reader agreement compared to the grade
the two systems in the overall scores and compartment scores. and the total score. This was caused mainly by a disagreement
According to the OARSI system, cartilage matrix swelling is the between grade 0 (normal cartilage, no OA), which requires a stage of
earliest histologically detectable change, which in an extreme form 0 (no OA involvement) and a grade 1 (threshold for OA), which in
would lead to cartilage hypertrophy. Edema in cartilage may reflect most cases was staged with a four (more than 50% involvement). The
condensation of the collagen fibers in the superficial and/or upper ICC is substantially lower as we can observe 0 (for a grade 0) and four
mid zones or variation in matrix cationic staining13. We question (for a grade of 1) in replicates, since 0 and four are not considered
how accurately this parameter can be recognized and whether it "close" in a range of 0e4. The individual components - grade and
can be differentiated from artifacts due to processing. The approach stage - can be misleading in isolation but may prove more insightful
towards detecting early changes depends on the type of study and when used to calculate an overall score (with a range from 0 to 24),
question being addressed. Changes observed on Toluidine Blue or yet the staging appears definitely more critical for agreement. Bone
Safranin O stained sections are not quantitative indicators of changes are not examined in the MANKIN system. In the OARSI
proteoglycan depletion and may be at least to some extent system, subchondral bone lesions are not included in the detection of
reversible. On the other hand, macroscopic structural defects such early changes as OARSI grades 1e4 address only cartilage changes.
as fibrillations and partial erosions are relatively late features that OARSI grades 5 and 6 integrate bone changes. This arrangement
are preceded by molecular changes. In this regard, detection of implies a progression from initial cartilage damage to subsequent
changes in gene expression by in situ hybridization or in protein bone involvement, which may not apply to all cases of human OA.
expression and degradation of matrix components by immuno- Bone changes at later OA stages include subchondral bone sclerosis,
histochemistry would be more sensitive and accurate. For example, microfracture with reparative tissue, subchondral cysts, osseous
immunohistochemistry for collagen type II and aggrecan helped to repair and osteophytosis24,25. The OARSI system parameter for bone
identify differences within the lowest histological grades of artic- deformation depends on the location within the joint that is repre-
ular cartilage21. Although such markers are useful to detect early sented by the section and is thus not useful as a routine tool.
changes that precede manifestations on standard histology, such While the OARSI system has not yet been validated through
changes may also be reversible and not necessarily reflect OA correlations with macroscopic or biochemical parameters2, the
initiation. However, this approach is more appropriate for a specific MANKIN system was correlated to a macroscopic score9 and to
mechanistic study rather than as a routine assessment tool. biochemical parameters1.

Potential limitations of the MANKIN system Conclusions

The MANKIN system describes features such as surface irregu- The most common and important question being addressed
larities with pannus and complete disorganization (Supplementary with cartilage scoring systems is lesion severity. To obtain this
Table SI). We consider these two parameters as critical for the information both systems appear complex, time-consuming and
assessment of cartilage degradation. Characteristics such as pannus generate variability. In fact, most publications that score the large
and surface irregularities without proper classification may also be number of parameters in the two systems do not discuss them in
found in healthy cartilage and lead to a lower score2. detail but only interpret overall lesion severity.
Safranin O or Toluidine Blue staining intensity as a parameter in Semi-quantitative histological scoring systems such as the
a grading system may lead to false results. In cartilage where Safranin MANKIN and OARSI system are observer-dependent and thus
O staining was not detectable, monoclonal antibodies revealed the subjective. Automated computerized histomorphometry might
presence of both keratan sulphate and chondroitin sulphate22. In enable more objective, accurate and reproducible analysis of
addition, fixation, decalcification and protocol variability can affect cartilage26,27. An automated image analysis program based on the
Safranin O staining intensity23 and therefore it has to be questioned MANKIN system has been developed28. Among the four subcom-
how sensitive it would be for detection of early changes. It has been ponents of the MANKIN scale, the computer program correlations
suggested that assessing certain parameters such as cellularity, cell with observer scores were best for surface defect and proteoglycan
morphology and tidemark need more reader consensus or better depletion, but less favorable for cellularity and tidemark invasion.
illustrations. In our study, we had the highest reader variability for the These results are similar to our present observations, underscoring
assessment of tidemark and the cellularity. Moreover, the MANKIN advantages of a system based on fewer and the most reliable
system does not include a staging component for the extent of parameters.
484 C. Pauli et al. / Osteoarthritis and Cartilage 20 (2012) 476e485

In conclusion, for the purpose of rapidly assessing severity of 6. Irlenbusch U, Schaller T. Investigations in generalized osteo-
cartilage degradation, we propose to develop a simplified system arthritis. Part 1: genetic study of Heberden’s nodes. Osteoar-
for scoring lesion volume as measured by lesion depth and width. A thritis Cartilage 2006;14:423e7.
similar system regarding lesion depth was proposed by Glasson for 7. Otte P. The nature of coxarthrosis and principles of its
experimental OA in mice29 and may serve as a model for a generally management. Dtsch Med J 1969;20:341e6.
applicable system. Furthermore, a library of images and illustra- 8. Saal A, Gaertner J, Kuehling M, Swoboda B, Klug S. Macroscopic
tions of stained tissue sections, similar to that used for grading and radiological grading of osteoarthritis correlates inade-
radiographs30, would be a valuable tool for training of observers quately with cartilage height and histologically demonstrable
and facilitate reproducible and consistent scoring within and damage to cartilage structure. Rheumatol Int 2005;25:161e8.
between studies. This library could be also be used to develop 9. Ostergaard K, Andersen CB, Petersen J, Bendtzen K, Salter DM.
online training programs. Validity of histopathological grading of articular cartilage from
osteoarthritic knee joints. Ann Rheum Dis 1999;58:208e13.
10. Van der Sluijs JA, Geesink RG, Van der Linden AJ, Bulstra SK,
Author contributions
Kuyer R, Drukker J. The reliability of the Mankin score for
osteoarthritis. J Orthop Res 1992;10:58e61.
Study conception and design: Chantal Pauli, Darryl D’Lima,
11. Ostergaard K, Petersen J, Andersen CB, Bendtzen K, Salter DM.
Martin Lotz.
Histologic/histochemical grading system for osteoarthritic
Acquisition of data: Chantal Pauli, Robert Whiteside, Dobrila
articular cartilage: reproducibility and validity. Arthritis
Nesic, Facundo Las Heras, John Matyas.
Rheum 1997;40:1766e71.
Statistical analysis: Jim Koziol.
12. Custers RJ, Creemers LB, Verbout AJ, van Rijen MH, Dhert WJ,
Drafting of article or revising it critically for important intel-
Saris DB. Reliability, reproducibility and variability of the
lectual content: Chantal Pauli, Jim Koziol, Darryl D’Lima, Shawn
traditional Histologic/Histochemical Grading System vs the
Grogan, Ken Pritzker, Robert Whiteside, Dobrila Nesic, Facundo Las
new OARSI Osteoarthritis Cartilage Histopathology Assess-
Heras, John Matyas and Martin Lotz.
ment System. Osteoarthritis Cartilage 2007;15:1241e8.
13. Pritzker KP, Gay S, Jimenez SA, Ostergaard K, Pelletier JP,
Conflict of interest Revell PA, et al. Osteoarthritis cartilage histopathology:
No author has any conflict of interest related to this work. grading and staging. Osteoarthritis Cartilage 2006;14:13e29.
14. Pearson RG, Kurien T, Shu KS, Scammell BE. Histopathology
Acknowledgments grading systems for characterisation of human knee osteo-
arthritisereproducibility, variability, reliability, correlation,
We are grateful to Lilo Creighton, Margaret Chadwell and Anita and validity. Osteoarthritis Cartilage 2011;19:324e31.
San Soucie for histologic processing of the specimens, and to 15. Rout R, McDonnell S, Benson R, Athanasou N, Carr A, Doll H,
Thomas Kryton for digitizing slides. This study was supported by et al. The histological features of anteromedial gonar-
the National Institutes of Health (AG007996), by Donald and Dar- throsisethe comparison of two grading systems in a human
lene Shiley, the Arthritis Foundation and the Sam and Rose Stein phenotype of osteoarthritis. Knee 2011;18:172e6.
Endowment Fund, and the Canadian Arthritis Network, Network 16. Schuster C. A note on the interpretation of weighted kappa and
Centres of Excellence. its relations to other rater agreement statistics for metric scales.
Educational and Psychological Measurement 2004;64:243e53.
17. McGraw KO, W SP. Forming inferences about some intraclass
Supplementary material correlation coefficients. Psychological Methods 1996;1:30e46.
18. Bland JM, Altman DG. Agreement between methods of
Supplementary data related to this article can be found online at measurement with multiple observations per individual.
doi:10.1016/j.joca.2011.12.018. Journal of Biopharmaceutical Statistics 2007;17:571e82.
19. Bland JM, Altman DG. Measuring agreement in method
References comparison studies. Statistical Methods in Medical Research
1999;8:135e60.
1. Mankin HJ. Biochemical and metabolic aspects of osteoar- 20. Aigner T, Cook JL, Gerwin N, Glasson SS, Laverty S, Little CB,
thritis. Orthop Clin North Am 1971;2:19e31. et al. Histopathology atlas of animal model systems - overview
2. Rutgers M, van Pelt MJ, Dhert WJ, Creemers LB, Saris DB. of guiding principles. Osteoarthritis and cartilage/OARS, Oste-
Evaluation of histological scoring systems for tissue- oarthritis Research Society 2010;18(Suppl 3):S2e6.
engineered, repaired and osteoarthritic cartilage. Osteoar- 21. Barley RD, Bagnall KM, Jomha NM. Histological scoring of artic-
thritis Cartilage 2010;18:12e23. ular cartilage alone provides an incomplete picture of osteoar-
3. Kuroki H, Nakagawa Y, Mori K, Ohba M, Suzuki T, Mizuno Y, thritic disease progression. Histol Histopathol 2010;25:291e7.
et al. Acoustic stiffness and change in plug cartilage over time 22. Camplejohn KL, Allard SA. Limitations of safranin ’O’ staining
after autologous osteochondral grafting: correlation between in proteoglycan-depleted cartilage demonstrated with mono-
ultrasound signal intensity and histological score in a rabbit clonal antibodies. Histochemistry 1988;89:185e8.
model. Arthritis Res Ther 2004;6:R492e504. 23. Hyllested JL, Veje K, Ostergaard K. Histochemical studies of the
4. Thomas CM, Fuller CJ, Whittles CE, Sharif M. Chondrocyte extracellular matrix of human articular cartilageea review.
death by apoptosis is associated with cartilage matrix degra- Osteoarthritis Cartilage 2002;10:333e43.
dation. Osteoarthritis Cartilage 2007;15:27e34. 24. Gelse K, Soder S, Eger W, Diemtar T, Aigner T. Osteophyte
5. Piskin A, Gulbahar MY, Tomak Y, Gulman B, Hokelek M, developmentemolecular characterization of differentiation
Kerimoglu S, et al. Osteoarthritis models after anterior cruciate stages. Osteoarthritis Cartilage 2003;11:141e8.
ligament resection and medial meniscectomy in rats. A 25. Felson DT, Gale DR, Elon Gale M, Niu J, Hunter DJ, Goggins J,
histological and immunohistochemical study. Saudi Med J et al. Osteophytes and progression of knee osteoarthritis.
2007;28:1796e802. Rheumatology (Oxford) 2005;44:100e4.
C. Pauli et al. / Osteoarthritis and Cartilage 20 (2012) 476e485 485

26. O’Driscoll SW, Marx RG, Fitzsimmons JS, Beaton DE. Method for apparent cartilage degeneration using a custom image analysis
automated cartilage histomorphometry. Tissue Eng 1999;5: program. J Orthop Res 2009;27:522e8.
13e23. 29. Glasson SS, Chambers MG, Van Den Berg WB, Little CB.
27. O’Driscoll SW, Marx RG, Beaton DE, Miura Y, Gallay SH, The OARSI histopathology initiative - recommendations for
Fitzsimmons JS. Validation of a simple histological- histological assessments of osteoarthritis in the mouse.
histochemical cartilage scoring system. Tissue Eng 2001;7: Osteoarthritis Cartilage 2010;18(Suppl 3):S17e23.
313e20. 30. Altman RD, Gold GE. Atlas of individual radiographic features in
28. Moussavi-Harami SF, Pedersen DR, Martin JA, Hillis SL, osteoarthritis, revised. Osteoarthritis Cartilage 2007;15(Suppl A):
Brown TD. Automated objective scoring of histologically A1e56.

You might also like