You are on page 1of 27

DR.

DANA E ORANGE (Orcid ID : 0000-0001-9859-7199)

Article type : Full Length


Accepted Article
Running head: Integrating RA synovial gene expression and histology

Title: Machine learning integration of rheumatoid arthritis synovial histology and RNAseq

data identifies three disease subtypes

Authors: Dana E. Orange, MD, MSc*1,2,3 Phaedra Agius, PhD*3, Edward F. DiCarlo, MD4,

Nicolas Robine, PhD3, Heather Geiger, PhD3, Jackie Szymonifka, PhD1, Michael

McNamara, BS1, Ryan Cummings, AB1, Kathleen M. Andersen, BSc1, Serene Mirza, BS1,

Mark Figgie, MD5, Lionel Ivashkiv, MD, PhD6, Alessandra B. Pernis, PhD6, Caroline Jiang,

PhD8, Mayu Frank, NP2,3, Robert Darnell, MD, PhD2,3, Nithya Lingampali, BS9, William

Robinson, MD, PhD9, Ellen Gravallese, MD10, Vivian P. Bykerk, MD1*, Susan M. Goodman,

MD1* and Laura T. Donlin, PhD*7, The Accelerating Medicine Partnership: RA/SLE Network

1 Rheumatology, Hospital for Special Surgery, New York, NY

2 Laboratory of Molecular Neuro-Oncology & Howard Hughes Medical Institute, The

Rockefeller University, New York, NY

3 New York Genome Center, New York, NY

4 Pathology, Hospital for Special Surgery, New York, NY

5 Orthopedic Surgery, Hospital for Special Surgery, New York, NY

6 David Z. Rosensweig, Susan M. Genomics Research Center, Hospital for Special

Surgery, New York, NY

7 Arthritis and Tissue Degeneration Program and the David Z. Rosensweig Genomics

Research Center, Hospital for Special Surgery, New York, NY

This article has been accepted for publication and undergone full peer review but has not
been through the copyediting, typesetting, pagination and proofreading process, which may
lead to differences between this version and the Version of Record. Please cite this article as
doi: 10.1002/art.40428

This article is protected by copyright. All rights reserved.


8 Department of Biostatistics, The Rockefeller University Hospital, New York, NY

9 Stanford University School of Medicine, Division of Immunology and Rheumatology, Palo


Accepted Article
Alto, CA

10 Division of Rheumatology, University of Massachusetts Memorial Medical Center,

Worcester, MA

*Authors contributed equally to the project

Corresponding Authors:

Dana E Orange

The Rockefeller University

1230 York Avenue

New York, NY 10065

Telephone: 917-439-9625

Fax: 212-794-1999

Email: dorange@rockefeller.edu

Laura T Donlin

Hospital for Special Surgery

535 East 70th Street

New York, NY 10021

Telephone: 212-774-2743

Fax: 212-774-2560

Email: DonlinL@HSS.EDU

Funding was provided by the New York Genome Center, Rockefeller University grant # UL1

TR000043 from the National Center for Advancing Translational Sciences (NCATS),

National Institutes of Health (NIH) Clinical and

This article is protected by copyright. All rights reserved.


Translational Science Award (CTSA) program, grant #UL1-TR000457-06, Weill Cornell

Clinical Translational Science Center (CTSC), and The Block Family Foundation. LI was
Accepted Article
supported by grant # AR046713.

This work was supported by the Accelerating Medicines Partnership (AMP) in Rheumatoid

Arthritis and Lupus Network. AMP is a public-private partnership (AbbVie Inc., Arthritis

Foundation, Bristol-Myers Squibb Company, Lupus Foundation of America, Lupus

Research Alliance, Merck Sharp & Dohme Corp., National Institutes of Health, Pfizer Inc,

Rheumatology Research Foundation, Sanofi and Takeda Pharmaceuticals International,

Inc.), created to develop new ways of identifying and validating promising biological targets

for diagnostics and drug development Funding was provided through grants from the

National Institutes of Health (UH2-AR067676, UH2-AR067677, UH2-AR067679, UH2-

AR067681, UH2-AR067685, UH2-AR067688, UH2-AR067689, UH2-AR067690, UH2-

AR067691, UH2-AR067694, and UM2-AR067678).

Abstract

Objective: We sought to refine histologic scoring of rheumatoid arthritis synovial tissue by

training with gene expression data and machine learning.

Methods: Twenty histologic features were assessed on 129 synovial tissue samples.

Consensus clustering was performed on gene expression data from a subset of 45 synovial

samples. Support vector machine learning was used to predict gene expression subtypes

using histology data as input. Corresponding clinical data were compared across subtypes.

Results: Consensus clustering of gene expression data revealed three distinct synovial

subtypes, including a highly inflammatory subtype characterized by extensive infiltration of

leukocytes, a low inflammatory subtype characterized by enrichment in pathways including

TGF β, glycoproteins and neuronal genes, and a mixed subtype. Machine learning applied

This article is protected by copyright. All rights reserved.


to histology features using gene expression subtypes as labels generated an algorithm for

scoring histology features. Patients with highly inflammatory synovial subtypes exhibited
Accepted Article
higher levels of markers of systemic inflammation and autoantibodies. CRP was

significantly correlated with pain in the high inflammatory group but not the others.

Conclusion: Gene expression analysis of synovial tissue revealed three distinct synovial

subtypes. We used these labels to generate a histology scoring algorithm that associates

with levels of ESR, CRP and autoantibodies. Comparison of gene expression patterns to

clinical features revealed a potentially clinically important distinction: mechanisms of pain

may differ in patients with different synovial subtypes.

Introduction

Rheumatoid arthritis (RA) is the most prevalent autoimmune arthritis, in which extensive

inflammation in synovial tissue can lead to joint destruction. Assessment of synovium has

the potential to provide guidance regarding optimal treatment strategies[1-5], however,

classification of RA synovium is not yet factored into current RA diagnosis, nor treatment

guidelines [6]. A hematoxylin and eosin (H&E) stain based assessment of RA synovium is

feasible for large numbers of patients undergoing interventional procedures, since it is a

routine offering by clinical pathology laboratories. The Krenn scoring system of H&E stained

synovial tissue involves assessment of three features: synovial lining hyperplasia,

synoviocyte stromal density and leukocytic infiltration[7-10]. Though high-grade synovitis is

62% sensitive and 96% specific for the diagnosis of rheumatic diseases, it does not

discriminate between subtypes of rheumatic diseases, such as RA versus psoriatic arthritis.

We reasoned a more granular histology scoring system could be useful to subtype and

guide treatment of RA. Others have explored the significance of lymphocyte aggregates

and found correlations with systemic markers of inflammation such as erythrocyte

sedimentation rate (ESR) and c-reactive protein (CRP) but not with factors with higher

specificity for such as anti-citrullinated peptide antibodies (ACPA) and rheumatoid factor

This article is protected by copyright. All rights reserved.


(RF) [11], suggesting that these features alone cannot distinguish immunologically distinct

subtypes.
Accepted Article
Assessments of H&E stained slides can detect an array of inflammatory features including

multinucleated giant cells[12] neutrophils[13], plasma cells[14], binucleate plasma cells[15]

and Russell bodies (enlarged plasma cells undergoing excessive synthesis of

immunoglobulin)[14], as well as extra-cellular features such as deposition of fibrin, the final

product of the clotting cascade[13], mucins, a heterogeneous family of heavily glycosylated

glycoproteins that retain water and therefore form gels in RA and osteoarthritis (OA)

synovium, but not normal synovial tissue[16] as well as detritus, small fragments of

cartilage or bone[17]. Here we aimed to evaluate the relative utility of 20 such features in

differentiating synovial subtypes defined by transcriptome-wide gene expression patterns,

with the goal of developing an algorithm to score histology features in a manner that

distinguishes synovial subtypes.

We performed an integrative analysis of clinical, histologic and gene expression data from a

cohort of 123 RA patients and six OA patients to provide insights into tissue inflammation

and sub-classification of RA. Gene expression cluster analysis identified three synovial

subtypes, which were used as labels to train a support vector machine (SVM) learning

algorithm with histology scores as inputs (features). This analysis produced a histology

scoring algorithm that predicts the three gene expression subtypes using only histology

features and corresponds with acute phase reactants and autoantibody levels.

Methods

An overview of the study design is presented in Figure 1.

Patient data

We enrolled 123 consecutive RA patients undergoing arthroplasty at Hospital for Special

Surgery (HSS) that met either American College of Rheumatology/European League

Against Rheumatism 2010[18] and/or 1987 criteria. Synovial samples from an additional six

This article is protected by copyright. All rights reserved.


patients with osteoarthritis were also included. Cyclized citrullinated peptide (CCP) and RF

serostatus were coded as negative, low or medium positive (1-3 times the upper limit of
Accepted Article
normal) and high positive (>3 times the upper limit of normal). Pain was assessed by asking

patients the question “How much pain have you had because of your condition over the

past week? Please indicate how severe your pain has been on a scale of 0-10.” This study

was approved by the HSS institutional review board (#2014-233), the Rockefeller University

IRB (#DOR0822) and Biomedical Research Alliance of New York (15-08-114-385) and

participating patients signed informed consent.

Sample processing

Adjacent areas of synovial tissue were placed into histology (OCT frozen blocks) and

dissociated by dissecting, treating with Liberase TL (100ug/ml, Roche) and DNAseI

(100ug/ml, Roche) at 37˚C for 15 minutes and stored at -80oC.

RNA sequencing

RNA was extracted, Qiagen RNAeasy Mini kit (Qiagen # 74104) and libraries were

prepared using Truseq mRNA stranded Library kits at New York Genome Center and 50

base pair, paired-end reads were sequenced on HiSeq2500. Reads were aligned to hg19

using STAR [19]. Samples with greater than 0.1% globin mRNA were excluded from further

analysis to prevent confusion between infiltrating and contaminating hematopoietic cells. 45

samples of sufficient quality were processed in three separate batches. To account for any

batch effects, we used ComBaT in the Bioconductor SVA package[20] and DESeq2[21] to

normalize the data.

RNA-seq clustering

RNA-seq data was clustered using consensus clustering, an iterative clustering method

where small data perturbations (using the rnorm function in the basic stats package in R)

are introduced at each clustering iteration. The clustering iterations offer the opportunity to

This article is protected by copyright. All rights reserved.


derive a statistically robust data partitioning as opposed to a single one-time clustering

computation. We used the k-means clustering method, which typically does not converge to
Accepted Article
the same solution when clustering the same data multiple times (contrary to agglomerative

approaches such as hierarchical clustering). We clustered the 45 RNA-seq samples 1000

times into 2, 3, 4 and 5 groups. The clustering iterations were then used to generate

likelihood scores representing the frequency of a pair of samples clustering together

(number of times the pair of samples cluster together divided by the number of clustering

iterations). This score ranges from 0 to 1, with scores of 1 indicating co-clustering of the two

samples.

Differential gene expression

We used the DESeq2 Bioconductor package[21] to identify differentially expressed genes

across synovial subtypes identified via clustering, with the adjusted p-value of 0.01. We

eliminated chrX/Y genes to remove sex biases, as well as IgG variable genes (V,D and J)

since the individual V,D and J gene segments are counted as individual immunoglobulin

genes, thereby over-representing their abundance. Immunoglobulin constant regions were

retained to ensure representation of immunoglobulin gene expression.

Pathway analysis

The Database for Annotation, Visualization and Integrated Discovery (DAVID v6.8) [22] was

used with default parameters to identify functional annotation clusters of genes. Genes

differentially expressed in High vs other groups with padj<0.01 and fold change <-0.5 were

used to characterize pathways relatively enriched in both Low and Mixed subtypes. Genes

differentially expressed in High vs other groups with padj<0.05 and increasing in order from

Low to Mixed to High were used to identify pathways associated with gradual increase in

expression from Low to Mixed to High. Enrichment scores are a modified Fisher Exact P

value for gene-enrichment analysis.

This article is protected by copyright. All rights reserved.


Inference of cell types in gene expression clusters

In order to deconvolute the cellular composition of the 3 subtypes in our data, we used an
Accepted Article
algorithm called CIBERSORT: Cell-Type Identification by estimating relative Subsets Of

known RNA Transcripts [23,24], a machine learning system trained on the profiles of 22

distinct leukocyte datasets over 547 genes with defined “barcodes” of gene expression

signatures to distinguish cell types.

Predicting RNA-seq subtypes from histology

We used histology scores as modeling features in a standard SVM, Bioconductor package

e1071) to predict the three RNA-seq subtypes, using leave-one-out cross-validation to train

and predict transcriptomic subtypes. Weights attributed to each feature were extracted from

a SVM. All histology features were presented as binary vectors – histology features having

more than 2 categories were converted to a binary representation, so that all features were

equally represented in the model.

Histology Scoring

129 synovial samples were preferentially taken from grossly inflamed (dull and opaque)

synovium. If no inflammation was apparent, samples were taken from standard locations:

the femoral aspects of the medial and lateral gutters, and the central supratrochlear region

in the suprapatellar pouch. Tissue samples were snap frozen and stained with Harris

modified hematoxylin solution (Sigma-Aldrich) and eosin Y (Sigma-Aldrich). Twenty

features (synovial lining hyperplasia, lymphocytes, plasma cells, Russell bodies, binucleate

plasma cells, fibrin, synovial lining giant cells, sublining giant cells, fibrosis, detritus,

necrosis, granulation tissue, neutrophils, mast cells, eosinophils, synovial

chondrometaplasia, germinal centers, mucin, vascularity) were scored using a systematic

approach that is outlined in detail in Supplemental Methods. 129 samples were scored by

one pathologist (ED) and 40 samples were scored again by a second pathologist (EG) to

determine inter-rater reliability.

This article is protected by copyright. All rights reserved.


ACPA Array

As previously described[23] serum levels of antibodies targeting 38 putative RA-associated


Accepted Article
autoantigens were measured using a custom bead-based immunoassay. Serum samples

were diluted 1:30 ratio in a proprietary dilution buffer (Bio-Rad Laboratories), mixed with the

antigens conjugated to spectrally-distinct fluorescent microspheres (Bio-Rad Laboratories),

and then incubated with an anti-human IgG antibody conjugated to phycoerythrin (Jackson

ImmunoResearch). The resulting fluorescence intensities were quantitated using a Luminex

200 System (Luminex Corporation).

Statistics

Kappa statistics were used to assess inter-rater reliability of the histology scores sourced

from the two pathologists. For variables with three or more response levels, weighted kappa

statistics were calculated to give credit for differences that are close as opposed to treating

any difference the same. Inter-rater reliability was considered as none to slight if 0.01-0.2,

fair as 0.21-0.40, moderate as 0.41-0.60, and substantial as 0.61-0.80 and almost perfect if

0.81-1.00[24]. The Jonckheere-Terpstra two-sided test for trend was used to compare

pathway enrichment scores of GSVA data across the three synovial subtypes. Kruskal-

Wallis test with Dunn’s multiple comparison tests was used to compare CIBERSORT RNA

scores and clinical features among the three synovial subtypes. Chi-square test was used

to detect differences above those expected by chance in binary data among the three

synovial subtypes. ANOVA with Tukey’s multiple comparisons test was used to compare

log2 transformed mean fluorescent intensities of ACPA among the three synovial subtypes.

Spearman’s correlation coefficients were calculated between pain and gene expression

across three synovial subtypes.

This article is protected by copyright. All rights reserved.


Results
Accepted Article
Clinical characteristics

Clinical characteristics of 123 patients with RA are presented in Supplemental Table 1. The

majority of patients were female and 47% and 50% were seropositive for RF and CCP,

respectively. Despite average disease duration of 14 years, the average disease activity

score in 28 joints (DAS28) was 3.8 (moderate) and 53% of patients were treated with a

biologic agent just prior to surgery. Joints included hips (40%), knees (57%) and shoulders

(3%).

Histology features extracted from RA synovium

We first sought to determine the prevalence and feasibility of scoring the candidate

histology features. Fourteen of the 20 features were observed in more than 5% of samples

(Figure 2A). We next evaluated the inter-rater reliability of these histology features on a

subset of samples (Figure 2B). Representative images of the features with frequency > 5%

and at least fair inter-rater reliability are presented in Figure 2C and 2D. Features that were

seen in at least 5% of samples, with at least fair inter-pathologist reliability included: plasma

cells, fibrin, binucleate plasma cells, Russell bodies, neutrophils, lymphocytes, synovial

giant cells, lining hyperplasia, mucin and detritus.

RNA-seq data analysis suggests three distinct subtypes of RA synovium

Independent of the histologic analysis, consensus clustering of the top 500 most variable

genes expressed in 45 samples (N=39 RA synovium and 6 OA synovium) identified three

gene expression subtypes. Figure 3A depicts pairwise likelihood scores of samples to

cluster together, with the 45 samples constrained to cluster into k=2, 3, 4 and 5 clusters.

The red boxes represent co-clustering samples (likelihood scores of 1), blue represents

samples that never co-cluster (scores at 0), and white represents inconsistent co-clustering

patterns (scores at 0.5). When the samples were partitioned into 2 or 3 clusters, the

This article is protected by copyright. All rights reserved.


likelihood figure is crisp and well defined, with the sharp red and blue colors suggesting that

the majority of the likelihood scores approached 1 or 0. However when the samples were
Accepted Article
clustered into 4 or more groups, significantly less consistency was observed. This

visualization is a statistically robust confirmation that the 45 RNA-seq samples form at most

three distinct subgroups. We also explored this analysis using differing numbers of variable

genes (Supplemental Figure 1) and identified very minor shifts across the three consensus

clustering subtypes, lending confidence to the k=3 sample partitioning. Principal component

analysis of the samples using the top 500 most variable genes demonstrated agreement

with the three sample subtypes (Figure 3B), validating the clustering.

Differential gene expression patterns between the three synovial subtypes

We identified 6582 transcripts as differentially expressed genes (DEG) (padj<0.01) across

the 3 synovial clusters. Comparisons of each cluster to the others are presented in

Supplemental Tables 2, 3, and 4. More than half of these separated Low from High, while

the majority of the remaining genes separated Mixed from High. The Mixed subtype shares

features with both High and Low subtypes. Since it is a blend of the two subtypes, there are

very few genes that are unique to the Mixed group. Heatmaps of the top 1000 and top 50

most variable genes (as ranked by the p-adjusted value output by DESeq2) depict the main

gene expression patterns segregating the subtypes (Figure 3C). The largest gene block

displays an increasing expression pattern as one progresses from the Low to High

inflammatory subtype (Supplemental Table 5). We performed functional annotation

clustering on genes that increase in this order using DAVID GO analysis and found that the

highest enrichment scores corresponded with immunity, immune cell signaling pathways

such as SH2, SH3 and kinases, immunoglobulins, as well as chemokines and cytokines

(Figure 4A and Supplemental Table 6).

A smaller block of genes showed higher expression in Low and Mixed than in High

inflammatory subtypes (Figure 3C and Supplemental Table 7). We performed functional

annotation clustering on these genes using DAVID GO analysis (Supplemental Table 8)

This article is protected by copyright. All rights reserved.


and found that the highest enrichment scores corresponded with gene sets encoding

glycoproteins (Figure 4B). These genes included markers of lining layer fibroblasts such as
Accepted Article
CD55 [25, 26], TGF-β superfamily genes (TGFBR3, TGFB2, BMP4 and BMP6) and genes

involved in regulation of extracellular glycoproteins (HYAL1, which encodes hyaluronidase

1), an enzyme important for hyaluronidase degradation and GCNT3, a glycosyltransferase

that mediates core O-glycan branching, a critical step in mucin synthesis[27]. Another

enriched group of genes in the low inflammatory subtype belonged to the protocadherins

(Figure 4B), a family of cell-adhesion proteins predominantly expressed in neuronal

synapses that are critical for axon targeting and neuron survival[28-31]. Other significantly

enriched pathways in those involved in neuronal pain processing. For example, GLRA3

(glycine receptor alpha 3) is required for central pain sensitization[32], and ADRA1B (alpha-

1 adrenergic receptor) is up-regulated on nociceptive nerve fibers that survive peripheral

injury [33-35]. We also analyzed expression of genes previously identified to discriminate

lymphoid, myeloid, fibroid and low-inflammatory synovial subtypes [5]. Samples from our

High inflammatory subtype demonstrated high expression of both myeloid genes and

lymphoid genes signature genes (Supplemental Figure 2). The previously described

“fibroid” synovial subtype was enriched in TGF-β signaling pathway genes with a relative

paucity of inflammatory gene expression [5]. We confirmed that these genes were also

overexpressed in our Low inflammatory subtype (Supplemental Figure 2).

CIBERSORT deconvolution of synovial gene expression subgroups

Identifying the cellular source of gene expression variation can be challenging in samples

containing various cell types, due to differences in both proportions and total cell quantities.

To better characterize the cell types responsible for gene expression differences amongst

the synovial tissues, we applied the machine-learning framework for Cell-Type Identification

by estimating relative Subsets Of known RNA Transcripts (or CIBERSORT) to estimate

cell-type composition. We compared inferred leukocyte frequencies using CIBERSORT

analysis of gene expression profiles across the three synovial subtypes. The clusters were

This article is protected by copyright. All rights reserved.


characterized by a successive increase in the percent hematopoietic cells between Low

(23%), Mixed (32%) and High inflammatory group (68%) (Figure 4C). The High
Accepted Article
inflammatory group harbored significantly increased inferred fractions of neutrophils, M2,

M1 and M0 macrophages, monocytes, T follicular helper cells, memory activated and

resting CD4 T cells, CD8 T cells, and plasma cells.

Predicting genomic subtypes from histology features using Machine Learning

One of our main goals was to determine the synchrony between synovial histology features

and their genomic subtypes, the existence of which enables a cheaper histology-based

approach to characterizing synovial tissue. To this end, we implemented a leave one out

cross validation, SVM classification system to predict synovial genomic subtypes for our 45

samples, using a binary representation for the histology scores as training features, and

genomic subtypes as training labels for the model. The predictive power of this system was

evaluated by measuring the area under the curve (AUC) of receiver-operating curves

(ROC) (Figure 5A). Models separating High from others and Low from others performed

best (AUC=0.88 and 0.71 respectively) while models separating Mixed subtypes were

harder to predict (AUC= 0.59). We compared the predicted synovial subtypes to those

assigned by RNA-seq clustering, and found that 39 out of 45 of the samples agreed.

Feature weights were extracted from the modeling system and are shown in Figure 5B

where we observed that the most discriminatory features were, in general, also had

stronger inter-rater reliability (Figure 2B).

After observing satisfactory training and testing results across the 45 RNA-seq samples, we

computed two SVMs using all 45 samples: one to discriminate Low from others, and one to

discriminate High from others. We used these models to predict genomic subtypes on the

remaining 82 samples for which we had complete histology and clinical laboratory data

available, but not gene expression data. We compared the distributions of histology

features across the 39 RA patient RNA-seq samples classified by gene expression

This article is protected by copyright. All rights reserved.


clustering (Figure 5C, left panels) to distributions across the 82 samples classified by the

histology scoring algorithm (Figure 5C, right panels). The distributions of histology features
Accepted Article
were similar whether the samples were classified by gene expression clusters (Figure 5C,

left panels) or by the histology feature algorithm (Figure 5C, right panels). For example,

lymphocyte scores are always highest in the high inflammatory subtype, regardless of how

the subtypes are classified. This is an expected result and serves to validate our approach.

We also compared clinical laboratory feature distributions of 39 RA patients classified by

gene expression clusters versus the 82 RA patients classified by the histology scoring

algorithm (Figure 5D), and again found consistent patterns. Given that clinical features were

not part of development of the algorithm, this consistent distribution of clinical features

demonstrates that both classification methods associate in meaningful ways with clinical

laboratory features. For example, low ESR values were most commonly present in Low

inflammatory subtypes and rarely present in High inflammatory subtypes, whether classified

by gene expression or the histology scoring algorithm. This consistent partitioning of an

independent set of clinical features provides additional confirmation for the histology scoring

algorithm.

Synovial subtypes are associated with clinical features

We next compared the clinical features of all 123 RA patients who were classified using the

histology scoring algorithm. Firstly, we compared the histology features across the three

subtypes and found that lymphocytes, plasma cells, lining hyperplasia, binucleate plasma

cells, Russell bodies, fibrin, and neutrophils were significantly increased in the High

inflammatory samples (Figure 6A). This histology data was consistent with the fractions of

cells inferred from the gene expression data (Figure 4C). Similarly, ESR, CRP, RF and

CCP were increased in patients with High inflammatory scores (Figure 6B). We also

compared the fine specificity of ACPA (Supplemental Figure 3), and identified significantly

increased levels of autoantibodies to citrullinated filaggrin, and fibrinogen-alpha peptides in

patients in the High inflammatory group (Figure 6C). Unexpectedly, these subtype

This article is protected by copyright. All rights reserved.


differences were not observed when looking at clinical assessments of pain, tender joint

counts, swollen joint counts, or disease duration (Figure 6B). The Low inflammatory
Accepted Article
subtype had high pain scores (median pain score =6/10) but little inflammation in the tissue

(according to gene expression and histology assessments or blood (ESR and CRP). We

therefore hypothesized that pain might be driven by distinct mechanisms in the various

subtypes. To explore this hypothesis, we compared the Spearman rank correlation

coefficients between the level of acute phase reactants and pain scores across all samples

grouped together and when parsed according to patient synovial histology subtype. We

found a weak, non-significant correlation between pain and acute phase reactants when we

analyzed all patients together (Figure 6D), that became more clear when we divided the

patients according to synovial histology subtype. Patients with High inflammatory

synovium, but not the other synovial subtypes, had a stronger and significant correlation of

pain with CRP. This suggests that pain is associated with inflammation in patients with

High inflammatory subtypes and that pain may be driven by distinct mechanisms in the

other patients.

Discussion

After identifying three RA synovial gene expression subtypes, we developed a histology

machine learning model trained on these subtypes, predicting genomic subtypes from

histology data. Through our modeling system, we found that the histologic features that

most strongly defined the High inflammatory subtype included three plasma cell features:

binucleate plasma cells, plasma cell percentage and Russell bodies (Figure 5B).

Deconvolution of gene expression profiles indicated that H&E scores of these plasma cell

features were a harbinger of a leukocyte panoply including T follicular helper cells, memory

resting and activated CD4 T cells, CD8 T cells, monocytes, M0, M1 and M2 macrophages,

and neutrophils. Our gene expression analysis did not identify distinct myeloid and

This article is protected by copyright. All rights reserved.


lymphoid synovial histologic subtypes as previously described[5]. It is possible that

differences in the patient populations or treatments account for the discrepant results.
Accepted Article
Analysis of the Low inflammatory subtype identified expression of genes involved in

glycoprotein production and TGF β pathways (fibroid genes) as well as neuronal genes.

This discovery is consistent with results of at least one other prior study in which

overexpression of neurogenesis pathway genes was identified in a low-inflammatory

subtype of RA [36]. A common theme among the neuronal genes in this cluster is that they

play a role in a maladaptive response of the nervous system to damage. It is interesting that

this subtype is characterized by a paucity of inflammatory infiltrates, yet maintains high pain

scores and multiple tender/swollen joints - this too is consistent with other findings of

patients with established RA [37]. While it is possible that these patients had OA of the

arthroplasty joint and active rheumatoid arthritis in other joints, given minimal systemic and

synovial inflammation, it raises the question of whether the other joints might be perceived

(by the evaluating rheumatologist) as tender and swollen due to mechanisms other than

inflammation. For example, synovial proteoglycans such as mucin, which was common in

synovial samples, has a jelly-like consistency and could be perceived by the evaluating

clinician as synovial swelling. Similarly, tenderness and pain could be due to non-

inflammatory mechanisms and this is consistent with the dissociation of pain scores from

systemic inflammatory markers (Figure 6D) as well as enrichment for neuronal gene

expression (Figure 4B) in the Low and Mixed synovial subtypes. Our work suggests that RA

patients with longstanding disease and poor response to anti-inflammatory treatment may

warrant synovial biopsy to determine their inflammatory subtype. Better understanding of

the cause of pain in patients with little tissue inflammation is critical because non-

inflammatory pain and mucin related synovial thickening could result in high tender and

swollen joint counts in the absence of systemic inflammation (low ESR and CRP). We

This article is protected by copyright. All rights reserved.


would expect such patients to respond poorly to anti-rheumatic drugs targeting

inflammation.
Accepted Article
There are several important limitations of this work. The lack of normal synovium tissues

makes it challenging to draw conclusions about the genes overexpressed in the Low

inflammatory subtype. These genes could represent a pathologic process or simply

represent relative enrichment of normal resident cells. Another challenge in interpreting

these results is that the cell bodies of sensory neurons reside in the dorsal root ganglion

and are not captured when sampling synovial tissue. It is possible that synaptic mRNA [38]

from damaged nerve fibers were captured in our tissue samples, or that resident synovial

cells express a broad array of neural genes in response to inflammation. For example,

synovial fibroblasts have been shown to express nicotinic-alpha7-receptor 7 [39] and

substance P [40]. Another limitation of this work is that our tissue samples were dissociated

into single cell suspensions prior to RNA fixation. Based on the significant numbers of

plasma cells counted by histology assessment, it is likely that the CIBERSORT inferred

plasma cell frequencies were lower than those observed by H&E due to plasma cell fragility

and death during processing. Additional processing artifacts include immune cell activation

due to various factors during dissociation. CIBERSORT was trained on microarray data and

can only detect 22 potential cell types, so it is quite possible that other cell types, such as

plasmablasts and peripheral helper T (TPH) [41] cells, are present but not annotated by

CIBERSORT. Yet one other limitation in this project is that the machine learning model was

trained and tested on just 45 synovial samples – a larger dataset would offer more

statistical power and we plan to run more samples in future work. Finally, the cohort

studied had longstanding disease and exposure to various treatments. Further assessment

of small joint tissue from early RA patients would be useful to characterize features

important for RA pathogenesis and predict treatment responses.

This article is protected by copyright. All rights reserved.


In summary, machine learning integration and analysis of histologic and transcriptional

datasets identified three distinct molecular subtypes of RA that correlate with specific
Accepted Article
clinical phenotypes. The High inflammatory subtype is associated with high synovial and

systemic inflammation and autoantibodies. The Low inflammatory subgroup is

characterized by high neuronal and glycoprotein gene expression, and pain scores that are

dissociated from elevated inflammatory markers.

Figure Legends:

Figure 1: Study Overview.

Figure 2: Histologic features of RA synovium. A) Frequency distribution of histology scores

of 20 features assessed in 129 synovial samples. B) Inter-rater reliability (kappa statistic)

of 15 histology features assessed in 40 synovial samples. C and D) Representative images

of H&E stained synovium: 10 synovial histology features retained for modeling. C) Images

obtained at 100x, scale bars indicate 200μm. D) Images obtained at 20x, scale bars

indicate 20μm.

Figure 3: Gene expression analysis of 45 synovial tissues identifies three distinct synovial

subtypes. A) Consensus clustering heatmaps using the top 500 most variable genes,

constrained to k= 2, 3, and 4 clusters. P= probability score for co-clustering of individual

samples. Red denotes samples clustered together consistently, blue denotes samples do

not cluster together consistently and white denotes inconsistent co-clustering. Samples

were labeled green, orange and yellow according to partitioning obtained when constraining

the clustering algorithm to k=3 clusters. B) PCA plot of RNA-seq data using the top 500

most variable genes. Samples are colored according to cluster. C) Heatmap of normalized

gene expression of the top 1000 and 50 DEG across three clusters. Gene names are listed

This article is protected by copyright. All rights reserved.


on the y-axis. Red denotes increased gene expression and green denotes decreased gene

expression.
Accepted Article
Figure 4: Gene expression characteristics of three synovial subtypes. A) Enrichment

scores of functional annotation clusters of genes with ordered increases in expression

levels from Low to Mixed to High. B) Enrichment scores of functional annotation clusters of

genes enriched in Low and Mixed relative to High inflammatory subtypes. Functional

groups with enrichment scores >3 are presented. C) CIBERSORT inferred fraction of

leukocyte cell types according to three synovial subtypes. *p<0.05 according to Kruskal-

Wallis test.

Figure 5: Machine learning classification of the histology features using RNA-seq-defined

synovial subgroups. A) ROC curves of histology scoring algorithm. B) Mean absolute

weights for 10 histology features separating High-inflammatory samples from others in the

histology scoring algorithm. C) Frequency distribution of raw histology scores among three

synovial subtypes classified by either RNA-seq clustering (left panel, N=39 RA patients) or

histology scoring algorithm (right panel, N=82 RA patients). D) Frequency distribution of

clinical laboratory results (Rank score 0=minimum, 1=25th percentile, 2=50th percentile,

3=75th percentile, 4=maximum) among three synovial subtypes classified by either gene

expression clustering (left panel, N=39 RA patients) or histology scoring algorithm (right

panel, N=82 RA patients).

Figure 6: Comparison of clinical features of patients with 3 synovial subtypes. A) Upper

panel: Median (+ interquartile range) of various ordinal histology features. *p<0.05,

**p<0.01, ***p<0.001, ****p<0.0001 according to Kruskal-Wallis test with Dunn’s multiple

comparison test comparing the mean rank of each group to the Low inflammatory group.

Lower panel: Average incidence of various binary histology features. *p<0.05, **p<0.01,

***p<0.001, ****p<0.0001 according to Chi-square test. B) Median (+ interquartile range) of

This article is protected by copyright. All rights reserved.


various clinical features. *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001 according to Kruskal-

Wallis test with Dunn’s multiple comparison test comparing the mean rank of each group to
Accepted Article
the Low inflammatory group. C) Log2 (plasma mean fluorescent intensity (MFI) of

antibodies to peptides divided by the median value per peptide) of the 3 putative RA-

associated autoantigens with significantly different levels among the three synovial

subtypes according to ANOVA with Tukey’s multiple comparison testing. Mean values for

samples from patients assigned to each synovial subtype are presented. Filaggrin=

Filaggrin 48-65 cit2 v1 cyclic, Fibrinogen A= Fibrinogen A 616-635 cit3 sm cyclic D)

Spearman’s correlation coefficients (ρ) of pain with either acute phase reactants across

three synovial subtypes. *p<0.05

Acknowledgements:

We would like to thank Nathalie Blachere for editorial assistance. Funding was provided by

the New York Genome Center, Rockefeller University grant # UL1 TR000043 from the

National Center for Advancing Translational Sciences (NCATS), National Institutes of

Health (NIH) Clinical and

Translational Science Award (CTSA) program, grant #UL1-TR000457-06, Weill Cornell

Clinical Translational Science Center (CTSC), and The Block Family Foundation. LI was

supported by grant # AR046713.

This work was supported by the Accelerating Medicines Partnership (AMP) in Rheumatoid

Arthritis and Lupus Network. AMP is a public-private partnership (AbbVie Inc., Arthritis

Foundation, Bristol-Myers Squibb Company, Lupus Foundation of America, Lupus

Research Alliance, Merck Sharp & Dohme Corp., National Institutes of Health, Pfizer Inc,

Rheumatology Research Foundation, Sanofi and Takeda Pharmaceuticals International,

Inc.), created to develop new ways of identifying and validating promising biological targets

for diagnostics and drug development Funding was provided through grants from the

National Institutes of Health (UH2-AR067676, UH2-AR067677, UH2-AR067679, UH2-

This article is protected by copyright. All rights reserved.


AR067681, UH2-AR067685, UH2-AR067688, UH2-AR067689, UH2-AR067690, UH2-

AR067691, UH2-AR067694, and UM2-AR067678).


Accepted Article
References:

1. Pitzalis, C., S. Kelly, and F. Humby, New learnings on the pathophysiology of RA


from synovial biopsies. Curr Opin Rheumatol, 2013. 25(3): p. 334-44.

2. De Groof, A., et al., Higher expression of TNFalpha-induced genes in the synovium


of patients with early rheumatoid arthritis correlates with disease activity, and
predicts absence of response to first line therapy. Arthritis Res Ther, 2016. 18: p.
19.

3. Hogan, V.E., et al., Pretreatment synovial transcriptional profile is associated with


early and late clinical response in rheumatoid arthritis patients treated with
rituximab. Ann Rheum Dis, 2012. 71(11): p. 1888-94.

4. Klaasen, R., et al., The relationship between synovial lymphocyte aggregates and
the clinical response to infliximab in rheumatoid arthritis: a prospective study.
Arthritis Rheum, 2009. 60(11): p. 3217-24.

5. Dennis, G., Jr., et al., Synovial phenotypes in rheumatoid arthritis correlate with
response to biologic therapeutics. Arthritis Res Ther, 2014. 16(2): p. R90.

6. Smolen, J.S., et al., EULAR recommendations for the management of rheumatoid


arthritis with synthetic and biological disease-modifying antirheumatic drugs: 2016
update. Ann Rheum Dis, 2017.

7. Krenn, V., et al., [Histopathological degeneration score of fibrous cartilage. Low- and
high-grade meniscal degeneration]. Z Rheumatol, 2010. 69(7): p. 644-52.

8. Krenn, V., et al., [Low-grade-/high-grade-synovitis: synovitis-score as a gold


standard?]. Orthopade, 2006. 35(8): p. 853-9.

9. Slansky, E., et al., Quantitative determination of the diagnostic accuracy of the


synovitis score and its components. Histopathology, 2010. 57(3): p. 436-43.

10. Pearle, A.D., et al., Elevated high-sensitivity C-reactive protein levels are associated
with local inflammatory findings in patients with osteoarthritis. Osteoarthritis
Cartilage, 2007. 15(5): p. 516-23.

11. Thurlings, R.M., et al., Synovial lymphoid neogenesis does not define a specific
clinical rheumatoid arthritis phenotype. Arthritis Rheum, 2008. 58(6): p. 1582-9.

12. Prieto-Potin, I., et al., Characterization of multinucleated giant cells in synovium and
subchondral bone in knee osteoarthritis and rheumatoid arthritis. BMC
Musculoskelet Disord, 2015. 16: p. 226.

This article is protected by copyright. All rights reserved.


13. Riddle, J.M., G.B. Bluhm, and M.I. Barnhart, Interrelationships between fibrin,
neutrophils and rheumatoid synovitis. J Reticuloendothel Soc, 1965. 2(5): p. 420-36.

14. Orlovskaya, G.V., P.Y. Muldiyarov, and I.S. Kazakova, Synovial plasma cells in
Accepted Article
rheumatoid arthritis. Electron microscope and immunofluorescence studies. Ann
Rheum Dis, 1970. 29(5): p. 524-32.

15. Perry, M.E., et al., Binucleated and multinucleated forms of plasma cells in synovia
from patients with rheumatoid arthritis. Rheumatol Int, 1997. 17(4): p. 169-74.

16. Volin, M.V., et al., Expression of mucin 3 and mucin 5AC in arthritic synovial tissue.
Arthritis Rheum, 2008. 58(1): p. 46-52.

17. Revell, P.A., et al., The synovial membrane in osteoarthritis: a histological study
including the characterisation of the cellular infiltrate present in inflammatory
osteoarthritis using monoclonal antibodies. Ann Rheum Dis, 1988. 47(4): p. 300-7.

18. Villeneuve, E., J. Nam, and P. Emery, 2010 ACR-EULAR classification criteria for
rheumatoid arthritis. Rev Bras Reumatol, 2010. 50(5): p. 481-3.

19. Dobin, A., et al., STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 2013.
29(1): p. 15-21.

20. Leek, J.T. and J.D. Storey, Capturing heterogeneity in gene expression studies by
surrogate variable analysis. PLoS Genet, 2007. 3(9): p. 1724-35.

21. Love, M.I., W. Huber, and S. Anders, Moderated estimation of fold change and
dispersion for RNA-seq data with DESeq2. Genome Biol, 2014. 15(12): p. 550.

22. Dennis, G., Jr., et al., DAVID: Database for Annotation, Visualization, and Integrated
Discovery. Genome Biol, 2003. 4(5): p. P3.

23. Sokolove, J., et al., Autoantibody epitope spreading in the pre-clinical phase
predicts progression to rheumatoid arthritis. PLoS One, 2012. 7(5): p. e35296.

24. McHugh, M.L., Interrater reliability: the kappa statistic. Biochem Med (Zagreb),
2012. 22(3): p. 276-82.

25. Palmer, D.G., et al., Features of synovial membrane identified with monoclonal
antibodies. Clin Exp Immunol, 1985. 59(3): p. 529-38.

26. Smith, M.D., et al., Microarchitecture and protective mechanisms in synovial tissue
from clinically and arthroscopically normal knee joints. Ann Rheum Dis, 2003. 62(4):
p. 303-7.

27. Yeh, J.C., E. Ong, and M. Fukuda, Molecular cloning and expression of a novel
beta-1, 6-N-acetylglucosaminyltransferase that forms core 2, core 4, and I branches.
J Biol Chem, 1999. 274(5): p. 3215-21.

28. Leung, L.C., et al., Coupling of NF-protocadherin signaling to axon guidance by cue-
induced translation. Nat Neurosci, 2013. 16(2): p. 166-73.

This article is protected by copyright. All rights reserved.


29. Hasegawa, S., et al., Distinct and Cooperative Functions for the Protocadherin-
alpha, -beta and -gamma Clusters in Neuronal Survival and Axon Targeting. Front
Mol Neurosci, 2016. 9: p. 155.
Accepted Article
30. Junghans, D., et al., Postsynaptic and differential localization to neuronal subtypes
of protocadherin beta16 in the mammalian central nervous system. Eur J Neurosci,
2008. 27(3): p. 559-71.

31. Asahina, H., et al., Distribution of protocadherin 9 protein in the developing mouse
nervous system. Neuroscience, 2012. 225: p. 88-104.

32. Harvey, R.J., et al., GlyR alpha3: an essential target for spinal PGE2-mediated
inflammatory pain sensitization. Science, 2004. 304(5672): p. 884-7.

33. Drummond, E.S., et al., Increased expression of cutaneous alpha1-adrenoceptors


after chronic constriction injury in rats. J Pain, 2014. 15(2): p. 188-96.

34. Drummond, P.D., et al., Upregulation of alpha1-adrenoceptors on cutaneous nerve


fibres after partial sciatic nerve ligation and in complex regional pain syndrome type
II. Pain, 2014. 155(3): p. 606-16.

35. Lee, Y.H., et al., Alpha1-adrenoceptors involvement in painful diabetic neuropathy: a


role in allodynia. Neuroreport, 2000. 11(7): p. 1417-20.

36. van Baarsen, L.G., et al., Synovial tissue heterogeneity in rheumatoid arthritis in
relation to disease activity and biomarkers in peripheral blood. Arthritis Rheum,
2010. 62(6): p. 1602-7.

37. Horton, S.C., C.A. Walsh, and P. Emery, Established rheumatoid arthritis: rationale
for best practice: physicians' perspective of how to realise tight control in clinical
practice. Best Pract Res Clin Rheumatol, 2011. 25(4): p. 509-21.

38. Meer, E.J., et al., Identification of a cis-acting element that localizes mRNA to
synapses. Proc Natl Acad Sci U S A, 2012. 109(12): p. 4639-44.

39. Waldburger, J.M., et al., Acetylcholine regulation of synoviocyte cytokine expression


by the alpha7 nicotinic receptor. Arthritis Rheum, 2008. 58(11): p. 3439-49.

40. Inoue, H., et al., Production of neuropeptide substance P by synovial fibroblasts


from patients with rheumatoid arthritis and osteoarthritis. Neurosci Lett, 2001.
303(3): p. 149-52.

41. Rao, D.A., et al., Pathologically expanded peripheral T helper cell subset drives B
cells in rheumatoid arthritis. Nature, 2017. 542(7639): p. 110-114.

This article is protected by copyright. All rights reserved.


Accepted Article

This article is protected by copyright. All rights reserved.


Accepted Article

This article is protected by copyright. All rights reserved.


Accepted Article

This article is protected by copyright. All rights reserved.


Accepted Article

This article is protected by copyright. All rights reserved.

You might also like