You are on page 1of 15

A Novel Self-Learning Framework for Bladder Cancer Grading Using

Histopathological Images

Gabriel Garcı́a1 , Anna Esteve1,2 , Adrián Colomer1 , David Ramos2 and Valery Naranjo1
1 Instituto de Investigación e Innovación en Bioingenierı́a, Universitat Politècnica de València, 46022, Valencia, Spain
2 Hospital Universitario y Politécnico La Fe, Avinguda de Fernando Abril Martorell, 106, 46026, Valencia, Spain.
arXiv:2106.13559v1 [eess.IV] 25 Jun 2021

Abstract
Recently, bladder cancer has been significantly increased in terms of incidence and mortality. Currently, two sub-
types are known based on tumour growth: non-muscle invasive (NMIBC) and muscle-invasive bladder cancer (MIBC).
In this work, we focus on the MIBC subtype because it is of the worst prognosis and can spread to adjacent organs. We
present a self-learning framework to grade bladder cancer from histological images stained via immunohistochemical
techniques. Specifically, we propose a novel Deep Convolutional Embedded Attention Clustering (DCEAC) which
allows classifying histological patches into different severity levels of the disease, according to the patterns established
in the literature. The proposed DCEAC model follows a two-step fully unsupervised learning methodology to discern
between non-tumour, mild and infiltrative patterns from high-resolution samples of 512 × 512 pixels. Our system out-
performs previous clustering-based methods by including a convolutional attention module, which allows refining the
features of the latent space before the classification stage. The proposed network exceeds state-of-the-art approaches
by 2−3% across different metrics, achieving a final average accuracy of 0.9034 in a multi-class scenario. Furthermore,
the reported class activation maps evidence that our model is able to learn by itself the same patterns that clinicians
consider relevant, without incurring prior annotation steps. This fact supposes a breakthrough in muscle-invasive
bladder cancer grading which bridges the gap with respect to train the model on labelled data.
Keywords: Bladder cancer, tumour budding, unsupervised learning, deep clustering, histopathological images,
self-learning, immunohistochemical staining

1. Introduction patient. When the tumour can not be located in the pre-
vious stage, an MRI urogram is carried out to analyze
Bladder cancer is an uncontrolled proliferation of the possible local spread [4]. If there is evidence of blad-
urothelial bladder tumour cells that entails the devel- der cancer, the urologist usually performs a cystoscopy
opment of the tumour. A significant increase in adult based on the transurethral resection (TUR) technique,
incidence and mortality has been observed during the which allows extracting a sample of abnormal bladder
last years regarding this alteration. In particular, re- tissue to determine the kind of tumour growth. After
cent studies claim that bladder cancer is the second most the preparation process, the biopsied tissue is usually
common urinary tract cancer and the fifth most incident stained with hematoxylin and eosin (H & E) to enhance
among men in developed countries [1, 2]. the histological properties of the tissue. Finally, an addi-
Nowadays, the diagnostic procedure of bladder can- tional staining process can be adopted to highlight spe-
cer includes several time-consuming tests. First, urine cial structures associated with the problem under study.
cytology is performed to determine the presence of car- Particularly, the immunohistochemical CK AE1/3 tech-
cinogenic cells [3]. Later, a vesico-prostatic and renal nique was applied on the histological images used in this
ultrasound is employed to locate the tumour and get an work to highlight the carcinogenic cells by providing a
idea about its kind of growing, which provides relevant brown hue when the antigen-antibody binding occurs.
cues to determine the grade and the prognosis of the Note that two kinds of bladder cancer, non-muscle in-
vasive (NMIBC) and muscle-invasive (MIBC), are dis-
Email address: jogarpa7@i3b.upv.es () tinguished depending on its level of invasiveness during
Preprint submitted to Computers in Biology and Medicine June 28, 2021
the tumour growth within the bladder wall. Currently, pixels. The resulted features were fused with other clini-
75% and 25% of the bladder cancer cases correspond cal data to address a classification stage via bidirectional
to NMIBC and MIBC, respectively [2]. In this study, GRU networks. The proposed algorithm reported accu-
we focus on the MIBC category since it leads to the racy of 0.67 for the 5-year survival prediction. In [15],
worst prognosis and favours the spread of the tumour to the authors carried out an end-to-end approach to dis-
adjacent organs. According to [5], the muscle-invasive cern between MIBC and NMIBC categories from H & E
bladder cancer (MIBC) does not usually present low- images. First, they performed a segmentation process
grade cases of malignancy, but just high-grade urothe- to discriminate the tissue from the background of the
lial carcinomas, which can be categorized as grade 2 or image. Patches of 700 × 700 pixels were used to per-
3 following the classification criteria proposed by the form both manual and automatic feature extraction. The
World Health Organization (WHO) [6]. Jimenez et al. hand-crafted learning was conducted via contextual fea-
[7] described three different histological patterns which tures such as nuclear size distribution, crack edge, sam-
keep correlation with the patient outcome. Specifically, ple ratio, etc., whereas the data-driven learning was ad-
nodular, trabecular and infiltrative patterns can be found dressed via VGG16 and VGG19 architectures. During
in the histopathological images stained with CK AE1/3, the classification stage, different machine-learning clas-
as observed in Figure 1. The nodular pattern (yellow sifiers such as support vector machine (SVM), logis-
box) is defined by well-delineated tumour cell nests tic regression (LR) or random forest (RF), among oth-
with a circular shape. Otherwise, the trabecular pat- ers, were used to determine the kind of bladder tissue.
tern is characterised by the presence of tumour cells ar- The hand-driven approaches showcased an outperform-
ranged in interconnected bands. Finally, the infiltrative ing with respect to the deep-learning models, achieving
pattern is composed of tumour cell strands (red box) or a an accuracy of 91 − 96% depending on the classifier.
small set of isolated cells called buds (blue box). The in- Respecting the classification-intended studies, most
filtrative pattern, a.k.a. tumour budding, represents the of them were focused on H & E-stained histological
most aggressive scenario and the worst prognosis for the images, as before. In [17], the researchers proposed a
patient [8–11]. For that reason, we pigeonholed nodular multi-class scenario to detect the molecular sub-type in
and trabecular structures in a single specific class (mild muscle-invasive bladder cancer (MIBC) cases. They ap-
pattern) to grade the MIBC severity according to the plied the ResNet architecture on patches of 512 × 512
prognosis of the disease. Also, we considered a non- pixels, achieving results for the area under the ROC
tumour pattern (pink box) to cover those cases in which curve (AUC) of 0.89 and 0.87 in terms of micro and
the patient does not present signs of tumour evidence. In macro-average. Otherwise, in [18], the authors made
this way, a multi-class scenario is conducted throughout use of the Xception network as a feature extractor from
the paper to grade the bladder cancer into non-tumour H & E-stained patches of 256 × 256 pixels. Then, an
(NT), mild (M) and infiltrative (I) patterns. SVM classifier was implemented to discern between
high and low mutational burden reaching values of 0.73
1.1. Related work and 0.75 for accuracy and AUC metrics, respectively.
Harmon et al. [19] proposed a classification scenario
An accurate bladder cancer diagnosis supposes a very to detect lymph node metastases from H & E patches
time-consuming task for expert pathologists, whose of 100 × 100 pixels. A combination of the ResNet-101
level of reproducibility is low enough to provide signif- architecture with AdaBoost classifiers reported an AUC
icant differences in the histological-based interpretation of 0.678 at the test time. Another interesting study [12]
[12, 13]. For that reason, many studies in the state of carried out a classification approach to categorize the
the art have proposed artificial-intelligence algorithms kind of the tissue into six different classes: urothelium,
to help pathologists in terms of cost-effectiveness and stroma, damaged, muscle, blood and background. To
subjectivity ratio. Most of them focused on machine- this end, the authors combined supervised and unsuper-
learning techniques applied on H & E-stained histolog- vised deep-learning techniques on patches of 128 × 128
ical images for segmentation [14–16] and classification pixels stained with H & E. Specifically, they trained
[12, 13, 17–22] problems. an autoencoder (AE) from the unlabelled images and
Regarding the segmentation-based studies, Lucas et used the encoder network to address the classification
al. [14] used the popular U-net architecture to segment throughout the features extracted from the labelled sam-
normal and malignant cases of bladder images. Then, ples. They reached multi-class results of 0.936, 0.935
they used the common VGG16 network as a backbone and 0.934% for precision, recall and F1-score metrics,
to extract histological features from patches of 224×224 respectively. One of the more important state-of-the-art
2
Nodular

Trabecular

Infiltrative

(a) (b) (c) (d) (e)

Figure 1: Whole Slide Image (WSI) from a patient suffering from muscle-invasive bladder cancer (MIBC) in which different growth patterns are
evidenced. (a) Non-tumour pattern. (b) Nodular arrangement (mild pattern). (c) Trabecular arrangement (mild pattern). (d) Tumour cell strands of
an infiltrative pattern. (e) Isolated tumour cells denoting an infiltrative pattern.

studies focused on H & E histological images of bladder ered in the literature for bladder cancer analysis, e.g.
cancer was carried out in [13]. Zhang et al. collected a magnetic resonance imaging (MRI) [16], cystoscopy
large database of WSIs with the aim of discerning be- [21, 22] or computerized tomography (CT) [20]. Partic-
tween low and high grades of the disease. In particular, ularly, Dolz et al. [16] applied deep-learning algorithms
they used an autoencoder network to identify possible to detect bladder walls and tumour regions from MRI
areas with cancer. Then, the extracted ROIs of dimen- samples. In [21, 22], different deep-learning architec-
sions 1024 × 1024 pixels were input to a Convolutional tures were implemented to distinguish between healthy
Neural Network (CNN) to classify them into low and and bladder cancer patients using cystoscopy samples.
high classes. Finally, an average accuracy of 94% was Yang et al. [20] outlined a classification between
reported by the proposed system, in comparison to the NMIBC and MIBC categories from CT images. Addi-
84, 3% reached by the pathologists. The findings from tionally, although immunohistochemical techniques are
this study reveal that there exists a significant subjectiv- widely used in the literature for detecting tumour bud-
ity level between experts in the diagnosis from histolog- ding, most of the state-of-the-art works applied them on
ical bladder cancer images, as supported in [12]. colorectal cancer images [23–26]. However, a large gap
in immunohistochemical-based studies is found in the
It should be noted that, besides the histopathologi- literature for bladder cancer diagnosis. As far as we
cal samples, other imaging modalities are also consid-
3
know, only the study carried out in [27] proposed the use With all of the above, the proposed end-to-end frame-
of immunofluorescence-stained samples to quantify the work supposes a reliable benchmark for making diag-
tumour budding for MIBC prognosis via machine learn- nostic suggestions without involving the pathologist ex-
ing algorithms. Specifically, the authors aimed to es- perience, which adds significant value to the body of
tablish a relationship between tumour budding and sur- knowledge. In summary, the main contributions of this
vival evaluated in patients with MIBC. To this end, they work are listed below:
carried out learning strategies based on nuclei detection
and segmentation of the tumour in stroma regions to • For the first time, we make use of CK AE3/1-
count the isolated tumour budding cells. The authors stained images to address the automatic diagnosis
proposed a survival decision function based on random of bladder cancer via machine-learning algorithms.
forest classifiers, reporting a hazard ratio of 5.44.
• We resort to advanced unsupervised deep-learning
1.2. Contribution of this work techniques to address the bladder cancer grading
To the best of the author’s knowledge, no previous without the need for prior annotation steps.
works have been performed to analyse the severity of
bladder cancer using histological images stained with • We propose a novel deep-clustering architecture
cytokeratin AE1-AE3 immunohistochemistry. More- able to improve the representation space via con-
over, all the state-of-the-art studies focused on super- volutional attention modules, which derives in a
vised learning methods to find dependencies between better-unsupervised classification.
the inputs and the predicted class [12, 13, 18, 19]. Some • We based on high-resolution histological patches
of them [12, 13] also considered the use of unsupervised to learn specific-bladder cancer patterns and strat-
techniques in early methodological steps to find possi- ify the different severity levels of the disease ac-
ble ROIs with cancer, but they need labelling data to cording to the literature.
build the definitive predictive models. Besides, pattern
recognition tasks aimed to grade bladder cancer have • Heat maps highlighting decisive areas are reported
not been addressed in previous studies. to incorporate an explainable component for the
To fill these gaps in the literature, we present in network prediction. This fact provides an inter-
this paper a self-learning framework for bladder can- pretability perspective that coincides with the clin-
cer growth pattern, which focuses on fully unsupervised icians’ criteria.
learning strategies applied on CK AE3/1-stained WSIs.
In particular, we propose a deep convolutional embed-
ded attention clustering (DCEAC) which allows boost- 2. Material
ing the performance of the classification model without
incurring labelled data. In the literature, deep-clustering A private database composed of 136 whole-slide
algorithms have demonstrated a high rate of perfor- images (WSIs) stained via immunohistochemistry CK
mance for image classification [28–30], image segmen- AE1/3 technique was used to accomplish this study.
tation [31], speech separation [32, 33] or data analy- The WSIs, coming from the Hospital Universitario y
sis [34], among other tasks. Inspired by [28], we pro- Politécnico La Fe (Valencia, Spain), were digitized at
pose a tailored algorithm capable of competing with the highest optical magnification (40×) to leverage the
the state-of-the-art results achieved by supervised algo- inherent structure of the bladder patterns associated
rithms. As a novelty, we include a convolutional atten- with each grade of the disease. Worth noting that a high
tion module to refine the features embedded in the la- image resolution is necessary to achieve an accurate di-
tent space. Additionally, we are the first that focus on agnosis of bladder cancer. This is because the class de-
the arrangement of the histological structures contained pendencies are evidenced in the high frequency of the
in the high-resolution patches to classify them into non- image, especially the tumour budding details.
tumour (NT), mild (M) and infiltrative (I) patterns, ac- In the first step of the database preparation, an ex-
cording to the criteria proposed in [7]. We also com- pert from the Pathological Anatomy Department carried
pute a class activation map (CAM) algorithm [35] to out a manual segmentation to indicate possible areas of
evidence how the proposed network pays attention to interest. At this point, it is important to highlight that
those specific structures which match with the clinical the segmentation was performed in a very rough man-
patterns associated with the aggressiveness of bladder ner, as observed in Figure 2, in order to reduce the ex-
cancer. pert’s annotation time as much as possible. From here,
4
a patching algorithm was applied to extract cropped im- properties of the histological patches. In a second phase,
ages with an optimal block size in terms of computa- a clustering branch is included at the output of the CAE
tional efficiency and structural content. Specifically, we bottleneck to provide the class information from the em-
extracted patches of dimensions 512 × 512 pixels, ac- bedded features, which are online updated by re-training
cording to some of the most recent state-of-the-art stud- the CAE in a combined network. Below, we detail both
ies focused on histopathological images [17, 36, 37]. learning steps.
Then, a preprocessing step was addressed to discard
the useless regions, corresponding to the background of 3.1. CAE pre-training
the WSI, and select those patches containing more than Autoencoder (AE) is one of the most common tech-
75% of annotated tissue. After this, a total of 2995 rep- niques for data representation, whose aim is to mini-
resentative patches composed the unsupervised frame- mize the reconstruction error between inputs X and out-
work. For validation purposes, an expert manually la- puts R. AE architectures are composed of two training
belled each patch as non-tumour (NT), mild (M) or in- stages: encoder fφ (·) and decoder gθ (·), where φ and θ
filtrative (I) classes, according to the pattern criteria pre- are learnable parameters. The encoder network applies
viously detailed in Section 1. The labelling process re- a non-linear mapping function to extract a feature space
sulted in 763 non-tumour, 1470 mild and 762 infiltrative Z from the input samples X, so that f : X → Z. Then,
cases, as reported in Figure 2. It is essential to remark the decoder structure is intended to reconstruct the input
that we did not have access to the labelled data during data from the embedded representations via R = gθ (Z).
the training phase since we propose a fully unsupervised The learning procedure is carried out by minimizing a
strategy to achieve a self-learning of the patterns. La- reconstruction loss function.
bels were just considered at the test time to evaluate the Notice that AE architectures are usually defined by
models’ performance. fully connected layers, intended to reduce the dimen-
sionality of the feature space [30, 34], or by convolu-
3. Methods tional layers acting as a feature extractor from 2D or 3D
input data [28]. Similarly to [28], we adopted a Con-
Recently, deep-clustering algorithms have risen to the volutional AutoEncoder (CAE) architecture to address
forefront of unsupervised image-based techniques since the reconstruction of the histological patches as a pre-
they allow enhancing the feature learning while improv- text task. However, our CAE differs from the current
ing the clustering performance in a unified framework literature in a specific aspect of the network: the bot-
[28]. In this paper, we address a fully unsupervised tleneck. Unlike Guo et al. [28], who combined flatten
self-learning strategy to cluster a large collection of un- operations with fully-connected layers at the middle of
labelled images into K = 3 groups corresponding to the CAE, we introduced a convolutional attention mod-
different severity levels of MIBC. Specifically, inspired ule through a residual connection to enhance the latent
by [28], we propose a novel deep convolutional embed- space for the downstream clustering task. As observed
ded attention clustering (DCEAC) in which the features in Figure 3, we utilized an encoder composed of three
are updated in an online manner to learn stable repre- stacked convolutional layers with a 3 × 3 receptive field
sentations for the clustering stage. Unlike conventional (blue boxes). At the bottleneck, we defined an atten-
approaches [30], the proposed DCEAC algorithm opti- tion block composed of a tailored autoencoder which
mizes the latent space by preserving the local structure allows refining the embedded features in the spatial di-
of data, which helps to stabilize the clustering-learning mension. Specifically, the proposed module combines
process without distorting the embedding properties. 1×1 convolutions (green boxes) with a sigmoid function
Self-learning methods aim to learn useful representa- (purple layer) intended to recalibrate the inputs. The
tions by leveraging the domain-specific knowledge from inclusion of an identity shortcut forces the network to
the unlabelled data to accomplish downstream tasks. stabilize the feature space by propagating larger gradi-
This training procedure is usually faced by solving pre- ents to previous layers via skip connections. An addi-
text tasks [38], relational reasoning [39] or contrastive tional 1 × 1 convolutional layer was included to mod-
learning [40] approaches. In our bladder cancer sce- ify the filter channel without affecting the dimension of
nario, we advocate for a sequential strategy that resorts the feature maps. In the decoder stage, we applied reg-
to image reconstruction as an unsupervised pretext task. ularization operations between the transpose convolu-
Specifically, we conduct a two-step learning methodol- tional layers (yellow-contour boxes) throughout Batch
ogy in which, first, a convolutional autoencoder (CAE) Normalization to avoid the internal covariate shift [41].
is trained to incorporate information about the domain A remarkable factor is that no pooling or upsampling
5
CK AE1/3-stained WSIs Patching algorithm Useful patches selection Labelling

Healthy

Grade II

<
512
Grade III

512

Figure 2: Database preparation process. First, a patching algorithm was applied on 136 CK AE1/3-stained WSIs to extract sub-images of 512 × 512
pixels. For validation purposes, the resulting 2995 patches were labelled by an expert as non-tumour (NT), mild (M) or infiltrative (I) pattern to
give rise to a multi-class scenario for bladder cancer grading.

layers were used to adapt the dimensions of the feature Algorithm 1: CAE training.
maps after each convolutional step. Instead, we worked
with a stride > 1 both in the encoder and decoder struc- Data: Unlabelled training data set
tures to provide a network with a higher capability of X = {X1 , ..., Xb , ..., XB }
transformation by learning spatial subsampling. Results: Trained convolutional autoencoder
As observed in Figure 3, given an input set of patches (CAE) parameters φ and θ.
X = {x1 , x2 , ..., xi , ..., xN }, with N the number of sam- φ, θ ← random;
ples per batch, the encoder network maps each input for e ← 1 to  do
xi ∈ R M×M×3 into an embedded feature space zi = fφ (xi ) for b ← 1 to B do
resulting from the attention module. At the end of the X ← Xb ⊂ X;
autoencoder network, the decoder function was trained for i ← 1 to N do
to provide a reconstruction map ri = gθ (zi ) trying to ri ← gθ ( fφ (xi )) ;
minimize the mean squared error (MSE) between the PN
input xi and the output ri , according to Equation 1. Lr ← N1 i=1 ||xi − ri ||2 ;
Note that the histological patches were resized from Update φ, θ using ∇φ,θ Lr
M0 = 512 to M = 128 to alleviate the GPU constraints
during the training of the model.
N rithm in which the decoder structure was discarded dur-
1X
Lr = ||xi − gθ ( fφ (xi ))||2 (1) ing the second stage corresponding to the clustering
N i=1
training. However, Guo et al. [28] demonstrated that
Learning details for the CAE pre-training. fine-tuning just the encoder network could distort the
Let X = {X1 , ..., Xb , ..., XB } be the training set com- feature space and hurt the classification performance.
posed of 2995 histological patches, the proposed CAE Instead, they kept the autoencoder untouched under
was trained during  = 200 epochs by applying a learn- the statement that AE architectures can avoid embed-
ing rate of 0.5 on B = 94 batches, being Xb ⊂ X a ding distortion by preserving local information of data
single batch composed of N = 32 samples. Adadelta [43]. For that reason, we also propose a simultaneous
optimizer [42] was used to update the reconstruction learning process for both reconstruction and clustering
weights trying to minimize the MSE loss function Lr branches to avoid the corruption of the feature space,
after each epoch e, as detailed in Algorithm 1. similarly to [28].
Once the CAE was pre-trained in a first stage (Algo-
3.2. DCEAC training rithm 1), we incorporated a clustering branch at the out-
In the pioneer deep-clustering work [30], the authors put of the CAE bottleneck giving rise to a Deep Convo-
proposed a Deep Embedded Clustering (DEC) algo- lutional Embedded Attention Clustering (DCEAC) able
6
𝐸𝑛𝑐𝑜𝑑𝑒𝑟 → 𝑓!(·) 𝐷𝑒𝑐𝑜𝑑𝑒𝑟 → 𝑔" (·)

128×128×64 128×128×64
Stride = 1 Stride = 2 3
3 8×
8 × 12
12 64×64×128 64×64×128 8×
8× Stride = 2 12
12 32×32×256
Stride = 2
Stride = 2

Bottleneck BN BN

𝑥! 𝑟!
32×32×256
Identity shortcut

32×32×64 32×32×1
!!

32×32×32 32×32×256

Figure 3: Architecture of the proposed CAE used for image reconstruction as a pretext task during the learning process.

to provide a soft label with class dependency. From the


q2i, j / i qi, j
P
embedded representations zi = {zi,1 , ..., zi,k , ..., zi,C }, be-
pi, j = P 2 P (4)
ing C = 256 the number of feature maps zi,k ∈ RH×W , j qi, j / i qi, j
we performed a spatial squeeze to obtain a feature vec-
tor z0i ∈ RC which leads to a better label assignment. The learning framework for the proposed DCEAC
As depicted in Figure 4, a Global Average Pooling (Algorithm 2) was conducted by minimizing a custom
(GAP) layer (fadded green) was used as the projection loss function (Equation 5), where Lr and Lc are the re-
function to reduce the feature maps zi,k ∈ RH×W , with construction and clustering losses, respectively. γ > 0
H = W = 32, into the feature vector z0i,k ∈ R1×1 (see is a temperature parameter used to prevent the distortion
Equation 2). of the feature space since γ = 0 would be equivalent to
train just the convolutional autoencoder (CAE) architec-
H W ture, as detailed in Section 3.1.
1 XX
z0i,k = zi,k (h, w) (2)
H × W h=1 w=1 L = Lr + γLc (5)

After the GAP operation, a clustering layer (red box Specifically, the clustering loss was defined as
in Figure 4) was included to map each embedded rep- Kullback-Leibler divergence (KL = (P||Q)) according
resentation z0i into a soft label qi, j , which represents the to Equation 6, whereas the mean squared error (mse)
probability of z0i of belonging to the cluster j. Accord- was used as a reconstruction loss function.
ing to Equation 3, qi, j was calculated via Student’s t- XX pi, j
distribution [44] by keeping the cluster centers {µ j }1K as Lc = pi j log (6)
i j
qi, j
trainable parameters.
As mentioned above, autoencoders are in charge of
(1 + ||z0i − µ j ||2 )−1 preserving the local structure of data, so the clustering
qi, j = P (3)
j (1 + ||zi − µ j || )
0 2 −1 term must provide just a slight contribution to updating
the weights in order to avoid the corruption of the latent
Note that the cluster centres were initialized by run- space. For that reason, we empirically set γ = 0.3 for
ning k-means technique on the embedded features z0i , all the experiments of the training process detailed in
as detailed in Algorithm 2. From here, a normal target Algorithm 2.
distribution pi, j (defined in Equation 4) was used as a
ground truth during the training of the models.
7
128×128×64 128×128×64
Stride = 1 Stride = 2
3 3
8× 64×64×128 8×
12
64×64×128 12
8× Stride = 2 𝑧),+ Stride = 2 8×
12 32×32×256 12
Stride = 2
BN BN

32×32×256
𝑧!
𝑥! 𝑟!
Clustering layer
Identity shortcut
𝑧′!
𝑞!
32×32×64 32×32×1
𝑧′),+
$ ∑
𝑞!,# / ! 𝑞!,#
32×32×32 32×32×256 $ /∑ 𝑞
∑# 𝑞!,#
𝑝!
! !,#

Figure 4: Architecture of the proposed Deep Convolutional Embedded Attention Clustering (DCEAC). The model is trained in an end-to-end
manner by minimizing both reconstruction and clustering loss functions. The reconstruction pretext task stabilizes the feature space zi avoiding the
embedding distortion, whereas the clustering term predicts the soft-class assignments qi .

Learning details for DCEAC training. convolutional autoencoders (CAEs) are more powerful
As in the previous CAE pre-training, given an input than fully connected AEs for dealing with images. For
batch Xb of N = 32 samples, we made use of Adadelta that reason, we adapt the previous DEC methodology by
optimizer with a learning rate of 0.5 to minimize the including convolution operations instead of fully con-
custom loss function L. Concerning the software and nected layers. To this end, we follow the methodology
hardware aspects, all models were developed using Ten- exposed in [45], where stacked CAEs were originally
sorflow 2.3.1 on Python 3.6. The experiments were per- proposed for hierarchical feature extraction. Therefore,
formed on a machine with Intel(R) Core(TM) i7-9700 in order to conduct a reliable state-of-the-art compari-
CPU @3.00GHz processor and 16 GB of RAM. For son, we fused both clustering [30] and CAEs architec-
deep-learning algorithms, a single NVIDIA A100 Ten- tures [45] to provide a refined DEC model, from now on
sor Core having cuDNN 7.5 and CUDA Toolkit 10.1 called rDEC.
was used. Otherwise, we also replicated the experiments car-
ried out by Guo et al. [28], who proposed a hybrid
4. Experimental results learning for deep clustering with convolutional autoen-
coders. The main difference with respect to the previous
4.1. State of the art rDEC is that [28] keep untouched the decoder term dur-
In this section, we show a comparison performance ing the training of the models giving rise to a hybrid
between the proposed DCEAC model and the most rel- framework that combines reconstruction Lr and cluster-
evant deep clustering-based works of the literature. In ing Lc losses. The idea behind this is that embedded
particular, we adapt the study carried out in [30], where feature space in rDEC could be distorted by only using
the authors proposed a two-step learning strategy based clustering oriented loss. So, they proposed leveraging
on a Deep Embedded Clustering (DEC) model com- the decoder structure to avoid the corruption of the la-
posed of fully connected layers. In the first step, they tent space by considering the reconstruction error. Note
trained the autoencoder network to extract knowledge that one of the main contributions of Guo et al. [28]
from the unlabelled images domain. In the second lied in the proposed bottleneck, since they forced the
stage, once the specific-image information was coded, dimension of the embedded features to be equal to the
Xie et al. [30] discarded the decoder structure to di- number of clusters throughout fully connected layers.
rectly address the clustering phase from the learnt fea- However, this is not scalable to other classification prob-
ture space, without considering the reconstruction error. lems with higher-dimensionality input images or with a
However, posterior works, such as [28], claimed that small number of clusters. Specifically, they applied the
8
Algorithm 2: DCEAC training. ing one of the main own contributions. Hereinafter, we
refer to this approach as rDCEC.
Data: Unlabelled training data set
X = {X1 , ..., Xb , ..., XB } 4.2. Quantitative results
Results: Cluster assignment ŷi for each In this section, we report the unsupervised classi-
histological sample xi . fication performance achieved by the aforementioned
Step 1: Cluster centers intialization rDEC [30] and rDCEC [28] algorithms in comparison
φ, θ ← pre-trained CAE parameters; with our proposed DCEAC model. Also, a conventional
Z ← fφ (X); method based on running k-means algorithm on the fea-
{µ j }Kj=1 ← kmeans(Z); ture space was considered to know how the gap in per-
formance is between the proposed model and traditional
Step 2: DCEAC training
techniques. This conventional approach will be termed
for e ← 1 to  do
as AE+kmeans. As observed in Tables 1 and 2, the
for b ← 1 to B do
comparison is handled by means of different figures of
X ← Xb ⊂ X;
merit, such as sensitivity (SN), specificity (SP), F-score
for i ← 1 to N do
(FS), accuracy (ACC) and area under the ROC curve
zi ← fφ (xi );
(AUC). Particularly, Table 1 shows results per class to
ri ← gθ (zi );
evidence how well the four algorithms classify the 2995
z0i ← GAP(zi );
(1+||z0i −µ j ||2 )−1
histological patches with non-tumour (NT), mild (M)
qi, j ← P 0 2 −1 ; and infiltrative (I) patterns. Besides, Table 2 reports
j (1+||zi −µ j || )
q2i, j / i qi, j the classification results in terms of micro and macro-
P
pi, j ← ;
j qi, j / i qi, j
P 2 P
average. Both metrics provide information about the
PN overall average performance of the classification mod-
Lr ← N1 i=1 ||xi − ri ||2 ;
p els, but micro-average takes into account the unbalanc-
Lc ← i j pi j log qi,i, jj ;
P P
ing between classes, which enables a more faithful per-
L ← Lr + γLc ; spective of the models’ behaviour than macro-average.
Update φ, θ, µ j using ∇φ,θ,µ j L; To enhance the comparison between the learning ap-
Step 3: Label prediction proaches, we represent in Figure 5 the latent space ar-
for b ← 1 to B do ranged by each model with its respective confusion ma-
X ← Xb ⊂ X; trix. Note that, while the confusion matrix gives infor-
for i ← 1 to N do mation about the classification capability of each model,
z0i ← GAP( fφ (xi )); the representation of the embedded space contributes
(1+||z0i −µ j ||2 )−1 to a more comprehensive clustering scenario for the
qi, j ← P 0 2 −1 ; bladder cancer grading. In this way, the T-distributed
j (1+||zi −µ j || )

ŷi ← argmax j (qi, j ); Stochastic Neighbor Embedding (TSNE) tool was used
to illustrate, in a 2D map, the well and miss-classified
embedded features denoted by spots and crosses, re-
spectively. Green, blue and red colours make reference
algorithms on the MNIST data set composed of sam- to the non-tumour, mild and infiltrative patterns.
ples xi ∈ R28×28×1 and provide an embedded space zi
with 10 features, according to the K = 10 number of 4.3. Qualitative results
clusters. However, in our case, we deal with images In an attempt to incorporate an interpretative perspec-
of 128 × 128 × 3 pixels, where the high resolution is tive for the reported quantitative results, we computed
essential for the classification performance, unlike for the class activation maps (CAMs), which allows high-
MNIST data set. Additionally, we aim to classify the lighting the regions in which the model pays attention
histological samples into K = 3 classes, so replicating to predict the class of each sample. This fact usually
the architecture of [28] is unfeasible since the decoder can help to find hidden patterns associated with a spe-
term would be unable of reconstructing the images just cific class or to determine if the label prediction is based
from three feature values. For that reason, to boost a on the same patterns as clinicians. In this way, the re-
compelling comparison with [28], we maintain the same ported heatmaps lead to a better understanding of the
architectures and training details proposed in this work, embedded feature space by pointing out decisive areas
but removing the convolutional attention module for be- of the histological patches for the cluster assignment.
9
Table 1: Unsupervised classification results per class.
NON-TUMOUR MILD INFILTRATIVE
AE+kmeans rDEC rDCEC DCEAC AE+kmeans rDEC rDCEC DCEAC AE+kmeans rDEC rDCEC DCEAC
SN 0.9345 1 1 0.9987 0.5020 0.8082 0.8952 0.9041 0.5105 0.4659 0.5105 0.6168
SP 0.9870 0.9319 0.9780 0.9978 0.7659 0.8262 0.7862 0.8118 0.6556 0.8782 0.9319 0.9364
FS 0.9475 0.9094 0.9689 0.9961 0.5754 0.8129 0.8458 0.8613 0.4052 0.5112 0.5971 0.6841
ACC 0.9736 0.9492 0.9836 0.9980 0.6364 0.8174 0.8397 0.8571 0.6187 0.7733 0.8247 0.8551

Table 2: Unsupervised classification results in terms of micro and macro-average.


MICRO-AVERAGE MACRO-AVERAGE
AE+kmeans rDEC rDCEC DCEAC AE+kmeans rDEC rDCEC DCEAC
SN 0.6144 0.7699 0.8240 0.8551 0.6490 0.7580 0.8019 0.8399
SP 0.8072 0.8850 0.9120 0.9275 0.8028 0.8788 0.8987 0.9153
FS 0.6144 0.7699 0.8240 0.8551 0.6427 0.7445 0.8039 0.8472
ACC 0.7429 0.8466 0.8827 0.9034 0.7429 0.8466 0.8827 0.9034
AUC 0.7259 0.8184 0.8503 0.8776 0.7259 0.8184 0.8503 0.8776

AE+kmeans rDEC rDCEC DCEAC

NT NT NT NT
True label
True label

True label

True label

M M M M

I I I I

NT M I NT M I NT M I NT M I
Predicted label Predicted label Predicted label Predicted label

Figure 5: Representation of the latent space and confusion matrix derived from the clustering classification reached by each method.

As deduced from Figure 5, the biggest challenge of proposed model to determine the class. In the green
the bladder muscle-invasive cancer (MIBC) grading lies frame of Figure 6, we illustrate well-classified mild (a-
in the distinction of the mild (M) and infiltrative (I) can- e) and infiltrative (f-j) histological patterns. Addition-
cerous patterns, as expected. For that reason, in Figure ally, in the red frame, we show bladder cancer samples
6, we report several examples of heatmaps correspond- with a mild pattern miss-classified as tumour budding
ing to miss-classified samples to elucidate the reason (k-o), and vice versa (p-t). The findings from the class
why the proposed model wrongs. Also, we show exam- activation maps will be discussed in Section 5.
ples of well-predicted CAMs to evidence the relevant
structures in which the network pays attention when
correctly predicting. Specifically, we show five exam-
ples per case to make clear the criteria followed by the

10
Well-predicted
mild pattern

(a) (b) (c) (d) (e)

Well-predicted
infiltrative
pattern

(f) (g) (h) (i) (j)

Mild pattern
wrongly predicted
as infiltrative
pattern

(k) (l) (m) (n) (o)

Infiltrative
pattern wrongly
predicted as
mild pattern

(p) (q) (r) (s) (t)

Figure 6: Class activation maps highlighting the regions that the proposed DCEAC model considers relevant for the class prediction. The green
frame refers to well-predicted images with mild (M) and infiltrative(I) patterns, whereas the red frame corresponds to the miss-classified samples
in which the aggressiveness of the disease has been confused.

5. Discussion the outperforming of our model is even more remark-


able when discerning the infiltrative pattern. DCEAC
5.1. About quantitative results shows the best results for all the metrics, especially for
sensitivity and F-score exceeding more than a 10% to
From Table 1, we can observe that all the contrasted
the rest of the approaches.
models work well for detecting the non-tumour class,
including the conventional kmeans. However, the pro- Table 2 reports the overall performance of the mod-
posed DCEAC model reaches the higher performance els, in terms of micro and macro-average. As mentioned
for all the metrics, except for sensitivity since the model above, the micro-average results take into account the
miss-classifies one non-tumour sample, as reported in unbalancing between classes, which is an important as-
the DCEAC’s confusion matrix of Figure 5. Regard- pect in this study since the samples with mild pattern ap-
ing the mild class, the proposed model also provides the pears oversampled. Nevertheless, the proposed DCEAC
best behaviour. Only rDEC surpass a 1% specificity, but model consistently outperforms the rest of the cluster-
at the cost of compromising a 10% the sensitivity con- ing methods by 2 − 3% across the different both micro
cerning the proposed DCEAC. As observed in Table 1, and macro-average metrics, as appreciated in Table 2.
11
As the final remark concerning the quantitative results, From the previous in-depth analysis of the quantita-
it should be highlighted that the expert’s decision coin- tive and interpretative results, we can clear up several
cides with the proposed artificial intelligence system in findings. The first of them is that the use of deep-
the 90, 34% of the cases, according to the average accu- learning techniques improves the classification perfor-
racy. mance regarding conventional clustering approaches.
A reinforcement of the quantitative results is reported As expected, all the deep clustering-based methods,
in Figure 5. From the confusion matrices, it is clearly i.e. rDEC, DCEC and DCEAC, outperform the base-
appreciated by the colour range that all the models tend line based on the kmeans algorithm. This is because
to confuse mild and infiltrative cancerous patterns. The these models enable a deeper learning stage in which
kmeans algorithm presents a very low capability of dis- the embedded features are adjusted to a target distribu-
cerning between carcinogenic samples since most of the tion, unlike the kmeans algorithm which modifies the
images are predicted as a mild pattern because of the clusters iteratively without updating the feature learn-
oversampling of that class. This changes when deep- ing. Additionally, we can observe that models with both
clustering algorithms are outlined. In particular, the reconstruction and clustering branches integrated into a
rDEC model improves the classification of carcinogenic unified framework provide better results than the rDEC
images but compromises a lot the non-tumour class by model, which addresses the learning process in two in-
miss-classifying samples with an infiltrative pattern. In dependent stages. The reason behind the outperform-
this line, the rDCEC model improves the results by no- ing of rDCEC and DCEAC with respect to rDEC lies in
tably decreasing the tumour budding samples wrongly the preservation of the local structure of the embedded
predicted as non-tumour cases. In addition, rDCEC also data. Since rDCEC and DCEAC models have a con-
increases the number of true positives for tumour sam- nected output between the clustering and reconstruction
ples. However, this model presents a significant lack stages, the clustering term can transfer class information
when predicting the infiltrative patterns, since an im- to the reconstruction term, which is in charge of updat-
portant number of them are wrongly labelled as mild. ing the weights of the encoder network. In this way, the
Unequivocally, the proposed DCEAC model provides embedded features can be optimized by incorporating
the best classification results. The samples with tumour the class prediction without distorting the latent space
budding miss-classified as non-tumour cases decreases thanks to the decoder structure. Finally, the proposed
almost to a minimum, unlike in previous cases. More- DCEAC model showcases substantial performance im-
over, the number of true positives for mild and infiltra- provements regarding the rest of the approaches. This is
tive patterns reaches the highest values, while reducing due to the inclusion of the convolutional attention block,
the false positives and negatives. which allows refining the latent space to provide more
The representation of the embedded feature space of- suitable features for the clustering phase.
fers a visual perspective for the quantitative results. We
can observe that kmeans algorithm is able to roughly 5.2. About qualitative results
discern between non-tumour and carcinogenic histolog- As appreciated in the class activation maps (CAMs)
ical samples. However, the point cloud is very diffuse reported in Figure 6, the proposed DCEAC model fo-
to separate mild and infiltrative classes. Contrarily, the cuses on tumour cell nests (Figure 6 (a-c)) and tumour
rDEC model shows a better distribution of the embed- interconnected bands (Figure 6 (d-e)) when predicts
ded data, although the features relative to each class still samples with a mild pattern. This fact implies that the
remain close together in the latent space. This improves proposed network has learnt by itself to associate nodu-
in the case of the rDCEC model, where independent lar and trabecular structures with a mild pattern of the
clusters begin to be appreciated. The non-tumour fea- disease. Additionally, DCEAC model recognises small
tures (denoted by the green colour) are unmarked in the sets of isolated buds (Figure 6 (f-g)) or tumour cell
representation space and the embedded tumour samples strands (Figure 6 (h-j)) as characteristic structures of the
begin to scatter towards different cluster classes. In- infiltrative pattern, a.k.a. tumour budding. These find-
disputably, the proposed DCEAC model provides the ings are evidenced in the green frame of the heat maps
best embedding representation since the features are corresponding to well-predicted samples.
distributed along the latent space forming independent In the case of the wrong predictions (red frame in Fig-
clusters according to a specific class. This fact further ure 6), we can observe that the proposed network main-
strengthens our confidence in the ability of the proposed tains consistency when determining the class of each
model to discern between non-tumour, mild and infiltra- sample. The histological patches of Figure 6 (k-o) show
tive histological patterns. an appearance more similar to the infiltrative patterns,
12
so the network highlights small strands reminiscent of Clustering (DCEAC) has demonstrated outperforming
tumour budding structures. However, the true label as- previous clustering-based methods, achieving an aver-
signed by the expert for these samples was a mild pat- age accuracy of 0.9034 to grade the aggressiveness of
tern. At this point, the qualitative results become in- the muscle-invasive bladder cancer (MIBC). Addition-
teresting because the model’ suggestions can lead the ally, the reported class activation maps (CAMs) show
pathologists to reconsider its diagnosis in some doubt- that the proposed system is able to learn by itself the
ful cases, as a second opinion. Besides, the human eye same structures as clinicians to associate the patterns
is susceptible to fatigue, so the proposed system could with the correct severity degree of the disease, without
help in cases where some patterns have gone unnoticed, incurring prior annotation steps. In this line, our fully
in order to avoid a biased diagnosis. unsupervised perspective bridges the gap with respect
Otherwise, in the cases of the Figure 6 (p-t), sam- to other supervised algorithms, since the proposed sys-
ples with an infiltrative pattern are wrongly predicted by tem does not require expert involvement to be trained.
the model as mild cases. In these histological patches, In future research lines, we will work on improving
the proposed network focuses on bigger structures re- the accuracy of tumour samples when structures of dif-
lated to nodular or trabecular patterns, but it ignores the ferent growth patterns appear on the same image. In
small isolated tumour cells that lead to higher severity addition, we will use more powerful hardware systems
of bladder cancer. From here, we can deduce that, al- to process entire high-resolution Whole Slide Images to
though the final prediction could be wrong, the pattern provide a diagnosis per biopsy, instead of per patch.
recognition accomplished by the model maintains con-
sistency. Notice that the model wrongs because patterns
belonging to a different class coexist in the same histo- Funding
logical patch, so in future research lines, we will face This work has been partially funded by SICAP
this long-standing problem. project (DPI2016-77869-C2-1-R) and GVA through
In summary, the proposed DCEAC model demon- project PROMETEO/2019/109. The work of Gabriel
strates, via class activation maps, a high-confident pre- Garcı́a has been supported by the State Research Span-
diction since it is able to focus on the same patterns ish Agency PTA2017-14610-I. The equipment used for
as clinicians, without having previous information from this research has been funded by the European Union
them. As mentioned above, the opinion from the ex- within the operating Program ERDF of the Valen-
pert and the proposed model will match in most of the cian Community 2014-2020 with the grant number ID-
cases (in concrete, 90, 34% of the time). Thus, the artifi- IFEDER/2020/030.
cial intelligence system could help as a computer-aided
system for reviewing processes, which would give rise
to an improvement of the diagnosis quality without in- References
volving other experts. Also, the proposed system could
[1] S. Antoni, J. Ferlay, I. Soerjomataram, A. Znaor, A. Jemal,
help inexperienced pathologists by suggesting areas of F. Bray, Bladder cancer incidence and mortality: a global
interest with a convincing label. overview and recent trends, European urology 71 (1) (2017) 96–
108.
[2] L. Lorenzo, Valor pronóstico de la presencia de un componente
6. Conclusion tumoral indiferenciado (” tumor budding”) en pacientes con car-
cinoma vesical músculo-invasivo, Ph.D. thesis, Universitat de
In this paper, we have proposed a novel self-learning València (2018).
framework based on deep-clustering techniques to [3] G. Feil, A. Stenzl, Pruebas de marcadores tumorales en el cáncer
grade the severity of bladder cancer through histological de vejiga, Actas Urológicas Españolas 30 (1) (2006) 38–45.
[4] S. Sharma, P. Ksheersagar, P. Sharma, Diagnosis and treatment
samples. Immunohistochemistry staining methods were of bladder cancer, American family physician 80 (7) (2009)
applied on the images to enhance the non-tumour, mild 717–723.
and infiltrative patterns, according to the literature. We [5] A. Stenzl, N. Cowan, M. De Santis, M. Kuczyk, A. Merseburger,
resorted to a hybrid model by combining a clustering M. Ribal, A. Sherif, J. Witjes, Guı́a clı́nica sobre el cáncer de ve-
jiga con invasión muscular y metastásico, European Association
branch along the reconstruction term used as a pretext of Urology (2010).
task to preserve the local structure of the features. To [6] C. Busch, F. Algaba, The who/isup 1998 and who 1999 systems
stand out from the state of the art, we introduced a con- for malignancy grading of bladder cancer. scientific foundation
and translation to one another and previous systems, Virchows
volutional attention block that allows refining the fea-
Archiv 441 (2) (2002) 105–108.
ture space to lead to a better-unsupervised classification. [7] R. E. Jimenez, E. Gheiler, P. Oskanian, R. Tiguert, W. Sakr,
The proposed Deep Convolutional Embedded Attention D. P. Wood Jr, J. E. Pontes, D. J. Grignon, Grading the invasive

13
component of urothelial carcinoma of the bladder and its rela- and its clinical application, The International Journal of Medical
tionship with progression-free survival, The American journal Robotics and Computer Assisted Surgery (2020) e2194.
of surgical pathology 24 (7) (2000) 980–987. [23] F. Prall, H. Nizze, M. Barten, Tumour budding as prognostic
[8] A. Almangush, M. Karhunen, S. Hautaniemi, T. Salo, I. Leivo, factor in stage i/ii colorectal carcinoma, Histopathology 47 (1)
Prognostic value of tumour budding in oesophageal cancer: a (2005) 17–24.
meta-analysis, Histopathology 68 (2) (2016) 173–182. [24] A. Lugli, E. Karamitopoulou, I. Panayiotides, P. Karakit-
[9] E. Karamitopoulou, I. Zlobec, D. Born, A. Kondi-Pafiti, P. Lyk- sos, G. Rallis, G. Peros, G. Iezzi, G. Spagnoli, M. Bihl,
oudis, A. Mellou, K. Gennatas, B. Gloor, A. Lugli, Tumour bud- L. Terracciano, et al., Cd8+ lymphocytes/tumour-budding in-
ding is a strong and independent prognostic factor in pancreatic dex: an independent prognostic factor representing a ‘pro-/anti-
cancer, European journal of cancer 49 (5) (2013) 1032–1039. tumour’approach to tumour host interaction in colorectal cancer,
[10] R. Masuda, H. Kijima, N. Imamura, N. Aruga, Y. Nakamura, British journal of cancer 101 (8) (2009) 1382–1392.
D. Masuda, H. Takeichi, N. Kato, T. Nakagawa, M. Tanaka, [25] T. Ogawa, T. Yoshida, T. Tsuruta, W. Tokuyama, S. Adachi,
et al., Tumor budding is a significant indicator of a poor prog- M. Kikuchi, T. Mikami, K. Saigenji, I. Okayasu, Tumor budding
nosis in lung squamous cell carcinoma patients, Molecular is predictive of lymphatic involvement and lymph node metas-
medicine reports 6 (5) (2012) 937–943. tases in submucosal invasive colorectal adenocarcinomas and in
[11] K. Fukumoto, E. Kikuchi, S. Mikami, K. Ogihara, K. Mat- non-polypoid compared with polypoid growths, Scandinavian
sumoto, A. Miyajima, M. Oya, Tumor budding, a novel prog- journal of gastroenterology 44 (5) (2009) 605–614.
nostic indicator for predicting stage progression in t1 bladder [26] I. Zlobec, M. P. Bihl, A. Foerster, A. Rufle, A. Lugli, The impact
cancers, Cancer science 107 (9) (2016) 1338–1344. of cpg island methylator phenotype and microsatellite instability
[12] R. Wetteland, K. Engan, T. Eftestøl, V. Kvikstad, E. A. Janssen, on tumour budding in colorectal cancer, Histopathology 61 (5)
Multiclass tissue classification of whole-slide histological im- (2012) 777–787.
ages using convolutional neural networks., ICPRAM 1 (2019) [27] N. Brieu, C. G. Gavriel, I. P. Nearchou, D. J. Harrison,
320–327. G. Schmidt, P. D. Caie, Automated tumour budding quantifi-
[13] Z. Zhang, P. Chen, M. McGough, F. Xing, C. Wang, M. Bui, cation by machine learning augments tnm staging in muscle-
Y. Xie, M. Sapkota, L. Cui, J. Dhillon, et al., Pathologist-level invasive bladder cancer prognosis, Scientific reports 9 (1) (2019)
interpretable whole-slide cancer diagnosis with deep learning, 1–11.
Nature Machine Intelligence 1 (5) (2019) 236–245. [28] X. Guo, X. Liu, E. Zhu, J. Yin, Deep clustering with convolu-
[14] M. Lucas, I. Jansen, T. G. van Leeuwen, J. R. Oddens, D. M. tional autoencoders, in: International conference on neural in-
de Bruin, H. A. Marquering, Deep learning–based recurrence formation processing, Springer, 2017, pp. 373–382.
prediction in patients with non–muscle-invasive bladder cancer, [29] X. Guo, E. Zhu, X. Liu, J. Yin, Deep embedded clustering with
European Urology Focus (2020). data augmentation, in: Asian conference on machine learning,
[15] P.-N. Yin, K. Kishan, S. Wei, Q. Yu, R. Li, A. R. Haake, PMLR, 2018, pp. 550–565.
H. Miyamoto, F. Cui, Histopathological distinction of non- [30] J. Xie, R. Girshick, A. Farhadi, Unsupervised deep embedding
invasive and invasive bladder cancers using machine learning for clustering analysis, in: International conference on machine
approaches, BMC medical informatics and decision making learning, PMLR, 2016, pp. 478–487.
20 (1) (2020) 1–11. [31] J. Enguehard, P. O’Halloran, A. Gholipour, Semi-supervised
[16] J. Dolz, X. Xu, J. Rony, J. Yuan, Y. Liu, E. Granger, learning with deep embedded clustering for image classification
C. Desrosiers, X. Zhang, I. Ben Ayed, H. Lu, Multiregion seg- and segmentation, IEEE Access 7 (2019) 11093–11104.
mentation of bladder cancer structures in mri with progressive [32] J. R. Hershey, Z. Chen, J. Le Roux, S. Watanabe, Deep clus-
dilated convolutional networks, Medical physics 45 (12) (2018) tering: Discriminative embeddings for segmentation and sepa-
5482–5493. ration, in: 2016 IEEE International Conference on Acoustics,
[17] A.-C. Woerl, M. Eckstein, J. Geiger, D. C. Wagner, T. Daher, Speech and Signal Processing (ICASSP), IEEE, 2016, pp. 31–
P. Stenzel, A. Fernandez, A. Hartmann, M. Wand, W. Roth, 35.
et al., Deep learning predicts molecular subtype of muscle- [33] B. H. Prasetio, H. Tamura, K. Tanno, A deep time-delay em-
invasive bladder cancer from conventional histopathological bedded algorithm for unsupervised stress speech clustering, in:
slides, European urology 78 (2) (2020) 256–264. 2019 IEEE International Conference on Systems, Man and Cy-
[18] H. Xu, S. Park, S. H. Lee, T. H. Hwang, Using transfer learning bernetics (SMC), IEEE, 2019, pp. 1193–1198.
on whole slide images to predict tumor mutational burden in [34] R. del Amor, A. Colomer, C. Monteagudo, V. Naranjo, A deep
bladder cancer patients, bioRxiv (2019) 554527. embedded refined clustering approach for breast cancer distinc-
[19] S. A. Harmon, T. H. Sanford, G. T. Brown, C. Yang, S. Mehrali- tion based on dna methylation, arXiv preprint arXiv:2102.09563
vand, J. M. Jacob, V. A. Valera, J. H. Shih, P. K. Agarwal, P. L. (2021).
Choyke, et al., Multiresolution application of artificial intelli- [35] B. Zhou, A. Khosla, A. Lapedriza, et al., Learning deep features
gence in digital pathology for prediction of positive lymph nodes for discriminative localization, in: Proceedings of the IEEE con-
from primary tumors in bladder cancer, JCO clinical cancer in- ference on computer vision and pattern recognition, 2016, pp.
formatics 4 (2020) 367–382. 2921–2929.
[20] Y. Yang, X. Zou, Y. Wang, X. Ma, Application of deep learn- [36] R. del Amor, L. Launet, A. Colomer, A. Moscardó,
ing as a noninvasive tool to differentiate muscle-invasive blad- A. Mosquera-Zamudio, C. Monteagudo, V. Naranjo, An
der cancer and non–muscle-invasive bladder cancer with ct, Eu- attention-based weakly supervised framework for spit-
ropean Journal of Radiology (2021) 109666. zoid melanocytic lesion diagnosis in wsi, arXiv preprint
[21] A. Ikeda, H. Nosato, Y. Kochi, T. Kojima, K. Kawai, arXiv:2104.09878 (2021).
H. Sakanashi, M. Murakawa, H. Nishiyama, Support system of [37] J. Silva-Rodriguez, A. Colomer, J. Dolz, V. Naranjo, Self-
cystoscopic diagnosis for bladder cancer based on artificial in- learning for weakly supervised gleason grading of local patterns,
telligence, Journal of endourology 34 (3) (2020) 352–358. IEEE journal of biomedical and health informatics (2021).
[22] R. Yang, Y. Du, X. Weng, Z. Chen, S. Wang, X. Liu, Automatic [38] S. Gidaris, P. Singh, N. Komodakis, Unsupervised represen-
recognition of bladder tumours using deep learning technology tation learning by predicting image rotations, arXiv preprint

14
arXiv:1803.07728 (2018).
[39] M. Patacchiola, A. Storkey, Self-supervised relational reasoning
for representation learning, arXiv preprint arXiv:2006.05849
(2020).
[40] T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A simple frame-
work for contrastive learning of visual representations, in: In-
ternational conference on machine learning, PMLR, 2020, pp.
1597–1607.
[41] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep
network training by reducing internal covariate shift, in: In-
ternational conference on machine learning, PMLR, 2015, pp.
448–456.
[42] M. D. Zeiler, Adadelta: an adaptive learning rate method, arXiv
preprint arXiv:1212.5701 (2012).
[43] X. Peng, S. Xiao, J. Feng, W.-Y. Yau, Z. Yi, Deep subspace
clustering with sparsity prior., in: IJCAI, 2016, pp. 1925–1931.
[44] L. Van der Maaten, G. Hinton, Visualizing data using t-sne.,
Journal of machine learning research 9 (11) (2008).
[45] J. Masci, U. Meier, D. Cireşan, J. Schmidhuber, Stacked con-
volutional auto-encoders for hierarchical feature extraction, in:
International conference on artificial neural networks, Springer,
2011, pp. 52–59.

15

You might also like