Attentive Variational Information Bottleneck For T

i i
“output” — 2022/12/19 — 15:57 — page 1 — #1

i i
Subject Section
Attentive Variational Information Bottleneck for

TCR-peptide Interaction Prediction
Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac820/6960920 by guest on 27 December 2022

Filippo Grazioli 1,∗ , Pierre Machart 1 , Anja Mösch 1 , Kai Li 2 , Leonardo
Castorina 3 , Nico Pfeifer 4 and Martin Renqiang Min 2,∗
1
NEC Laboratories Europe, Heidelberg, Germany,
2
NEC Laboratories America, Princeton, NJ 08540, USA,
3
School of Informatics, University of Edinburgh, Edinburgh, United Kingdom and
4
University of Tübingen, Tübingen, Germany.
∗ To whom correspondence should be addressed.
Associate Editor: XXXXXXX
Received on XXXXX; revised on XXXXX; accepted on XXXXX
Abstract
Motivation: We present a multi-sequence generalization of Variational Information Bottleneck (VIB) (Alemi
et al., 2016) and call the resulting model Attentive Variational Information Bottleneck (AVIB). Our AVIB
model leverages multi-head self-attention (Vaswani et al., 2017) to implicitly approximate a posterior
distribution over latent encodings conditioned on multiple input sequences. We apply AVIB to a fundamental
immuno-oncology problem: predicting the interactions between T cell receptors (TCRs) and peptides.
Results: Experimental results on various datasets show that AVIB significantly outperforms state-of-the-art
methods for TCR-peptide interaction prediction. Additionally, we show that the latent posterior distribution
learned by AVIB is particularly effective for the unsupervised detection of out-of-distribution (OOD) amino
acid sequences.
Availability: The code and the data used for this study are publicly available at: https://github.com/nec-
research/vibtcr.
Contact: filippo.grazioli@neclab.eu, renqiang@nec-labs.com
Supplementary information: Supplementary data are available at Bioinformatics online.
1 Introduction if TCR recognition takes place can cytokines be released, which leads to
Predicting whether T cells recognize peptides presented on cells is a the death of a target cell.
fundamental step towards the development of personalized treatments to TCRs consist of an α- and a β-chain whose structures determine
the interaction with the pMHC complex. Each chain consists of three
enhance the immune response, like therapeutic cancer vaccines (Slansky
et al., 2000; Meng and Butterfield, 2002; McMahan et al., 2006; Corse loops, referred to as complementarity determining regions (CDR1-3). It is
et al., 2011; Buhrman and Slansky, 2013; Hundal et al., 2020). In the believed that the CDR3 loops primarily interact with the peptide of a given
human immune system, T cells monitor the health status of cells by pMHC complex (Feng et al., 2007; Rossjohn et al., 2015; La Gruta et al.,
identifying foreign peptides on their surface (Davis and Bjorkman, 1988; 2018). Supplementary Materials 1 depict the 3D structure of a TCR-pMHC
Krogsgaard and Davis, 2005). The T cell receptors (TCRs) are able to complex.
bind to these peptides, especially if they originate from an infected or Recent discoveries (Dash et al., 2017; Lanzarotti et al., 2019) have
cancerous cell. The binding of TCRs - also known as TCR recognition demonstrated that both the CDR3α- and β-chains carry information on
- with peptides, presented by major histocompatibility complex (MHC) the specificity of the TCR toward its cognate pMHC target. Obtaining
molecules in peptide-MHC (pMHC) complexes, constitutes a necessary information about paired TCR α- and β-chains requires specific and
expensive experiments, like single cell (SC) sequencing, which limits
step for immune response (Rowen et al., 1996; Glanville et al., 2017). Only
its availability. Conversely, the bulk sequencing of a population of cells
reactive to a peptide is cheaper, but it only allows to gain information about
either the α- or the β-chain.
1
© The Author(s) 2022. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium,
provided the original work is properly cited.
i i
i i
i i
“output” — 2022/12/19 — 15:57 — page 2 — #2

i i
2 Grazioli et al.
In this work, we propose Attentive Variational Information Bottleneck Variational Autoencoder (MMVAE) (Shi et al., 2019) factorizes the joint
(AVIB) to predict TCR-peptide interactions. AVIB is a multi-sequence variational posterior as a combination of unimodal posteriors, using a
generalization of Variational Information Bottleneck (VIB) (Alemi et al., Mixture of Experts (MoE). MoE-based models have been used in the
2016). Notably, we introduce Attention of Experts (AoE), a novel method biomedical field to tackle challenges such as protein-protein interactions
for combining single-sequence latent distributions into a joint multi- (Qi et al., 2007), biomolecular sequence annotation (Caragea et al., 2009),
sequence latent encoding distribution using self-attention. Owing to its and clustering cell phenotypes from single-cell data (Kopf et al., 2021).
design, AoE can naturally leverage the abundant available data where Their main advantage is that they can infer global patterns in the genetic
either the CDR3α or the CDR3β sequence is missing when estimating the or peptide sequences in supervised and unsupervised settings (Kopf et al.,
multi-sequence variational posterior. The model learns to predict whether 2021).
the binding between the peptide and the TCR takes place or not. Supervised learning. The Variational Information Bottleneck (VIB)
Extensive experiments demonstrate that AVIB significantly outperforms (Alemi et al., 2016) is to supervised learning what the β-VAE (Higgins
state-of-the-art methods. In addition, the probabilistic nature of the VIB et al., 2017) is to unsupervised learning. VIB leverages variational

framework allows to provide estimates on the uncertainty of AVIB’s inference to construct a lower bound on the Information Bottleneck
predictions. We empirically show that AVIB can be used for out-of- (IB) objective (Tishby et al., 2000). By applying the reparameterization
distribution (OOD) detection of amino acid sequences without supervision. trick (Kingma and Welling, 2013), Monte Carlo sampling is used to
get an unbiased estimate of the gradient of the VIB objective. This
1.1 Background and Related Works allows using stochastic gradient descent to optimize the objective. Various
multimodal generalizations of the VIB have been recently proposed: the
1.1.1 TCR-pMHC and TCR-peptide Interaction Prediction
Multimodal Variational Information Bottleneck (MVIB) (Grazioli et al.,
Several recent works have investigated TCR-pMHC and TCR-peptide
2022a) and DeepIMV (Lee and Schaar, 2021). Both MVIB and DeepIMV
interaction prediction. Various proposed approaches operate simple
adopt the PoE to estimate a joint multimodal latent encoding distribution
CDR3β sequence alignment (Wong et al., 2019; Chronister et al.,
from the unimodal latent encoding distributions. In contrast, our AVIB
2021). TCRdist computes CDR similarity-weighted distances (Dash et al.,
model predicts interactions among multiple input sequences. This involves
2017). SETE adopts k-mer feature spaces in combination with principal
modeling complex relations among different sequences (analogous to but
component analysis (PCA) and decision trees (Tong et al., 2020). Various
not the same as modalities) with powerful and flexible multi-head self
methods adopt Random Forest to operate classification (De Neuter et al.,
attention, for which PoE is a sub-optimal choice.
2018; Gielis et al., 2019; Springer et al., 2020). ImRex tackles the problem
with a method based on convolutional neural networks (CNNs) (Moris
et al., 2019). TCRGP is a classification method which leverages a Gaussian
process (Jokinen et al., 2019). ERGO is a deep learning approach which 2 Materials and Methods
adopts long short-term memory (LSTM) networks and autoencoders to Let Y be a random variable representing a ground truth label associated
compute representations of peptide and CDR3β (Springer et al., 2020). with an input random variable X. Let Z be a stochastic encoding of
ERGO II (Springer et al., 2021) is an updated version of ERGO which X defined by an encoder pθ (Z|x) parameterized by a neural network
considers additional input data, i.e. CDR3α sequence, V and J genes, 1 . Following Tishby et al. (2000), our goal consists in learning an
MHC and T cell type. NetTCR-1.0 (Jurtz et al., 2018) and NetTCR- encoding Z which is (a) maximally informative about Y and (b) maximally
2.0 (Montemurro et al., 2021) propose a simple 1D CNN-based model, compressive about X. Using an information theoretic approach, we obtain
integrating peptide and CDR3 sequence information for the prediction of the IB objective with p(X, Y, Z) = p(Z|X)p(Y |X)p(X):
TCR-peptide specificity. TITAN (Weber et al., 2021) is a bimodal neural
network that explicitly encodes β-chain and peptide; it leverages transfer max RIB (θ) = I(Z, Y ; θ) − βI(Z, X; θ) , (1)
θ
learning and SMILES (Weininger et al., 1989) encoding to achieve good
generalization.
where β ≥ 0 is a Lagrange multiplier controlling the trade-off between
(a) and (b) and I(Z, Y ; θ) is the mutual information between Z and Y
1.1.2 Deep Multimodal Variational Inference
parameterized by θ:
The problem investigated in this work consists in predicting whether
multiple sequences of amino acids, i.e. a peptide and the CDR3s, bind. Z
p(z, y|θ)
A single sequence is not informative of whether binding takes place or I(Z, Y ; θ) = p(z, y|θ) log dy dz . (2)
p(z|θ)p(y|θ)
not, when observed alone. As a consequence, binding prediction cannot
be framed as a classical multimodal learning problem. Nevertheless, this As derived in Alemi et al. (2016), assuming qϕ (y|z) and rω (z)
work presents a strong relationship with multimodal variational inference are variational approximations of the true p(y|z) and p(z), respectively,
and takes inspiration from it. In this section, related works from both the Equation 1 can be rewritten as:
supervised and self-supervised learning domains are presented.
Self-supervised learning. Deep neural networks proved to be N
1 X
successful at modeling probability distributions in the context of JV IB = E [− log qϕ (y n |f (xn , ϵ; θ))]+
Variational Bayes (VB) methods. The Variational Autoencoder (VAE) N n=1 ϵ∼p(ϵ)
(Kingma and Welling, 2013) jointly trains a generative model from latent βDKL (pθ (Z|xn )||rω (Z)) , (3)
variables to observations with an inference network from observations to
latent variables. Multimodal generalizations of the VAE shall tackle the where ϵ ∼ N (0, I) is an auxiliary Gaussian noise variable,
problem of learning a joint posterior distribution of the latent variable DKL is the Kullback-Leibler divergence and f (·; θ) is a vector-valued
conditioned on multiple input modalities. The Multimodal Variational
Autoencoder (MVAE) (Wu and Goodman, 2018) models the joint posterior 1 Notation: X, Y , Z are random variables; x, y, z are their realizations;
as a Product of Experts (PoE) over the marginal posteriors, enabling f (·; θ) and pθ (·) are functions and probability distributions parameterized
cross-modal generation at test time. The Mixture-of-experts Multimodal by a vector θ; S represents a set.
i i
i i
i i
“output” — 2022/12/19 — 15:57 — page 3 — #3

i i
Attentive Variational Information Bottleneck for TCR-peptide Interaction Prediction 3
parametric deterministic encoding function (here a neural network).

The reparameterization trick (Kingma and Welling, 2013) introduces ϵ,
and allows writing pθ (z|x)dz = p(ϵ)dϵ, where z = f (x, ϵ; θ)
is now treated as a deterministic variable. This formulation allows
the noise variable to be independent of the model parameters and to
compute gradients of the objective in Equation 3 and optimize via
backpropagation. In this work, we let the latent encoding distribution
on Z be a multivariate Gaussian distribution with a diagonal covariance
structure z ∼ pθ (Z|x) = N (µ, diag(σ 2 )); a valid reparameterization
is z = µ+σ ⊙ϵ. With the variational distribution rω (Z) set to a standard
multivariate Gaussian distribution N (0, I) as done in practice, we can
view VIB as a variational encoder-decoder analogous to VAE (Kingma

and Welling, 2013), in which the latent encoding distribution pθ can be
viewed as a latent posterior, and the variational decoding distribution qϕ
can be viewed as a decoder.
In the same spirit of extending VAE (Kingma and Welling, 2013) to
MVAE (Wu and Goodman, 2018), the VIB objective of Equation 3 can be
generalized by representing X as a collection of multiple input sequences
Fig. 1. Attention of experts (AoE) for Gaussian posteriors. Ei is the stochastic Gaussian
X = {Xi |ith sequenceispresent}. In light of this, in the language of a
encoder of the ith sequence.
variational encoder-decoder, the posterior pθ (Z|x) of Equation 3 consists
Due to space limitation, we provide detailed description of the
actually in the joint posterior pΘ (Z|x1 , ..., xM ) := pΘ (Z|x1:M ),
implementation, the training setup and the choice of the hyperparameter
conditioned on the joint M available sequences. However, for predicting
β in Supplementary Materials 3. Supplementary Materials 4 describes the
the interaction label Y from X, the M different sequences cannot be
full AVIB architecture.
simply treated as M different modalities.
2.2 Relation to Multimodal Variationl inference

2.1 Attention of Experts
MVAE (Wu and Goodman, 2018) and MVIB (Grazioli et al., 2022a)
Similar to previous works (Wu and Goodman, 2018; Grazioli et al., 2022a),
approximate the joint posterior pΘ (Z|x1:M ) assuming that the M
the single-sequence posteriors are modeled as Gaussian distributions with
modalities are conditionally independent, given the common latent
diagonal structure: qeθi (Z|xi ) = N (µi , diag(σi2 )). By stacking the
variable Z. This allows expressing the joint posterior as a product of
parameters (represented as column-vectors) µ0 and σ0 of the latent prior
unimodal approximate posteriors qeθi (Z|xi ) and a prior p(Z), referred to
with the µi and σi for all available i = 1, ..., M sequences, we define
as Product of Experts (PoE): pΘ (Z|x1:M ) = p(Z) M
Q
i=1 qeθi (Z|xi ),
the following two matrices M ∈ R(M +1)×dZ and Σ ∈ R(M +1)×dZ , M
where Θ = {θi }i=1 . MMVAE (Shi et al., 2019) factorizes
where dZ is the dimensionality of the latent single-sequence posteriors:
the joint multimodal posterior as a mixture of Gaussian unimodal
 T
µ0
 T
σ0 posteriors, referred to as Mixture of Experts (MoE): pΘ (Z|x1:M ) =
1 PM
 .   .  M i=1 qeθi (Z|xi ).
M =  ..  , Σ =  .. 
  
 . (4)
PoE assumes conditional independence between modalities (Hinton,
µT
M σ T
M 2002). Furthermore, conditional dependence is impossible to capture by
MoE, due to its additive form. This becomes a major shortcoming when
We propose to implicitly learn the dependencies between the M
modeling TCR-peptide interaction, in which the single sequences are not
single-sequence posteriors and the multi-sequence joint posterior by means
predictive of the binding if observed individually. Although AoE does not
of multi-head self-attention, leveraging its power of capturing multiple
explicitly parameterize conditional dependence between the sequences, it
complex interactions in X, and allowing possible missing sequences:
does not assume that each sequence should be individually predictive of
( the class label, making it a more suitable candidate to model molecular
µAoE = P ool(M ultiHeadµ (M , M , M ))
. (5) interactions.
σAoE = P ool(M ultiHeadσ (Σ, Σ, Σ))
AoE can improve on PoE and MoE on multiple levels. First, employing
attention for estimating the joint multi-sequence posterior allows learning
P ool : R(M +1)×dZ → RdZ is a 1D max pooling function.
relative importance among the various single-sequence posteriors. This
M ultiHead is the standard multi-head attention block defined in
allows dynamically enhancing the weight given to certain input sequences,
(Vaswani et al., 2017), whose equations are provided in Supplementary
while diminishing the focus on others, without being restrained to
Materials 2. We refer to Equation 5 as Attention of Experts (AoE). Figure
“AND” and “OR” relations, like PoE and MoE, respectively (Shi et al.,
1 provides a schematic depiction of AoE.
2019). Second, as AoE is a parametric trainable module, it can learn
We refer to a multi-sequence VIB which adopts AoE for modeling
to accommodate miscalibrated single-sequence posteriors, which are
the multi-sequence joint posterior as Attentive Variational Information
especially difficult to handle by PoE (Kutuzova et al., 2021).
Bottleneck (AVIB). The AVIB objective is:
The adoption of PoE and MoE for the approximation of a multimodal
N
posterior using unimodal encoders allows for inference also in case certain
1 X modalities are missing (Wu and Goodman, 2018; Shi et al., 2019; Kutuzova
JAV IB = E [− log qϕ (y n |f (xn
1:M , ϵ; Θ))]+
N n=1 ϵ∼p(ϵ) et al., 2021; Grazioli et al., 2022a). A single encoder applied on the
concatenation of all modalities would not allow that. Just like PoE and
βDKL (pΘ (Z|xn
1:M )||rω (Z))) , (6)
MoE, AoE allows inference with missing inputs. There is in fact no
where the multi-sequence posterior is modeled as pΘ (Z|x1:M ) = restriction on the number of rows of M and Σ (see Equations 4 and 5),
2
N (µAoE , diag(σAoE )).
i i
i i
i i
“output” — 2022/12/19 — 15:57 — page 4 — #4

i i
4 Grazioli et al.
which is the equivalent of the number of word tokens in a natural language 3 Results and Discussion
processing (NLP) setting (see Section 3.5). First, we provide a description of the datasets used in this work. We then
In this work, we only benchmark AoE against PoE and do not apply AVIB to the TCR-peptide interaction prediction problem. Last, we
compare against MoE. We believe MoE’s "OR"-nature is not suitable for demonstrate AVIB’s effectiveness in the context of OOD detection. All
modeling the chemical specificity of multiple molecules. If taken alone, the experiments are implemented using PyTorch (Paszke et al., 2019). Code
single-sequence variational posteriors are not informative of the chemical
and data are publicly available at: https://github.com/nec-research/vibtcr.
reaction. Analogously, sampling from a MoE - which has similarities to
the OR operator - is not suitable for capturing how molecules chemically
3.1 Datasets
interact.
Recent studies (Jurtz et al., 2018; De Neuter et al., 2018; Jokinen et al.,
2.3 Information Bottleneck Mahalanobis Distance 2019; Wong et al., 2019; Moris et al., 2019; Gielis et al., 2019; Tong et al.,
2020; Springer et al., 2020; Fischer et al., 2020; Springer et al., 2021;

Although AVIB is not explicitly designed for uncertainty estimation, Montemurro et al., 2021; Weber et al., 2021) investigate the prediction
we propose a simple, yet effective, approach for OOD detection. This of TCR-peptide/-pMHC interactions. Most use data from the Immune
approach is strongly inspired by Lee et al. (2018) and leverages the Epitope Database (IEDB) (Vita et al., 2019), VDJdb (Bagaev et al.,
Mahalanobis distance. In the following, we first summarize the method 2020) and McPAS-TCR (Tickotsky et al., 2017), which mainly contain
proposed by Lee et al. (2018). Then, we describe how to extend this CDR3β data and lack information on CDR3α. We merge human TCR-
approach to AVIB. peptide data extracted from the ERGO II and NetTCR-2.0 repositories 2 .
Mahalanobis distance. The Mahalanobis distance has proved to be Binding (i.e. positive) samples are derived from the IEDB, VDJdb and
an effective metric for OOD detection (Lee et al., 2018). Let fl (x) denote McPAS-TCR databases. Positive data points generated by Klinger et al.
the output of the lth hidden layer of a neural network, given an input x.
(2015), referred to as MIRA set, are also considered 3 . We employ all
Using the training samples, this method fits a class-conditional Gaussian non-binding (i.e. negative) samples used by Springer et al. (2021) and
distribution to the embeddings of each class, computing a per-class mean Montemurro et al. (2021). Hence, negative samples are derived from
µcl = N1
P
c i:yi =c fl (xi ) and a shared covariance matrix Σl = random recombination of positive data points, as well as from the 10X
1 PK P c c T
N c=1 i:yi =c (fl (xi )−µl )(fl (xi )−µl ) . Given aP test sample x, Genomics assays described in Montemurro et al. (2021). Overall, 271,366
the Mahalanobis score is computed
as scoreM aha (x) = l αl M
l (x), human TCR-peptide samples are available. We organize the data and create
where Ml (x) = − minc 21 (fl (x) − µcl )Σ−1 l (f l (x) − µ c )T . Lee
l the following datasets.
et al. (2018) fit the αl coefficients by training a logistic regression on a α+β set. 117,753 samples out of 271,366 present peptide information,
set of samples for which the knowledge of the OOD/ID label is assumed. along with both CDR3α and CDR3β sequences. In this work, we refer
Additionally, the authors show that adding a small (ε) controlled noise to to this subset as the α+β set. The ground truth label is a binary variable
the input can improve results, analogously to ODIN (Liang et al., 2017). which represents whether the peptide and TCR chains interact.
We leverage the expectation of the learned latent posterior conditioned β set. 153,613 samples out of 271,366 present peptide and CDR3β
on all input sequences and fit K class-conditional Gaussian distributions information (the CDR3α sequence is missing). We refer to this subset as
using the ID training samples, where K is the number of classes. For the the β set. The β set and the α+β set are disjoint.
TCR-peptide interaction prediction task, we have two classes: binders and Human TCR set. We refer to the totality of the human TCR-peptide
non-binders. The K empirical class means are computed as: data (i.e. β set ∪ α+β set) as Human TCR set.
1 X Non-human TCR set. We extract 5,036 non-human TCR samples
µcAV IB = E[Z|xi1:M ] . (7) from the VDJdb database, which we use as OOD samples. These
Nc i:y =c
i
samples come from mice and macaques and present peptide and CDR3β
A shared covariance matrix is computed as: information. We refer to these samples as Non-human TCR set.
In addition to the TCR datasets, in order to thoroughly evaluate AVIB
ΣAV IB = on multiple types of molecular data, we perform experiments on the
K
following peptide-MHC datasets.
1 X X NetMHCIIpan-4.0 set. This dataset consists of 108,959 peptide-MHC
(E[Z|xi1:M ] − µcAV IB )(E[Z|xi1:M ] − µcAV IB )T .
N c=1 i:y =c pairs and was proposed in Reynisson et al. (2020) for the training of the
i
(8) NetMHCIIpan-4.0 model. All MHC molecules are class II. A continuous
binding affinity (BA) value, ranging in [0, 1], is associated to each (peptide,
Lee et al. (2018) compute Mahalanobis distances at multiple hidden
MHC) pair and used to validate AVIB on a regression task.
layers of a neural network. A logistic regression is trained to learn
Human MHC set. We create a second set of OOD samples composed
relative weights assuming the knowledge of the ID/OOD label for a set
of 463,684 peptide-MHC pairs. The peptide sequences are taken from the
of validation samples. In contrast to that, given a multi-sequence sample
Human TCR set, i.e. the peptide information is shared among ID and OOD
x1:M , we propose to leverage the multi-sequence posterior distribution
sets. The MHC molecules are represented as pseudo-sequences of amino
over encodings learned by AVIB, i.e. pΘ (Z|x1:M ) and compute one
acids 4 . We consider both class I and class II MHC alleles. We refer to
single Mahalanobis distance, which acts as OOD score:
these samples as Human MHC set.
scoreAV IB−M aha (x1:M ) = 2 https://github.com/IdoSpringer/ERGO-II;

1 https://github.com/mnielLab/NetTCR-2.0
−min( (E[Z|x1:M ]−µcAV IB )Σ−1 c T
AV IB (E[Z|x1:M ]−µAV IB ) ).
c 2 3 The MIRA set is publicly available in the NetTCR-2.0 repository
(9) https://github.com/mnielLab/NetTCR-2.0/tree/main/data
4 For the MHC pseudo-sequences, we refer to the PUFFIN
This approach is hyperparameter-free. Hence, it does not require a
validation set for tuning. As a consequence, prior knowledge of OOD (Zeng and Gifford, 2019) repository: https://github.com/gifford-
validation samples is not required. lab/PUFFIN/blob/master/data/pseudosequence.2016.all.X.dat
i i
i i
i i
“output” — 2022/12/19 — 15:57 — page 5 — #5

i i
Table 1. TCR-peptide interaction prediction results. The reported confidence intervals are standard errors over 5 repeated experiments with different independent
training/test random splits. Reported scores are computed on the test sets. Baselines: NetTCR-2.0 (Montemurro et al., 2021), ERGO II (Springer et al., 2021),
LUPI-SVM (Abbasi et al., 2018). Best results are in bold.
Dataset Inputs Method AUROC AUPR F1
NetTCR-2.0 0.755 ± 0.001 0.395 ± 0.002 0.349 ± 0.002

β set Pep+β ERGO II 0.761 ± 0.011 0.415 ± 0.020 0.412 ± 0.010
AVIB (ours) 0.804 ± 0.001 0.494 ± 0.001 0.477 ± 0.001
LUPI-SVM 0.770 ± 0.001 0.212 ± 0.001 0.218 ± 0.001

NetTCR-2.0 0.846 ± 0.002 0.396 ± 0.003 0.413 ± 0.001
Pep+β
ERGO II 0.894 ± 0.001 0.538 ± 0.004 0.498 ± 0.003
AVIB (ours) 0.895 ± 0.001 0.534 ± 0.004 0.515 ± 0.002
α+β set

NetTCR-2.0 0.862 ± 0.002 0.477 ± 0.003 0.472 ± 0.002
Pep+α+β ERGO II 0.903 ± 0.002 0.578 ± 0.004 0.528 ± 0.002
AVIB (ours) 0.913 ± 0.001 0.614 ± 0.002 0.586 ± 0.001
β set NetTCR-2.0 0.727 ± 0.001 0.342 ± 0.001 0.276 ± 0.002

∪ Pep+β ERGO II 0.748 ± 0.015 0.379 ± 0.022 0.381 ± 0.014
α+β set AVIB (ours) 0.773 ± 0.001 0.414 ± 0.002 0.396 ± 0.003
Table 2. Multi-sequence posterior approximation - benchmark and ablation. TCR-peptide binding prediction experiments are performed on the α+β set.
Peptide-MHC BA regression experiments are performed on the NetMHCIIpan-4.0 set. Confidence intervals are standard errors over 5 repeated experiments with
different training/test random splits. PEP: peptide; α: CDR3α sequence; β: CDR3β sequence; MHC II: MHC class II pseudo-sequence. Baseline: MVIB. Ablation
without multi-head self-attention: Average Pooling of Experts (AvgPOOLoE). Best results are in bold. ↑: larger value is better. ↓: lower value is better.
MVIB AVIB
Inputs Metric ↑/↓ AvgPOOLoE
(PoE) (AoE)
AUROC 0.889 ± 0.001 0.883 ± 0.002 0.895 ± 0.001

AUPR 0.512 ± 0.003 0.502 ± 0.002 0.535 ± 0.004
Pep+β ↑
F1 0.498 ± 0.002 0.484 ± 0.003 0.515 ± 0.002
Accuracy 0.860 ± 0.003 0.852 ± 0.002 0.873 ± 0.001
AUROC 0.910 ± 0.001 0.905 ± 0.002 0.913 ± 0.001

AUPR 0.595 ± 0.004 0.589 ± 0.002 0.614 ± 0.002
Pep+α+β ↑
F1 0.575 ± 0.002 0.555 ± 0.007 0.587 ± 0.001
Accuracy 0.907 ± 0.001 0.898 ± 0.002 0.916 ± 0.001
MSE 0.0313 ± 0.0001 0.0329 ± 0.0002 0.0299 ± 0.0001

↓
Pep+MHC II RMSE 0.137 ± 0.001 0.140 ± 0.001 0.133 ± 0.003
R2 ↑ 0.538 ± 0.001 0.514 ± 0.004 0.559 ± 0.001
Figure 7 in the Supplementary Materials depicts the distributions longest sequence. Information on the length distribution of the amino acid
of the human TCR data for both the α+β set and the β set. The two sequences can be found in Supplementary Materials 5.1.
datasets have similar peptide distributions, but contain different CDR3β
sequences. Supplementary Materials 5.1 provides information regarding 3.3 TCR-peptide Interaction Prediction
the distribution of the length of the amino acid sequences in the various
In order to evaluate AVIB’s performance on the TCR-peptide interaction
datasets. Supplementary Materials 5.2 provides information on the class
prediction task, we perform experiments on three datasets: the α+β set, the
distribution of the α+β set and β set. Supplementary Materials 5.3
β set and their union β set ∪ α+β set. For the β set and the union set, input
describes the binding affinity distributions of the NetMHCIIpan-4.0 set.
samples are (xP eptide , xCDR3β ) pairs. For the α+β set, inputs can be
either (xP eptide , xCDR3β ) pairs or (xP eptide , xCDR3α , xCDR3β )
3.2 Pre-processing triples. For all tri-sequence experiments, we adopt a full multi-sequence
In this work, peptides, CDR3α and CDR3β are represented as sequences extension of the JAV IB objective (see Supplementary Materials 6,
of amino acids. The 20 amino acids translated by the genetic code Equation 12).
are in general represented as English alphabet letters. Analogously to Baselines. We benchmark AVIB against two state-of-the-art deep
Montemurro et al. (2021), we pre-process the amino acid sequences learning methods for TCR-peptide interaction predictions: ERGO II
using BLOSUM50 encodings (Henikoff and Henikoff, 1992), i.e. the (Springer et al., 2021) and NetTCR-2.0 (Montemurro et al., 2021).
substitution value of each amino acid represented by the BLOSUM50 Additionally, we benchmark AVIB against the LUPI-SVM (Abbasi et al.,
matrix’ diagonal. This allows us to represent a sequence of N amino acids 2018), by leveraging the α-chain at training time as privileged information.
as a 20 × N matrix, analogously to the approach proposed by Nielsen For all benchmark methods, we adopt the original publicly available
et al. (2003). After performing BLOSUM50 encoding, we standardize the implementations5 .
features by subtracting the mean and scaling to unit variance. As the length
of the amino acid sequences is not constant, we operate 0-padding after 5 https://github.com/IdoSpringer/ERGO-II;
the BLOSUM50 encoding (Mösch and Frishman, 2021). This ensures that https://github.com/mnielLab/NetTCR-2.0;
all matrices have shape 20 × Nmax , where Nmax is the length of the https://github.com/wajidarshad/LUPI-SVM
i i
i i
i i
“output” — 2022/12/19 — 15:57 — page 6 — #6

i i
6 Grazioli et al.
Evaluation metrics. Table 1 summarizes the experimental results. For

evaluation, the area under the receiver operator characteristic (AUROC)
curve , the area under the precision-recall (AUPR) curve and the F1 score
(F1) are computed on the test sets. 5 repeated experiments with different
80/20 training/test random splits are performed for robust evaluation.
Peptide+CDR3β. On the β set, AVIB obtains ∼4% higher AUROC
and 8% higher AUPR compared to the best baseline, ERGO II. On the
β set ∪ α+β set, AVIB outperforms ERGO II by achieving ∼3% higher
AUROC and ∼4% higher AUPR. On the α+β set, in the peptide+CDR3β
setting, AVIB compares with ERGO II.
Peptide+CDR3α. Peptide+CDR3α results on the α+β set are
reported in Supplementary Materials 7.

Peptide+CDR3α+CDR3β. In the tri-sequence setting, when
considering peptide and both CDR3α and CDR3β sequences, AVIB Fig. 2. AVIB performance with missing input sequences. Confidence intervals are
obtains 1% higher AUROC, ∼4% higher AUROC and ∼6% higher F1 standard deviations over 5 repeated experiments on the α+β set with different independent
score compared to the best baseline, ERGO II. training/test random splits. Train: training time sequences. Test: test time sequences.
These experimental results demonstrate that AVIB is a competitive be presented on the surface of cells if they bind to MHC molecules. This
method for TCR-peptide interaction prediction. On the α+β set, AVIB’s mechanism allows the immune system to gain knowledge about in-cell
tri-sequence (peptide+CDR3α+CDR3β) results outperform the results anomalies such as cancerous mutations or viral infections.
obtained in both bi-sequence (peptide+CDR3α and peptide+CDR3β) Baseline and ablation methods. We benchmark AVIB, which
settings (see Table 1 and Supplementary Materials 7). This shows that employs AoE, against MVIB (Grazioli et al., 2022a), which employs
AVIB is an effective multi-sequence learning method, which can learn PoE. Additionally, we perform an ablation study meant to investigate the
richer representations from the joint analysis of multiple data sequences. influence of multi-head self-attention in AoE. For the ablation, we remove
the multi-head self-attention module from AoE (see Equation 5) and only
3.3.1 Cross-dataset Experiments operate a simple pooling of the various single-sequence posteriors. We
In Supplementary Materials 8, we presents cross-dataset experiments, in define two ablation methods: Max Pooling of Experts (MaxPOOLoE),
which we train AVIB and the baseline models on the α+β set and test on the which adopts a 1D max pooling function and Average Pooling of Experts
the β set. As shown in Figure 7, the α+β set and the β set present similar (AvgPOOLoE), which adopts 1D average pooling.
peptide distributions, but contain different CDR3β sequences. Our cross- Evaluation metrics. For the evaluation of classification results on
dataset results show that all models fail to generalize to unseen CDR3β the α+β set, we adopt AUROC, AUPR, F1 and accuracy. For evaluating
sequences. These results are in line with Grazioli et al. (2022b), which regression on the NetMHCIIpan-4.0 set, we employ MSE, root mean
analogously shows that state-of-the-art models fail to generalize to unseen squared error (RMSE) and the R2 coefficient (Wright, 1921).
peptides. Table 2 presents classification and regression results on the α+β set
and the NetMHCIIpan-4.0 set. AoE achieves best results in all settings
3.3.2 Visualization of the Attention Weights and on both datasets. Interestingly, the ablation methods AvgPOOLoE and
One of the advantages of using AoE for estimating the multi-sequence MaxPOOLoE (Supplementary Materials 10) achieve worse performance
posterior is the dynamic weighting of the multiple single-sequence compared to PoE.
posteriors. This allows to capture relationships between the input
sequences. In Supplementary Materials 9, we show how the attention
3.5 Missing Input Sequences
weights derived from the µAoE self-attention block change while
gradually mutating the peptide sequence. We notice, that as the peptide In this section, we study AVIB’s performance when certain data sequences
sequence disruption increases, the peptide-CDR3β attention weight drops are available at training time, but missing at test time. We train AVIB on
while CDR3β-peptide increases. (xP eptide , xCDR3α , xCDR3β ) triples from the α+β set. At test time,
we omit one of the two CDR3 sequences. In real-world settings, it is
3.4 Multi-sequence Posterior Approximation in fact common to have batches of data where only CDR3α or CDR3β
information is available. It is therefore efficient to leverage one single
In this section, we compare various techniques to approximate Gaussian model which can operate also if a CDR3 sequence is missing. This prevents
joint posteriors. We perform experiments and benchmark on two datasets: the need of training different models on the various sequences subsets.
α+β set and NetMHCIIpan-4.0 set. Experiments on the α+β set employ Figure 2 presents the experimental results. As expected, AVIB
either (xP eptide , xCDR3β ) pairs or (xP eptide , xCDR3α , xCDR3β ) performance decreases when a CDR3 sequence is missing at test time.
triples as inputs. Experiments on the NetMHCIIpan-4.0 set input However, the performance achieved by AVIB when trained in the tri-
(xP eptide , xM HC ) pairs. sequence setting and tested on missing sequences is not consistently
The ground truth labels of the NetMHCIIpan-4.0 set are continuous different to the performance deriving from a bi-sequence training. We
BA scores. For BA regression, we train models by substituting the log- only observe a significant difference on the AUPR score when the CDRα
likelihood of Equation 6 with a mean squared error (MSE) loss. BA sequence is missing: AVIB trained on peptide+CDR3β achieves ∼3%
prediction of pMHC complexes is - just like TCR-peptide interaction higher AUPR than AVIB trained on peptide+CDR3α+CDR3β and tested
prediction - a fundamental problem in computational immuno-oncology with missing CDR3α.
(O’Donnell et al., 2018; O’Donnell et al., 2020; Reynisson et al., 2020;
Cheng et al., 2021) and is a key step in the development of vaccines against
3.6 Out-of-distribution Detection
cancer (Slansky et al., 2000; Meng and Butterfield, 2002; McMahan et al.,
2006; Corse et al., 2011; Buhrman and Slansky, 2013; Hundal et al., Alemi et al. (2018) show that VIB has the ability to detect OOD
2020) and infectious diseases (Malone et al., 2020). Peptides can only samples. In this section, the OOD detection capabilities of AVIB are
i i
i i
i i
“output” — 2022/12/19 — 15:57 — page 7 — #7

i i
Table 3. OOD detection results. Distinguishing in- and out-of-distribution test samples. DID is the in-distribution set; DOOD is the out-of-distribution set, not
ID ID
available at training time. The reported confidence intervals are standard errors over 5 repeated experiments with different independent Dtrain /Dtest random
splits. ↑ indicates larger value is better; ↓ indicates lower value is better. Best results are in bold. Baselines: MSP, ODIN, AVIB-R. For ODIN, the hyperparameters
are ε = 0.001 and T = 1000, tuned on a validation set.
FPR Detection
AUROC AUPR
D ID / D OOD Method @ 95% TPR Error
↑ ↑
↓ ↓
Human TCR MSP 0.962 ± 0.001 0.505 ± 0.001 0.540 ± 0.003 0.627 ± 0.003
/ ODIN 0.962 ± 0.002 0.506 ± 0.001 0.425 ± 0.008 0.559 ± 0.014
Non-human AVIB-R 0.719 ± 0.018 0.384 ± 0.009 0.768 ± 0.010 0.714 ± 0.011
TCR AVIB-Maha
0.699 ± 0.011 0.374 ± 0.006 0.850 ± 0.002 0.871 ± 0.001

(ours)
MSP 0.955 ± 0.002 0.503 ± 0.001 0.491 ± 0.007 0.550 ± 0.005

Human TCR
ODIN 0.714 ± 0.047 0.382 ± 0.024 0.701 ± 0.029 0.763 ± 0.027
/
AVIB-R 0.297 ± 0.027 0.174 ± 0.014 0.955 ± 0.004 0.964 ± 0.003
Human MHC
AVIB-Maha
0.006 ± 0.002 0.028 ± 0.001 0.994 ± 0.001 0.995 ± 0.001
(ours)
investigated. We assume that we have an in-distribution (ID) dataset DID

of (xID ID ID ) tuples, where x denote input data sequences
1 , ..., xM ; y i
and y the class label. DOOD denotes an out-of-distribution dataset of
(xOOD
1 , ..., xOOD
M ; y OOD ) tuples. We study the scenario in which the
model only has access to ID samples DtrainID at training time. The test set
ID OOD
consists of Dtest ∪ Dtest . We adopt the Human TCR set as ID dataset,
i.e. DID . We perform experiments using the Non-human TCR set and the
Human MHC set as OOD datasets, i.e. DOOD .
We leverage the expectation of the learned latent posterior conditioned
on all input sequences and fit 2 class-conditional Gaussian distributions
using the ID training samples, one for the binding samples and one
for the non-binding ones (Equation 7). The class-conditional Gaussian
distributions share the same covariance matrix (Equation 8). Analogously
to Lee et al. (2018), we discriminate whether test samples are ID or OOD
using the Mahalanobis distance score (AVIB-Maha) (Equation 9).
Training and test sets for OOD detection. Given a pair
(DID , DOOD ), we operate a random 80/20 training/test split of DID into
ID
Dtrain ID . We train AVIB on D ID
and Dtest train for TCR-peptide interaction
prediction. No OOD samples are available at training time. We ensure that Fig. 3. OOD detection - ROC and PR curves.
the number of ID and OOD samples in the test set is balanced by applying
the procedure described in Supplementary Materials 11.1. Experiments AoE to implicitly approximate the posterior distribution over latent
are repeated 5 times with different random training/test splits. encodings conditioned on multiple input sequences. We apply AVIB to
Baselines. For benchmark, we compare our results with several OOD the TCR-peptide interaction prediction problem, a fundamental challenge
detection methods: MSP (Hendrycks and Gimpel, 2016), ODIN (Liang in immuno-oncology. We show that our method significantly improves on
et al., 2017) and the AVIB rate (AVIB-R) DKL pΘ (Z|xn
the state-of-the-art baselines ERGO II (Springer et al., 2021) and NetTCR-
1:M )||rω (Z)
(Alemi et al., 2018). See Supplementary Materials 11.2 for further details 2.0 (Montemurro et al., 2021). We demonstrate the effectiveness of AoE
about the baseline methods. with a benchmark against PoE, as well as with an ablation study. We
Evaluation metrics. As evaluation metrics, in addition to AUROC also show that AoE achieves best results on peptide-MHC binding affinity
and AUPR, we adopt the false positive rate at 95% true positive rate (FPR regression. Furthermore, we demonstrate that AVIB can handle missing
@ 95% TPR) and the detection error (see Supplementary Materials 11.3). data sequences at test time. We then leverage the bottleneck posterior
Table 3 summarizes the OOD detection results for AVIB trained on distribution learned by AVIB and demonstrate that it can be used to
the Human TCR set for TCR-peptide interaction prediction and using the effectively detect OOD amino acid sequences. Our method significantly
Non-human TCR set and the Human MHC set as OOD datasets. Figure 3 outperforms the baselines MSP (Hendrycks and Gimpel, 2016), ODIN
shows the ROC and PR curves. AVIB-Maha achieves best results on all (Liang et al., 2017) and AVIB-R (Alemi et al., 2018). Interestingly, we
investigated metrics on both OOD datasets. On the Non-human TCR set, observe that generalization to unseen sequences remains a challenging
AVIB-Maha outperforms AVIB-R by ∼9% AUROC and >15% AUPR. problem for all investigated models. These results are analogous to those
On the Human MHC set, AVIB-Maha outperforms AVIB-R by ∼29% FPR of Grazioli et al. (2022b). We believe this drop in performance is due to
@ 95% TPR and ∼15% detection error. the sparsity of the observed training sequences. Future work should focus
on tackling the problem of generalization by, for example, simulating or
approximating the chemical interactions of TCR and peptides (or pMHCs),
4 Conclusion as well as their 3D structures.
In this paper, we propose AVIB, a multi-sequence generalization of

the Variational Information Bottleneck (Alemi et al., 2016), which uses
i i
i i
i i
“output” — 2022/12/19 — 15:57 — page 8 — #8

i i
8 Grazioli et al.
References Grazioli, F., Mösch, A., Machart, P., Li, K., Alqassem, I., O’Donnell,
Abbasi, W. A., Asif, A., Ben-Hur, A., et al. (2018). Learning protein T. J., and Min, M. R. (2022b). On TCR binding predictors failing to
binding affinity using privileged information. BMC bioinformatics, generalize to unseen peptides. Frontiers in Immunology, 13.
19(1), 1–12. Hendrycks, D. and Gimpel, K. (2016). A baseline for detecting
misclassified and out-of-distribution examples in neural networks. arXiv
Alemi, A. A., Fischer, I., Dillon, J. V., and Murphy, K. (2016). Deep
preprint arXiv:1610.02136.
variational information bottleneck. arXiv preprint arXiv:1612.00410.
Alemi, A. A., Fischer, I., and Dillon, J. V. (2018). Uncertainty in the Henikoff, S. and Henikoff, J. G. (1992). Amino acid substitution matrices
variational information bottleneck. arXiv preprint arXiv:1807.00906. from protein blocks. Proceedings of the National Academy of Sciences,
Bagaev, D. V., Vroomans, R. M., Samir, J., Stervbo, U., Rius, C., 89(22), 10915–10919.
Dolton, G., Greenshields-Watson, A., Attaf, M., Egorov, E. S., Zvyagin, Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M.,
I. V., et al. (2020). Vdjdb in 2019: database extension, new analysis Mohamed, S., and Lerchner, A. (2017). beta-vae: Learning basic visual
concepts with a constrained variational framework. In International

infrastructure and a t-cell receptor motif compendium. Nucleic Acids
Research, 48(D1), D1057–D1062. Conference on Learning Representations.
Buhrman, J. D. and Slansky, J. E. (2013). Improving t cell responses Hinton, G. E. (2002). Training products of experts by minimizing
contrastive divergence. Neural computation, 14(8), 1771–1800.
to modified peptides in tumor vaccines. Immunologic research, 55(1),
Hundal, J., Kiwala, S., McMichael, J., Miller, C. A., Xia, H., Wollam,
34–47.
Caragea, C., Sinapov, J., Dobbs, D., and Honavar, V. (2009). Mixture A. T., Liu, C. J., Zhao, S., Feng, Y.-Y., Graubert, A. P., et al. (2020).
of experts models to exploit global sequence similarity on biomolecular pvactools: a computational toolkit to identify and visualize cancer
sequence labeling. In BMC bioinformatics, volume 10, pages 1–14. neoantigens. Cancer immunology research, 8(3), 409–420.
Springer. Jokinen, E., Huuhtanen, J., Mustjoki, S., Heinonen, M., and Lähdesmäki,
Cheng, J., Bendjama, K., Rittner, K., and Malone, B. (2021). Bertmhc: H. (2019). Determining epitope specificity of t cell receptors with tcrgp.
improved mhc–peptide class ii interaction prediction with transformer BioRxiv, page 542332.
and multiple instance learning. Bioinformatics, 37(22), 4172–4179. Jurtz, V. I., Jessen, L. E., Bentzen, A. K., Jespersen, M. C., Mahajan, S.,
Chronister, W. D., Crinklaw, A., Mahajan, S., Vita, R., Koşaloğlu-Yalçın, Vita, R., Jensen, K. K., Marcatili, P., Hadrup, S. R., Peters, B., et al.
Z., Yan, Z., Greenbaum, J. A., Jessen, L. E., Nielsen, M., Christley, S., (2018). Nettcr: sequence-based prediction of tcr binding to peptide-mhc
complexes using convolutional neural networks. BioRxiv, page 433706.
et al. (2021). Tcrmatch: predicting t-cell receptor specificity based on
sequence similarity to previously characterized receptors. Frontiers in Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes.
immunology, 12, 673. arXiv preprint arXiv:1312.6114.
Corse, E., Gottschalk, R. A., and Allison, J. P. (2011). Strength of tcr– Klinger, M., Pepin, F., Wilkins, J., Asbury, T., Wittkop, T., Zheng, J.,
peptide/mhc interactions and in vivo t cell responses. The Journal of Moorhead, M., and Faham, M. (2015). Multiplex identification of
Immunology, 186(9), 5039–5045. antigen-specific t cell receptors using a combination of immune assays
Dash, P., Fiore-Gartland, A. J., Hertz, T., Wang, G. C., Sharma, S., and immune receptor sequencing. PLoS One, 10(10), e0141561.
Souquette, A., Crawford, J. C., Clemens, E. B., Nguyen, T. H., Kopf, A., Fortuin, V., Somnath, V. R., and Claassen, M. (2021). Mixture-
Kedzierska, K., et al. (2017). Quantifiable predictive features define of-experts variational autoencoder for clustering and generating from
epitope-specific t cell receptor repertoires. Nature, 547(7661), 89–93. similarity-based representations on single cell data. PLoS computational
biology, 17(6), e1009086.
Davis, M. M. and Bjorkman, P. J. (1988). T-cell antigen receptor genes
Krogsgaard, M. and Davis, M. M. (2005). How t cells’ see’antigen. Nature
and t-cell recognition. Nature, 334(6181), 395–402.
De Neuter, N., Bittremieux, W., Beirnaert, C., Cuypers, B., Mrzic, A., immunology, 6(3), 239–245.
Moris, P., Suls, A., Van Tendeloo, V., Ogunjimi, B., Laukens, K., Kutuzova, S., Krause, O., McCloskey, D., Nielsen, M., and Igel, C. (2021).
et al. (2018). On the feasibility of mining cd8+ t cell receptor patterns Multimodal variational autoencoders for semi-supervised learning: In
underlying immunogenic peptide recognition. Immunogenetics, 70(3), defense of product-of-experts. arXiv preprint arXiv:2101.07240.
159–168. La Gruta, N. L., Gras, S., Daley, S. R., Thomas, P. G., and Rossjohn, J.
Feng, D., Bond, C. J., Ely, L. K., Maynard, J., and Garcia, K. C. (2018). Understanding the drivers of mhc restriction of t cell receptors.
(2007). Structural evidence for a germline-encoded t cell receptor–major Nature Reviews Immunology, 18(7), 467–478.
histocompatibility complex interaction’codon’. Nature immunology, Lanzarotti, E., Marcatili, P., and Nielsen, M. (2019). T-cell receptor
cognate target prediction based on paired α and β chain sequence and
8(9), 975–983.
structural cdr loop similarities. Frontiers in immunology, 10, 2080.
Fischer, D. S., Wu, Y., Schubert, B., and Theis, F. J. (2020). Predicting
antigen specificity of single t cells based on tcr cdr 3 regions. Molecular Lee, C. and Schaar, M. (2021). A variational information bottleneck
systems biology, 16(8), e9416. approach to multi-omics data integration. In International Conference
Gielis, S., Moris, P., Bittremieux, W., De Neuter, N., Ogunjimi, B., on Artificial Intelligence and Statistics, pages 1513–1521. PMLR.
Laukens, K., and Meysman, P. (2019). Detection of enriched t cell Lee, K., Lee, K., Lee, H., and Shin, J. (2018). A simple unified
epitope specificity in full t cell receptor sequence repertoires. Frontiers framework for detecting out-of-distribution samples and adversarial
in immunology, 10, 2820. attacks. Advances in neural information processing systems, 31.
Glanville, J., Huang, H., Nau, A., Hatton, O., Wagar, L. E., Rubelt, F., Liang, S., Li, Y., and Srikant, R. (2017). Enhancing the reliability of
Ji, X., Han, A., Krams, S. M., Pettus, C., et al. (2017). Identifying out-of-distribution image detection in neural networks. arXiv preprint
arXiv:1706.02690.
specificity groups in the t cell receptor repertoire. Nature, 547(7661),
Malone, B., Simovski, B., Moliné, C., Cheng, J., Gheorghe, M.,
94–98.
Grazioli, F., Siarheyeu, R., Alqassem, I., Henschel, A., Pileggi, G., Fontenelle, H., Vardaxis, I., Tennøe, S., Malmberg, J.-A., Stratford,
and Meiser, A. (2022a). Microbiome-based disease prediction with R., et al. (2020). Artificial intelligence predicts the immunogenic
multimodal variational information bottlenecks. PLOS Computational landscape of sars-cov-2 leading to universal blueprints for vaccine
Biology, 18(4), e1010050. designs. Scientific reports, 10(1), 1–14.
i i
i i
i i
“output” — 2022/12/19 — 15:57 — page 9 — #9

i i
McMahan, R. H., McWilliams, J. A., Jordan, K. R., Dow, S. W., Wilson, Springer, I., Besser, H., Tickotsky-Moskovitz, N., Dvorkin, S., and
D. B., Slansky, J. E., et al. (2006). Relating tcr-peptide-mhc affinity Louzoun, Y. (2020). Prediction of specific tcr-peptide binding from large
to immunogenicity for the design of tumor vaccines. The Journal of dictionaries of tcr-peptide pairs. Frontiers in immunology, 11, 1803.
clinical investigation, 116(9), 2543–2551. Springer, I., Tickotsky, N., and Louzoun, Y. (2021). Contribution of t
Meng, W. S. and Butterfield, L. H. (2002). Rational design of peptide- cell receptor alpha and beta cdr3, mhc typing, v and j genes to peptide
based tumor vaccines. Pharmaceutical research, 19(7), 926–932. binding prediction. Frontiers in immunology, 12.
Montemurro, A., Schuster, V., Povlsen, H. R., Bentzen, A. K., Jurtz, V., Szeto, C., Lobos, C. A., Nguyen, A. T., and Gras, S. (2020). TCR
Chronister, W. D., Crinklaw, A., Hadrup, S. R., Winther, O., Peters, recognition of peptide–MHC-i: Rule makers and breakers. International
B., et al. (2021). Nettcr-2.0 enables accurate prediction of tcr-peptide Journal of Molecular Sciences, 22(1), 68.
binding by using paired tcrα and β sequence data. Communications Tickotsky, N., Sagiv, T., Prilusky, J., Shifrut, E., and Friedman, N. (2017).
biology, 4(1), 1–13. Mcpas-tcr: a manually curated catalogue of pathology-associated t cell
Moris, P., De Pauw, J., Postovskaya, A., Ogunjimi, B., Laukens, K., receptor sequences. Bioinformatics, 33(18), 2924–2929.

and Meysman, P. (2019). Treating biomolecular interaction as an Tishby, N., Pereira, F. C., and Bialek, W. (2000). The information
image classification problem–a case study on t-cell receptor-epitope bottleneck method. arXiv preprint physics/0004057.
recognition prediction. bioRxiv. Tong, Y., Wang, J., Zheng, T., Zhang, X., Xiao, X., Zhu, X., Lai, X., and
Mösch, A. and Frishman, D. (2021). TCRpair: prediction of functional Liu, X. (2020). Sete: Sequence-based ensemble learning approach for
pairing between HLA-A*02:01-restricted T-cell receptor α and β chains. tcr epitope binding prediction. Computational Biology and Chemistry,
Bioinformatics, 37(21), 3938–3940. 87, 107281.
Nielsen, M., Lundegaard, C., Worning, P., Lauemøller, S. L., Lamberth, Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
K., Buus, S., Brunak, S., and Lund, O. (2003). Reliable prediction of t- A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. In
cell epitopes using neural networks with novel sequence representations. Advances in neural information processing systems, pages 5998–6008.
Protein Science, 12(5), 1007–1017. Vita, R., Mahajan, S., Overton, J. A., Dhanda, S. K., Martini, S., Cantrell,
O’Donnell, T. J., Rubinsteyn, A., Bonsack, M., Riemer, A. B., Laserson, J. R., Wheeler, D. K., Sette, A., and Peters, B. (2019). The immune
U., and Hammerbacher, J. (2018). Mhcflurry: open-source class i mhc epitope database (iedb): 2018 update. Nucleic acids research, 47(D1),
binding affinity prediction. Cell systems, 7(1), 129–132. D339–D343.
O’Donnell, T. J., Rubinsteyn, A., and Laserson, U. (2020). Mhcflurry Weber, A., Born, J., and Rodriguez Martínez, M. (2021). Titan: T-
2.0: Improved pan-allele prediction of mhc class i-presented peptides by cell receptor specificity prediction with bimodal attention networks.
incorporating antigen processing. Cell systems, 11(1), 42–48. Bioinformatics, 37(Supplement_1), i237–i244.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Weininger, D., Weininger, A., and Weininger, J. L. (1989). Smiles. 2.
Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, algorithm for generation of unique smiles notation. Journal of chemical
A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., information and computer sciences, 29(2), 97–101.
Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). Pytorch: An Wong, E., Gold, M., Meermeier, E., Xulu, B., Khuzwayo, S., Sullivan,
imperative style, high-performance deep learning library. In H. Wallach, Z., Mahyari, E., Rogers, Z., Kloverpris, H., Sharma, P., et al. (2019).
H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Trav1-2 cd8 t-cells including oligoconal expansions of mait cells are
editors, Advances in Neural Information Processing Systems 32, pages enriched in the airways in human tuberculosis. commun biol 2: 203.
8024–8035. Curran Associates, Inc. Wright, S. (1921). Correlation and causation. Journal of agricultural
Qi, Y., Klein-Seetharaman, J., and Bar-Joseph, Z. (2007). A mixture of research, 20(7), 557–585.
feature experts approach for protein-protein interaction prediction. In Wu, M. and Goodman, N. (2018). Multimodal generative models for
BMC bioinformatics, volume 8, pages 1–14. BioMed Central. scalable weakly-supervised learning. arXiv preprint arXiv:1802.05335.
Reynisson, B., Alvarez, B., Paul, S., Peters, B., and Nielsen, M. Zeng, H. and Gifford, D. K. (2019). Quantification of uncertainty in
(2020). Netmhcpan-4.1 and netmhciipan-4.0: improved predictions peptide-mhc binding prediction improves high-affinity peptide selection
of mhc antigen presentation by concurrent motif deconvolution and for therapeutic design. Cell systems, 9(2), 159–166.
integration of ms mhc eluted ligand data. Nucleic acids research,
48(W1), W449–W454.
Rossjohn, J., Gras, S., Miles, J. J., Turner, S. J., Godfrey, D. I., and
McCluskey, J. (2015). T cell antigen receptor recognition of antigen-
presenting molecules. Annual review of immunology, 33, 169–200.
Rowen, L., Koop, B. F., and Hood, L. (1996). The complete 685-kilobase
dna sequence of the human β t cell receptor locus. Science, 272(5269),
1755–1762.
Sehnal, D., Bittrich, S., Deshpande, M., Svobodová, R., Berka, K.,
Bazgier, V., Velankar, S., Burley, S. K., Koča, J., and Rose, A. S. (2021).
Mol* viewer: modern web app for 3d visualization and analysis of large
biomolecular structures. Nucleic Acids Research.
Shi, Y., Siddharth, N., Paige, B., and Torr, P. H. (2019). Variational
mixture-of-experts autoencoders for multi-modal deep generative
models. arXiv preprint arXiv:1911.03393.
Slansky, J. E., Rattis, F. M., Boyd, L. F., Fahmy, T., Jaffee, E. M., Schneck,
J. P., Margulies, D. H., and Pardoll, D. M. (2000). Enhanced antigen-
specific antitumor immunity with altered peptide ligands that stabilize
the mhc-peptide-tcr complex. Immunity, 13(4), 529–538.
i i
i i

Attentive Variational Information Bottleneck For T

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Attentive Variational Information Bottleneck For T

Uploaded by

Copyright:

Available Formats

i i

“output” — 2022/12/19 — 15:57 — page 1 — #1

Attentive Variational Information Bottleneck for

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac820/6960920 by guest on 27 December 2022

“output” — 2022/12/19 — 15:57 — page 2 — #2

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac820/6960920 by guest on 27 December 2022

“output” — 2022/12/19 — 15:57 — page 3 — #3

Attentive Variational Information Bottleneck for TCR-peptide Interaction Prediction 3

parametric deterministic encoding function (here a neural network).

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac820/6960920 by guest on 27 December 2022

2.2 Relation to Multimodal Variationl inference

“output” — 2022/12/19 — 15:57 — page 4 — #4

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac820/6960920 by guest on 27 December 2022

scoreAV IB−M aha (x1:M ) = 2 https://github.com/IdoSpringer/ERGO-II;

“output” — 2022/12/19 — 15:57 — page 5 — #5

Attentive Variational Information Bottleneck for TCR-peptide Interaction Prediction 5

Dataset Inputs Method AUROC AUPR F1

NetTCR-2.0 0.755 ± 0.001 0.395 ± 0.002 0.349 ± 0.002

LUPI-SVM 0.770 ± 0.001 0.212 ± 0.001 0.218 ± 0.001

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac820/6960920 by guest on 27 December 2022

β set NetTCR-2.0 0.727 ± 0.001 0.342 ± 0.001 0.276 ± 0.002

AUROC 0.889 ± 0.001 0.883 ± 0.002 0.895 ± 0.001

AUROC 0.910 ± 0.001 0.905 ± 0.002 0.913 ± 0.001

MSE 0.0313 ± 0.0001 0.0329 ± 0.0002 0.0299 ± 0.0001

“output” — 2022/12/19 — 15:57 — page 6 — #6

Evaluation metrics. Table 1 summarizes the experimental results. For

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac820/6960920 by guest on 27 December 2022

“output” — 2022/12/19 — 15:57 — page 7 — #7

Attentive Variational Information Bottleneck for TCR-peptide Interaction Prediction 7

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac820/6960920 by guest on 27 December 2022

MSP 0.955 ± 0.002 0.503 ± 0.001 0.491 ± 0.007 0.550 ± 0.005

investigated. We assume that we have an in-distribution (ID) dataset DID

In this paper, we propose AVIB, a multi-sequence generalization of

“output” — 2022/12/19 — 15:57 — page 8 — #8

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac820/6960920 by guest on 27 December 2022

“output” — 2022/12/19 — 15:57 — page 9 — #9

Attentive Variational Information Bottleneck for TCR-peptide Interaction Prediction 9

Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac820/6960920 by guest on 27 December 2022

You might also like