You are on page 1of 9

IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 19, NO.

1, JANUARY/FEBRUARY 2022 275

AMP0: Species-Specific Prediction of


Anti-microbial Peptides Using Zero and Few
Shot Learning
Sadaf Gull and Fayyaz Minhas

Abstract—Evolution of drug-resistant microbial species is one of the major challenges to global health. Development of new
antimicrobial treatments such as antimicrobial peptides needs to be accelerated to combat this threat. However, the discovery of novel
antimicrobial peptides is hampered by low-throughput biochemical assays. Computational techniques can be used for rapid screening of
promising antimicrobial peptide candidates prior to testing in the wet lab. The vast majority of existing antimicrobial peptide predictors are
non-targeted in nature, i.e., they can predict whether a given peptide sequence is antimicrobial, but they are unable to predict whether the
sequence can target a particular microbial species. In this work, we have used zero and few shot machine learning to develop a targeted
antimicrobial peptide activity predictor called AMP0. The proposed predictor takes the sequence of a peptide and any N/C-termini
modifications together with the genomic sequence of a microbial species to generate targeted predictions. Cross-validation results show
that the proposed scheme is particularly effective for targeted antimicrobial prediction in comparison to existing approaches and can be
used for screening potential antimicrobial peptides in a targeted manner with only a small number of training examples for novel species.
AMP0 webserver is available at http://ampzero.pythonanywhere.com.

Index Terms—Antibiotic resistance, antimicrobial peptides, zero/few shot learning, target microbial species

1 INTRODUCTION the pace of discovery or development of new antibiotics is


very slow: only two new classes of antibiotics were intro-
NTIBIOTICS play a significant role in protecting humans
A from microbial infections. The discovery and use of
antibiotics since the 1930s has helped in treating serious
duced for clinical use in the last two decades [4]. Conse-
quently, the use of vaccines, lysins, antibodies, probiotics,
bacteriophages and antimicrobial peptides (AMPs) is becom-
infections and saved many lives [1]. Resistance against anti-
ing popular in therapeutics as alternatives to antibiotics [1].
biotics in microbes was first detected in the 1960s and it has
For designing new drugs to counter the threat of antimicro-
prompted an evolutionary arms race between microbes and
bial resistance, the use of AMPs is rapidly gaining attention
antibiotics [2]. Antimicrobial resistance is currently a major
[1], [6], [7], [8]. AMPs exhibit different biological activities
global health crisis. The number of deaths due to infections
against microbes, e.g., bacteria, viruses, fungi, etc. [1], have
caused by antibiotic resistance annually is increasing and is
higher inhibition rates than antibiotics, and can potentially
estimated to reach up to 10 million by 2050 [3]. The World
slow down the evolution of antibiotic resistance as well [8].
Health Organization (WHO) has generated a list of antibi-
In order to develop potent AMPs, a large number of
otic resistant bacterial species that are a major threat to
potential AMP candidates need to be tested and evaluated
global health and require urgent development of novel ther-
experimentally before entering clinical trials. The prediction
apeutics against them: Enterococcus faecium, Staphylococcus
of AMPs using machine learning techniques can reduce the
aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudo-
cost of identifying AMPs in the wet lab by pre-screening
monas aeruginosa, and Enterobacter [4].
potential antimicrobial peptides. Machine learning and arti-
To handle the issue of antibiotic resistance, the develop-
ficial intelligence tools such as Deep Neural Networks
ment of novel antibiotics is necessary [1], [2], [3], [4], [5]. In
(DNN), Support Vector Machines (SVM) [9], [10], [11], [12],
comparison to the rate of increase in antimicrobial resistance,
[13], [14], etc., are widely used in computational biology for
biological discovery [9], [15], [16], [17], [18]. A number of
machine learning based AMP predictors are also available
 Sadaf Gull is with the PIEAS Biomedical Informatics Lab, Department of
Computer and Information Sciences, Pakistan Institute of Engineering in the literature [19], [20], [21], [22], [23], [24]. The primary
and Applied SciencesPO Nilore, Islamabad 45650, Pakistan. issue with these non-targeted predictors is that they are
E-mail: sadafzakarkhan@gmail.com. unable to predict whether a given peptide sequence will be
 Fayyaz Minhas is with the Department of Computer Science, University of
effective against a given target microbial species or not (see
Warwick, Coventry, U.K. and also with the PIEAS Biomedical Informatics
LabPIEAS, PO Nilore, Islamabad 45650, Pakistan. Fig. 1). Only a small number of targeted predictors exist in
E-mail: fayyaz.minhas14@alumni.colostate.edu. the literature but they are not able to generate predictions
Manuscript received 5 Nov. 2019; revised 25 Apr. 2020; accepted 28 May 2020. for novel microbial species [25], [26], [27]. Vishnepolsky
Date of publication 2 June 2020; date of current version 3 Feb. 2022. et al. developed a predictor for 6 different gram-negative
(Corresponding author: Fayyaz Minhas.) bacterial strains [26]. The AMP predictor by Kleandrova
Digital Object Identifier no. 10.1109/TCBB.2020.2999399

1545-5963 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Rijeka Croatia. Downloaded on October 11,2023 at 12:49:22 UTC from IEEE Xplore. Restrictions apply.
276 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 19, NO. 1, JANUARY/FEBRUARY 2022

TABLE 1
Filtering Criteria Applied to DBAASP Database to
Obtain Required Dataset

Filtering criteria Number of


peptides
DBAASP monomer peptides 12,984
Sequences with length >5 12,517
Sequences with microbial targets (excluding cancers) 9,890
Sequences with MIC in (mM) or (mg=mL) 8,045
Sequences with target species genomes in NCBI [52] 8,025
Sequences with at least one target MIC  25 mg=mL 5,710

object recognition [35], [40], [44], video classification [45],


Fig. 1. A general framework of machine learning predictors for (a) non
[46] and transfer learning [47], [48].
targeted and (b) targeted predictions.
This work focuses on using ZSL/FSL for prediction of
anti-microbial peptides. Specifically, we have developed a
et al. used 70 different gram-negative strains of bacteria in targeted anti-microbial peptide predictor called AMP0 that
training to predict antimicrobial and cytotoxic activity of can predict the effectiveness of a given peptide against a
individual amino acids in a peptide sequence for different specific species by considering the amino acid sequence of
strains [25]. Although they covered a large set of bacterial the peptide and the genomic sequence of the target species
species, their method can generate predictions for only spe- through zero and few shot learning.
cific strains of gram-negative bacterial strains. Unavailabil-
ity of their predictor for public use is also a limitation [25]. 2 METHODS
The major drawback in targeted predictors is their inability
2.1 Data Collection and Preprocessing
to predict peptide antimicrobial activity for novel microbial
species. However, the prediction of antimicrobial activity of We have used the Database of Antimicrobial Activity and
a peptide without knowing the microbial species against Structure of Peptides (DBAASP version 2) for training and
which the peptide is effective is not meaningful. As a conse- evaluation of different machine learning models in this work
quence, the development of effective machine learning [49]. DBAASP has been widely used in recent studies in this
based targeted antimicrobial peptide predictors is an open field [25], [26], [27], [50], [51]. It contains a total of 12, 984 pep-
problem. In this work, we propose a novel solution to tar- tide sequences and their experimentally verified minimum
inhibitory concentrations (MICs) against various target micro-
geted antimicrobial peptide prediction.
bial species. We have used peptides with length greater than 5
From a machine learning perspective, one of the issues
amino acids whose experimentally validated MICs are avail-
associated with targeted prediction of anti-microbial pep-
able in micro molar (mM) or microgram per milliliter
tides is that the number of training examples of peptides
(mg=mL). We also ensured that the genomes of the target spe-
associated with a single target species can be quite small.
For a novel microbial species, such as a novel virus, no or cies are available in NCBI [52] and that each peptide in our
very few peptides that target it will typically be known. dataset has at least one target species for which its MIC is
In this work, we overcome this limitation through the use  25mg=mL [26]. The details of different filtration stages to
of zero and few shot machine learning (ZSL and FSL). extract the dataset of our interest are given in Table 1. DBAASP
Conventional machine learning methods typically require reports the effectiveness of a peptide sequence against multi-
a substantial number of training examples for a given pre- ple strains of a microbial species. We have taken the minimum
MIC of a peptide across different strains of a species as its MIC
dictive task and some training examples of all classes
against that species. All MIC values have been converted to
must be included in the training set. However, zero shot
mg=mL [25]. Our final dataset comprises of 5,710 peptides that
learning algorithms are specifically designed to predict
are effective against a total of 336 different microbial species.
the association of a given example to a novel class for
The details of individual peptides and their MICs against their
which no training examples are available [28]. This is
achieved by using a feature vector representation of the target species is given in supplementary material, which can
example and an attribute descriptor of the novel class. be found on the Computer Society Digital Library at http://
Many variants of ZSL strategies have been proposed from doi.ieeecomputersociety.org/10.1109/TCBB.2020.2999399.
which are being used in the field of machine learning As an additional preprocessing step, we have scaled the
[29], [30], [31], [32], [33], [34], [35]. Similarly, few shot target scores using a sigmoidal curve such that MIC scores
learning techniques can generate accurate predictions for  25 mg=mL are mapped onto þ1 and those  100 mg=mL
are mapped to 1 (see Fig. 2). For this purpose, we have uti-
a given class by using a minimal number of training
lized a sigmoid rescaling function which maps raw MIC
examples [36]. Different techniques for FSL have also
scores y0 to target values as follows:
been proposed and their results are far better than con-
ventional machine learning models for learning novel  0   z 
y  55 e
tasks [37], [38], [39], [40], [41]. Both ZSL and FSL have y¼s  ; with s ðzÞ ¼ 2  1:
10 1 þ ez
been used in practice in different domains different
domains has given significant results such as kinase pre- This rescaling ensures that subsequent processing and
diction [42], classification of histological images [43], machine learning models are not affected by large variations
Authorized licensed use limited to: University of Rijeka Croatia. Downloaded on October 11,2023 at 12:49:22 UTC from IEEE Xplore. Restrictions apply.
GULL AND MINHAS: AMP0: SPECIES-SPECIFIC PREDICTION OF ANTI-MICROBIAL PEPTIDES USING ZERO AND FEW SHOT LEARNING 277

Fig. 3. Proposed model framework using features of peptide and geno-


mic sequences.

2-mer, 3-mer and 4-mer are calculated from a given genome


sequence and normalized to unit norm resulting in a 340-
dimensional feature representation of a given genome.

2.3 Prediction Models


Fig. 2. Rescaling MICs using bipolar sigmoid function. To predict whether a given peptide sequence will be effective
against a target microbial species or not, we have proposed a
zero-shot machine learning model. We compare the pro-
in MICs across different target species and peptides which posed model to conventional machine learning models as
can vary from a few mg=mL to more than 2000 mg=mL. If the baseline as discussed below. In order to aid the reader in
MIC of a peptide is not known for a species, its rescaled score understanding our modeling approach for baseline and pro-
is set at 0.0. posed predictors, we denote a peptide sequence by its
d-dimensional feature vector xi , i ¼ 1; . . . ; 5710 whereas a
2.2 Feature Extraction particular microbial species is represented by an a-dimen-
To predict antimicrobial activity of a peptide against given sional attribute vector sj , for j ¼ 1; . . . ; 336 based on its
species through machine learning, we need features of pep- genomic sequence. We denote the rescaled MIC of a peptide
tide and genomic sequence of target microbial species as xi against species sj by the target variable yij . The prediction
discussed below (see Fig. 3). problem can then be expressed as finding a mathematical
function fðxxi ; sj ; QÞ, parameterized by learnable parameters
2.2.1 Amino Acid Sequence Features Q, that can predict the effectiveness of a peptide sequence x i
In order to obtain peptide-level features, we have used both 1- against microbial species s j . Below we give details of differ-
mer and 2-mer composition of the peptide sequence. 1-mer ent machine learning models used in this work.
composition results in a 40-dimensional feature vector (fre-
quency count of 20 L-amino acids and 20 D-amino acids). The 2.3.1 Baseline Models
feature representation models the type of amino acid (L and We have used Radial Basis Function SVM [67], Gradient
D) in the peptide sequence separately as peptide bioactivity is Boosted Tree classifier (XGBoost) [68], Neural networks [69]
dependent upon the type of amino acids [53], [54], [55], [56]. and K-nearest neighbor classification [70] as baseline mod-
The resulting feature vectors for a given peptide is normalized els. In order to predict the effectiveness of a given peptide
to unit norm. We have also analyzed 2-mer composition sequence against a microbial species, we construct a joint
which results in a 402 ¼ 1600-dimensional feature vector [57]. x
feature representation fij ¼ ½ i  by concatenating peptide
DBAASP [49] also provides information about N-terminus sj
and C-terminus modifications of peptides which can play a and species level features with the associated training label
significant role in their antimicrobial activity. Modification at yij set to þ1 (antimicrobial) if the MIC of peptide x i for spe-
N-terminus and C-terminus of peptides can change their bio- cies s j is  25 mg=mL and -1 (non-antimicrobial) if the MIC
logical activity [58]. We have used one-hot encoding to capture is  100 mg=mL. For each of the baseline models, we tuned
information about C- and N-terminus modifications in our fea- their hyper-parameters through cross validation which are
ture representation. The sequence features are concatenated given in the supplementary material, available online.
with C and N termini features. Details about the different types
of C and N termini modifications are given in supplementary 2.3.2 Zero and Fewshot Learning
information, available online. In this work, we model the problem of targeted antimicro-
bial activity prediction through zero shot learning (ZSL)
2.2.2 Genomic Features [35]. Widely used in object classification and computer
In order to perform targeted prediction of antimicrobial vision, ZSL allows a classification model to generate predic-
activity of a peptide sequence against a particular species tions for novel classes which are not available at training
through machine learning, we need to extract species-level time [31], [32], [33]. This is achieved by learning the defini-
features as well. The literature reports the use of mono, di, tion of a class through an attribute vector representation
tri and tetra-nucleotide composition of genomic sequences instead of predicting class labels directly as in conventional
for comparison or clustering of genomes [59], [60], [61], [62], classification. Many variants of ZSL have been proposed in
[63], [64], [65], [66]. As a consequence, we have extracted the literature [29], [30], [31], [32], [33], [34], [35]. While ZSL
features from complete genomes of species downloaded assumes that no examples of a novel class presented during
from NCBI [52]. For feature extraction the counts of 1-mer, testing are available for training, the related case of few-
Authorized licensed use limited to: University of Rijeka Croatia. Downloaded on October 11,2023 at 12:49:22 UTC from IEEE Xplore. Restrictions apply.
278 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 19, NO. 1, JANUARY/FEBRUARY 2022

shot learning aims at building a machine learning model


such that only a few training examples are available for the
target class [37], [38], [39], [40], [41]. Few Shot Learning
(FSL) techniques perform significantly better than conven-
tional classification methods when the number of training
examples is very small [37], [38], [39].
The problem of targeted antimicrobial activity prediction is
ideally suited to zero and few shot learning: in typical
machine learning guided design of wet lab experiments for
screening potential peptides that are effective against a target
microbial species, no or very few peptides with known labels
are available for training. Furthermore, in order to predict the
effectiveness of a peptide against a novel microbial species for
which no or very few training examples are available, we can
model the target microbial species as a class represented by
an attribute vector based on its genomic sequence. In this
work, we have used the ZSL scheme proposed by Romera- Fig. 4. (a) TSR requires a novel peptide sequence and predicts the
Paredes and Torr [35]. For predicting the effectiveness of a microbe that is most likely to be targeted by that peptide (out of 336
peptide sequence for a target species, the discriminant func- given species); (b) PAP takes inputs of a peptide sequence and a novel
species genome to predict whether a peptide is effective against a given
tion used by the ZSL model of Romera-Paredes and Torr [35]
species or not.
can be written as f ðx xi ; sj ; QÞ ¼ xTi Qssj with the learnable
weight matrix Q 2 R . If the number of peptides and spe-
da

cies (classes) available during training are denoted by m and optimization problem [35]. For this purpose, an m  m sized
z, respectively and the rescaled MIC scores for each of the pep- kernel matrix K with Kij ¼ kðx xi ; x j Þ is computed over the
tide against each microbe is represented by the m  z matrix training data using a kernel function such as the radial basis
Y 2 ½1; 1mz , the learning problem for ZSL can be formu- function (RBF) k ðaa; b Þ ¼ expðkkaa  bk2 Þ with the hyper-
lated as the following optimization problem: parameter k > 0. The closed form solution of the kernelized
ZSL optimization problem requires calculation of an
Q ¼ arg min
Q2Rda
XT QS  Y
2
Fro
þð g kQSk2F þ  XT Q
2
F
þ gkQk2F Þ: ðm  aÞ dimensional instance-attribute association matrix
A from training data as follows (see [35] for details):
1 1
Here, X 2 Rdm and S 2 Raz represent matrices of all A ¼ KT K þ gI KYS ST S þ I :
peptide features (m examples each with a d-dimensional fea-
ture vector) and attributes of microbial species (z classes each For inference or prediction of effectiveness of a peptide rep-
with a attributes), respectively. The first term represents the resented by a feature vector x against a microbial species rep-
loss function with the aim of minimizing the error between resented by its attribute vector s, an m-dimensional vector of
predicted and target MICs. The second term (gkQSk2F þ kernel scores k ðx x; x 1 Þ kðx
xÞ ¼ ½ kðx x; x 2 Þ x; xm Þ T
kðx
kXT Qk2F þ gkQk2F ) is the regularization factor that ensures of the test example with each training example is computed
smoothness of the prediction function fðx x; s ; QÞ and sparsity and used in the kernelized prediction function f ðx x; s; AÞ ¼
of the weight matrix Q through penalization of the Frobenius xÞT Ass.
kðx
norm k k2F of respective matrices. g and  are regularization It is important to note that this framework extends seam-
hyper-parameters. In addition to better performance over lessly to FSL by simply adding further training instances for
benchmark datasets, another reason for choosing this ZSL a target class. The hyperparameters of the model ðg; ; kÞ
implementation is the existence of a computationally efficient are tuned through cross-validation. The best performance
closed-form solution of its underlying optimization problem of the model was found using g ¼ 2:0,  ¼ 0:0001, and
which can be written as follows: the hyperparameter k of RBF kernel set to 2.0.
1 1
Q ¼ XXT þ gI XYST SST þ I 2.4 Performance Evaluation
We consider two practical use-cases of our system: 1) Target
Once the optimal weight matrix Q has been obtained, the Species Ranking (TSR): given a set of microbial species for
predictions for a peptide (represented by the feature vector which labeled peptide sequences are available for training,
x ) for species (represented by the attribute vector s ) can be predict the microbe that is most-likely to be targeted by a
generated by the decision function f ðx x; s; Q Þ ¼ x T Q s . novel peptide sequence and, 2) Peptide Activity Prediction
Note that this decision function can be used for generating for Novel Species (PAP): predict whether a peptide is effec-
predictions both for novel peptides and novel species pro- tive against a given species or not such that no or very few
vided their attribute representation s is available. The most peptide examples for that species are available during train-
likely target species for a given peptide can be identified by ing (i.e., Zero Shot or Few Shot Learning scenario) (see
simply ranking the resulting decision function scores across Fig. 4). It is important to note that both these scenarios
a given list of potential target species. reflect practical use cases for biologists who are interested
This formulation can be kernelized for non-linear kernels in machine-learning guided discovery for targeted antimi-
by applying the Representer theorem to the underlying crobial peptides.
Authorized licensed use limited to: University of Rijeka Croatia. Downloaded on October 11,2023 at 12:49:22 UTC from IEEE Xplore. Restrictions apply.
GULL AND MINHAS: AMP0: SPECIES-SPECIFIC PREDICTION OF ANTI-MICROBIAL PEPTIDES USING ZERO AND FEW SHOT LEARNING 279

2.4.1 Target Species Ranking


In order to evaluate the performance of baseline and proposed
machine learning models for TSR, we have used 5-fold and
10-fold cross validation [71]. In 5 (10)-fold cross-validation,
the dataset of 5,710 peptides is divided into 5 (10) non-over-
lapping folds. A given model is trained on labeled examples
of all peptides in all but one of the folds and tested on the
remaining peptides. This process is then repeated for each
fold. For each test peptide in a fold, model scores for all 336
species are sorted in descending order. The rank of the highest
scoring microbe that is a known target of the given test pep-
tide (positive example) is used as a peptide-specific perfor-
mance metric. This simple biologist-centric performance
metric, called Rank of First Positive Prediction (RFPP), is
based on the premise that an ideal machine learning model
should assign high score to a known target species of a given
peptide sequence and, consequently, rank target species at Fig. 5. Percentile-wise RFPP Scores.
lower ranks in the sorted list in comparison to non-target spe-
cies [72]. As a result, for an ideal machine learning model, the
RFPP for all test peptides should be 1.0. As discussed in the 2.4.3. Robustness Analysis
results section, we report the percentile-wise RFPP scores for In order to analyze the importance of different features and
all test peptides for different machine learning models their biological significance, we have have analyzed the per-
together with a random predictor as experimental control. formance of different machine learning models with both 1-
The RFPP score at a certain percentile p, henceforth denoted mer and 2-mer peptide features. To gain further insight in
by RFPPðpÞ is defined as follows: RFPPðpÞ ¼ q, if p% test the role of various feature components, we have also plotted
peptides have at least one known target microbial species the corresponding weight values of the learned parameter
among their top q predictions (out of 336). Thus, for an ideal matrix Q . The magnitude of a particular weight parameter
classifier RFPPð100Þ ¼ 1, i.e., for every peptide, the top scor- reflects the relative importance of its corresponding feature.
ing species is a real target species of the given test peptide. To study the impact of sequence similarity between train-
RFPP is a biologist-centric metric as it tells us directly how ing and test peptides on prediction accuracy, we have also
often top-ranking predictions of a peptide can be expected to performed a non-redundant cross-validation analysis. Spefi-
correspond to true target species and it can be directly used in cially, we have used CD-Hit [75] to cluster the 5,710 training
experiment design. peptides into 329 clusters based on a sequence identity
threshold of 40 percent identity and performed cross-vali-
dation in a non-redundant manner such that all peptides
2.4.2 Peptide Activity Prediction
belonging to a single cluster are always in the same fold.
For PAP, i.e., predicting a peptide’s effectiveness for a novel This effectively limits the effect of homology in prediction.
species, our proposed modeling approach takes peptide We have also analyzed the impact of genomic differences
and genomic sequences as input and the score generated by between training and test species on model performance.
the decision function of a machine learning model is used For this purpose, we first define and calculate the genetic
for classification of peptide sequences for individual spe- distance between two species as the euclidean distance
cies. In order to quantify predictive accuracy, a selected set between their respective genomic feature representations.
of 17 test species from DBAASP with a small but sufficient Then, for a given species at test time, we calculate its geno-
number (75-180) of known positive and negative peptide mic distance from its closest species which has at least T
examples is used (details given in Table 3). For ZSL, the examples in training (for T ¼ 1 and T ¼ 100). The correlation
model is trained on all examples from other species and its between prediction accuracy of a given species (in terms of
predictive performance is evaluated for individual species AUC-ROC) and its genomic distance to training species is
using area under the receiver operating characteristic curve then determined.
(AUC-ROC) as a performance metric [73]. For few-shot In order to ensure that the proposed TSR model is robust
learning (FSL), a few positive and negative examples of a to false positives, we have also evaluated its performance by
test species (1, 2, 4, 8 and half of all available examples for effectively doubling the number of negative test examples.
that species) are randomly sampled for training together Specifically, for each test species with a given number of neg-
with all examples from all other species and the model is ative examples, an equal number of randomly generated test
evaluated on the remaining examples of the test species. peptide sequences are added as additional negative exam-
This process is repeated 20 times with different species-level ples such that the randomly generated peptide sequences
training and test examples to get average AUC-ROC scores follow the same length distribution as the original dataset.
and their standard deviation. The performance of the pro-
posed method is compared to classical machine learning
models as well as existing to existing state of the art non-tar- 3 RESULTS
geted antimicrobial activity predictors (CAMP [22], [74] and In this section, we discuss the results for the two learning
AMAP [19]). tasks below.
Authorized licensed use limited to: University of Rijeka Croatia. Downloaded on October 11,2023 at 12:49:22 UTC from IEEE Xplore. Restrictions apply.
280 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 19, NO. 1, JANUARY/FEBRUARY 2022

TABLE 2
RFPP Prediction Scores Generated by Various Baseline and Proposed Model

Model SVM XGBOOST 1-nearest neighbor Neural network ZSL


Percentiles 1-mer 2-mer 1-mer 2-mer 1-mer 1-mer 1-mer 2-mer
0 1 1 1 1 1 1 1 1
1 1 1 1 1 2 3 1 1
5 3 3 1 1 4 10 1 1
10 6 6 1 1 7 18 1 1
25 23 15 2 1 12 34 1 1
50 65 50 9 6 24 64 2 2
75 129 112 40 30 114 115 5 3
90 176 165 161 139 162 184 37 23
95 218 213 248 243 217 233 134 113
99 277 284 328 324 301 302 308 298
100 333 336 336 336 336 336 336 336

3.1 Target Species Ranking (TSR) target a novel species for which no or very few training exam-
Fig. 5 shows the percentile-wise RFPP scores for all classi- ples are available. For this purpose, we compare the perfor-
fiers. As discussed in section 2.4, the ideal RFPP score for all mance of conventional machine learning models (SVM,
peptides is 1.0. For the random classifier that generates a XGBoost), the proposed Zero Shot Learning (ZSL) and Few
random score for a given example, the median RFPP is 75, Shot Learning (FSL) models in addition to existing state of the
i.e., for 50 percent test peptides in cross-validation, a true art non-targeted antimicrobial activity predictors (CAMP [22],
target species is within the top 75 (out of 336) predictions. In [74] and AMAP [19]). For this use case, XGBoost with amino
contrast, for XGBoost and SVM baseline models, the median acid composition features performed significantly better than
RFPPs are 50 and 9, respectively. However, the proposed SVM (results not shown for brevity). However, the prediction
model performs much better than these baseline models: performance of XGBoost was typically no better than a ran-
the RFPP for the proposed model at the 75th percentile is dom classifier especially when the number of training exam-
1.0, i.e., for up to 75 percent peptides, the top prediction by ples from a given test species was Similarly very small (see
the model is correct. This clearly shows the effectiveness of Supplementary Information, available online for complete
the proposed prediction scheme for identifying the correct results). Existing state of the art methods such as CAMP [22]
target species of a peptide. and AMAP [19] do not give satisfactory predictive perfor-
A numeric comparison of different classification meth- mance for the chosen species. In contrast, the proposed few
ods with 1-mer and 2-mer peptide features through 5-fold shot learning model performs significantly better with an
cross-validation is given in Table 2. The 2-mer representa- expected increase in prediction accuracy when the number of
tion works well for SVM and XGBOOST classifiers whereas, training examples of a species is increased. The supplementary
for nearest neighbor and neural network models, 1-mer fea- material, available online contains more detailed results in
ture representation gives better results. For the proposed which we analyze the impact of genomic distance between
ZSL model, the 2-mer representation gives better predictive train and test species on predictive performance of the pro-
accuracy as shown in Table 2. The supplementary material, posed model. This analysis shows that there is negative corre-
available online contains more detailed comparative results lation (pearson correlation score of 0.3) between predictive
over 10-fold cross-validation, non-redundant cross-valiation accuracy and genomic distance, i.e., as expected, if the test
and additional random peptide negative examples. Change
in the number of cross-validation folds has a marginal
impact on predictive performance of the proposed model.
Our non-redundant cross-validation analysis shows that the
proposed model is significantly better than other machine
learning methods in predicting the target species of pepti-
des when the test peptides share less than 40 percent
sequence identity with training peptides. Furthermore, the
proposed scheme is also robust to additional random nega-
tive test examples. These analyses clearly show the effec-
tiveness of the proposed model in comparison to a variety
of classical machine learning.

3.2 Peptide Activity Prediction for (PAP)


Table 3 shows the results of various machine learning models
for the Peptide Activity Prediction for Novel Species (PAP)
task. In this task the objective is to evaluate whether a given
machine learning model can correctly predict peptides that Fig. 6. Weights associated with different 1-mer features.
Authorized licensed use limited to: University of Rijeka Croatia. Downloaded on October 11,2023 at 12:49:22 UTC from IEEE Xplore. Restrictions apply.
GULL AND MINHAS: AMP0: SPECIES-SPECIFIC PREDICTION OF ANTI-MICROBIAL PEPTIDES USING ZERO AND FEW SHOT LEARNING 281

TABLE 3
Results for Peptide Activity Prediction for Novel Species

For each species the number of positive (P) and Negative (N) examples is Given together with its Average Test AUC-ROC (with standard deviation in
paranthesis).

species is similar to a training species, the predictions can be cancer cells. Tryptophan (W) can penetrate a microbial cell
expected to be more accurate. However, this analysis also membrane and is effective against numerous antibiotic resis-
shows that the proposed model does not undergo an abrupt tant bacteria. Phenylalanine-rich (F) AMPs have higher anti-
degradation in predictive performance when generating pre- microbial activity against Gram-positive bacteria, Gram-
dictions for test species that are different from species used in negative bacteria and yeast without hemolytic activity [76].
training. Cysteine (C) is also an important amino acid in natural antimi-
crobial peptides of vertebrates, invertebrates and plants [77],
have excessive ability of pore formation in a membrane which
3.3 Feature Analysis leads to high antimicrobial activity [76]. The supplementary
In order to gain an insight into the importance of various fea- material, available online shows plots of the parameter matrix
tures used by the linear ZSL model, we have analyzed the for genomic feature vector components and their association
weight values of the parameter matrix Q . Fig. 6 shows the with peptide sequence features.
sum of the weight values for each L- (small) and D- type (cap-
tialized) amino acid in the feature representation across all
species. The large magnitudes of weights of amino acids G, g, 3.4 Webserver
F, f, P, p, and w correlates with literature findings about the The webserver developed for proposed model is available
importance of these amino acids in AMPs. Specifically, the at the URL: http://ampzero.pythonanywhere.com. The
Proline-rich peptides (P) have capability of bacterial cell pene- webserver takes a peptide sequence along with any N/C-
tration. Glycine (G) improves antimicrobial activity of pepti- terminus modifications as input together with the genome
des and potentially targets fungi, Gram-negative bacteria, and of a species in order to predict the effectiveness of the
Authorized licensed use limited to: University of Rijeka Croatia. Downloaded on October 11,2023 at 12:49:22 UTC from IEEE Xplore. Restrictions apply.
282 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 19, NO. 1, JANUARY/FEBRUARY 2022

peptide against the given species. The user can upload a list [15] Z. Teng, M. Guo, Q. Dai, C. Wang, J. Li, and X. Liu,
“Computational prediction of protein function based on weighted
of known positive and negative example peptide sequences mapping of domains and GO terms,” BioMed. Res. Int., vol. 2014,
for the given species for FSL predictions. pp. 1–9, 2014.
[16] P. Radivojac et al., “A large-scale evaluation of computational
protein function prediction,” Nature Methods, vol. 10, no. 3, 2013,
Art. no. 221.
4 CONCLUSIONS [17] A. Valencia, “Automatic annotation of protein function,” Curr.
Opinion Struct. Biol., vol. 15, no. 3, pp. 267–274, 2005.
We have developed a targeted antimicrobial activity predic- [18] B. Rost, J. Liu, R. Nair, K. O. Wrzeszczynski, and Y. Ofran,
tor called AMP0 which can predict the effectiveness of a “Automatic prediction of protein function,” Cellular Mol. Life Sci.
given peptide sequence against a target species. The use of CMLS, vol. 60, no. 12, pp. 2637–2650, 2003.
[19] S. Gull, N. Shamim, and F. Minhas, “AMAP: Hierarchical multi-
zero and few shot learning in the proposed model helps in label prediction of biologically active and antimicrobial peptides,”
overcoming the shortcomings of conventional machine Comput. Biol. Med., vol. 107, pp. 172–181, 2019.
learning techniques for this purpose. Our cross-validation [20] P. Bhadra, J. Yan, J. Li, S. Fong, and S. W. Siu, “AmPEP: Sequence-
analysis shows that the proposed model can perform better based prediction of antimicrobial peptides using distribution pat-
terns of amino acid properties and random forest,” Sci. Rep.,
than existing approaches and it can be easily integrated in vol. 8, no. 1, 2018, Art. no. 1697.
experimental discovery of antimicrobial peptide sequences [21] M. Torrent, V. M. Nogues, and E. Boix, “A theoretical approach to
for novel species. spot active regions in antimicrobial proteins,” BMC Bioinf., vol. 10,
no. 1, 2009, Art. no. 373.
[22] F. H. Waghu, R. S. Barai, P. Gurung, and S. Idicula-Thomas,
ACKNOWLEDGMENTS “CAMPR3: A database on sequences, structures and signatures
of antimicrobial peptides,” Nucleic Acids Res., vol. 44, no. D1,
Sadaf Gull is supported by a grant under indigenous 5000 pp. D1094–D1097, 2015.
PhD fellowship scheme by the Higher Education Commis- [23] W. Lin and D. Xu, “Imbalanced multi-label learning for identify-
sion (HEC) of Pakistan. ing antimicrobial peptides and their functional types,” Bioinfor-
matics, vol. 32, no. 24, pp. 3745–3752, 2016.
[24] P. Agrawal and G. P. Raghava, “Prediction of antimicrobial poten-
REFERENCES tial of a chemically modified peptide from its tertiary structure,”
Front. Microbiol., vol. 9, 2018, Art. no. 2551.
[1] B. Aslam et al., “Antibiotic resistance: A rundown of a global
[25] V. V. Kleandrova, J. M. Ruso, A. Speck-Planche, and M. N. Dias
crisis,” Infection Drug Resistance, vol. 11, 2018, Art. no. 1645.
Soeiro Cordeiro, “Enabling the discovery and virtual screening of
[2] C. L. Ventola, “The antibiotic resistance crisis: Part 1: Causes and
potent and safe antimicrobial peptides. simultaneous prediction
threats,” Pharmacy Ther., vol. 40, no. 4, 2015, Art. no. 277.
of antibacterial activity and cytotoxicity,” ACS Combinatorial Sci.,
[3] J. M. Blair, “A climate for antibiotic resistance,” Nature Climate
vol. 18, no. 8, pp. 490–498, 2016.
Change, vol. 8, no. 6, 2018, Art. no. 460.
[26] B. Vishnepolsky et al., “Predictive model of linear antimicrobial
[4] M. Lakemeyer, W. Zhao, F. A. Mandl, P. Hammann, and S. A. Sieber,
peptides active against gram-negative bacteria,” J. Chem. Inform.
“Thinking outside the box—Novel antibacterials to tackle the
Model., vol. 58, no. 5, pp. 1141–1151, 2018.
resistance crisis,” Angewandte Chemie Int. Edition, vol. 57, no. 44,
[27] A. Speck-Planche, V. V. Kleandrova, J. M. Ruso, and M. DS Cordeiro,
pp. 14440–14475, 2018.
“First multitarget chemo-bioinformatic model to enable the discov-
[5] C. N. Spaulding, R. D. Klein, H. L. Schreiber, J. W. Janetka, and
ery of antibacterial peptides against multiple gram-positive patho-
S. J. Hultgren, “Precision antimicrobial therapeutics: The path of
gens,” J. Chem. Inform. Model., vol. 56, no. 3, pp. 588–598, 2016.
least resistance?,” NPJ Biofilms Microbiomes, vol. 4, no. 1, 2018,
[28] H. Larochelle, D. Erhan, and Y. Bengio, “Zero-data learning of
Art. no. 4.
new tasks,” in Proc. 23rd Nat. Conf. Artif. Intell. - Vol. 2, 2008,
[6] F. Kampshoff, M. D. Willcox, and D. Dutta, “A pilot study of the
pp. 646–651, Accessed: Apr. 25, 2020. [Online].
synergy between two antimicrobial peptides and two common
[29] M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell,
antibiotics,” Antibiotics, vol. 8, no. 2, 2019, Art. no. 60.
“Zero-shot learning with semantic output codes,” in Proc. Int.
[7] F. Costa, C. Teixeira, P. Gomes, and M. C. L. Martins, “Clinical
Conf. Neural Inf. Process. Syst., 2009, pp. 1410–1418.
application of AMPs,” in Antimicrobial Peptides, Springer, 2019,
[30] Z. Zhang and V. Saligrama, “Zero-shot learning via semantic simi-
pp. 281–298.
larity embedding,” in Proc. IEEE Int. Conf. Comput. Vis., 2015,
[8] G. Yu, D. Y. Baeder, R. R. Regoes, and J. Rolff, “Predicting drug
pp. 4166–4174.
resistance evolution: Insights from antimicrobial peptides and
[31] R. Socher, M. Ganjoo, C. D. Manning, and A. Ng, “Zero-shot
antibiotics,” Proc. Roy. Soc. B: Biol. Sci., vol. 285, no. 1874, 2018,
learning through cross-modal transfer,” in Proc. IEEE Int. Conf.
Art. no. 20172687.
Neural Information Processing Systems, 2013, pp. 935–943.
[9] A. Sokolov, C. Funk, K. Graim, K. Verspoor, and A. Ben-Hur,
[32] M. Norouzi et al., “Zero-shot learning by convex combination of
“Combining heterogeneous data sources for accurate functional
semantic embeddings,” 2013, arXiv:1312.5650.
annotation of proteins,” BMC Bioinf., vol. 14, 2013, Art. no. S10.
[33] Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong, “Transductive
[10] T. L. Campos, P. K. Korhonen, R. B. Gasser, and N. D. Young, “An
multi-view zero-shot learning,” IEEE Trans. Pattern Anal. Mach.
evaluation of machine learning approaches for the prediction of
Intell., vol. 37, no. 11, pp. 2332–2345, Nov. 2015.
essential genes in eukaryotes using protein sequence-derived
[34] E. Kodirov, T. Xiang, and S. Gong, “Semantic autoencoder for
features,” Comput. Struct. Biotechnol. J., vol. 17, pp. 785–796, 2019.
zero-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog-
[11] M. Kulmanov, M. A. Khan, and R. Hoehndorf, “DeepGO: Predicting
nit., 2017, pp. 3174–3183.
protein functions from sequence and interactions using a deep ontol-
[35] B. Romera-Paredes and P. Torr, “An embarrassingly simple
ogy-aware classifier,” Bioinformatics, vol. 34, no. 4, pp. 660–668, 2017.
approach to zero-shot learning,” in Proc. Int. Conf. Mach. Learn.,
[12] A. S. Rifaioglu, T. Do gan, M. J. Martin, R. Cetin-Atalay, and
2015, pp. 2152–2161.
V. Atalay, “DEEPred: Automated protein function prediction
[36] L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object
with multi-task feed-forward deep neural networks,” Sci. Rep.,
categories,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4,
vol. 9, no. 1, Art. no. 7344, 2019.
pp. 594–611, Apr. 2006.
[13] R. Fa, D. Cozzetto, C. Wan, and D. T. Jones, “Predicting human
[37] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for
protein function with multi-task deep neural networks,” PloS One,
few-shot learning,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
vol. 13, no. 6, 2018, Art. no. e0198216.
2017, pp. 4077–4087.
[14] S. Hua and Z. Sun, “Support vector machine approach for protein
[38] F. Sung et al., “Learning to compare: Relation network for few-
subcellular localization prediction,” Bioinformatics, vol. 17, no. 8,
shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
pp. 721–728, 2001.
2018, pp. 1199–1208.

Authorized licensed use limited to: University of Rijeka Croatia. Downloaded on October 11,2023 at 12:49:22 UTC from IEEE Xplore. Restrictions apply.
GULL AND MINHAS: AMP0: SPECIES-SPECIFIC PREDICTION OF ANTI-MICROBIAL PEPTIDES USING ZERO AND FEW SHOT LEARNING 283

[39] S. Gidaris and N. Komodakis, “Dynamic few-shot visual learning [63] H. Nakashima, K. Nishikawa, and T. Ooi, “Di. erences in dinucle-
without forgetting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog- otide frequencies of human, yeast, and escherichia coli genes,”
nit., 2018, pp. 4367–4375. DNA Res., vol. 4, no. 3, pp. 185–192, 1997.
[40] V. Garcia and J. Bruna, “Few-shot learning with graph neural [64] H. Nakashima, M. Ota, K. Nishikawa, and T. Ooi, “Genes from
networks,” 2017, arXiv:1711.04043. nine genomes are separated into their organisms in the dinucleo-
[41] S. Ravi and H. Larochelle, “Optimization as a model for few-shot tide composition space,” DNA Res., vol. 5, no. 5, pp. 251–259, 1998.
learning,” in Proc. Int. Conf. Learn. Representations, vol. 1, p. 6, 2017. [65] D. T. Pride, R. J. Meinersmann, T. M. Wassenaar, and M. J. Blaser,
[42] I. Deznabi, B. Arabaci, M. Koyut€ urk, and O. Tastan, “DeepKinZero: “Evolutionary implications of microbial genome tetranucleotide
zero-shot learning for predicting kinase-phosphosite associations frequency biases,” Genome Res., vol. 13, no. 2, pp. 145–158, 2003.
involving understudied kinases,” BioRxiv, pp. 670638, 2019. doi: [66] M. Takahashi, K. Kryukov, and N. Saitou, “Estimation of bacterial
10.1101/670638. species phylogeny through oligonucleotide frequency distances,”
[43] M. Mendieta and D. Romero, “A cross-modal transfer approach Genomics, vol. 93, no. 6, pp. 525–533, 2009.
for histological images: A case study in aquaculture for disease [67] C. Cortes and V. Vapnik, “Support-vector networks,” Mach.
identification using zero-shot learning,” in Proc. IEEE 2nd Ecuador Learn., vol. 20, no. 3, pp. 273–297, 1995.
Techn. Chapters Meeting, 2017, pp. 1–6. [68] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting sys-
[44] X. Sun, H. Xv, J. Dong, H. Zhou, C. Chen, and Q. Li, “Few-shot tem,” in Proc. 22nd ACM Sigkdd Int. Conf. Knowl. Discov. Data Min-
Learning for Domain-specific Fine-grained Image Classification,” ing, 2016, pp. 785–794.
IEEE Trans. Ind. Electronics, to be published, doi: 10.1109/ [69] K. Gurney, in An Introduction to Neural Networks. Boca Raton, FL,
TIE.2020.2977553 USA: CRC press, 1997.
[45] K. Cao, J. Ji, Z. Cao, C.-Y. Chang, and J. C. Niebles, “Few-shot video [70] T. Cover and P. Hart, “Nearest neighbor pattern classification,”
classification via temporal alignment,” 2019, arXiv:1906.11415. IEEE Trans. Inf. Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967.
[46] M. Matsuki and S. Inoue, “Toward projection learning between [71] Z. John Lu, “The elements of statistical learning: Data mining,
sensor data and semantic word vector for zero-shot learning,” in inference, and prediction,” J. Roy. Statist. Soc.: Series A (Statist.
Proc. Joint 8th Int. Conf. Inform. Electronics Vis., 3rd Int. Conf. Imag. Soc.), vol. 173, no. 3, pp. 693–694, 2010.
Vis. Pattern Recognit., 2019, pp. 108–111. [72] F. ul A. Afsar Minhas, B. J. Geiss, and A. Ben-Hur, “PAIRpred:
[47] B. Liu, X. Wang, M. Dixit, R. Kwitt, and N. Vasconcelos, “Feature Partner-specific prediction of interacting residues from sequence
space transfer for data augmentation,” in Proc. IEEE Conf. Comput. and structure,” Proteins: Struct. Function Bioinf., vol. 82, no. 7,
Vis. Pattern Recognit., 2018, pp. 9090–9098. pp. 1142–1155, 2014.
[48] Z. Luo, Y. Zou, J. Hoffman, and L. F. Fei-Fei, “Label efficient learn- [73] J. Davis and M. Goadrich, “The relationship between Precision-
ing of transferable representations acrosss domains and tasks,” in Recall and ROC curves,” in Proc. 23rd Int. Conf. Mach. Learn., 2006,
Proc. Int. Conf. Neural Inf. Process. Syst., 2017, pp. 165–177. pp. 233–240.
[49] M. Pirtskhalava et al., “DBAASP v. 2: An enhanced database of [74] M. N. Gabere and W. S. Noble, “Empirical comparison of web-
structure and antimicrobial/cytotoxic activity of natural and syn- based antimicrobial peptide prediction tools,” Bioinformatics,
thetic peptides,” Nucleic Acids Res., vol. 44, no. D1, pp. D1104– vol. 33, no. 13, pp. 1921–1929, 2017.
D1112, 2015. [75] Y. Huang, B. Niu, Y. Gao, L. Fu, and W. Li, “CD-HIT Suite: A web
[50] M. Youmans, C. Spainhour, and P. Qiu, “Long short-term mem- server for clustering and comparing biological sequences,” Bioin-
ory recurrent neural networks for antibacterial peptide identi- formatics, vol. 26, no. 5, pp. 680–682, 2010.
fication,” in Proc. IEEE Int. Conf. Bioinf. Biomed., 2017, pp. 498–502. [76] J. Wang et al., “Antimicrobial peptides: Promising alternatives in
[51] T. S. Win et al., “HemoPred: A web server for predicting the hemo- the post feeding antibiotic era,” Med. Res. Rev., vol. 39, no. 3,
lytic activity of peptides,” Future Medicinal Chemistry, vol. 9, no. 3, pp. 831–859, 2019.
pp. 275–291, 2017. [77] J.-L. Dimarcq, P. Bulet, C. Hetru, and J. Hoffmann, “Cysteine-rich
[52] N. R. Coordinators, “Database resources of the national center for antimicrobial peptides in invertebrates,” Peptide Sci., vol. 47, no. 6,
biotechnology information,” Nucleic acids Res., vol. 44, no. Data- pp. 465–477, 1998.
base issue, 2016, Art. no. D7.
[53] F. Cava, H. Lam, M. A. De Pedro, and M. K. Waldor, “Emerging
knowledge of regulatory roles of D-amino acids in bacteria,” Cel-
lular Mol. Life Sci., vol. 68, no. 5, pp. 817–831, 2011. Sadaf Gull is currently working toward the PhD degree in the Depart-
[54] M. L. Mangoni et al., “Effect of natural L-to D-amino acid conver- ment of Computer and Information Sciences, Pakistan Institute of Engi-
sion on the organization, membrane binding, and biological func- neering and Applied Sciences (PIEAS), Islamabad, Pakistan. She is
tion of the antimicrobial peptides bombinins H,” Biochemistry, funded by the indigenous PhD fellowships scheme by the Higher Educa-
vol. 45, no. 13, pp. 4266–4276, 2006. tion Commission (HEC). Her area of research is machine learning in bio-
[55] R. H. Baltz, “Daptomycin: Mechanisms of action and resistance, medical informatics.
and biosynthetic engineering,” Curr. Opinion Chemical Biol.,
vol. 13, no. 2, pp. 144–151, 2009.
[56] Y. Kawai et al., “Structural and functional differences in two cyclic
bacteriocins with the same sequences produced by lactobacilli,”
Appl. Environ. Microbiol., vol. 70, no. 5, pp. 2906–2911, 2004.
Fayyaz Minhas received the PhD degree in bioinformatics from Colo-
[57] C. Leslie, E. Eskin, and W. S. Noble, “The spectrum kernel: A
string kernel for SVM protein classification,” in Biocomputing rado State University, USA, on a Fulbright Scholarship. is currently with
2002, World Scientific, Singapore, 2001, pp. 564–575. the Department of Computer Science, University of Warwick, Coventry,
[58] E. Crusca Jr et al., “Influence of N-terminus modifications on the UK and is partially supported by the PathLAKE digital pathology consor-
biological activity, membrane interaction, and secondary structure tium which is funded from the Data to Early Diagnosis and Precision
of the antimicrobial peptide hylin-a1,” Peptide Sci., vol. 96, no. 1, Medicine strand of the government’s Industrial Strategy Challenge
Fund, managed and delivered by UK Research and Innovation (UKRI) .
pp. 41–48, 2011.
For more information, please visit (https://warwick.ac.uk/fac/cross_fac/
[59] S. Karlin and I. Ladunga, “Comparisons of eukaryotic genomic
sequences,” Proc. Nat. Acad. Sci. USA, vol. 91, no. 26, pp. 12832–12836, pathlake/). He has also been awarded the National Youth Award by the
1994. Government of Pakistan for his contributions to science and technology.
[60] S. Karlin, A. M. Campbell, and J. Mrazek, “Comparative DNA anal- His research focuses on applications of machine learning in Bioinformat-
ysis across diverse genomes,” Annu. Rev. Genetics, vol. 32, no. 1, ics and histopathology.
pp. 185–225, 1998.
[61] S. Kariin and C. Burge, “Dinucleotide relative abundance
" For more information on this or any other computing topic,
extremes: A genomic signature,” Trends Genetics, vol. 11, no. 7,
pp. 283–290, 1995. please visit our Digital Library at www.computer.org/csdl.
[62] S. Karlin, “Global dinucleotide signatures and analysis of genomic
heterogeneity,” Curr. Opinion Microbiol., vol. 1, no. 5, pp. 598–610,
1998.

Authorized licensed use limited to: University of Rijeka Croatia. Downloaded on October 11,2023 at 12:49:22 UTC from IEEE Xplore. Restrictions apply.

You might also like