You are on page 1of 4

Perspective

AlphaFold – A Personal Perspective on


the Impact of Machine Learning

Alan R. Fersht

MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK

https://doi.org/10.1016/j.jmb.2021.167088
Edited by Dr. Daniel Otzen

Abstract
I outline how over my career as a protein scientist Machine Learning has impacted my area of science and
one of my pastimes, chess, where there are some interesting parallels. In 1968, modelling of three-
dimensional structures was initiated based on a known structure as a template, the problem of the path-
way of protein folding was posed and bets were taken in the emerging field of Machine Learning on
whether computers could outplay humans at chess. Half a century later, Machine Learning has pro-
gressed from using computational power combined with human knowledge in solving problems to playing
chess without human knowledge being used, where it has produced novel strategies. Protein structures
are being solved by Machine Learning based on human-derived knowledge but without templates. There
is much promise that programs like AlphaFold based on Machine Learning will be powerful tools for
designing entirely novel protein folds and new activities. But, will they produce novel ideas on protein fold-
ing pathways and provide new insights into the principles that govern folds?
Ó 2021 Elsevier Ltd. All rights reserved.

Introduction dynamics and to aid the visualization and


classification of protein structures into families. In
Machine Learning has had a significant impact on a parallel universe, the science of Artificial
what I do in science and in my personal activities. Intelligence (AI) and Machine Learning was also
My career in research as a protein scientist began beginning to emerge as theory and potential
at a most fortunate time. I was recruited in the last applications. To set the scene in this perspective, I
year of my PhD in 1968, a key year that pops up a will begin with the little that I know about AI and
few times in this article, to work as a chemist on Machine Learning and how they have impacted on
the mechanisms of enzymes whose structures two of my pastimes, board games like chess, a
were being determined at the MRC of Molecular passion that I share with Demis Hassabis, a
Biology. The first three-dimensional structures of mastermind of DeepMind, the company behind
enzymes were beginning to be solved by X-ray AlphaFold, and my current addiction to bird and
protein crystallography, and the determination of nature photography.
their primary structures by direct protein
sequencing was laboriously progressing. The next Artificial intelligence, machine learning
decade saw the introduction of recombinant DNA
technology, DNA sequencing, gene cloning and Artificial intelligence is the science of studying
expression, on the one hand, and the application how to build intelligent programs and machines
of computational methods to analyse protein that can solve problems in an imaginative manner.

0022-2836/Ó 2021 Elsevier Ltd. All rights reserved. Journal of Molecular Biology 433 (2021) 167088
A.R. Fersht Journal of Molecular Biology 433 (2021) 167088

We are concerned here with Machine Learning, and are routinely used by them to analyse
which is a subfield of AI that enables machines to positions and train. The game of Go is
learn from past data or experience without being computationally even more difficult than Chess
explicitly programmed. It can use methods from and the ability of machines to beat the best
statistics and physics to find hidden insights in human players was more limited until 2015.
data. It involves algorithms such as neural It is perhaps no coincidence that Demis Hassabis
networks and deep learning neural networks. (A of DeepMind was a pre-teenage chess prodigy
neural network is a computing system made up of before doing his degree in computer science and a
interconnected units like neurons in the brain that PhD in neuroscience. In 2015, AlphaGo, the
processes information by passing it among them. DeepMind program, beat the European Go
Deep learning uses huge neural networks with Champion. The program1 used deep neural net-
many layers of processing units.) Machine works, was trained by supervised learning from
Learning is concerned mainly about accuracy and human games and reinforcement learning from
patterns, and automates analytical model building. games of self-play, and went on to beat the World
Champion Ke Jie. Subsequently, there was a signif-
icant breakthrough on the introduction of the pro-
Application to photography and board gram AlphaGo Zero2 that mastered Go to an even
games higher level using deep neural networks and self-
reinforcement methods without employing any prior
Accurate autofocus is a key feature of modern human knowledge: the rules of the game were set as
photography, especially when photographing the parameters and the machine learned just by
birds, animals and other subjects in action. When playing itself and self-reinforcement. The program
photographing birds, animals and human faces, was extended to master chess and shogi again just
having the eye in focus is usually the most by knowing the rules and playing itself.3 AlphaGo
important feature. Companies like Sony and Zero is easily the strongest chess playing program
Canon have developed AI methods for doing this. by far. The program has taken a further step forward
In the past year, Canon released its first camera, in MuZero which works also with Atari games that
the R5, that does this remarkably well. Its are messy and do not rely on a defined set of rules.3
algorithms will recognise the shape of all types of So where does this leave chess and Go players?
birds and animals, locate them in just a small part Lee Se-dol, the only human to beat AlphaGo in a
of a crowded scene, home in on the head and game (due to a machine blunder), retired in
then the eye. It can locate and track a bird in flight deference to the superiority of machines. But, Ke
and focus unerringly on the eye at a frame rate of Jie is learning from AlphaGo Zero to improve his
20/s. Do I know how it does it? I know that it has game. Similarly, AlphaGo Zero has brought some
somehow been trained on many photos of birds new strategies into chess and is stimulating human
and animals, but all that concerns me is that it players who are using it to improve their play.
works accurately and rapidly and assists me. I
could get sharp shots in the past without this aid, AlphaFold and protein folding
but the success rate with it is far higher. The eye-
autofocus is simply a useful tool. The Protein Folding Problem has, at least, two
parts. The first, based on the work of Christian
Chess and Go Anfinsen,4 is the prediction of the three-
dimensional structure of proteins from knowing only
Chess has for long been a subject of computer their primary structure. The second, asked by
science and a challenge to the practitioners of AI Levinthal in 1968: What is the pathway of protein
but with minimal success until the first glimmers of folding?5
hope at the end of the 1960s. In 1968, David
Levy, Scottish chess champion and computer
Structure prediction
expert, made a £500 bet with two AI pioneers that
no computer program would win a chess match The first predictions of protein structure were
against him within 10 years. He only just won the based on using the known, previously determined
bet in 1978. The strength of chess playing structure of a homologous protein as a template.
machines improved after the slow start and in the For example, Keil and co-workers in 19686 mod-
mid 1990s super computers programmed on past elled the structure of trypsin based on the structure
games played by humans and human-designed of alpha-Chymotrypsin determined by Blow and co-
tactics and strategy reached grandmaster workers,7 a completely human-based model
strength. In 1997, the IBM Deeper Blue defeated building.
one of the greatest players in the history of chess, Fast forward 50 years of intense experimental
Garry Kasparov, 3.5–2.5 in a six-game match. work and 175,000 three-dimensional structures in
Chess engines, which are programmed with much the Protein Data Bank and great advances in
human experience and input, are now running on predicting structures fuelled by the CASP
laptops, are stronger than the top grandmasters competition, AlphaFold has exhibited the power of
2
A.R. Fersht Journal of Molecular Biology 433 (2021) 167088

Machine Learning in recognising patterns in primary in awe at the skill and ingenuity of experimentalists
sequences that determine three-dimensional folds and theoreticians.
with high precision. The central component of
AlphaFold is a neural network that is trained on
Folding pathways
the very large numbers of structures in the Protein
Data Bank to predict distributions of distances Solving the pathway of protein folding, along with
between the Cb atoms of pairs of residues of a the dynamics of protein processes, is a different
protein and construct an artificial force field to type of challenge from predicting protein structure.
direct folding without using an individual template A rough analogy is that structure prediction is like
but on patterns derived from many proteins.8 It equilibrium thermodynamics, which gives the
has also relied heavily on sequences databases energy level of a state but no information on the
and multiple sequence alignments. The situation is kinetics of how the state reached that energy
like with the chess programs that have much human level. AlphaFold’s task was to predict the most
experience built into them and can perform better likely structures of proteins that are known to fold,
than human beings because they use additional dif- and not the kinetics and stability of folding.
ferent ways of analysis and powerful computation, The posing of the pathway problem, famously
or, even more crudely, like the eye-detecting autofo- attributed to Levinthal5,10 was, in reality, a search
cus on my camera which is much faster and more for mechanisms that would speed up a process that
accurate than me. AlphaFold has not yet shown if unbiased and purely random would take eons.
simple rules that govern protein folding – indeed, Theoreticians have subsequently shown in general
there may not be – or explained to me how I can terms and simplified simulations how the acquisition
look at a protein myself and see how it would fold of native contacts speeds up folding, and atomistic
up. There is even the philosophical question of simulations can mimic folding and unfolding path-
whether AlphaFold is really solving the folding prob- ways at high resolution. Kinetic experiments show
lem as posed by Anfinsen: there is an alternative that many small domains fold by a nucleation-
view that if the method relies directly or indirectly condensation mechanism in which there is a final
on the information derived from multiple sequence or major transition state that looks like an expanded
alignments then it is a method of experimental form of the native structure with many long interac-
structure determination like X-ray crystallography, tions being partly formed that stabilize an extended
NMR spectroscopy or electron microscopy, using folding nucleus with stronger interactions being con-
sequence as the experimental data. But, whatever solidated.11 Other small domains can form by a
the philosophical arguments, AlphaFold could be more classical framework mechanism whereby pre-
very useful technology, as for example for the formed elements of regular secondary structures
design of drugs where structural knowledge of the dock.12 Members of the homeodomain 3-helix bun-
target protein is often the starting point, and its dle family have a common fold, which would easily
power can be harnessed by experimentalists and be detected by AlphaFold. But the mechanism of
theoreticians just as chess players now routinely their folding slides from a framework mechanism
use the earlier generations of chess programs to nucleation-condensation with change of local sta-
based on human knowledge and machine number bility of the secondary structure, as shown by exper-
crunching. iment and atomistic simulation.12
Where it goes from here will be exciting. I wrote Atomistic simulation of the unfolding pathways of
back in 1984 after the first protein engineering proteins, which are the folding pathways in reverse
experiments, that one of the goals was to design for simple systems, is very powerful and widely
novel enzymes.9 This has proved to be an immense applicable to families of proteins where methods
challenge since the interplay of structure, binding can be applied to detect features that direct
and catalysis is so elusively exquisite and there folding mechanisms and lead to different
are very stringent and not fully understood con- sequences forming similar structures.13 Will
straints on enzyme-substrate and enzyme- Machine Learning add to solving folding pathways
transition interactions and dynamics. The design and dynamic processes?
and selection of simple binding proteins, on the Where does it leave experimentalists? Are we like
other hand, is far simpler. So, can AlphaFold facili- Lee Se-dol and retire to defer to superior machines
tate enzyme design? Will it automate drug design? or are we going to use Machine Learning as an
Can it design entirely novel protein folds without invaluable tool like chess players use chess
input from multiple sequence alignments? And engines? The first elucidations of three-
regarding protein design, there is an additional com- dimensional structures of enzymes in the late
plication to predicting the structure of an existing 1960s caused some classical enzymologists to flee
protein that is known to fold: a novel protein may to work on proteins with unknown structures so as
or may not fold as protein stability is often close to not to compete with the new technology and its
marginal and there are kinetic barriers to successful exponents. Others saw this as the most exciting
folding. I certainly hope they will solve these prob- opportunity to understand enzyme catalysis,
lems, and wait in keen anticipation as I am always structure and mechanism at the atomic level and
3
A.R. Fersht Journal of Molecular Biology 433 (2021) 167088

grasped the new opportunities with both hands for 3. Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K.,
experiment and computation. I would hope that the Sifre, L., Schmitt, S., et al., (2020). Mastering Atari, Go,
same is true for Machine Learning applied to chess and shogi by planning with a learned model. Nature,
proteins and wish I could start my career again on 588, 604–609.
protein design with AlphaFold at my disposal. 4. Anfinsen, C.B., (1973). Principles that govern the folding of
protein chains. Science, 181, 223–230.
5. Levinthal, C., (1968). Are there pathways for protein
folding?. J. Chim. Phys.-Chim. Biol., 65, 44–45.
6. Keil, B., Dlouha, V., Holeysov, V., Sorm, F., (1968).
Acknowledgments Hypothesis of 3-Dimensional arrangement of polypeptide
chain in trypsin. Czech. Chem. Commun., 13, 2307–2315.
I thank the MRC Laboratory of Molecular Biology 7. Matthews, B.W., Sigler, P.B., Henderson, R., Blow, D.M.,
for funding. (1967). Three-dimensional structure of tosyl-alpha-
chymotrypsin. Nature, 214, 652–656.
Conflict of Interest Statement 8. Senior, A.W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre,
L., Green, T., et al., (2020). Improved protein structure
The authors declare no conflicts of interest. prediction using potentials from deep learning. Nature, 577,
706–710.
Received 23 April 2021; 9. Fersht, A.R., Shi, J.P., Wilkinson, A.J., Blow, D.M., Carter,
P., Waye, M.M.Y., et al., (1984). Analysis of enzyme
Accepted 28 May 2021;
structure and activity by protein engineering. Angew.
Available online 2 June 2021
Chem.-Int. Edit. Engl., 23, 467–473.
Keywords: 10. Levinthal, C., (1969). How to Fold Graciously. University of
protein folding; Illinois Press, Urbana.
11. Itzhaki, L.S., Otzen, D.E., Fersht, A.R., (1995). The
machine learning;
structure of the transition state for folding of chymotrypsin
chess;
inhibitor 2 analysed by protein engineering methods:
Go
evidence for a nucleation-condensation mechanism for
protein folding. J. Mol. Biol., 254, 260–288.
12. Gianni, S., Guydosh, N.R., Khan, F., Caldas, T.D., Mayor,
References U., White, G.W., et al., (2003). Unifying features in protein-
folding mechanisms. Proc. Natl. Acad. Sci. U. S. A., 100,
1. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., 13286–13291.
van den Driessche, G., et al., (2016). Mastering the game 13. Demakis, C., Childers, M.C., Daggett, V., (2021).
of Go with deep neural networks and tree search. Nature, Conserved patterns and interactions in the unfolding
529, 484–489. transition state across SH3 domain structural
2. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., homologues. Protein Sci., 30, 391–407.
Huang, A., Guez, A., et al., (2017). Mastering the game of
Go without human knowledge. Nature, 550, 354–359.

You might also like