You are on page 1of 14

Total chemical synthesis of proteins

Stephen B. H. Kent
Received 7th April 2008
First published as an Advance Article on the web 16th September 2008
DOI: 10.1039/b700141j
This tutorial review outlines the modern ligation methods that enable the ecient total chemical
synthesis of enzymes and other protein molecules. Key to this success is the chemoselective
reaction of unprotected synthetic peptides (chemical ligation). Notably, native chemical ligation
enables the reaction of two unprotected peptides in aqueous solution at neutral pH to form a
single product in near quantitative yield. Full-length synthetic polypeptides are folded to form the
dened tertiary structure of the target protein molecule, which is characterized by mass
spectrometry, NMR, and X-ray crystallography, in addition to biochemical and/or biological
activity.
1. Introduction
The need for chemical protein synthesis
Proteins are biological macromolecules that carry out most of
the biochemical functions of the cell and are also widely
employed in structural roles. The biological function of a
particular protein derives from its unique folded structure,
which in turn is dened by the amino acid sequence of its
polypeptide chain. Genome sequencing has revealed that a
eukaryotic cell may have more than 20 000 open reading
frames, each of which encodes the amino acid sequence of a
ribosomally-translated polypeptide chain that folds to form a
protein molecule. A typical protein molecule found in nature
consists of a polypeptide chain of B300 amino acid residues.
The diversity of the proteins found in a cell is substantially
increased by splicing at the mRNA level prior to translation,
and by post-translational modications of specic amino acids
after their incorporation into a proteins polypeptide chain. It
has been estimated that a cell can express more than 100 000
distinct protein molecules.
1
Because of the diverse and important roles that proteins
play in the biological world, scientists have long sought to
understand how the structure of a protein molecule gives rise
to its functional properties. For the past thirty years, the
techniques of recombinant DNA-based molecular biology
have been used to systematically vary the amino acid sequence
of a polypeptide expressed in Escherichia coli; after folding, the
changes in the properties of the mutant protein molecule are
observed and correlated with the change in the amino acid
sequence. This approach is somewhat optimistically named
protein engineering, and has provided much useful insight
into how proteins function.
2
Yet from a chemists viewpoint
the molecular biology approach to the elucidation of the
molecular basis of protein function is subject to severe limita-
tions: only the twenty genetically encoded amino acids can be
readily incorporated into a protein molecule; and, site-specic
post-translational modications of the protein molecule are
both technically dicult and impossible to control precisely.
Heroic eorts have been made to overcome these limitations,
both in vitro and by use of synthetic biology in engineered
microorganisms;
3,4
these sophisticated approaches are great
science, but so far have had only limited impact and are not yet
widely used for investigating the molecular basis of protein
function.
Total chemical synthesis of a protein molecule overcomes
the limitations of molecular biology. Once synthetic access to a
protein has been established, chemical synthesis enables the
researcher to eect, at-will, any desired change in the covalent
structure of a protein molecule. Even more importantly, total
chemical synthesis enables the labeling of a protein without
limitation as to the number and kind of labels introduced, yet
with the atom-by-atom precision that is essential for using to
full eect modern biophysical spectroscopic methods, such as
NMR, EPR, and laser Raman. Chemical synthesis is critical
for realizing the full potential of single molecule and other
uorescence studies of protein function. In this tutorial review,
we describe modern methods for the total chemical synthesis
of proteins and use several case-studies to illustrate the
Stephen Kent is professor of
chemistry at the University of
Chicago, where he uses syn-
thetic chemistry to elucidate
the molecular basis of protein
function, particularly enzyme
catalysis. His early work cul-
minated in the total chemical
synthesis of the HIV-1 protease
used to determine the original
crystal structures of that mole-
cule. He pioneered modern
total protein synthesis, based
on the chemical ligation of
unprotected peptide segments
in aqueous solution.
Stephen Kent
Department of Chemistry, Institute for Biophysical Dynamics, Center
for Integrative Science, University of Chicago, 929 East 57th Street,
Chicago, IL 60637, USA. E-mail: skent@uchicago.edu
338 | Chem. Soc. Rev., 2009, 38, 338351 This journal is c The Royal Society of Chemistry 2009
TUTORIAL REVIEW www.rsc.org/csr | Chemical Society Reviews
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online / Journal Homepage / Table of Contents for this issue
application of chemical protein synthesis to the elucidation of
the molecular basis of protein function.
Brief historical overview
The quest to make enzymes and other protein molecules by
total synthesis was one of the grand challenges for organic
chemistry in the twentieth century. In the early years of the
century, the great German chemist Emil Fischer set out to make
the natural products known as enzymes by means of total
chemical synthesis.
5
Together with Franz Hofmeister, Fischer
enunciated the peptide theory of protein structure, which stated
that proteins are made up of linear polymers of a-amino acids,
which he termed polypeptides. Fischer and his scientic
descendants pioneered the eld of chemical peptide synthesis
that eventually enabled the total chemical synthesis of the
complex biologically active peptide hormone oxytocin (nine
amino acid residues) by Vincent du Vigneaud and his collea-
gues. The total chemical synthesis of proteins remained a major
objective of the international organic synthesis community in
the decades following the Second World War. The desire to
achieve this goal led to the development of a vast array of
synthetic peptide chemistry methods and culminated in the
unambiguous total synthesis of crystalline, fully active human
insulin protein (51 amino acids) and of a series of unique
chemical insulin analogues;
6
the enzyme ribonuclease A (124
amino acids) was also prepared in crystalline, fully active form.
7
In a body of work that represented the high point of the
classical organic chemistry synthesis approach to the total
synthesis of proteins, Kenner and his colleagues undertook
the total synthesis of a consensus lysozyme enzyme molecule
(129 amino acids) from twelve protected peptide segments by a
fully convergent route.
8
For reasons discussed below, this
sophisticated synthesis was ultimately unsuccessful.
Classical organic synthesis in solution
The syntheses mentioned above made use of the maximal
protection approach, in which all side chain functional groups
were reversibly protected, and all reactions were carried out in
organic solvents (Fig. 1). The classical solution approach to
the total chemical synthesis of the long polypeptide chains
found in proteins suered from a number of shortcomings.
9
These included:
Lack of chiral integrity in peptide bond formation. Activation
of the a-COOH of a protected peptide chain gave rise to
racemization of the C-terminal amino acid in the basic condi-
tions used for condensation with another peptide segment
nucleophile.
Inability to purify and characterize protected peptides. It
was also dicult to eectively purify and characterize the
fully-protected peptide segments; multiple recrystallizations
did not suce to guarantee homogeneity of reaction products,
and it was necessary to deprotect the puried peptide segments
in order to analytically characterize them.
Poor solubility. Most seriously, many fully-protected peptide
segments had limited solubility in even powerful organic solvents;
the consequent low concentrations of reacting peptide segments
led to slow and incomplete coupling reactions, while (pseudo)-
rst-order side reactions gave rise to high levels of by-products.
So severe were the eects of poor solubility of fully-protected
peptide segments that synthetic chemists came to believe that
there was an inherent barrier to the eective reaction of large,
high molecular weight peptide segments.
8,10
The limited solubility and other problems encountered using
maximally protected peptide segments were never completely
overcome, although a number of proteins were successfully
synthesized by classical methods in solution.
11
Moreover, the
sophisticated chemistries used in these solution methods
required an exceptionally high level of skill and the syntheses
were arduous, requiring teams of expert chemists to carry them
out eectively. The classical synthetic organic chemistry approach
to the study of protein function was overshadowed by the advent
in the mid-1970s of recombinant DNA-based molecular biology
for protein expression and engineering, and was abandoned.
Solid phase peptide synthesis
In 1963, Merrield introduced a novel technique that greatly
facilitated the chemical synthesis of peptides.
9
The scheme is
Fig. 1 Classical solution peptide synthesis. Peptide segment building
blocks are reacted in a fully convergent strategy; all non-reacting
functional groups in both peptide segments are masked by reversible
protecting groups (PG
n
). In the nal step, all the protecting groups
are removed to give the full-length polypeptide product.
Fig. 2 Solid phase peptide synthesis. The fully-protected peptide
chain is built up in stepwise fashion from the C-terminal amino acid,
which is covalently attached to an insoluble polymer support. In the
nal step, all the protecting groups are removed and the covalent link
to the polymer support is cleaved to give the full-length polypeptide
product in solution.
This journal is c The Royal Society of Chemistry 2009 Chem. Soc. Rev., 2009, 38, 338351 | 339
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
shown in Fig. 2. In solid phase peptide synthesis, the C-terminal
amino acid residue of a target peptide sequence is covalently
attached to an insoluble polymeric support; subsequent amino
acid residues are introduced by removal of the N(a)-protecting
group of this rst residue, purication of the resin-bound amino
acid by ltration and washing, and addition of the next amino
acid in N(a)-protected, carboxyl-activated form; after forma-
tion of the new peptide bond, excess activated amino acid and
soluble by-products are removed by ltration and washing.
These steps are repeated in essentially standard form until the
target resin-bound protected peptide chain has been assembled
(Fig. 2). In a nal step, all protecting groups are removed and
the covalent link to the resin is cleaved to release the crude
peptide product. Facile purication by ltration and washing at
each step of the synthesis enables the use of large excesses of
activated amino acid for each peptide bond forming step;
consequently, reactions are rapid and near-quantitative.
Handling losses are minimized because the product peptide
remains bound to the insoluble resin at all stages of the
synthesis. As a result, with correctly designed and well-executed
chemistry, crude products containing a high proportion of the
desired peptide are obtained in high yield.
The solid phase method revolutionized the chemical syn-
thesis of peptides. It has been estimated that solid phase peptide
synthesis is B50-fold less arduous than a solution synthesis of the
same target.
9
Furthermore, attachment of the growing protected
peptide chain to the resin support largely overcame solubility
problems, and thus enabled the use of standardized protocols for
the total chemical synthesis of peptides. This has allowed the
routine chemical synthesis of peptide chains of thirty amino acid
residues or more, and made peptide synthesis widely accessible to
laboratories throughout the world. Countless thousands of
peptides have been made by stepwise solid phase synthesis. In
the decades after Merrield introduced solid phase peptide synth-
esis, ever more optimized chemistries were developed and newhigh
resolution techniques were used for the eective purication and
precise characterization of the peptide products. Nonetheless, solid
phase synthesis had no impact on the maximum size of poly-
peptide that could be made by chemical means: in even the most
highly optimized solid phase chemistries, the statistical accumula-
tion of resin-bound by-products, arising fromincomplete reactions
and from low level impurities in the solvents, reagents, and
protected amino acids used, limited to B50 amino acids the
longest peptide that could be reliably made in good yield as a
homogeneous molecular species of dened covalent structure.
Although solid phase peptide synthesis and classical solu-
tion organic synthesis are based on dierent principles and
suer from idiosyncratic shortcomings, neither method en-
abled the routine preparation of the polypeptide chains found
in all but the very smallest protein molecules. Clearly, a more
powerful approach based on novel principles was needed.
2. Analysis of the synthetic challenge
An eective total chemical synthesis of protein molecules
should have the following characteristics:
it should be convergent, building up the nal polypeptide
from a set of high purity, fully characterized peptide segment
building blocks;
it should make use of unprotected peptide segments that
are readily prepared by stepwise solid phase peptide synthesis;
reaction of unprotected peptide segments to form a
product polypeptide chain should proceed rapidly and without
side reactions, even with larger peptide segments;
intermediate products should be amenable to purication
and characterization by high resolution methods;
the ultimate polypeptide product should be obtained
directly in nal form without further manipulation.
Such an approach to the total chemical synthesis of proteins
is illustrated in Fig. 3. A point-by-point discussion of the
principles underlying modern chemical protein synthesis is as
follows:
Convergent synthesisstepwise peptide synthesis gene-
rates a series of intermediates with very similar properties that
are dicult to purify away from the target full-length poly-
peptide, and stepwise syntheses from amino acid building
blocks are very inecient in their use of starting materials;
although sequential peptide segment condensation syntheses
are more ecient than stepwise amino acid-by-amino acid
syntheses, they are still somewhat inecient in their use of
starting materials.
By contrast, in a fully convergent strategy all starting materials
are the same number of synthetic steps from the nal
productthus, in principle this is the most ecient synthetic
strategy. In practice, we choose to combine the ease and
practicality of stepwise solid phase synthesis of moderate size
unprotected peptides with convergent chemical ligation in order
to make the most ecient use of those peptide segment starting
materials. Because convergent ligation maximizes at every point
the dierences between starting materials and products, such
fully convergent synthesis provides for the highest purity
products. A fully convergent, modular synthetic strategy from
readily prepared unprotected peptide segments is also ideally
suited to the generation of a wide range of analogues.
Unprotected peptide segmentspeptides of up to B50
amino acid residues can be routinely prepared by highly
optimized stepwise solid phase synthesis; such unprotected
peptide segments are readily puried in good yield by high
Fig. 3 Modern chemical protein synthesis. Unprotected peptide
segment building blocks are covalently joined to one another in a
convergent strategy by chemoselective chemical ligation reaction(s).
The full-length polypeptide target is obtained directly in nal form,
and is folded to give the functional protein molecule.
340 | Chem. Soc. Rev., 2009, 38, 338351 This journal is c The Royal Society of Chemistry 2009
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
resolution techniques such as preparative reversed phase high
performance liquid chromatography (HPLC); and, unpro-
tected peptides can be precisely characterized by modern
electrospray mass spectrometry. Unprotected peptide
segments derived from globular proteins can be readily
handled in the wateracetonitrile0.1% triuoroacetic acid
solvent mixtures used for HPLC. Finally, unprotected
peptides are freely soluble at millimolar concentrations in
solvents containing chaotropes (additives which enhance the
solubility of a polypeptide chain) such as aqueous 6 M
guanidineHCl, giving the concentrations of reacting peptide
segments needed for rapid, high yield reaction.
Purication and characterization of intermediates
intermediate products in the synthesis of a proteins polypeptide
chain from unprotected peptide segments will themselves be
(larger) unprotected peptides; such unprotected peptide pro-
ducts are readily handled, puried, and characterized in the
same way as the starting peptide segments; and, the unprotected
peptide intermediate products can also be dissolved in aqueous
6 M guanidineHCl to greater-than-millimolar concentrations
for further synthetic elaboration.
Thus, the use of unprotected peptide segments in a con-
vergent synthetic strategy would in principle overcome the
handling, solubility, purication, and characterization issues
that plagued classical solution synthesis using fully-protected
peptide segments. Such a convergent strategy would also avoid
the excessive accumulation of resin-bound impurities that
prevented the use of the stepwise solid phase method for the
synthesis of longer (greater than B50 residue) peptide chains.
9
In order to achieve this goal, it was rst necessary to invent a
new synthetic chemistry that would enable the unambiguous
covalent condensation of large unprotected peptide segments.
3. The chemical ligation principle
The breakthrough that enabled the use of unprotected peptide
segments for the total chemical synthesis of protein molecules
came in 1992 with the introduction of a novel concept, the
chemoselective condensation of unprotected peptides.
12
In a
chemoselective condensation reaction, two unique, mutually
reactive functional groups are employed, one on each of the
reacting entities; these two reactive functionalities are designed
to react with one another, but to not react with any of the
other functional groups present in either entity, thus giving a
single reaction product. Chemoselective reaction is one of the
original principles of synthetic organic chemistry, predating
and underlying the use of reversible protecting groups. For
example, Emil Fischer used the chemoselective reaction
principle in the synthesis of his peptides, by the reaction of
chloroacetyl chloride with an amino acid ester to give a
chloroacetamido acid ester that was then converted to the
desired dipeptide by reaction with ammonia. And, of course,
use of a reversible protecting group in a synthesis requires
chemoselective reaction for its introduction and removal.
In considering the challenge of chemical protein synthesis by the
chemoselective condensation of unprotected peptide segments, it
occurred to us that the problem would be greatly simplied if we
were to discard the obligatory requirement to form a peptide bond
at the site of covalent linking of the two reacting segments. If
formation of an analogue structure at the ligation site were
acceptable, then a variety of existing chemistries could readily be
adapted to covalently link two unprotected peptide segments in an
unambiguous fashion. The concept is shown in Fig. 4. We use the
term chemical ligation for the chemoselective condensation of
two unprotected peptides to give a unique covalent polypeptide
product.
12
The initial ligation chemistry that we employed was the
simple nucleophilic reaction of a peptide(1)thiocarboxylate
with a bromoacetylpeptide(2) in aqueous solution at low pH,
to give a thioester-linked peptide(1)peptide(2) product. The
thioester moiety is a reasonable facsimile of a peptide bond,
and the product thioester-containing peptide was stable and
behaved as a normal polypeptide chain. The utility of the
thioester-forming chemical ligation reaction was illustrated by
the total chemical synthesis of the HIV-1 protease.
12
The
synthetic protein was characterized by analytical HPLC and
by electrospray mass spectrometry and had full enzymatic
activity. X-Ray crystallography was used to determine the
high resolution structure of the chemically synthesized
enzyme. Importantly, the thioester-containing polypeptide
chain was completely stable to normal handling and was stable
in lyophilized form over a period of years.
Another useful ligation chemistry based on the same chemo-
selective reaction principle is oxime-forming ligation.
13
Here an
aminooxyacetylpeptide(1) is reacted with a glyoxylylpeptide(2)
in aqueous solution at low pH to give an oxime-linked ligation
product, which is stable at neutral pH. It is readily apparent that
this oxime-forming ligation chemistry is not restricted to making
peptides of normal N-to-C linear topology; an example of the
total chemical synthesis of a topological protein analogue, using
a combination of thioester- and oxime-forming ligation
reactions, will be presented below. Several other ligation
chemistries based on the same principle, using chemoselective
reaction to form a non-native covalent link between two
unprotected peptides, have subsequently been introduced and
used for the total chemical synthesis of proteins.
14
The chemical ligation of unprotected peptide segments is
technically straightforward; the necessary functionalized
peptides can be readily made by simple variations of standard
Fig. 4 Principles of chemical ligation. Two unprotected peptide
segments are covalently joined by the chemoselective reaction of
unique, mutually reactive functional groups, one on each reacting
segment. An analogue structure is formed at the ligation site.
12
This journal is c The Royal Society of Chemistry 2009 Chem. Soc. Rev., 2009, 38, 338351 | 341
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
solid phase peptide synthesis. As we had anticipated, the use of
unprotected peptide segments overcomes the problem of
limited solubility of protected peptides; the unprotected
synthetic peptides (and unprotected ligation products) are
readily puried using standard reverse phase HPLC solvents
such as aqueous acetonitrile0.1% triuoroacetic acid, and are
freely soluble in aqueous solutions containing chaotropes such
as 8 M urea or 6 M guanidineHCl. Both the starting peptide
segments and the ligation products can be directly character-
ized by electrospray mass spectrometry, which is both highly
sensitive and is a powerful tool for conrming the covalent
chemical structure of a synthetic peptide. Finally, the forma-
tion of a non-native covalent bond between the reacting
segments obviates racemization in the ligation reaction.
Thus, by the simple conceptual leap of accepting as valid the
formation of a non-peptide bond analogue structure at the site
of the covalent link between two unprotected peptides, we
have surmounted all the problems that for decades had
plagued classical solution organic synthesis of proteins using
fully-protected peptide segments.
4. Native chemical ligation
The chemical ligation of unprotected peptide segments was very
eective as originally introduced, as shown by successful total
syntheses of fully functional proteins and even enzymes.
14
Further-
more, the use of analogue structures, such as a thioester, to replace
the native peptide bond at the ligation site enabled novel
backbone engineering experiments to dissect the contribution
of backbone hydrogen bonds to enzyme catalysis.
15
Nonetheless,
many researchers remained sceptical of the validity of using
synthetic proteins containing non-native structures as tools for
understanding the molecular basis of protein function. Confronted
with this obtuse criticism (after all, the point of using chemical
synthesis is to introduce non-native analogue structures), it
occurred to us that the thioester-forming ligation chemistry could
be simply adapted to give native polypeptide products, while
maintaining the use of unprotected peptide segments.
Origin of a concept
As an extension of our original ligation chemistry, we envisioned
the nucleophilic thioester-forming ligation reaction shown in
Fig. 5, where an additional CH
2
has been introduced in the
bromoacyl moiety of peptide(2); if an amino group were also
present as shown, then, by analogy with the well-known O-to-N
acyl shift, the initial thioester-linked product (1) would be
perfectly set up for intramolecular nucleophilic attack on
the newly formed thioester functionality by the amino group;
the product resulting from this S-to-N acyl shift would be an
amide-linked peptide(1)Cyspeptide(2) i.e. containing a
native peptide bond at the ligation site. In this way, two
unprotected peptides could be chemoselectively reacted to give
a native peptide ligation product.
Native chemical ligation
In preliminary experiments, the original chemistry concept
shown in Fig. 5 did not work cleanly, because of side reactions
involving aziridine formation from the 1-bromo-2-amino acyl
moiety. Philip Dawson, who at that time was a doctoral
student in my laboratory at The Scripps Research Institute,
developed the optimal way of making the key thioester-linked
intermediate (1); Dawson envisioned a peptide(1)thioester
undergoing thiol(ate)thioester exchange with the thiol(ate)
moiety of a Cyspeptide(2), as shown in Fig. 6.
In the presence of added thiol, this thiolthioester exchange is
reversible. In model reactions, in which the amino group of the
N-terminal Cys was blocked by acetylation, formation of the
thioester-linked ligation product could be observed. With a
free amino group, however, the initial thioester-linked product
immediately rearranged to give a stable amide-linked ligation
product; under the neutral ligation reaction conditions used, this
amide-forming rearrangement was irreversible. Thioester-
mediated native amide (i.e. peptide bond)-forming chemical
ligation of two unprotected peptide segments by means of
thiolthioester exchange was named native chemical ligation.
16
Native chemical ligation of unprotected peptide segments is
both simple and practical and consequently is widely used. The
reaction is carried out in aqueous 6 M guanidineHCl at neutral
pH (pH 6.87.0), and gives near-quantitative yields of the desired
ligation product. The necessary peptidethioester and Cyspeptide
starting materials are readily prepared by optimized Boc chemistry
stepwise solid phase peptide synthesis,
17,18
and can be eciently
puried by standard reverse phase HPLC methods. Using native
chemical ligation, the product polypeptide chain is directly
obtained in nal form and requires no further manipulation. Both
the starting peptide segments and the ligation product can be
characterized by high resolution analytical HPLC-electrospray
mass spectrometry.
Fig. 5 Original idea for an amide-forming chemical ligation reaction: (left) thioester-forming nucleophilic attack; (right) subsequent intra-
molecular nucleophilic attack, via a tetrahedral intermediate, to form an amide (native peptide) bond.
342 | Chem. Soc. Rev., 2009, 38, 338351 This journal is c The Royal Society of Chemistry 2009
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
Mechanistic aspects
The key feature of the native chemical ligation approach is the
reversibility of the rst step, the thiol(ate)thioester exchange
reaction. Native chemical ligation is exquisitely regioselective
because that thiol(ate)thioester exchange step is freely reversible
in the presence of an exogenous thiol added as catalyst.
16
The
high yields of nal ligation product obtained, even in the presence
of internal Cys residues in either/both segments, is the result of
the irreversibility, under the reaction conditions used, of the
second (S-to-N acyl shift) amide-forming step.
Native chemical ligation is highly chemoselective. No by-
products are formed from reaction with the other functional
groups present in either peptide segment, e.g. unprotected a- or
e-amino groups, phenolic hydroxyls, etc. This can be attributed
to the modest enthalpic activation of the peptidethioester, the
use of pH 6.87.0 in the native chemical ligation reaction, and
the stability of the thioester moiety towards hydroxyl/hydroxide
nucleophiles. Reaction of the thioester with side chain
thiol/thiolate functionalities of internal cysteine residues in
either peptide segment is reversed as described above, and is
thus unproductive.
Native chemical ligation owes its ecacy to the unique
properties of thioesters. It is not widely appreciated that
thioesters are more stable to hydroxide-catalyzed hydrolysis
than normal oxoesters; yet at the same time, thioesters are very
much more reactive towards thiolysis and aminolysis than the
corresponding oxoesters. It is this combination of stability and
reactivity of thioesters, ideally tuned to the reaction mechanism
and the functionalities found in peptides, that is responsible for
the practical utility of native chemical ligation; the peptide
thioester can be handled as a normal unprotected peptide, yet at
neutral pH in the presence of a thiol catalyst it reacts rapidly
and specically with an unprotected Cyspeptide to give an
amide-linked product. No peptidethioester is lost to hydrolysis
under the standard reaction conditions.
Recently, more eective thiol catalysts have been deve-
loped for native chemical ligation. Using the aryl thiol
(4-carboxymethyl)thiophenol, which in the sodium salt form
is freely soluble in aqueous 6 M guanidineHCl, it is now
feasible to tune reaction rates over an order of magnitude
range. In general, it is possible to obtain near quantitative
ligation yields in a matter of a few hours at room temperature.
19
Native chemical ligation makes use of a very moderately
activated peptidethioester and neutral pH reaction conditions
are used; consistent with this, to date no racemization of the
amino acid residue corresponding to the C-terminal of the
peptidethioester has been detected in the ligation products.
Requirement for cysteine residues
A criticism of native chemical ligation is that it requires a Cys at
the ligation site. It is often stated that this is a severe limitation of
the native chemical ligation approach, because Cys is the least
common amino acid found in proteins. Like many statements
made on the basis of statistical averages, this criticism is mis-
leading; while Cys may be the least common amino acid found in
proteins, it is equally true that there are many thousands of
cysteine-rich disulde-containing secretory proteins found in
nature, and that the most common structural motif in the human
genome is the Cys-rich zinc nger domain. All of these Cys-rich
proteins and protein domains are in principle accessible by total
chemical synthesis using native chemical ligation.
For other protein targets, chemistries have been developed
in an attempt to extend native chemical ligation of unprotected
peptides to non-Cys sites. The most useful of these was
introduced by Dawson and Yan who used catalytic desulfur-
ization of Cys in the ligation product, enabling ligation at
XaaAla sites (Fig. 7A);
20
recently this desulfurization
approach has been generalized to other b-thiol amino acids,
and desulfurization catalysts that allow the conversion of Cys
to Ala in the presence of Cys(Acm) have been developed.
Another approach to ligation at non-Cys sites is to use thiol-
containing N(a) auxiliaries that mimic the presence of a Cys
residue at the N-terminal of peptide(2), and which can be
selectively removed from the newly-formed amide bond after
the ligation reaction (Fig. 7B). In practice, the use of thiol-
containing auxiliaries is highly sensitive to the nature of the
amino acids at the ligation site and is currently much less
eective than native chemical ligation at Cys.
21
Native chemical ligation has also been used for the chemical
synthesis of a wide range of fully functional proteins by the
simple expedient of introducing a Cys residue as needed at a
ligation site. Cys-scanning mutagenesis experiments in many
proteins have shown that Cys residues can be introduced in the
vast majority of positions of the polypeptide of a globular
protein molecule without aecting folding or function. In
some syntheses, it has proved convenient to alkylate the
non-native Cys residues introduced at ligation sites. In prac-
tice, as will be illustrated by the case-studies in the following
section, lack of Cys residues in suitable positions of the
polypeptide chain of a protein target is simply not an
Fig. 6 Native chemical ligation: thioester-mediated amide-forming
chemoselective ligation of two unprotected peptide segments. Under
the reaction conditions used (neutral pH; aryl thiol catalyst), the rst
thiolthioester exchange step is freely reversible, while the subsequent
intramolecular nucleophilic attack is irreversible and gives a native
peptide bond at the ligation site.
16
This journal is c The Royal Society of Chemistry 2009 Chem. Soc. Rev., 2009, 38, 338351 | 343
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
issue: the synthetic design can be readily adjusted to enable the
use of native chemical ligation to make fully functional proteins.
Summary
Chemoselective condensation of unprotected peptide segments
by native chemical ligation enables the routine total chemical
synthesis of proteins. This novel synthetic approach addresses
key aspects of the challenge of chemical protein synthesis, as
eloquently described in Kenners analysis of the problems he
and his colleagues encountered in the convergent synthesis of
the 129 amino acid residue polypeptide chain of consensus
lysozyme using classical organic chemistry: viz. Organic
synthesis (needs to) tackle the fundamental problem alluded to
earlier in this lecture, namely the simulation in the laboratory of
Natures marvellous intramolecular coupling of carboxyl and
amino groups.. . . What is required is a general method for
coupling sections of a polypeptide chain by virtue of an intra-
molecular condensation.
8
Native chemical ligation of un-
protected peptides is just such a method as Kenner imagined;
it enables the ecient covalent coupling of even very large
polypeptide chains, thus proving that there is no inherent
barrier to the eective coupling of two large polypeptide
segments, and it surmounts the solubility and racemization
problems that were encountered in classical solution synthesis
using protected peptide segments.
Native chemical ligation comprises the subtle and eective
fusion of several key concepts and chemistries. These include:
the chemical ligation principle, i.e. the chemoselective
condensation of unprotected peptide segments to form an
(initial) analogue structure at the ligation site;
unprotected peptidethioesters as building blocks;
highly optimized in situ neutralization Boc chemistry
stepwise solid phase peptide synthesis,
17
which is essential for
the ecient chemical synthesis of peptidethioesters;
polypeptide ligation products are obtained in nal form
without the need for further manipulation;
the nal product, the unprotected starting peptide
segments, and intermediate products in a synthesis, can all
be puried by modern high resolution methods such as
reverse phase HPLC, and can all be precisely characterized by
electrospray mass spectrometry.
Native chemical ligation of unprotected peptide segments has
enabled general synthetic access to the world of proteins. Since its
introduction, native chemical ligation has proved to be the most
robust and practical of the ligation chemistries for covalently
joining two unprotected peptide segments. Native chemical
ligation has been used to make hundreds of proteins ranging in
size up to more than 200 amino acids, and to make designed
protein analogues with masses of up to 50825 Da.
14,23,24
In the
next section, several examples of total chemical synthesis of
proteins will be given.
5. Cases-studies in the total chemical synthesis
of proteins
Folding and formation of disuldes
The ability to prepare the full-length polypeptide chain found
in a protein molecule does not in itself constitute the total
chemical synthesis of a protein: the polypeptide must be folded
to form the dened tertiary structure that is characteristic of a
protein. It is this unique folded structure that gives rise to the
functional properties of a protein molecule. The information
necessary for the formation of the folded protein molecule
is present in the amino acid sequence of the proteins poly-
peptide chain, and in most cases the correctly folded structure
represents a thermodynamic minimum.
In the cell, there exists elaborate chaperone machinery to
prevent aggregation of misfolded forms of the nascent poly-
peptide chain as it is assembled on the ribosome in the densely
populated cytosol. The situation with a chemically synthesized
polypeptide chain is simpler: we are dealing with a single
molecular species, and all the chemical ligation steps and
Fig. 7 Chemistries that can enable native chemical ligation at non-Cys sites. A After native chemical ligation, the Cys residue at the ligation site is
desulfurized, forming an XaaAla.
20
B A thiol-containing N(a) auxiliary moiety is used at the N-terminal of one peptide segment; after native
chemical ligation, the auxiliary moiety is removed, e.g. by acidolysis (R = (OMe)
2
).
22
344 | Chem. Soc. Rev., 2009, 38, 338351 This journal is c The Royal Society of Chemistry 2009
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
subsequent purications are carried out under denaturing
conditions that keep the polypeptide from aggregating. The
full-length polypeptide synthetic product is folded in an
aqueous solution that mimics physiological conditions. If
disuldes are known to be present in the nal folded protein
molecule, then a low MW thioldisulde redox couple is used
at an elevated pH (BpH 8), usually with an excess of the thiol
component to reduce any misfolded disulde crosslinked
species that form. Formation of the correct disuldes results
from the thermodynamic driving force for formation of the
stable, folded tertiary structure of the protein molecule. A
modest concentration (e.g. 0.51.5 M guanidineHCl) of
denaturant is used as a chaotrope to keep misfolded forms
of the polypeptide from aggregating and precipitating. Under
these conditions, folding with concomitant formation of
disuldes is rapid and near quantitative, and is readily
followed by analytical HPLC-mass spectrometry (LCMS).
25
We have used these folding protocols successfully for the total
synthesis of several hundred proteins over the past decade;
folding of the relatively large amounts of high purity synthetic
polypeptide chains is a robust process and rarely fails.
Analytical control
In addition to its use in monitoring ligation reactions, LCMS
is also used for rigorous analytical control of all other steps in
the total chemical synthesis of a protein molecule. Modern
biological mass spectrometry, especially electrospray ioniza-
tion MS, is a rapid and precise way of determining the mass of
unprotected peptides and polypeptide ligation products, and
of any peptide by-products that are present at any stage of a
synthesis; the exact mass of a molecule serves as a severe
constraint on structural hypotheses, especially with knowledge
of the synthetic chemistries and reaction sequences used in its
preparation.
26
Folding of the full-length synthetic polypeptide
to form the unique tertiary structure of a protein molecule can
be precisely conrmed by modern multidimensional nuclear
magnetic resonance (NMR) spectroscopy; even in the absence
of complete resonance assignments, an amide proton TOCSY
ngerprint will quickly reveal the presence of more than one
folded form or of residual denatured polypeptide. Finally,
high resolution X-ray crystallography can be used to deter-
mine the precise folded structure of the synthetic protein
Fig. 8 Total chemical synthesis of the integral membrane protein crambin [46 amino acids; 6 cysteines (3 disuldes)]. (Left) synthetic scheme;
(right) LCMS analytical data for each stage of the synthesis; the times shown refer to the overall elapsed time for the synthesis. The synthetic
crambin was characterized by mass spectrometry, multidimensional NMR spectroscopy, and by X-ray structure determination.
25
This journal is c The Royal Society of Chemistry 2009 Chem. Soc. Rev., 2009, 38, 338351 | 345
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
molecule.
27
Where diculty in obtaining suitable crystals is
encountered, total chemical synthesis uniquely oers the
possibility of more facile crystallization through the use of a
racemic solution of the protein enantiomers.
28
Crambin
The small integral membrane plant protein crambin was prepared
by a one pot convergent synthetic strategy.
25
Native chemical
ligation of three synthetic peptide segments was carried out
without intermediate purications, as shown in Fig. 8. The crude
full-length 46 residue polypeptide chain was folded in the presence
of a redox couple (the observed loss of 6 Da was consistent with
the formation of three disuldes), and puried in a single step by
reverse phase HPLC to give the synthetic protein in forty
milligram amounts (40% overall yield, based on starting peptide
segments). Synthetic crambin was characterized by mass spectro-
metry (observed: 4702.2 0.3 Da; calculated: 4702.4 Da (average
isotope composition)), by multidimensional NMR spectroscopy,
and by high resolution X-ray crystallography.
Human lysozyme
The enzyme human lysozyme contains a polypeptide chain of 130
amino acid residues, and has eight cysteine residues that form four
disulde bonds in the folded protein molecule. The scheme for the
fully convergent total chemical synthesis of human lysozyme from
four synthetic peptide segments is shown in Fig. 9. The folded
synthetic protein was obtained in good yield and high purity, had
a measured mass of 14693.4 0.7 Da (calc., 14692.7 Da (average
isotopes)), and contained four disulde bonds; multidimensional
NMR spectroscopy conrmed the presence of a uniquely folded
protein species, and the X-ray structure of the synthetic enzyme
was determined to 1.04 A

resolution. Synthetic human lysozyme


had full enzymatic activity.
27
Mirror image enzymes
Thioester-forming ligation chemistry was used to make both
mirror image forms of the enzyme HIV-1 protease, as shown in
Fig. 10. Each enantiomer of the homodimeric protein was
obtained in B50 mg quantity. The synthetic polypeptides were
characterized by HPLC and electrospray MS; the high resolution
X-ray structure of the thioester-containing D-HIV-1 protease was
determined and shown to be the mirror image of the X-ray
structure of the native backbone L-HIV-1 protease molecule.
The folded synthetic proteins were characterized for proteolytic
activity by a uorogenic assay. The enantiomeric enzymes
showed reciprocal chiral specicity in the hydrolysis of
peptide substrates; that is, the L-enzyme hydrolyzed an L-peptide
substrate, but did not hydrolyze the corresponding D-peptide,
whereas the D-enzyme hydrolyzed the D-peptide substrate, but
did not hydrolyze the L-peptide.
29
Topological analogues
Thioester-forming chemical ligation and oxime-forming chemi-
cal ligation were used to prepare novel topological analogues of
the transcription factors cMyc and Max, by a fully convergent
synthetic route as shown in Fig. 11. The target protein mole-
cules each contained 172 amino acid residues and were made up
of truncated forms of the B90 residue b/HLH/Z domains of
cMyc and Max covalently joined to form a single polypeptide
with two amino terminals. Gel shift assays showed that the
covalently-linked cMycMax had full DNA binding activity
with the expected sequence specicity; circular dichroism
spectroscopy showed that the cMycMax protein-like entity
Fig. 9 Fully convergent total chemical synthesis of human lysozyme [130 amino acid residues; 8 cysteines (4 disuldes)] The synthetic protein had
full enzymatic activity and was characterized by mass spectrometry, multidimensional NMR spectroscopy, and by high resolution X-ray structure
determination.
27
346 | Chem. Soc. Rev., 2009, 38, 338351 This journal is c The Royal Society of Chemistry 2009
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
was folded into the characteristic helical conformation even in
the absence of cognate DNA.
30
Synthetic erythropoiesis protein, a designed glycoprotein
mimetic
Native chemical ligation was used to make, from four synthetic
peptide segments, the 166 amino acid residue polypeptide chain
of a designed chemical analogue of erythropoietin, a potent
glycoprotein that stimulates the production of red blood cells.
A negatively charged, branched glycan-mimetic entity of
dened covalent structure was prepared by a combination of
polymer-supported organic synthesis and organic synthesis in
solution, and was site-specically attached by oxime-forming
ligation to two of the peptide building blocks. The partially-
convergent synthetic route used is shown in Fig. 12. The folded
synthetic protein contained two disuldes, had a mass of 50 825
10 Da, and behaved as a single molecular species with pI 5.0
on isoelectric focusing. Synthetic erythropoiesis protein had full
biological activity and had the prolonged duration of action
in vivo that was the design objective.
23
Protein diastereomers
In order to elucidate the molecular basis of C
0
-capping of a
protein a-helix, a set of ubiquitin analogues was prepared by
total chemical synthesis. For each synthetic protein, three un-
protected peptides were assembled by convergent native chemical
ligation; the two Cys residues at the ligation sites in the resulting
full-length 76 amino acid residue polypeptide chain were then
converted to native Ala residues by desulfurization with Raney
nickel. The synthetic scheme used for the preparation of the
diastereomeric protein [D-Ala35]ubiquitin is shown in Fig. 13.
Thermodynamic measurements of the relative stabilities of pairs
of [L-Xaa
35
]ubiquitin/[D-Xaa
35
]ubiquitin analogues showed that
the reason for the frequent occurrence of a Gly residue at the C
0
position of an a-helix is predominantly conformational; insertion
of an L-amino acid at the C
0
position incurs an energetic penalty
upon adopting the backbone conformation necessary to transi-
tion from the helical conformation to another type of secondary
structure.
31
Site-specic NMR isotope labels
Total chemical synthesis by thioester-forming ligation was used
to prepare the HIV-1 protease, each monomer of which was
site-specically 96%
13
C enriched at only the catalytic aspartic
acid side chain carboxyl carbon (i.e. the side chain carboxyl of
Asp25). This single carbon atom could be observed using
13
C
NMR spectroscopy against the low (1.1%) natural abundance
of
13
C in the rest of the protein molecule. These NMR
measurements enabled us to determine the ionization state of
the catalytic apparatus in the presence of the inhibitor pepsta-
tin, a mimic of the tetrahedral intermediate in peptide bond
hydrolysis. We found that in the complex with pepstatin, one
catalytic carboxyl was protonated and had a normal
13
C
chemical shift, consistent with H-bonding to the hydroxyl
moiety of pepstatin as observed in the known X-ray structure
Fig. 10 Total chemical synthesis of the mirror image enzyme molecules
L-HIV-1 protease and D-HIV-1 protease.
29
Each enzyme molecule is a
homodimer of 99 amino acid residue polypeptide chains that together
form a single active site. The synthetic protein enantiomers had full
enzymatic activity, but reciprocal chiral specicity, and were character-
ized by mass spectrometry and by X-ray structure determination.
Fig. 11 Fully convergent total chemical synthesis of the transcription
factor-related protein cMycMax (172 amino acid residues). The
polypeptide chain of this topological protein analogue has two
N-terminals; the covalent structure of the synthetic protein was
characterized by electrospray mass spectrometry. The synthetic
protein was pre-folded even in the absence of cognate DNA, and
had the expected DNA-binding activity, as shown by gel shift assays.
30
Reproduced by permission.
This journal is c The Royal Society of Chemistry 2009 Chem. Soc. Rev., 2009, 38, 338351 | 347
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
of the HIV-1 proteasepepstatin complex. By contrast, the
other carboxylate was fully ionized with an abnormal
13
C
chemical shift characteristic of a low dielectric constant environ-
ment; examination of the previously-reported X-ray structure
of the HIV-1 proteasepepstatin complex showed that the
oxygen atoms of one carboxyl were at least 3.5 A

from any
other neighbouring group, precluding any H-bonding inter-
actions. An ionized naked carboxylate group in such a
chemical microenvironment would be expected to have sub-
stantially enhanced nucleophilic reactivity; this would serve as a
chemical explanation for the unusual nucleophilicity of just one
of the two catalytic aspartates in the aspartyl proteinase class of
enzymes.
33
Covalent dimer HIV-1 protease
The HIV-1 protease molecule has two mobile ap structures, one
in each of the identical monomers, that close down over the peptide
substrate in the active site. In order to make distinct, i.e. asym-
metric, chemical analogues of each ap, we synthesized a 203 amino
acid residue covalent dimer form of the HIV-1 protease. The
synthetic route is shown in Fig. 14. Four B50 residue synthetic
peptide segments were prepared by highly optimized Boc chemistry
stepwise solid phase peptide synthesis. The N-terminal half of the
target polypeptide was prepared by kinetically controlled ligation,
to give a 99 residue peptidethioester product; the C-terminal half
of the target sequence was prepared by native chemical ligation as a
104 residue Cyspolypeptide. The two halves of the target sequence
were then joined to one another by native chemical ligation,
followed by alkylation of the Cys residues at the ligation sites, to
give the full-length polypeptide. After purication of the synthetic
polypeptide chain, folding was achieved by dialysis into native
buer conditions. The synthetic protein had observed mass
21869.8 0.4 Da [calc., 21869.76 (av. isotopes)], was characterized
by high resolution X-ray crystallography, and showed full enzy-
matic activity. This modular synthetic route has been used to make
more than thirty chemical analogues, and corresponding
uorescently- and spin-labeled versions, of the HIV-1 protease in
order to investigate the role of the aps in the catalytic activity of
this enzyme molecule.
24
Summary
The case-studies described above illustrate the power and
utility of modern ligation methods for the total chemical
synthesis of proteins. Each of the syntheses described above
was performed by a single individual, with the exception of the
synthetic erythropoiesis protein, including manual stepwise
solid phase syntheses of all necessary peptidethioester and
Cyspeptide building blocks. All synthetic protein products
were meticulously characterized and were shown to be of high
purity with the expected covalent and tertiary structures.
Finally, all synthetic proteins had the expected biochemical
and/or biological activities.
Fig. 12 Total chemical synthesis of synthetic erythropoiesis protein (166 amino acid residues; 50 825 Da). The monodisperse, negatively charged
glycan mimeticof dened covalent structureis shown in blue; the four unprotected peptide segments used as building blocks are in green.
After folding/disulde formation, this synthetic glycoprotein mimetic displayed full biological activity, and had B3-fold longer lifetime in vivo, in
accord with the design objectives.
23
348 | Chem. Soc. Rev., 2009, 38, 338351 This journal is c The Royal Society of Chemistry 2009
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
Total chemical synthesis of proteins enables the straight-
forward preparation of analogue and labeled protein
molecules not readily accessible by recombinant DNA-based
methods. For example, chemical synthesis has been used to
prepare, inter alia, the following: proteins containing a wide
variety of non-coded amino acids, including D-amino acids;
backbone engineered proteins, with chemical isosteres
replacing the native peptide bond; proteins containing xed
geometry non-peptidic b-turn analogues; proteins containing
uorescent dyes and/or spin labels; proteins containing glycan
mimetics; proteins containing sugars and glycans (i.e. glyco-
proteins); and, proteins containing NMR probe nuclei at
specic single atom sites.
14,23
Such unprecedented chemical
analogues enable the more precise correlation of the structure
of a protein with its functional properties, and can contribute
in unique ways to our understanding of protein molecules.
6. Current developments
Native chemical ligation has been extended to the use of
polypeptidethioesters produced by recombinant DNA micro-
bial expression; this powerful semi-synthetic method is termed
expressed protein ligation, and represents a most auspicious
conuence of chemical peptide synthesis and recombinant
DNA-based molecular biology.
34
Expressed protein ligation
and related intein-based methods enable the broad application
of synthetic chemistry to problems in protein biology.
35
(For
an elegant example, see ref. 36.)
An important recent development in chemical ligation methods
for the total synthesis of proteins is the kinetically controlled
ligation reaction; this chemistry enables the ecient reaction of a
peptide(1)thioaryl ester with a Cyspeptide(2)thioalkyl ester, to
give a peptide(1)Cyspeptide(2)thioalkyl ester product.
37
Used
in combination with native chemical ligation, kinetically controlled
ligation enables the fully convergent synthesis of larger protein
targets, as exemplied by the human lysozyme and HIV-1 pro-
tease covalent dimer syntheses described in Section 5 (above).
24,27
What is needed to complete and expand the synthetic protein
chemists tool box?
First, and most importantly, there is a need for a truly practical
synthesis of peptidethioesters. Although the use of optimized
in situ neutralization Boc chemistry solid phase peptide synthesis
Fig. 13 Total chemical synthesis of the protein diastereomer [D-Ala35]ubiquitin (76 amino acid residues; no cysteines). (Left) The synthetic
strategy, convergent chemical ligation of three unprotected peptide segments, followed by conversion of the ligation site Cys residues to native Ala
residues. (Right) LCMS data for the steps of the synthesis. The product synthetic protein was characterized by mass spectrometry and by X-ray
structure determination.
32
Reproduced by permission.
This journal is c The Royal Society of Chemistry 2009 Chem. Soc. Rev., 2009, 38, 338351 | 349
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
enables the straightforward synthesis of peptidethioesters in good
yield and purity, the skill sets for Boc chemistry solid phase peptide
synthesis are no longer widely available, and the necessary use of
corrosive strong acid reagents precludes the application of modern
laboratory robotics. Standard Fmoc chemistry solid phase peptide
synthesis cannot be used for the synthesis of peptidethioesters,
because a pre-existing thioester moiety is labile to the piperidine
used to remove the Fmoc group at each step of the synthesis.
An eective, general Fmoc chemistry solid phase synthesis of
peptidethioesters would have broad impact on the practical total
chemical synthesis of proteins; recently Dawson and Blanco-
Canosa have introduced an ingenious novel linker that promises
to eectively address this need by enabling the use of standard
Fmoc chemistry solid phase synthesis for the routine synthesis of
peptidethioesters.
38
Another important unmet need is an orthogonal amide-forming
ligation chemistry,
39
i.e. one that is fully compatible with the use of
native chemical ligation, and that is as practically useful; this would
enable the routine convergent synthesis of larger proteins without
the use of any protecting groups, and in particular would obviate
the need for protection of N-terminal Cys residues.
7. Summary and outlook
The take home message from this tutorial review is simply this:
modern chemical ligation methods are an elegant and practical
solution to the grand challenge of total chemical synthesis of
proteins. Chemical protein synthesis based on ligation meth-
ods has surmounted the issues that defeated the use of classical
synthetic organic chemistry for total protein synthesis, and has
dramatically extended the size of synthetically accessible poly-
peptide chains beyond that achievable by stepwise solid phase
peptide synthesis. Modern synthetic protein chemistry is an
enabling technology for the application of advanced bio-
physical methods to the study of proteins, for example by single
molecule uorescence spectroscopy, and will continue to make
important contributions to the elucidation of the molecular
basis of protein function.
What is on the horizon?
It may be that the chemical ligation tool kit will nd its killer
application in nanoscience, for the bottom-up fabrication of
molecular arrays and protein-inspired molecular devices.
40,41
Today we are entering the era of synthetic biology, the
ultimate goal of which is the de novo fabrication of auto-
nomous, self-replicating molecular systems (i.e. living cells).
There are already indications that the chemical ligation
principle and related methods will contribute to synthetic
biology by, for example, enabling controlled chemical
synthesis within the living cell.
42
Fig. 14 Total chemical synthesis by a fully convergent route of a covalent dimer form of the HIV-1 protease (203 amino acid residues). The
synthetic protein was characterized by mass spectrometry and by X-ray structure determination, and had full enzyme activity.
24
350 | Chem. Soc. Rev., 2009, 38, 338351 This journal is c The Royal Society of Chemistry 2009
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online
References
1 R. D. Unwin, S. J. Gaskell, C. A. Evans and A. D. Whetton, Exp.
Hematol. (N. Y.), 2003, 31, 1147.
2 J. A. Wells and D. A. Estell, Trends Biochem. Sci., 1988, 13, 291.
3 D. Mendel, V. W. Cornish and P. G. Schultz, Annu. Rev. Biophys.
Biomol. Struct., 1995, 24, 435.
4 J. Xie and P. G. Schultz, Curr. Opin. Chem. Biol., 2005, 9, 548.
5 S. Kent, J. Pept. Sci., 2003, 9, 574.
6 P. Sieber, K. Eisler, B. Kamber, B. Riniker, W. Rittel, F. Marki and
M. Degasparo, HoppeSeylers Z. Physiol. Chem., 1978, 359, 113.
7 H. Yajima and N. Fujii, Biopolymers, 1981, 20, 1859.
8 G. W. Kenner, Proc. R. Soc. London, Ser. A, 1977, 353, 441.
9 S. B. H. Kent, Annu. Rev. Biochem., 1988, 57, 957.
10 D. S. Kemp, Biopolymers, 1981, 20, 1793.
11 S. Sakakibara, Biopolymers, 1999, 51, 279.
12 M. Schnolzer and S. B. H. Kent, Science, 1992, 256, 221.
13 K. Rose, J. Am. Chem. Soc., 1994, 116, 30.
14 P. E. Dawson and S. B. H. Kent, Annu. Rev. Biochem., 2000, 69,
923.
15 M. Baca and S. B. Kent, Proc. Natl. Acad. Sci. U. S. A., 1993, 90,
11638.
16 P. E. Dawson, T. W. Muir, I. Clark-Lewis and S. B. Kent, Science,
1994, 266, 776.
17 M. Schnolzer, P. Alewood, A. Jones, D. Alewood and S. B.
H. Kent, Int. J. Pept. Res. Ther., 2007, 13, 31.
18 T. M. Hackeng, J. H. Grin and P. E. Dawson, Proc. Natl. Acad.
Sci. U. S. A., 1999, 96, 10068.
19 E. C. B. Johnson and S. B. H. Kent, J. Am. Chem. Soc., 2006, 128,
6640.
20 L. Z. Yan and P. E. Dawson, J. Am. Chem. Soc., 2001, 123, 526.
21 D. Macmillan, Angew. Chem., Int. Ed., 2006, 45, 7668.
22 D. W. Low, M. G. Hill, M. R. Carrasco, S. B. Kent and P. Botti,
Proc. Natl. Acad. Sci. U. S. A., 2001, 98, 6554.
23 G. G. Kochendoerfer, S. Y. Chen, F. Mao, S. Cressman,
S. Traviglia, H. Y. Shao, C. L. Hunter, D. W. Low, E. N. Cagle,
M. Carnevali, V. Gueriguian, P. J. Keogh, H. Porter,
S. M. Stratton, M. C. Wiedeke, J. Wilken, J. Tang, J. J.
Levy, L. P. Miranda, M. M. Crnogorac, S. Kalbag, P. Botti,
J. Schindler-Horvat, L. Savatski, J. W. Adamson, A. Kung, S. B.
H. Kent and J. A. Bradburne, Science, 2003, 299, 884.
24 V. Y. Torbeev and S. B. H. Kent, Angew. Chem., Int. Ed., 2007, 46,
1667.
25 D. Bang and S. B. H. Kent, Angew. Chem., Int. Ed., 2004, 43,
2534.
26 B. T. Chait and S. B. H. Kent, Science, 1992, 257, 1885.
27 T. Durek, V. Y. Torbeev and S. B. H. Kent, Proc. Natl. Acad. Sci.
U. S. A., 2007, 104, 4846.
28 B. L. Pentelute, Z. P. Gates, V. Tereshko, J. L. Dashnau,
J. M. Vanderkooi, A. A. Kossiako and S. B. H. Kent, J. Am.
Chem. Soc., 2008, 130, 9695.
29 R. C. D. Milton, S. C. F. Milton and S. B. H. Kent, Science, 1992,
256, 1445.
30 L. E. Canne, A. R. Ferredamare, S. K. Burley and S. B. H. Kent,
J. Am. Chem. Soc., 1995, 117, 2998.
31 D. Bang, A. V. Gribenko, V. Tereshko, A. A. Kossiako,
S. B. Kent and G. I. Makhatadze, Nat. Chem. Biol., 2006, 2, 139.
32 D. Bang, G. I. Makhatadze, V. Tereshko, A. A. Kossiako and
S. B. Kent, Angew. Chem., Int. Ed., 2005, 44, 3852.
33 R. Smith, I. M. Brereton, R. Y. Chai and S. B. H. Kent, Nat.
Struct. Biol., 1996, 3, 946.
34 T. W. Muir, D. Sondhi and P. A. Cole, Proc. Natl. Acad. Sci.
U. S. A., 1998, 95, 6705.
35 E. C. Schwartz, T. W. Muir and A. B. Tyszkiewicz, Chem.
Commun., 2003, 2087.
36 J. P. Pellois, M. E. Hahn and T. W. Muir, J. Am. Chem. Soc., 2004,
126, 7170.
37 D. Bang, B. L. Pentelute and S. B. H. Kent, Angew. Chem., Int.
Ed., 2006, 45, 3985.
38 J. B. Blanco-Canosa and P. E. Dawson, Angew. Chem., Int. Ed.,
2008, 47, 6851.
39 J. W. Bode, R. M. Fox and K. D. Baucom, Angew. Chem., Int. Ed.,
2006, 45, 1248.
40 R. V. Ulijn and A. M. Smith, Chem. Soc. Rev., 2008, 37, 664.
41 E. H. Bromley, K. Channon, E. Moutevelis and D. N. Woolfson,
ACS Chem. Biol., 2008, 3, 38.
42 M. E. Hahn, J. P. Pellois, M. Vila-Perello and T. W. Muir,
ChemBioChem, 2007, 8, 2100.
This journal is c The Royal Society of Chemistry 2009 Chem. Soc. Rev., 2009, 38, 338351 | 351
P
u
b
l
i
s
h
e
d

o
n

1
6

S
e
p
t
e
m
b
e
r

2
0
0
8
.

D
o
w
n
l
o
a
d
e
d

b
y

L
a
w
r
e
n
c
e

B
e
r
k
e
l
e
y

N
a
t
i
o
n
a
l

L
a
b
o
r
a
t
o
r
y

o
n

2
8
/
0
1
/
2
0
1
4

2
2
:
1
3
:
4
8
.

View Article Online

You might also like