You are on page 1of 70

Protein structure and function

Marie-Véronique CLEMENT
Associate Professor
Yong Loo Lin School of Medicine
NUS Graduate School for Integrative Science and Engineering
Department of Biochemistry
National University of Singapore
8 Medical Drive, MD 7 #03-15
Singapore 117597
Tel: (65) 68747985
Fax: (65) 67791453
From amino acids to protein:

by an amino group

Peptide bond

Amino acid

terminates by a 
carboxyl group
A peptide: Phe­Ser­Glu­Lys (F­S­E­K)
The Shape of proteins:

Native conformation

determined by
different Levels
    of structure
Four Levels of Structure Determine the
Shape of Proteins
Primary structure
The linear arrangement (sequence) of amino acids and the location of covalent (mostly
disulfide) bonds within a polypeptide chain. Determined by the genetic code.

Secondary structure
local folding of a polypeptide chain into regular structures including the α helix, β
sheet, and U-shaped turns and loops.

Tertiary structure
overall three-dimensional form of a polypeptide chain, which is stabilized by multiple
non-covalent interactions between side chains.

Quaternary structure:
The number and relative positions of the polypeptide chains in multisubunit
proteins. Not all protein have a quaternary structure.
Primary Structure of a protein: 
determined by the nucleotide sequence of its gene

Bovine Insulin: the first sequenced protein

• In 1953, Frederick Sanger determined the amino acid sequence of insulin, a

protein hormone .

• This work is a landmark in biochemistry because it showed for the first time that
a protein has a precisely defined amino acid sequence.

• it demonstrated that insulin consists only of amino acids linked by peptide bonds
between α-amino and α-carboxyl groups.

• the complete amino acid sequences of more than 100,000 proteins are now

• Each protein has a unique, precisely defined amino acid sequence.

Primary Structure Pro­insulin is produced 
in the Pancreatic islet 
C­peptide cells
Pro­insulin protein

65/66 30/31

Human:  Thr­Ser­Ile
Cow:  Ala­Ser­Val
Pig:  Thr­Ser­Ile

Chiken:  His­Asn­Thr 


+ C peptide

Amino acid substitution in proteins from different species

Conservative Substitution of an amino acid by another 
amino acid of similar polarity 
(Val for Ile in position 10 of insulin)

Substitution involving replacement 
Non conservative of an amino acid by another of 
different polarity 
(sickle cell anemia, 6th position of hemoglobin 
replace from a glutamic acid to a valine induce 
precipitation of hemoglobin in red blood cells)

Invariant residues Amino acid found at the same position in 
different species
  (critical for for the sructure or function of the protein)
Protein conformation: most of the proteins fold into only 
one stable conformation or native conformation

  More than 50 amino acids becomes a protein

• Stabilized by hydrogen bonds

• H- bonds are between –CO and –NH
groups of peptide backbone
• H-bonds are either intra- or inter-
• 3 types : a-helix, b-sheet and triple-helix

What forces determine the 
• Primary structure ­ determined by    
• Secondary, Tertiary, Quaternary structures ­ 
all determined by weak forces
– Weak forces ­ H­bonds, ionic interactions, van 
der Waals interactions, hydrophobic 
Non covalent 
involved in the 
shape of proteins

Secondary structures:
α Helix:
α helix conformation was discovered 50 years ago in 
α keratine abundant in hair nails, and horns
β Sheet: 
discovered within a year of the discovery of 
α helix.Found in protein fibroin the major 
constituant of silk 

The α helix: 
result from hydrogen bonding, does not involve the side chain of the amino acid

result from hydrogen bonding, does not involve the side chain of the amino acid

Two type of β Sheet 
An anti paralellel 
β sheet

A paralellel 
β sheet

• Limited to tropocollagen molecule
• Sequence motif of –(Gly­X­Pro/Hypro)n­
• 3 left­handed helices wound together to give a 
right­handed superhelix
• Stable superhelix : glycines  located on the 
central axis (small R group) of triple helix
• One interchain H­bond for each triplet of aas – 
between NH of Gly and CO of X (or Proline) in 
the adjacent chain
Triple helix of Collagen


• Helices/β­sheets: ~50% of regular 
2ostructures of globular proteins
• Remaining : coil or loop conformation
• Also quite regular, but difficult to 
• Examples : reverse turns, β­bends 
(connect successive strands of 
antiparallel β­sheets)
The Beta Turn
(aka beta bend, tight turn) 
• allows the peptide chain to reverse direction 
• carbonyl O of one residue is H­bonded to the amide proton of a residue 
three residues away 
• proline and glycine are prevalent in beta turns (?)

• A strand of polypeptide in a β­sheet may contain  an “extra” residue
• This extra residue is not hydrogen bonded to a neighbouring strand
• This is known as a β­bulge.

Tertiary structure: the overall shape of a protein 
or a telephone cord!!!

The secondary structure of a

telephone cord

A telephone cord, specifically the coil

of a telephone cord, can be used as
an analogy to the alpha helix
secondary structure of a protein.

The tertiary structure of a telephone


The tertiary structure of a protein refers

to the way the secondary structure folds
back upon itself or twists around to form
a three-dimensional structure. The
secondary coil structure is still there, but
the tertiary tangle has been
superimposed on it.
Tertiary structure: the overall shape of a protein
Full three dimensional organization of a protein

The three-dimensional
structure of a protein

• R­group interactions result in  3D structures of 
globular proteins
• Types of interactions : H­, ionic­ (salt linkage), 
hydrophobic­  and disulphide­ bond
• Hydrophilic R groups on surface while 
hydrophobic R groups buried inside of 
• Wide variety of 3o structures: since large 
variation in protein sizes and amino acid 
The role of side chain in the  Where is water?
shape of proteins



The tertiary structure for myoglobin is fairly well understood. Myoglobin has an alpha helix which then can be
viewed as being enclosed in this blue sheath, the sheath doesn't exist but we can draw it that way. That helix
folds back upon itself into what's referred to as the tertiary structure of myoglobin. Bonds between the side
groups of the amino acid residues are responsible for holding together the tertiary structure of this protein.

A coiled­coil:
Structure occurs when the 2 a helix 
have  most of their nonpolar 
(hydrophobic) side chains on one 
side, so that they can twist around 
each other with these side chain 
facing inwards

If protein is formed as a 
complex of more than one 
protein chain, the complete 
structure is designed as 
quaternery structure:

• Generally formed 
by non­covalent 
interactions between 

• Either as homo­ or 
• Oligomers (multimers) are more stable than dissociated 
– They prolong life of protein in vivo
• Active sites can be formed by residues from adjacent 
– A subunit may not constitute a complete active site
• Error of synthesis is greater for longer polypeptide chains
• Subunit interactions : cooperativity/ allosteric effects

Primary structure

Secondary structure

Tertiary structure

Quaternary structure

Structure of the hemaglutinine
(a long multimeric molecule whose three identical subunits are each composed of two
chains, HA1 and HA2).

QuickTime™ and a
TIFF (LZ W) decompressor
are needed to see this picture.

Primary structure β strands
α helices
(1 letter code used)

QuickTime™ and a
TIFF (LZ W) decompressor
are needed to see this picture.

Tertiary Quaternary
structure structure
Protein domains:
Any part of a protein that can fold 

independently into a compact, stable 
structure. A domain usually contains 
between 40 and 350 amino acids.

• A domain is the modular unit from which 
many larger proteins are constructed. 

• The different domain of protein are often 
associated with different functions.
Protein domains 
The NAD­binding 
Cytochrome b562 domain of 
A single domain protein  the enzyme lactic 
dehydrogenase The variable domain 
involved in electron transport 
of an immunoglobulin
in mitochondria

The Src protein

The modular nature of proteins:
EGF domain

Immunoglobulin domain

Membrane spanning domain

 (tissue plasminogen activator)

domain Chymotryptic domain

Epidermal growth factor: (EGF) is generated by proteolytic cleavage of a precursor

protein containing multiple EGF domains (orange).

The EGF domain also occurs in Neu protein and in tissue plasminogen activator (TPA).

Other domains, or modules, in these proteins include a chymotryptic domain (purple),

an immunoglobulin domain (green), a fibronectin domain (yellow), a membrane-
spanning domain (pink), and a kringle domain (blue).
[Adapted from I. D. Campbell and P. Bork, 1993, Curr. Opin. Struc. Biol. 3:385.]
How protein structures are determined?

The majority of protein structures known to date have been

solved with the experimental technique of

• X-ray crystallography,
which typically provides data of high resolution but provides no
time-dependent information on the protein's conformational

• NMR (nuclear magnetic resonance spectroscopy),

which provides somewhat lower-resolution data in general and is
limited to relatively small proteins, but can provide time-
dependent information about the motion of a protein in solution.

More is known about the tertiary structural features of soluble

globular proteins than about membrane proteins because the
latter class is extremely difficult to study using these methods.

An X-ray diffraction image for the protein myoglobin.

The first protein crystal structure was of sperm whale

myoglobin, as determined by Max Perutz and Sir John Cowdery
Kendrew in 1958, which led to a Nobel Prize in Chemistry.

The X-ray diffraction analysis of myoglobin was originally

motivated by the observation of myoglobin crystals in dried
pools of blood on the decks of whaling ships.
NMR is a field of structural biology, that applies nuclear
magnetic resonance spectroscopy to investigating proteins

The field was pioneered by among others, Kurt Wüthrich, who won the Nobel prize in 2002,

Pacific Northwest National Laboratory's high The NMR sample is

magnetic field (800 MHz) NMR spectrometer being prepared in a thin walled
loaded with a sample. glass tube.

Protein NMR is performed on aqueous samples of highly purified protein.

Sample consist of between 300 and 600 microlitres with a protein concentration in the
range 0.1 – 3 millimoles.

The source of the protein can be either natural or produced in an expression system using
  recombinant DNA techniques through genetic  engineering.
Function of 
peptides and proteins

Oxytocin and vasopressin are two peptide hormones
with very similar structure, but with very different
biological activities.

Interestingly, their structures only differ by one amino

acid residue (the hydrophobic LEU number 8 in
oxytocin is replaced by a hydrophilic ARG residue in

Oxytocin is a potent stimulator of uterine smooth

muscle, and also stimulates lactation.
Read Table 3.2 in Devlin
 for other examples of  Vasopressin, also know as antidiuretic hormone
biologically active peptides
(ADH), has no effect on uterine smooth muscle, but
  reabsorbtion of water by the kidney, thus
increasing blood pressure.
Function of proteins
• Enzymatic catalysis
• Transport and storage (the protein hemoglobin,
• Coordinated motion (actin and myosin). are the most
important buffers in
• Mechanical support (collagen). the body.

• Immune protection (antibodies)

• Generation and transmission of nerve impulses

- some amino acids act as neurotransmitters,
receptors for neurotransmitters, drugs, etc. are
protein in nature. (the acetylcholine receptor), • Protein molecules
possess basic and
• Control of growth and differentiation - acidic groups which
act as H+ acceptors
transcription factors or donors
Hormones respectively if H+ is
growth factors ( insulin or thyroid stimulating added or removed.
• Proteins are the most important buffers in the body. They are
mainly intracellular and include haemoglobin.

• The plasma proteins are buffers but the absolute amount is

small compared to intracellular protein.

• Protein molecules possess basic and acidic groups which act as

H+ acceptors or donors respectively if H+ is added or removed.

• Many proteins (thousands!) present in blood plasma
• Proteins contain weakly acidic (glutamate, aspartate) and basic 
(lysine, arginine, histidine) side chains (or R groups)
• At neutral pH, only histidine residues (containing imidazole R 
group with pKa ~ 6.0) in proteins can act as a buffer component 
• Haemoglobin with 38 histidine/tetramer is a good buffer
• N­terminal groups of proteins (pKa ~ 8.0) can also act as a 
buffer component
Proteins play crucial roles in almost every biological process.
They are responsible in one form or another for a variety of
physiological functions including

Enzymatic catalysis

Transport and storage

Coordinated motion

Mechanical support

Immune protection

Generation and transmission of nerve impulses

Control of growth and differentiation

Enzymatic catalysis -
almost all biological reactions are enzyme catalyzed.

Enzymes are known to increase the rate of a biological reaction by a factor of 10 to the 6th power!

There are several thousand enzymes which have been identified to date.

Transport and storage - small molecules are often carried by proteins in the
physiological setting (for example, the protein hemoglobin is responsible for the transport of
oxygen to tissues). Many drug molecules are partially bound to serum albumins in the plasma.

The binding of oxygen is affected by molecules such as

carbon monoxide (CO) (for example from tobacco smoking,
cars and furnaces).

CO competes with oxygen at the heme binding site.

Hemoglobin binding affinity for CO is 200 times greater than
its affinity for oxygen, meaning that small amounts of CO
dramatically reduces hemoglobin's ability to transport
oxygen. When hemoglobin combines with CO, it forms a
very bright red compound called carboxyhemoglobin.

When inspired air contains CO levels as low as 0.02%,

headache and nausea occur; if the CO concentration is
increased to 0.1%, unconsciousness will follow. In heavy
3-dimensional structure of smokers, up to 20% of the oxygen-active sites can be
hemoglobin. The four subunits are blocked by CO.
shown in red and yellow, and the
heme groups in green.

Coordinated motion - muscle is mostly protein, and muscle contraction is mediated by
the sliding motion of two protein filaments, actin and myosin.

Platelet activation is a controlled 
sequence of actin filament:

Cross linking

That creates a dramatic shape change 
in the platelet

Platelet before activation Activated platelet
Activated platelet
 at a later stage than C)
Mechanical support - skin and bone are strengthened by the protein collagen.

Abnormal collagen synthesis or 
structure causes dysfunction of

•  cardiovascular organs, 
• bone, 
• skin,
• joints 
• eyes

Refer to Devlin 
Clinical correlation 3.4 p121

Immune protection - antibodies are protein structures that are responsible for
reacting with specific foreign substances in the body.

Generation and transmission of nerve impulses -

Some amino acids act as neurotransmitters, which transmit electrical signals from one nerve cell to
another. In addition, receptors for neurotransmitters, drugs, etc. are protein in nature.

An example of this is the acetylcholine receptor, which is a protein structure that is embedded in
postsynaptic neurons.

gamma Amino butyric acid
Synthesised from glutamate

GABA acts at inhibitory synapses in

the brain. GABA acts by binding to
specific receptors in the plasma
membrane of both pre- and
postsynaptic neurons.

Control of growth and differentiation -
proteins can be critical to the control of growth, cell differentiation and expression of DNA.

For example, repressor proteins may bind to specific segments of DNA, preventing expression and
thus the formation of the product of that DNA segment.

Also, many hormones and growth factors that regulate cell function, such as insulin or thyroid
stimulating hormone are proteins.


Membrane transport proteins


In general, all globular proteins have

3D structures that are
specialized for their particular functions.

Shape and function

The relationship between shape and function of proteins:

The relationship between shape and function of proteins:

Protein degradation:

Disease and protein folding:


HOT Areas of Medical Research
Human Genome
sequencing is completed

Application in Biology and

Medicine just beginning

e.g., Cloning of a disease gene is

the first step in understanding
the basic defects and rational

Structural and functional

characterization of all novel
PROTEINS will unravel new
disease genes.
Shape and function
In globular proteins, tertiary interactions are frequently stabilized by the sequestration
of hydrophobic amino acid residues in the protein core, from which water is excluded,
and by the consequent enrichment of charged or hydrophilic residues on the protein's
water-exposed surface.

In secreted proteins that do not spend time in the cytoplasm, disulfide bonds between
cysteine residues help to maintain the protein's tertiary structure.

A variety of common and stable tertiary structures appear in a large number of proteins
that are unrelated in both function and evolution - for example, many proteins are
shaped like a TIM barrel, named for the enzyme triosephosphateisomerase.

Another common structure is a highly stable dimeric coiled-coil structure composed of

four alpha helices.

Proteins are classified by the folds they represent in databases like SCOP and CATH.