You are on page 1of 89

Fundamental

Molecular Biology
Second Edition

Lisabeth A. Allison

Chapter 4
Protein Structure and Folding

Copyright © 2012 John Wiley & Sons, Inc. All rights reserved.
Cover photo: Julie Newdoll/www.brushwithscience.com “Dawn1 of the
Double Helix”, oil and mixed media on canvas, © 2003
Outline
4.1 Introduction
4.2 Primary structure: amino acids and the genetic code
4.3 The three-dimensional structure of proteins
4.4 Protein function and regulation of activity
4.5 Protein folding and misfolding

2
4.1

• Proteins are found in all living systems, ranging from


bacteria and archaea through the unicellular
eukaryotes, to plants, fungi, and animals.

• In all life forms, proteins are made up of the same


building blocks - amino acids.

• Each cell contains thousands of different genes and


makes thousands of different proteins.

3
What is a gene?

• In the late 1930s…


“A molecule of living stuff made up of many atoms
held together.”

• A specific stretch of nucleotides in DNA (or in


some viruses, RNA) that contains information for
making a particular RNA molecule that in most
cases is used to make a particular protein.

4
4.2

The 22 amino acids found in proteins

• Proteins are chain-like polymers of amino acids


specified by the genetic code.

• Each amino acid has an amino group (NH3+) and a


carboxyl group (COO) attached to a central carbon
called the -carbon.

• The only difference between two amino acids is in


their different side chain or “R group.”

5
pyrrolysine

selenocysteine

6
• At pH 7 the amino and carboxyl groups of amino
acids are charged.

• Over a pH range from 1 to 14 these groups exhibit


binding and dissociation of a proton.

• The weak acid-base behavior of amino acids


provides the basis for many techniques for amino
acid identification and protein separations (Fig. 4.2).

7
8
Protein primary structure

• Amino acids joined together by peptide bonds


form the primary structure of a protein.

• The amino group of one molecule reacts with the


carboxyl group of the other in a condensation
reaction.

9
10
• When joined in a series of peptide bonds, amino
acids are called residues.

• A short sequence of amino acids is called a peptide;


the term polypeptide applies to longer chains of
amino acids.

• The arrangement of amino acids, with their distinct


side chains, gives each protein its characteristic
structure and function.

11
• The peptide bond has a partial double bond character
as a result of resonance (Fig. 4.4A).

• Free rotation occurs only between the -carbon and the


peptide unit.

• Trans and cis-configurations are possible about the


rigid peptide bond (Fig. 4.4B).

• The peptide chain is flexible, but it is more rigid than it


would be if there were free rotation about all of the
bonds.
12
13
Translating the genetic code

• A DNA sequence is read in triplets using the


antisense (non-coding) strand as a template that
directs synthesis of RNA via complementary base
pairing (Fig. 4.5).

• An open reading frame (ORF) in the mRNA


indicates the presence of a start codon followed by
codons for a series of amino acids and ending with a
termination codon.

14
15
The genetic code
• Each “codon box” is composed of four three-letter
codes, 64 in all (4 x 4 x 4).

• 61 codons are recognized by tRNAs for the


incorporation of the 20 common amino acids.

• 3 codons signal termination, or code for


selenocysteine and pyrrolysine.

16
*
*

17
The genetic code is degenerate

• tRNAs specific to a particular amino acid recognize


multiple codon triplets that differ only in the third letter.

e.g. leucine is coded for by 6 different codons, while methionine


has only one codon

18
The “wobble hypothesis”
• Pairing between codon and
anticodon at the first two
codon positions always
follows the usual rule of
complementary base
pairing.

• Exceptional “wobbles”
(non-Watson-Crick base
pairing) can occur at the
third position.
19
The genetic code is not universal

• In certain organisms and organelles the meaning of


select codons has been changed.
e.g. Tetrahymena reads UAA and UAG as glutamine (Gln)

20
21
The 21st and 22nd genetically encoded
amino acids
The UGA code for selenocysteine is found in:
• >15 genes in prokaryotes that are involved in redox
reactions.
• >40 genes in eukaryotes that code for various antioxidants
and the type I iodothyronine deiodinase.

The UAG code for pyrrolysine has been found in:


• a few archaebacteria and eubacteria.

22
Modified nucleotides and codon bias

• “Wobbles” can occur at the third position.

• When bases in the anticodon are modified, further


pairing patterns are possible.

• Examples:
Inosine can pair with U, C, and A.
2-thiouracil restricts pairing to A alone.

23
Implications of codon bias for molecular
biologists
• The frequencies with which different codons are used vary
significantly between different organisms and between
proteins expressed at high or low levels within the same
organism.

• Expression of functional proteins in heterologous hosts is a


cornerstone of molecular biology research.

• Codon bias can have a major impact on the efficiency of


expression of proteins if they contain codons that are rarely
used in the desired host.
24
• What might happen if you tried to express a
Tetrahymena gene that encodes a glutamine-rich
protein in E. coli?

• (Tetrahymena reads UAA and UAG as glutamine (Gln)

25
D- and L-amino acids in nature

• D- and L-amino acids are enantiomers (stereoisomers


that are mirror images of each other).

• Living organisms are composed predominantly of L-


amino acids.

• Ribosomes only use L-amino acids to make proteins.

26
27
Exceptions:

• D-amino acids are found in some peptides in


microorganisms, but are synthesized by pathways that
do not involve the ribosome.

• D-amino acids are present in some peptides in other


organisms, but are made from the genetically encoded
L-amino acids by a post-translational process.

28
Examples:

• D-amino acids are present in the venom of some


bivalves, snails, spiders, amphibians, and the duck-
bill platypus.

• The presence of D-amino acids is linked to more


potent venom.

29
(from spider venom)

30
4.3

• Dalton (Da) units are typically used to describe the


molecular weight of proteins.

• Typical polypeptide chains have molecular weights of


20 to 70 kDa (20,000 to 70,000 Da).

• The average molecular weight of an amino acid is 110


Da.

• A typical polypeptide chain thus contains 181 to 636


amino acids.

31
Secondary structure

• Interactions of amino acids with their neighbors


gives a protein its secondary structure.

• Primarily stabilized by hydrogen bonds.

• Also depends on disulfide bridges, van der Waals


interactions, hydrophobic contacts, and electrostatic
interactions.

32
The three basic elements of protein
secondary structure

 -helix

 -pleated sheet

• Unstructured turns

33
34
-helix

• Most common structural motif in proteins.

• Tight helical structure stabilized by hydrogen


bonding among near-neighbor amino acids.

• Proline, the “helix-breaking residue”, cannot


participate as a donor in hydrogen bonding.

35
-pleated sheet
• Extended amino acids chains packed side by side to
create a pleated, accordian-like appearance.

• Stabilized by hydrogen bonding.

<Parallel  structure>
• Two segments of a polypeptide chain (or two individual
polypeptides) are aligned in the N-terminal to C-terminal
direction or vice versa.

<Antiparallel  structure>
• One segment is N-terminal to C-terminal and the other is C-
terminal to N-terminal.
36
Unstructured turns

• “Turns” connect the -helices and -pleated sheets


in proteins.

• Relatively short loops that do not exhibit a defined


secondary structure.

37
Tertiary structure
• The folded three-dimensional shape of a polypeptide.

• Most interactions are stabilized by noncovalent bonds:


- Hydrophobic interactions
- Hydrogen bonds

• The principle covalent bonds within and between


polypeptides are disulfide (S-S) bonds or “bridges”
between cysteines (Fig. 4.9).

38
39
Three main categories of tertiary structure

• Globular proteins

• Fibrous proteins

• Membrane proteins

40
Globular proteins
• The overall shape of most proteins is roughly
spherical.
e.g. the enzyme lysozyme folds up into a globular tertiary
structure forming the active site.

41
Fibrous proteins
• Long filamentous or “rod-like” structures.

• Structural components of cells and tissues.

• A number of major designs:

- triple helical arrangement (e.g. collagen)


- “coiled coils”
- antiparallel -pleated sheets

42
43
Membrane proteins

• Differ from soluble proteins in the relative


distribution of hydrophobic amino acid residues.

• The seven transmembrane helix structure is a


common motif in membrane proteins.

44
45
Quaternary structure

• A functional protein can be composed of one or


more polypeptide subunits.

• Can be identical or nonidentical subunits.

• Stabilizing bonds are the same as those for tertiary


structure.

46
• Quaternary structure allows greater versatility of
function.

• Catalytic or binding sites are often formed at the


interface between subunits.
e.g. the two  and two  subunits in hemoglobin form a
binding site for a heme group

47
4.4

• Proteins larger than about 20 kDa are often formed from


two or more domains with specific functions.

• A single domain is usually formed from a continuous


amino acid sequence. (e.g. DNA-binding domain)

• Domains contain common structural-functional motifs.

• Proteins have a diversity of functions in cells.

• One vital role of proteins is to serve as enzymes that


catalyze the hundreds of chemical reactions necessary for
life.
48
Enzymes are biological catalysts

• Enzymes lower the activation energies of the chemical


groups that participate in a reaction and thereby speed
up the reaction.

• The substrate forms a tight complex with the enzyme by


binding to a region called the active site.

• Most enzymes act through an induced-fit mechanism.

49
Example:

• Lysozyme catalyzes the breakdown of


polysaccharides from the E. coli peptidoglycan layer.

• The active site is a long, deep cleft that can bind six
N-acetylglucosamine (NAG) and N-acetylmuramic
acid (NAM) units.

• Lysozyme brings the reacting species together in a


geometry that favors reaction.

50
• For the fourth NAG-NAM unit to fit in the active
site, it must be distorted, and forms a less stable
conformation.

• Asp52 and Glu35 residues of lysozyme interact


with the fourth and fifth NAG-NAM units,
breaking the C-O bond between them by
hydrolysis.

51
52
Regulation of protein activity by post-
translational modifications

The functional activity of proteins can be regulated at


several different levels:
– Transcription

– RNA processing

– Translation

– Post-translational modifications, such as phosphorylation


and allosteric effectors

53
54
• After translation, proteins are joined covalently and
noncovalently to other molecules.
e.g. lipoproteins, glycoproteins, metalloproteins

• The most common regulatory mechanism is the


reversible phosphorylation of amino acid side chains.

55
Protein phosphorylation

• May cause a protein to change shape and unmask or


mask a catalytic or functional domain.

• Phosphorylated side chain may be part of a binding


motif to facilitate formation of a multiprotein
complex.

• Phosphorylated side chain may promote dissociation


of a multiprotein complex.

56
Kinases

• Catalyze the addition of phosphate groups.

• Tend to be very specific, acting on very few substrates.

• Two protein kinase groups have been widely studied in


eukaryotes:
1. Those that phosphorylate serine or threonine side chains.
2. Those that phosphorylate tyrosine side chains.

57
Phosphatases

• Remove phosphates.

• Tend to be less specific, acting on many substrates.

58
Allosteric regulation of protein activity

• Ligand-induced conformational change.

• An active site or another binding site is altered in a


way that increases or decreases its activity.

59
Negative control Positive control

60
Example:

• Cyclin-dependent kinase (CDK) activity is regulated


by both allosteric modification and phosphorylation.

61
Inactive conformation of CDK (Fig. 4.19)

• The T loop is located at the entrance to the active site.

• Polypeptide substrates are blocked from gaining access


to the ATP molecule in the active site.

• A critical glutamate residue in the PSTAIRE helix is held


at a distance from the active site.

62
Partial activation of CDK

• Binding of cyclin to CDK induces a conformational


change.

• T loop moves away from the entrance of the active


site.

• Critical glutamate in PSTAIRE helix moves into active


site.

63
Full activation of CDK

• Phosphorylation of Thr160 in T loop by CDK-


activating kinase (CAK).

• Stabilizes active site “catalytic cleft.”

64
(CDK)

Phosphorylation of Thr160

65
Macromolecular assemblages
• Expression of the genetic information relies on the
sequential action of large and dynamic macromolecular
assemblages or “molecular machines.”

• In some cases, protein folding is initiated before the


completion of protein synthesis.

• Other proteins undergo major folding after release into


the cytoplasm or a specific organelle.

• Most proteins require “molecular chaperones” to fold


properly in vivo.
66
67
4.5
Molecular chaperones

• Increase the efficiency of protein folding.

• Reduce the probability of competing reactions such


as aggregation.

• Aid in the destruction of misfolded proteins.

• Typically ATP-dependent.

68
Regulation of protein folding
69
• Heat-shock proteins promote protein folding and aid
in the destruction of misfolded protein.
e.g. Hsp40, Hsp70, Hsp90

• Hsp90 mediates protein folding by undergoing


major shape changes upon binding and hydrolysis
of ATP and interaction with p23 (Fig. 4.21).

70
Hsp90 chaperone function

71
Endoplasmic reticulum “quality control”

• Secreted proteins are translocated into the


endoplasmic reticulum (ER).
• Folding takes place before secretion through the
Golgi apparatus.
• Folding catalysts accelerate potentially slow steps in
the folding process
e.g. peptidylprolyl and protein disulfide isomerases

• Incorrectly folded proteins are detected by the


“unfolded protein response” and targeted for
degradation. 72
Ubiquitin-mediated protein degradation

• Ubiquitin (a 76 amino acid polypeptide) is attached to a


protein by a series of enzyme-mediated reactions.

• The ubiquitin-conjugated protein is then targeted to the


26S proteasome.

• Ubiquitin is released and the target protein is degraded


by proteases.

73
74
Ubiquitin Proteasome System programme

https://www.youtube.com/watch?v=hvNJ3yWZQbE

75
Protein misfolding diseases

• Formation of protein aggregates is linked to at least 20


different human diseases.

• Normally soluble proteins accumulate as insoluble


deposits known as amyloid or amyloid-like fibrils.

• Proteins in amyloid-like fibrils fold into a cross -


spine.

76
77
Amyloid-like fibrils

78
Prions

The primary cause of transmissible spongiform


encephalopathies (TSEs).

• Progressive neurodegeneration
• Dementia
• Loss of muscle control of voluntary movements
• Once symptoms appear, death results in 6 months to 1 year
• There is no cure

79
Human forms of prion disease
• Kuru
• Creutzfeldt-Jakob disease
• Gerstmann-Straussler syndrome
• Fatal familial insomnia

Animal forms
• Scrapie (sheep)
• Bovine spongiform encephalopathy (BSE: “mad cow
disease”)
• Chronic wasting disease (elk and deer)

80
The “prion only” hypothesis of infection

• Stanley Prusiner: Nobel Prize in 1997.

• Lack of immune response characteristic of infectious


diseases.

• Long incubation time (up to 40 years for kuru).

• Resistance of the infectious agent to radiation that


destroys living microorganisms (e.g. viruses, bacteria).

81
• The infectious agent is not a living organism but a
protein called scrappie prion protein (PrPSc)with the
unusual ability to replicate itself within the body.

• The prion PrPSc has the same amino acid sequence


as the normal host protein PrPC.

• But, the prion is misfolded into a different 3-D


structure.

82
After misfolding the prion protein becomes…

• Aggregated (brain plaques)

• Protease resistant

• Infectious

• Able to survive standard sterilization techniques

83
Normal cell
• The normal cellular protein PrPC is a cell surface protein
expressed in neurons.

Infected cell
• Host protein PrPC is misfolded to form new prions called
PrPSc.
• Formation of fibrils, aggregates, and amyloid plaques.

84
Human sporadic transmissible
spongiform encephalopathies

• PrPC misfolds spontaneously and generates more


prions by “autoinfection”
Creutzfeldt-Jakob disease (CJD)

• Preventative action?
None – frequency of one in a million.

85
86
Human inherited transmissible
spongiform encephalopathies

• Mutated PrPC gene with greater tendency to


spontaneously misfold to prion form.

Gerstmann-Straussler syndrome
Fatal familial insomnia

• Preventative action?
None – 100% likelihood of disease progression.

87
Human infectious transmissible
spongiform encephalopathies

• Eating brains or infected meat products

Kuru: former ritualistic cannibalism in Papua New Guinea


New variant CJD: consumption of tainted beef

• Preventative action?
Don’t eat contaminated meat products.

88
Outline
4.1 Introduction
4.2 Primary structure: amino acids and the genetic code
4.3 The three-dimensional structure of proteins
4.4 Protein function and regulation of activity
4.5 Protein folding and misfolding

89

You might also like