You are on page 1of 96

BIO 403 – Genomics and Proteomics

Proteomics

Md. Wahid Murad


Lecturer
Department of Genetic Engineering and Biotechnology
University of Dhaka
wahid.murad@du.ac.bd
Basics of Protein Structure
Levels of Protein Structure
Levels of Protein Structure

(a) The linear sequence of amino acid residues defines the primary structure.

(b) Secondary structure consists of regions of regularly repeating conformations of the


peptide chain such as α helices and β sheets.

(c) Tertiary structure describes the shape of the fully folded polypeptide chain.

(d) Quaternary structure refers to the arrangement of two or more polypeptide chains into
a multi-subunit molecule.
Amino acids
Amino acids
Peptide bond
Peptides and Proteins

• The peptide formation involves two amino acids covalently joined together between the
carboxyl group of one amino acid and the amino group of another. This reaction is a
condensation reaction involving removal of elements of water from the two molecules.
The resulting product is called a dipeptide. The newly formed covalent bond connecting
the two amino acids is called a peptide bond. Once an amino acid is incorporated into a
peptide, it becomes an amino acid residue. Multiple amino acids can be joined together
to form a longer chain of amino acid polymer.
• A linear polymer of more than fifty amino acid residues is referred to as a polypeptide. A
polypeptide, also called a protein, has a well-defined three-dimensional arrangement. On
the other hand, a polymer with fewer than fifty residues is usually called a peptide
without a well-defined three-dimensional structure.
Peptides and Proteins

• The residues in a peptide or polypeptide are numbered beginning with the residue
containing the amino group, referred to as the N-terminus, and ending with the residue
containing the carboxyl group, known as the C-terminus.
• The actual sequence of amino acid residues in a polypeptide determines its ultimate
structure and function.
• The atoms involved in forming the peptide bond are referred to as the backbone atoms.
They are the nitrogen of the amino group, the α carbon to which the side chain is
attached and carbon of the carbonyl group.
Determination of Protein Structure

• X-ray crystallography

• Nuclear magnetic resonance (NMR) spectroscopy

• Cryo-electron microscopy
Dihedral Angles

• A peptide bond is actually a partial double bond owing to shared electrons between
O=C–N atoms. The rigid double bond structure forces atoms associated with the peptide
bond to lie in the same plane, called the peptide plane. Because of the planar nature of
the peptide bond and the size of the R groups, there are considerable restrictions on the
rotational freedom by the two bonded pairs of atoms around the peptide bond. The
angle of rotation about the bond is referred to as the dihedral angle (also called the
tortional angle). For a peptide unit, the atoms linked to the peptide bond can be moved to
a certain extent by the rotation of two bonds flanking the peptide bond. This is measured
by two dihedral angles. One is the dihedral angle along the N–Cα bond, which is defined
as phi (φ); and the other is the angle along the Cα–C bond, which is called psi (ψ). Various
combinations of φ and ψ angles allow the proteins to fold in many different ways.
Ramachandran Plot

• The rotation of φ and ψ is not completely


free because of the planar nature of the
peptide bond and the steric hindrance
from the side chain R group.
Consequently, there is only a limited
range of peptide conformation.
• When φ and ψ angles of amino acids of a
particular protein are plotted against
each other, the resulting diagram is called
a Ramachandran plot.
• This plot maps the entire conformational
space of a peptide and shows sterically
allowed and disallowed regions. It can be
very useful in evaluating the quality of
protein models.
Stabilizing Forces

• Protein structures from secondary to quaternary are maintained by noncovalent forces.


These include
• electrostatic interactions,
• van der Waals forces, and
• hydrogen bonding.

• Electrostatic interactions are a significant stabilizing force in a protein structure. They


occur when excess negative charges in one region are neutralized by positive charges in
another region. The result is the formation of salt bridges between oppositely charged
residues. The electrostatic interactions can function within a relatively long range (15 Å).
Stabilizing Forces

• Hydrogen bonds are a particular type of electrostatic interactions similar to dipole– dipole
interactions involving hydrogen from one residue and oxygen from another. Hydrogen
bonds can occur between main chain atoms as well as side chain atoms. Hydrogen from
the hydrogen bond donor group such as the N–H group is slightly positively charged,
whereas oxygen from the hydrogen bond acceptor group such as the C=O group is
slightly negatively charged. When they come within a close distance (<3 Å), a partial bond
is formed between them, resulting in a hydrogen bond. Hydrogen bonding patterns are a
dominant factor in determining different types of protein secondary structures.
• Van der Waals forces also contribute to the overall protein stability. These forces are
instantaneous interactions between atoms when they become transient dipoles. A
transient dipole can induce another transient dipole nearby. The dipoles of the two atoms
can be reversed a moment later. The oscillating dipoles result in an attractive force. The
van der Waals interactions are weaker than electrostatic and hydrogen bonds and thus
only have a secondary effect on the protein structure.
Stabilizing Forces

• In addition to these common stabilizing forces, disulfide bridges, which are covalent
bonds between the sulfur atoms of the cysteine residue, are also important in
maintaining some protein structures.

• For certain types of proteins that contain metal ions as prosthetic groups, noncovalent
interactions between amino acid residues and the metal ions may play an important
structural role.
Secondary Structures
Secondary Structures

• Local structures of a protein with regular conformations are


known as secondary structures. They are stabilized by hydrogen
bonds formed between carbonyl oxygen and amino hydrogen of
different amino acids. Chief elements of secondary structures are
α-helices and β-sheets.

• An α-helix has a main chain backbone conformation that


resembles a corkscrew. Nearly all known α-helices are right
handed, exhibiting a rightward spiral form. In such a helix, there
are 3.6 amino acids per helical turn. Hydrophobic residues of the
helix tend to face inside and hydrophilic residues of the helix face
outside. Thus, every third residue along the helix tends to be a
hydrophobic residue. Ala, Gln, Leu, and Met are commonly found
in an α-helix, but not Pro, Gly, and Tyr.
Secondary Structures

• A β-sheet is a fully extended configuration built up from several spatially adjacent regions
of a polypeptide chain. Each region involved in forming the β-sheet is a β-strand. The β-
strand conformation is pleated with main chain backbone zigzagging and side chains
positioned alternately on opposite sides of the sheet. β-Strands are stabilized by hydrogen
bonds between residues of adjacent strands. The β-strands can run in the same direction
to form a parallel sheet or can run every other chain in reverse orientation to form an
antiparallel sheet, or a mixture of both.
Secondary Structures

• Loops and turns connect a helices and b strands and allow the polypeptide chain to fold
back on itself producing the compact three-dimensional shape seen in the native
structure.
• Loops often contain hydrophilic residues and are usually found on the surfaces of proteins
where they are exposed to solvent and form hydrogen bonds with water.
• Loops containing only a few (up to five) residues are referred to as turns if they cause an
abrupt change in the direction of a polypeptide chain.
• If the connecting regions are completely irregular, they belong to random coils.
• Residues in the loop or coil regions tend to be charged and polar and located on the
surface of the protein structure. They are often the evolutionarily variable regions where
mutations, deletions, and insertions frequently occur. They can be functionally significant
because these locations are often the active sites of proteins.
Tertiary Structure

• Tertiary structure results from the folding of a polypeptide (which may already possess
some regions of α helix and β structure) into a closely packed three-dimensional
structure.
• An important feature of tertiary structure is that amino acid residues that are far apart in
the primary structure are brought together permitting interactions among their side
chains. Whereas secondary structure is stabilized by hydrogen bonding between amide
hydrogens and carbonyl oxygens of the polypeptide backbone, tertiary structure is
stabilized primarily by noncovalent interactions (mostly the hydrophobic effect)
between the side chains of amino acid residues. Disulfide bridges, though covalent, are
also elements of tertiary structure they are not part of the primary structure since they
form only after the protein folds.
Motifs

• Supersecondary structures, or
motifs, are recognizable
combinations of α helices, β strands,
and loops that appear in a number
of different proteins.
• Sometimes motifs are associated
with a particular function although
structurally similar motifs may have
different functions in different
proteins.
Domains

• Many proteins are composed of several discrete,


independently folded, compact units called domains.
Domains may consist of combinations of motifs. The
size of a domain varies from as few as 25 to 30 amino
acid residues to more than 300.
• Domains are usually connected by loops but they are
also bound to each other through weak interactions
formed by the amino acid side chains on the surface of
each domain.
• Some domain structures occur in many different
proteins whereas others are unique. In general, proteins
can be grouped into families according to similarities in
domain structures and amino acid sequence. All of the
members of a family have descended from a common
ancestral protein.
Domains

Protein domains can be classified by their structures. One commonly used classification
scheme groups these domains into four categories.
• The “all-α” category contains domains that consist almost entirely of α helices and loops.
• “All- β” domains contain only β sheets and nonrepetitive structures that link β strands.
The other two categories contain domains that have a mixture of α helices and β strands.
• Domains in the “α/β” class have supersecondary structures such as the β-α-β motif and
others in which regions of α helix and β strand alternate in the polypeptide chain.
• In the “α+β” category, the domains consist of local clusters of α helices and β sheet where
each type of secondary structure arises from separate contiguous regions of the
polypeptide chain.
Domains

• Protein domains can be further classified


by the presence of characteristic folds
within each of the four main structural
categories.
• A fold is a combination of secondary
structures that form the core of a
domain.
Domain Structure and Function

• The relationship between domain structure and function is complex.


• Often a single domain has a particular function such as binding small molecules or
catalyzing a single reaction.
• In multifunctional enzymes, each catalytic activity can be associated with one of several
domains found in a single polypeptide chain.
• However, in many cases the binding of small molecules and the formation of the active
site of an enzyme take place at the interface between two separate domains. These
interfaces often form crevices, grooves, and pockets that are accessible on the surface of
the protein.
• The extent of contact between domains varies from protein to protein.
Quaternary Structure

• Many proteins exhibit an additional level of organization called quaternary structure.


Quaternary structure refers to the organization and arrangement of subunits in a protein
with multiple subunits. Each subunit is a separate polypeptide chain.
• A multisubunit protein is referred to as an oligomer (proteins with only one polypeptide
chain are monomers). The subunits of a multisubunit protein may be identical or different.
• When the subunits are identical, dimers and tetramers predominate. When the subunits
differ, each type often has a different function.
• A common shorthand method for describing oligomeric proteins uses Greek letters to
identify types of subunits and subscript numerals to indicate numbers of subunits. For
example, an α2βγ protein contains two subunits designated α and one each of subunits
designated β and γ.
Quaternary Structure

• The subunits within an oligomeric protein always have


a defined stoichiometry and the arrangement of the
subunits gives rise to a stable structure where subunits
are usually held together by weak noncovalent
interactions.
• Hydrophobic interactions are the principal forces
involved although electrostatic forces may contribute
to the proper alignment of the subunits. Because
inter-subunit forces are usually rather weak, the
subunits of an oligomeric protein can often be
separated in the laboratory.
• In vivo, however, the subunits usually remain tightly
associated.
Protein Folding
Protein Folding

• Protein folding is predicated on the theory that a linear polypeptide can, without
assistance from other molecules, fold into its physiological form (native state), first
shown by Christian Anfinsen. Native state is the structure of the macromolecule in nature
at which it is fully functional.
Christian Anfinsen’s Experiment

• In a key experiment Christian Anfinsen added urea to a sample of ribonuclease A until the
protein lost its secondary and tertiary structure – a process known as denaturation. In the
denatured form (or the nonnative state) the protein loses its ability to catalyze its normal
enzymatic activity.
• Anfinsen gradually removed the urea from the sample and observed that ribonuclease A
regained its catalytic activity.
• This simple experiment showed that a protein can self-fold into its native state.

• When unfolded, the protein is very flexible and assumes no one specific conformer for a
significant length of time. After folding, the same protein is folded into a tertiary structure,
which, under physiological conditions, is maintained. The folded protein is the most stable
of all forms of the protein.
Energy Landscape Theory

• The energy landscape theory holds that proteins in the unfolded


state can assume many conformations and may start to fold
starting from any one of these conformations.
• Folding decreases the potential energy of the protein by creating
new intramolecular noncovalent bonds and by removing water
from hydrophobic side chains. The removal of water when
hydrophobic groups associate is called the hydrophobic effect.
• The native state of the protein is the global minimum potential
energy.
• As the protein folds there are local energy minima that the
protein may encounter. At these local minima, the protein is in a
misfolded state, and it is only with the addition of energy that a
protein can get out of a local minimum energy state.
Energy Landscape Theory

• The energy landscape theory suggests that there are many possible protein folding
pathways that lead to the native state.

• Cyrus Levinthal calculated, assuming the folding process is random and given the number
of potential torsion angles possible in a polypeptide, that it would take an incredibly large
number of trials to fold a protein correctly.
• Extending this idea, Robert Zwanzig estimated that by random sampling of all possible
conformers, it would take approximately 1027 years to fold a small protein containing 101
amino acids.

• Because proteins are observed to fold on the order of 10-6 s, it is likely that the protein
explores only one or a few pathways before properly folding.
Proteomics
Proteomics

• Proteomics is the large-scale study of proteomes.


• Proteomics deals with proteins and their structure, functionality, interaction between
other proteins, and how they function in our bodies.

• A proteome is a set of proteins


produced in an organism,
system, or biological context.
Proteomic Databases

• Comprehensive, • Patterns and profiles • An ontological


high-quality and specific for more representation of
freely accessible than a thousand protein-related
resource of protein protein families or entities.
sequence and domains.
functional
information.

• Database of protein • Fully automated • Protein-protein


3D structures. protein structure interaction network.
homology-modelling
server.
Protein 3D Structure Visualization
PDB Files

• A text file that contains protein sequence and atomic coordinates.


• A PDB Viewer software can read the coordinates and give it a 3D visualization.
3D Structure of Hemoglobin with IHP

• PDB ID: 1NIH


IHP at binding site
Prediction of Protein Structures
Prediction of Secondary Structure

• Protein secondary structure prediction refers to the prediction of the conformational


state of each amino acid residue of a protein sequence as one of the three possible
states, namely, helices, strands, or coils, denoted as H, E, and C, respectively. The
prediction is based on the fact that secondary structures have a regular arrangement of
amino acids, stabilized by hydrogen bonding patterns. The structural regularity serves the
foundation for prediction algorithms.

• The secondary structure prediction methods can be either ab initio based, which make
use of single sequence information only, or homology based, which make use of multiple
sequence alignment information. The ab initio methods, which belong to early generation
methods, predict secondary structures based on statistical calculations of the residues of a
single query sequence. The homology-based methods do not rely on statistics of residues
of a single sequence, but on common secondary structural patterns conserved among
multiple homologous sequences.
Sequence-structure mapping

The top string chain illustrates a protein sequence. Each letter of this sequence represents
an amino-acid molecule.
The aim of sequence-structure mapping is to assign each amino-acid molecule to one of
three classes of protein secondary structure named as α-helix (H), β-sheet (E) and coil (C).
Secondary Structure Prediction Using PSIPRED

1. Download the amino acid sequence from NCBI-Protein or UniProt


2. Open PSIPRED at http://bioinf.cs.ucl.ac.uk/psipred/
3. Submit sequence
Exercise

>Q148H4.1|Keratin (Bovine)
MTCGSGFRGRAFSCVSACGPRPGRCCITAAPYRGISCYRGLTGGFGSRSICGGFRAGSFGRSFGYRSGGV
GGLNPPCITTVSVNESLLTPLNLEIDPNAQCVKQEEKEQIKCLNNRFAAFIDKVRFLEQQNKLLETKLQF
YQNRQCCESNLEPLFNGYIETLRREAECVEADSGRLSSELNSLQEVLEGYKKKYEEEVALRATAENEFVA
LKKDVDCAYLRKSDLEANVEALIQEIDFLRRLYEEEIRVLQAHISDTSVIVKMDNSRDLNMDNIVAEIKA
QYDDIASRSRAEAESWYRSKCEEIKATVIRHGETLRRTKEEINELNRVIQRLTAEVENAKCQNSKLEAAV
TQAEQQGEAALNDAKCKLAGLEEALQKAKQDMACLLKEYQEVMNSKLGLDIEIATYRRLLEGEEQRLCEG
VGSVNVCVSSSRGGVVCGDLCVSGSRPVTGSVCSAPCSGNLAVSTGLCAPCGPCNSVTSCGLGGISSCGV
GSCASVCRKC
Prediction of Tertiary Structure

• There are three computational approaches to protein three-dimensional structural


modeling and prediction. They are homology modeling, threading, and ab initio
prediction.
• The first two are knowledge-based methods; they predict protein structures based on
knowledge of existing protein structural information in databases.
• Homology modeling builds an atomic model based on an experimentally determined
structure that is closely related at the sequence level.
• Threading identifies proteins that are structurally similar, with or without detectable
sequence similarities.
• The ab initio approach is simulation based and predicts structures based on
physicochemical principles governing protein folding without the use of structural
templates.
Homology Modeling

• As the name suggests, homology modeling predicts protein structures based on sequence
homology with known structures. It is also known as comparative modeling. The principle
behind it is that if two proteins share a high enough sequence similarity, they are likely
to have very similar three-dimensional structures. If one of the protein sequences has a
known structure, then the structure can be copied to the unknown protein with a high
degree of confidence. Homology modeling produces an all-atom model based on
alignment with template proteins.
Homology Modeling

The overall homology modeling procedure consists of six steps.


1. The first step is template selection, which involves identification of homologous
sequences in the protein structure database to be used as templates for modeling.
2. The second step is alignment of the target and template sequences.
3. The third step is to build a framework structure for the target protein consisting of main
chain atoms.
4. The fourth step of model building includes the addition and optimization of side chain
atoms and loops.
5. The fifth step is to refine and optimize the entire model according to energy criteria.
6. The final step involves evaluating of the overall quality of the model obtained.
Homology Modeling
Threading

• There are only small number of protein folds available (<1,000), compared to millions of
protein sequences. This means that protein structures tend to be more conserved than
protein sequences. Consequently, many proteins can share a similar fold even in the
absence of sequence similarities. This allowed the development of computational
methods to predict protein structures beyond sequence similarities.
• By definition, threading or structural fold recognition predicts the structural fold of an
unknown protein sequence by fitting the sequence into a structural database and
selecting the best-fitting fold. The comparison emphasizes matching of secondary
structures, which are most evolutionarily conserved. Therefore, this approach can identify
structurally similar proteins even without detectable sequence similarity.
• The algorithms can be classified into two categories, pairwise energy based and profile
based. The pairwise energy–based method was originally referred to as threading and the
profile-based method was originally defined as fold recognition. However, the two terms
are now often used interchangeably without distinction in the literature
Threading

• In the pairwise energy based method, a protein sequence is searched for in a structural
fold database to find the best matching structural fold using energy-based criteria. The
detailed procedure involves aligning the query sequence with each structural fold in a fold
library.
• The next step is to build a crude model for the target sequence by replacing aligned
residues in the template structure with the corresponding residues in the query.
• The third step is to calculate the energy terms of the raw model, which include pairwise
residue interaction energy, solvation energy, and hydrophobic energy.
• Finally, the models are ranked based on the energy terms to find the lowest energy fold
that corresponds to the structurally most compatible fold.
Threading
Ab initio protein structural prediction

• Both homology and fold recognition approaches rely on the availability of template
structures in the database to achieve predictions. If no correct structures exist in the
database, the methods fail. However, proteins in nature fold on their own without
checking what the structures of their homologs are in databases. Obviously, there is some
information in the sequences that provides instruction for the proteins to “find” their
native structures.
• The limited knowledge of protein folding forms the basis of ab initio prediction. As the
name suggests, the ab initio prediction method attempts to produce all-atom protein
models based on sequence information alone without the aid of known protein
structures.
• The perceived advantage of this method is that predictions are not restricted by known
folds and that novel protein folds can be identified. However, because the
physicochemical laws governing protein folding are not yet well understood, the energy
functions used in the ab initio prediction are at present rather inaccurate.
Homology Modeling using SWISS-MODEL

• Download query sequence


• Download template structure
• Pairwise sequence alignment
• Open SWISS-MODEL
• Input sequence and template
• Submit job

Basic Applied Bioinformatics – Chandra Shekhar Mukhopadhyay


Threading using RaptorX

• Download query sequence


• Open RaptorX
• Input sequence
• Submit job
Ab initio modeling using trRosetta

• Download query sequence


• Open trRosetta
• Input sequence
• Submit job
Protein Identification and
Quantification by
Mass Spectrometry
Mass Spectrometry

• Mass spectrometry is a powerful analytical technique used to quantify known materials, to


identify unknown compounds within a sample, and to elucidate the structure and
chemical properties of different molecules.

• The complete process involves the conversion of the sample into gaseous ions, with or
without fragmentation, which are then characterized by their mass to charge ratios (m/z)
and relative abundances.
Basic Principle of Mass Spectrometry

• There are many different types of MS instruments, but they all have the same three
essential components.
• First, there is an ionization source, where the molecule is given a positive electrical
charge, either by removing an electron or by adding a proton. Depending on the ionization
method used, the ionized molecule may or may not break apart into a population of
smaller fragments.
• Next in line there is a mass analyzer, where the cationic fragments are separated
according to their mass.
• Finally, there is a detector, which detects and quantifies the separated ions.
Basic Principle of Mass Spectrometry
Basic Principle of Mass Spectrometry

• A common type of MS technique used in the organic laboratory is electron ionization. In


the ionization source, the sample molecule is bombarded by a high-energy electron
beam, which has the effect of knocking a valence electron off of the molecule to form a
radical cation. Because a great deal of energy is transferred by this bombardment process,
the radical cation quickly begins to break up into smaller fragments, some of which are
positively charged and some of which are neutral. The neutral fragments are either
adsorbed onto the walls of the chamber or are removed by a vacuum source. In the mass
analyzer component, the positively charged fragments and any remaining unfragmented
molecular ions are accelerated down a tube by an electric field.
• This tube is curved, and the ions are deflected by a strong magnetic field. Ions of different
mass to charge (m/z) ratios are deflected to a different extent, resulting in a ‘sorting’ of
ions by mass (virtually all ions have charges of z = +1, so sorting by the mass to charge
ratio is the same thing as sorting by mass). A detector at the end of the curved flight tube
records and quantifies the sorted ions.
The output: Mass spectra of acetone

• Below is typical output for an electron-ionization MS experiment


The output: Mass spectra of acetone

• The sample is acetone. On the horizontal


axis is the value for m/z (as we stated
above, the charge z is almost always +1, so
in practice this is the same as mass). On the
vertical axis is the relative abundance of
each ion detected.
• On this scale, the most abundant ion,
called the base peak, is set to 100%, and all
other peaks are recorded relative to this
value. For acetone, the base peak is at m/z
= 43.
• The molecular weight of acetone is 58, so
we can identify the peak at m/z = 58 as
that corresponding to the molecular ion
peak, or parent peak.
The output: Mass spectra of acetone

• There is a small peak at m/z = 59: this is referred to as the M+1 peak. A small fraction -
about 1.1% - of all carbon atoms in nature are actually the 13C rather than the 12C isotope.
The 13C isotope is, of course, heavier than 12C by 1 mass unit. In addition, about 0.015% of
all hydrogen atoms are actually deuterium, the 2H isotope. So the M+1 peak represents
those few acetone molecules in the sample which contained either a 13C or 2H.

• Molecules with lots of oxygen atoms sometimes show a small M+2 peak (2 m/z units
greater than the parent peak) in their mass spectra, due to the presence of a small
amount of 18O (the most abundant isotope of oxygen is 16O). Because there are two
abundant isotopes of both chlorine (about 75% 35Cl and 25% 37Cl) and bromine (about
50% 79Br and 50% 81Br), chlorinated and brominated compounds have very large and
recognizable M+2 peaks.
Tandem Mass Spectrometry (MS/MS)

• Tandem mass spectrometry, also known as MS/MS, is a technique where two or more
mass analyzers are coupled together using an additional reaction step to increase their
abilities to analyze chemical samples. A common use of tandem MS is the analysis of
biomolecules, such as proteins and peptides.

• The molecules of a given sample are ionized and the first spectrometer (designated MS1)
separates these ions by their mass-to-charge ratio. Ions of a particular m/z-ratio coming
from MS1 are selected and then made to split into smaller fragment ions.
• These fragments are then introduced into the second mass spectrometer (MS2), which in
turn separates the fragments by their m/z-ratio and detects them. The fragmentation step
makes it possible to identify and separate ions that have very similar m/z-ratios in regular
mass spectrometers.
Liquid Chromatography – Mass Spectrometry (LC MS)

• Liquid chromatography involves the separation of proteins by size exclusion, strong


cation/anion exchange or hydrophobic interaction (reverse phase chromatography).

• Protein samples are first denatured using heat, urea, guanidine or surfactants such as
sodium dodecyl sulfate to disrupt the 3D protein structure and enable proteases to access
all of the available cleavage sites.
• Disulfide bonds also affect cleavage site availability and are usually reduced and alkylated
before digestion.
• Proteins are typically enzymatically digested into peptides before MS analysis. Trypsin is
most commonly used because it produces small peptides that are amenable to
electrospray ionization.

• Peptide mixtures are usually separated using liquid chromatography and analyzed using
mass spectrometry (LC–MS) with tandem mass spectrometry (LC–MS–MS).
Proteomic experiments using LC-MS/MS
Proteomic experiments using LC-MS/MS

• A typical proteomics experiment involves measuring intact peptide masses following


chromatographic separation and ionization (LC–MS).

• Individual peptide ions are isolated from the mixture and subjected to dissociation (MS–
MS) followed by mass analysis of the resulting fragment ions.

• Both the intact peptide mass and peptide fragment ion masses are used for peptide
identification through database searching because a peptide mass measurement alone is
not sufficient for identification when dealing with complex mixtures.
Mass spectrometry of proteins

• Electron ionization mass spectrometry is generally not very useful for analyzing
biomolecules.
• Mass spectrometry of biomolecules has undergone a revolution over the past few
decades, with many new ionization and separation techniques being developed. Generally,
the strategy for biomolecule analysis involves soft ionization, in which much less energy is
imparted to the molecule being analyzed during the ionization process.
• Usually, soft ionization involves adding protons rather than removing electrons: the
cations formed in this way are significantly less energetic than the radical cations formed
by removal of an electron. The result of soft ionization is that little or no fragmentation
occurs, so the mass being measured is that of an intact molecule. Typically, large
biomolecules are digested into smaller pieces using chemical or enzymatic methods, then
their masses determined by 'soft' MS.
Ionization of protein samples for MS: ESI and MALDI

• New developments in soft ionization MS technology have made it easier to detect and
identify proteins that are present in very small quantities in biological samples.
• In electrospray ionization (ESI), the protein sample, in solution, is sprayed into a tube and
the molecules are induced by an electric field to pick up extra protons from the solvent.
• Another common 'soft ionization' method is 'matrix-assisted laser desorption ionization'
(MALDI). Here, the protein sample is adsorbed onto a solid matrix, and protonation is
achieved with a laser.
TOF

• Typically, both ESI and MALDI are used in conjunction with a time-of-flight (TOF) mass
analyzer component

• In TOF, the proteins are accelerated by an electrode through a column, and separation is
achieved because lighter ions travel at greater velocity than heavier ions with the same
overall charge. In this way, the many proteins in a complex biological sample (such as
blood plasma, urine, etc.) can be separated and their individual masses determined very
accurately.
MALDI-TOF

• Matrix-assisted laser desorption/ionization (MALDI) coupled


to time-of-flight mass spectrometry (MALDI-TOF MS) is used
to sequence proteins, map biomolecules in tissues, identify
microorganisms, and analyze several thousand biochemical
assays in a day.
• The sample for analysis by MALDI-TOF MS is prepared by
mixing or coating with a solution of an energy-absorbent
matrix which entraps and co-crystallizes the sample when
dried. The matrix is ionized with a laser beam, and transfers
the charge to the analytes, generating singly charged ions
from analytes in the sample that are then accelerated. Ions
are separated from each other on the basis of their mass-to-
charge ratio (m/z) before being detected and measured using
the TOF mass analyzer.
• The high throughput and speed associated with MALDI-TOF
MS has made this technology an important tool for large-scale
proteomics.
Applications in Proteomics

1. Protein identification and characterization

2. Relative quantification of protein


a) Metabolic isotopic labeling

b) Chemical labeling

3. Absolute quantification of protein

4. Characterization of post-translational modifications

5. Identification of protein-protein interactions (PPI)

6. Tissue imaging

Guerrera 2005, Kolker 2006, Sokolowska 2013


Protein identification and characterization

• The very first step in protein identification and characterization is the determination of its
molecular weight (MW) and its primary amino acid sequence. There are two major
approaches for protein characterization using MS: the analysis of intact proteins (top-
down approach) and the analysis of a peptide mixture from a digested protein (bottom-
up approach).
• The top-down approach involves the ionization of intact proteins in the gas-phase
followed by mass measurement (with high resolution) without prior digestion. Separation
and/or fractionation of biological samples into a single protein or less complex mixtures of
proteins are required for accurate measurements. Separated proteins are digested and
directly subjected to peptide mass fingerprinting for protein identification.
• In the bottom-up variation known as shotgun proteomics, the protein mixture is directly
digested into a collection of peptides which are separated using single or multi-
dimensional chromatography and analyzed using tandem MS (MS/MS).
Relative quantification of protein

• Quantitative measurement of changes in protein expression between different


physiological and pathological states is another important area in proteomic research.

• A main goal of protein profiling and quantification is the development of protein


biomarkers, which is based on the determination of differences in expression levels. It can
be used for diagnosis, disease prognosis, and drug response prediction.

• Difference can be made between the proteins from the two different samples by various
means. They can be of two types:
1. Metabolic isotopic labeling
2. Chemical labeling
Metabolic isotopic labeling

• Metabolic labelling of proteins exploits the incorporation of isotopic labels during the
process of cellular metabolism and protein synthesis.

• To differentiate between identical peptides from two samples simultaneously, one sample
is labeled with heavy isotopes (2H, 13C, 15N or 18O) to produce a mass shift.
• A mass shift of at least 3 Da is desired to prevent isotope overlaps between the two
samples, which would affect the quantitation.

• Samples can be grown on isotope-labeled media so that the resulting heavy isotopes are
incorporated into every amino acid in a protein.
• Bacterial cultures can be grown on 14N–15N media, whereas mammalian cells can be
labeled using cell cultures containing stable isotope-labeled amino acids (SILAC). In this
approach, one population of cells is grown in medium containing the normal form of an
essential amino acid; another population of cells is grown in medium supplemented with a
stable isotope-labelled analogue.
Metabolic isotopic labeling
Chemical labeling

• When metabolic labelling of proteins is not possible or not desirable, chemical labelling
techniques can be used as an alternative quantitative tool.
• A widely used chemical labelling technique for quantitative proteomics is the isotope-
coded affinity tag (ICAT) method.

• The ICAT reagents include a cysteine-reactive group, an isotopically light or heavy linker
and a biotin affinity tag.
Relative quantification of cysteine-containing proteins by ICAT

• The light and the heavy reagents are used to label the cysteine residues from proteins of
two different sources. The two protein samples are then combined and enzymatically
cleaved into peptide fragments.
• Cysteine-containing peptides are isolated using avidin affinity chromatography and
subsequently identified, and quantitated by LC-MS/MS.

• ICAT-labelled peptides elute as pairs. By comparing the peaks for identical peptides
labelled with the light and the heavy ICAT reagent (differing by 8 Da), the relative
abundance of that peptide in each sample can be determined, which is directly related to
the abundance of the corresponding protein.
• The complexity of the peptide mixture is greatly reduced because only the cysteine-
containing peptides are analyzed in the mass spectrometer.
Relative quantification of cysteine-containing proteins by ICAT
Absolute quantification of protein

• Many approaches for absolute quantitation of proteins and peptides involve the use of a
standard curve, which is developed with a stable isotope-incorporated peptide.
• This peptide is then used as an internal standard by spiking the analytical sample with a
known amount.
• The ratio between the synthetic and endogenous peptide is determined by MS, and the
absolute amount of this peptide can be calculated.
Characterization of post-translational modifications

• Post-translational modifications (PTMs) of proteins are important to virtually all biological


processes.

• Mass spectrometry approaches have the capability to characterize most stable


modifications in proteins which includes glycosylation, phosphorylation, disulfide bridges,
acetylation, ubiquitination, and methylation.
Identification of protein-protein interactions (PPI)

• Protein-protein interactions (PPIs) can be analyzed using affinity purification mass


spectrometry (AP-MS) methods.
Tissue Imaging

• Another development in the application of mass spectrometry in proteomics is the use of


MALDI-TOF MS in profiling and imaging proteins directly from thin tissue sections, a
technology known as protein profiling and imaging mass spectrometry (IMS).
• IMS provides specific information on the local molecular composition, relative abundance
and spatial distribution of peptides and proteins in the analyzed section.
Analysis of Protein-Protein
Interaction and
Protein-DNA Interaction by
Yeast Two-Hybrid (Y2H) System
Yeast-Two-Hybrid System

• Two-hybrid screening (originally known as yeast two-hybrid system or Y2H) is a molecular


biology technique used to discover protein–protein interactions (PPIs) and protein–DNA
interactions by testing for physical interactions (such as binding) between two proteins
or a single protein and a DNA molecule, respectively.
Basic Principle

• The premise behind the test is the activation of downstream reporter gene(s) by the
binding of a transcription factor onto an upstream activating sequence (UAS).
• For two-hybrid screening, the transcription factor is split into two separate fragments,
called the DNA-binding domain (DBD or often also abbreviated as BD) and activating
domain (AD). The BD is the domain responsible for binding to the UAS and the AD is the
domain responsible for the activation of transcription.
• The activating and binding domains are modular and can function in proximity to each
other without direct binding. This means that even though the transcription factor is split
into two fragments, it can still activate transcription when the two fragments are indirectly
connected.
Basic Principle

• Plasmids are engineered to produce a protein product in which the DNA-binding domain
(BD) fragment is fused onto a protein while another plasmid is engineered to produce a
protein product in which the activation domain (AD) fragment is fused onto another
protein.

• The protein fused to the BD may be referred to as the bait protein, and is typically a
known protein the investigator is using to identify new binding partners.
• The protein fused to the AD may be referred to as the prey protein and can be either a
single known protein or a library of known or unknown proteins.
Basic Principle

• If the bait and prey proteins interact (i.e., bind), then the AD and BD of the transcription
factor are indirectly connected, bringing the AD in proximity to the transcription start site
and transcription of reporter gene(s) can occur.

• If the two proteins do not interact, there is no transcription of the reporter gene.

• In this way, a successful interaction between the fused protein is linked to a change in the
cell phenotype.
Overview of Yeast-Two-Hybrid System

The yeast-2-hybrid system is a simple scientific technique used to screen a library of proteins
for potential interactions.

• Firstly, a transcription factor is broken into two parts – a DNA-binding domain (BD) and a
catalytic activation domain (AD).
• The DNA-binding domain is fused to a protein of interest called the bait (e.g. an enzyme).
• The activation domain is fused to a number of potential binding partners – called the prey
(e.g. different ligands).
• If the bait and prey interact, the two parts of the transcription factor are reconstituted and
activate transcription of a gene.
• If the bait and prey do not interact, the two parts of the transcription factor remain
separate and transcription doesn’t occur.
Overview of Yeast-Two-Hybrid System
Overview of Yeast-Two-Hybrid System

The yeast-2-hybrid system detects protein-protein interactions according to the activation of


a reporter gene.
• The reporter gene may encode for the production of a protein that causes a visible colour
change (e.g. ß-galactosidase).
• Alternatively, the reporter gene may encode for the production of an essential amino acid
that is required for the yeast to grow on a deficient media (hence yeast growth would
indicate successful interaction between bait and prey).

Yeast-2-hybrid screens are a simple technique and hence have a relatively high rate of false
positives (partial interactions). Consequently, the yeast-2-hybrid system is typically only used
as an initial test to identify possible protein interactions.
Yeast-One-Hybrid System

• The Yeast 1-hybrid assay examines protein-DNA interactions.


• A query protein is directly fused with the AD domain and expressed in yeast strains
harboring various target DNA sequences upstream of the reporter gene.
• Thus, if the query binds to a particular target sequence, the associated AD domain will
activate reporter gene expression
Yeast-Three-Hybrid System

• Yeast 3-hybrid assays study protein interactions that are mediated by a third component,
such as an RNA molecule or another protein.
• In this instance, Bait and Prey do not directly interact with each other. Instead, they bind
to, for example, an RNA molecule, albeit with different sequence specificity.
• Therefore, only in the presence of the particular RNA molecule would Bait and Prey be
able to interact and drive reporter gene expression
Advantages

• Yeast 2-hybrid is a powerful technique to identify protein interactions because of its


straightforward methodology and fast turnaround time.
• Therefore, the throughput of the technique can be scaled up significantly to screen the
entire proteome.
• In addition, the technique has been adapted in other model organisms to study organism
specific interactions.
Limitations

• The assay can produce a high level of false positive and negative interactions. This is an
important reason to validate any interactions using other techniques such as co-
immunoprecipitation.
• Interaction must occur in the nucleus of the cell in order for the reporter gene to be
activated. Proteins that are localized to other cellular compartments may not produce a
positive interaction, even if they interact directly.
• Overexpression of recombinant fusion protein, which happens in most yeast 2-hybrid
experiments, could produce spurious interaction data. In addition, the fusion of AD/DB
domains to query proteins may affect query protein function in vivo.
• Query proteins may not be correctly expressed, folded, or modified when expressed in
yeast. Therefore, it is important to confirm that query proteins are functional before
deriving interaction data from the assay.
Thank You

You might also like