You are on page 1of 19

1

EXPERIMENT – 07

PROTEIN SEQUENCE ANALYSIS

EB3233: BIOINFORMATICS LABORATORY

ABSRACTS

Proteins are the most structurally complex and functionally advanced molecules identified,
from a chemical viewpoint. In proteins, there are 20 kinds of amino acids, each with unique
chemical properties. A protein molecule is formed from a long chain of the amino acids, each
bound by a covalent peptide bond to its neighbor. Hence, proteins also are defined as
polypeptides. Each protein type has a special amino acid sequence, precisely like that from
one molecule to a next. Several number of different proteins, each with a unique amino acid

2
sequence of its own, are identified. When a protein which folded up into a stable
conformation containing useful properties had evolved, that structure could be changed
during evolution to allow new functions to be performed. Genetic processes that periodically
create duplicate copies of genes have significantly accelerated this process, allowing one
gene copy to adapt to perform a new function independently. As a result, certain present-day
proteins can be classified into protein families, each member of the family containing an
amino acid sequence and a three-dimensional (3D) conformation that matches those of other
members of the family. This sort of occurrence has happened very frequently in the past. We
used three software in practice to determine the protein family and physical properties of the
amino acid sequence provided, that are SMART, TMHMM, and Protparam.

INTRODUCTION
We used three software in practice to determine the protein family and physical properties of
the amino acid sequence provided, that are SMART, TMHMM, and Protparam.

Simple Modular Architecture Research Tool (SMART) allows genetically mobile domains to
be identified and annotated and domain architectures to be analyzed. More than 500 domain
families present in proteins associated with signaling, extracellular and chromatin are
detectable (PDF) Bioinformatic Tools for Gene and Protein Sequence Analysis, 2020). In
relation to phyletic distributions, functional class, tertiary structures and functionally relevant

3
residues, these domains are thoroughly annotated. A relational database system records each
domain contained in a non-redundant protein database, and also search criteria and taxonomic
information (Simple Modular Architecture Research Tool, 2020). The user interfaces of this
database make it possible to check for proteins that contain unique domain combinations in
given taxa. SMART program is located at:
http://smart.embl-heidelberg.de/smart/set_mode.cgi?NORMAL=1

A new membrane protein topology prediction method is TMHMM. This program predicts
97-98 percent of the transmembrane helices accurately. Consequently, TMHMM can
distinguish better than 99 percent between soluble and membrane proteins with both
specificity and sensitivity, but when signal peptides are present, the accuracy falls (TMHMM
Server v. 2.0 -- Prediction of transmembrane helices in proteins | HSLS, 2020). This high
degree of precision has helped us to predict integral membrane proteins reliably. TMHMM
version 2.0 is used for detecting transmembrane helices. It is located at;
http://www.cbs.dtu.dk/services/TMHMM/.

ProtParam is a program that allows different physical and chemical parameters to be


computed for a given protein stored in Swiss-Prot or TrEMBL or for a sequence entered by
the user. No additional information about the protein under consideration is needed. The
protein could be defined either as the accession number or as ID of the Swiss-Prot/TrEMBL
or in the form of a raw sequence (Primary Structure Analysis of a Protein Using
ProtParam ..., n.d.). Molecular weight, theoretical pI, amino acid composition, atomic
composition, extinction coefficient, approximate half-life, index of instability, aliphatic index
and large hydropathicity average are the computed parameters (ProtParam References -
ExPASy, n.d.).
This is located at; https://web.expasy.org/protparam/
OBJECTIVES

 To identify protein family and physical properties of amino acid sequence


provided.

4
MATERIALS
 Computer
 Internet connection
 SMART database server
 TMHMM server v 2.0
 Protparam program

METHODS AND RESULTS

A. Domain analysis using SMART database

1. The SMART database homepage was accessed by visiting


http://smart.embl-heidelberg.de/smart/set_mode.cgi?NORMAL=1

5
Figure 1- SMART database homepage

2. FASTA sequence was obtained from NCBI for Q9ERI2.1

Figure 2- FASTA sequence for Q9ERI2.1


3. The protein sequence was copied and pasted into the sequence box and the sequence
SMART was clicked.
4. A typical output is shown in the figure below

6
Figure 3 - The result page

5. RAB was clicked to find out what the RAB domain is.

Figure 4 - Featuresegments
B. Looking for trans-membrane details of RAB domain

1. The TMHMM database homepage was accessed by visiting


http://www.cbs.dtu.dk/services/TMHMM/.

7
Figure 5 - The TMHMM database homepage

2. FASTA sequence was obtained from NCBI for Q9ERI2.1. The protein sequence
was copied and pasted into the sequence box and the submit button was clicked.
3. A typical output is shown in the figure below.

Figure 6 - The results page


C. Predicting the main physico-chemical properties of a protein

1. The ProtParam database homepage was accessed by visiting:


https://web.expasy.org/protparam/

8
Figure 7 - The ProtParam database homepage

2. FASTA sequence was obtained from NCBI for Q9ERI2.1.

3. The protein sequence was copied and pasted into the sequence box and the
computer parameters button was clicked.

4. A typical output is shown in the figure below.

9
Figure 8 - The results page

DISCCUSION

We identified the protein family and physical properties of the amino acid sequence provided
by this experiment. Analysis of protein sequences can be done using a wide variety of
bioinformatics methods from various perspectives. In this device, internet access will depend
on certain analyses that can be done using just a computer. All the required services are
accessible through web servers and databases that are publicly available. These programs
enable biologists to predict a protein's structure, the existence of functional motifs or

10
domains, cellular localization and post-translational changes. Integration of these data helps
the biologist to predict the possible molecular roles of proteins of interest in a more informed
fashion. SMART database server, TMHMM server v 2.0 and Protparam (PDF) Bioinformatic
Tools for Gene and Protein Sequence Analysis, 2020).

Simple Modular Architecture Research Tool (SMART) is a biological database that is used in
the identification and analysis of protein domains within protein sequences. The web servers
we use to determine the protein sequences (Simple Modular Architecture Research Tool,
2020). To detect protein domains in protein sequences, SMART uses profile-hidden Markov
models built from multiple sequence alignments. There are 1,009 domain models included in
the latest release of SMART. Heidelberg’s European Molecular Biology Laboratory
organizes the data set. In this experiment, firstly Domain analysis were performed by using a
smart program. Domain analysis using the Q9ERI2.1 protein sequence showed that one
domain could be identified, which is RAB (Figure 03).

TMHMM, based on a hidden Markov model, to characterize and try a new membrane protein
topology prediction technique. Additionally, TMHMM can distinguish better than 99 percent
between soluble and membrane proteins with both specificity and sensitivity, but when signal
peptides are present, the accuracy falls (TMHMM Server v. 2.0 -- Prediction of
transmembrane helices in proteins | HSLS, 2020). According to this experiment, the results
obtained showed that the proteins did not have a transmembrane component. The entire
sequence is labeled inside or outside, predicting that there will be no membrane helices.
Therefore, the protein is located outside the membrane (Figure 06).

The figure below is the example of TMHMM result from a protein that has transmembrane
region. X- Axis represents the amino acid number, and the Y-axis represents the membrane,
outside the cell, or in the cytoplasm. The red color in the plot represents the transmembrane,
the blue color represents the inside or cytoplasm, and the pink color represents the outside or
extracellular. We can decide where segments inside the protein are positioned by analyzing
the probabilities indicated on the graph. The TMHMM result shown in the figure below
makes it easy to identify that it has five transmembrane region.

11
The five predicted TMHs

Figure 9 - The TMHMM results for sp_ P78588_FREL_CANAL

Protparam is a pretty good way to predict any basic physico-chemical property that can be
deducted from the protein sequence, a program that can be used online on the ExPASy
server. With regard to a little protein, it makes no complex and adventurous statements. No
additional information about the protein under consideration is needed (Primary Structure
Analysis of a Protein Using ProtParam ..., n.d.). The protein could be defined either as the
accession number or as ID of the Swiss-Prot/TrEMBL or in the form of a raw sequence.
Molecular weight, theoretical pI, amino acid composition, atomic composition, extinction
coefficient, approximate half-life, index of instability, aliphatic index and large
hydropathicity average are the computed parameters (ProtParam - SIB Swiss Institute of
Bioinformatics | ExPASy, n.d.). So in here, by analyzing the results we were able to find the
following information for this protein (Figure 8).

 Molecular weight – 25017.23


 pI – 5.21
 Instability index. Based on instability index, determine whether the protein stable in
the test tube. – 29.89, this classifies the protein as stable.
 Total number of negatively charged residue - 33
 Total number of positively charged residue – 26
 Atomic composition - Carbon C 1101

12
- Hydrogen H 1715
- Nitrogen N 305
- Oxygen O 340
- Sulfur S 11

For protein electrophoresis, molecular weight and isoelectric point or pI are two specifics
needed. A key step in the isolation and purification of protein is protein electrophoresis.

REFERENCES

 En.wikipedia.org. 2020. Simple Modular Architecture Research Tool. [online]


Available at:
<https://en.wikipedia.org/wiki/Simple_Modular_Architecture_Research_Tool>
[Accessed 3 December 2020].

 Hsls.pitt.edu. 2020. TMHMM Server V. 2.0 -- Prediction Of Transmembrane Helices


In Proteins | HSLS. [online] Available at: <https://www.hsls.pitt.edu/obrc/index.php?
page=URL1164644151> [Accessed 3 December 2020].

13
 n.d. Primary Structure Analysis Of A Protein Using Protparam .... [online] Available
at: <https://vlab.amrita.edu/?sub=3&brch=275&sim=1455&cnt=1> [Accessed 2020].

 n.d. Protparam - SIB Swiss Institute Of Bioinformatics | Expasy. [online] Available at:
<https://www.expasy.org/resources/protparam> [Accessed 2020].

 n.d. Protparam References - Expasy. [online] Available at:


<https://web.expasy.org/protparam/protpar-ref.html> [Accessed 2020].

 ResearchGate. 2020. (PDF) Bioinformatic Tools For Gene And Protein Sequence
Analysis. [online] Available at:
<https://www.researchgate.net/publication/226687822_Bioinformatic_Tools_for_Gen
e_and_Protein_Sequence_Analysis> [Accessed 3 December 2020].

 Youtube.com. 2020. [online] Available at: <https://www.youtube.com/watch?


v=8BaMB1E53hE> [Accessed 3 December 2020].

POST – LAB QUESTIONS


1. Get the FASTA sequence for CAH39249. Do domain analysis SMART database.
How many domains can be identified and what is/are the domains(s)?
- Only one domain
- Tryp_SPc

14
2. Using TMHMM, is there any trans-membrane segment in the protein sequence? If
not, where is the location? Inside or outside of the cell?
- No, there is no any trans-membrane segment
- It is inside the cell

3. Using Protparam, identifying the following:

a. Molecular weight – 43030.50


b. pI – 8.06
c. Instability index. Based on instability index, determine whether the protein
stable in the test tube. – 34.29, this classifies the protein as stable.
d. Total number of negatively charged residue - 20
e. Total number of positively charged residue - 22

15
4. Repeat and answer all the above questions (1-3) but using different sequence
(Q10656.1)

1) Get the FASTA sequence for CAH39249. Do domain analysis SMART


database. How many domains can be identified and what is/are the
domains(s)?

Four domains

- IG
- Two IGc2
- TyrKc
16
c

2) Using TMHMM, is there any trans-membrane segment in the protein


sequence? If not, where is the location? Inside or outside of the cell?

- Yes, there are transmembrane segments

17
The two predicted TMHs

3) Using Protparam, identifying the following:

a) Molecular weight – 118956.30


b) pl – 5.75
c) Instability index. Based on instability index, determine whether the
protein stable in the test tube. – 44.10, this classifies the protein as unstable.
d) Total number of negatively charged residue - 149
e) Total number of positively charged residue - 123

18
19

You might also like