Lab Manual: Birla Institute of Technology and Science, Pilani Pilani Campus

LAB MANUAL
INTRODUCTION TO BIOINFORMATICS
Birla Institute of Technology and Science, Pilani

Pilani Campus
• AKRITI SRIVASTAVA 2017B1A10482P

Index
0. Introduction about gene HLA-B
1. EXPERIMENT 1
• To find GC content of the given nucleotide sequence
2. EXPERIMENT 2
• To find the mRNA sequence, position of start codon reverse complement of given
nucleotide sequence
3. EXPERIMENT 3
• To find exon, introns and promoter sequence using GENSCAN
4. EXPERIMENT 4
• Number of ORF that are present in given gene sequence
• Find out maximum length of ORF in all possible frame
5. EXPERIMENT 5
• To draw dot plot charts of homologous genes
• To find score of nucleotide sequence using pairwise alignment tool (EMBOSS
NEEDLE)
6. EXPERIMENT 6
• To find score using pairwise alignment tool (EMBOSSWATER)
• To find score of protein using pairwise alignment tool (EMBOSS WATER)
• To find score using pairwise alignment tool (EMBOSS NEEDLE)
• BLASTN
7. EXPERIMENT 7
• To get 5 different homologous protein sequences using BLASTP.
• To perform multiple sequence alignment of homologous protein sequences and
analyse the result by guide tree, phylogenetic analysis and m view.
• To find structure of protein sequence using BLASTP and PDB.
8. EXPERIMENT 8
• NUCLEOTIDE OPERATION
➢ Multiple sequence alignment by MUSCLE ALGORITHM
➢ To find out distance matrix.
➢ To take out phylogeny tree and also different forms of it.
• PROTEIN OPERATION
➢ Multiple sequence alignment by MUSCLE ALGORITHM
➢ To find out distance matrix.
➢ To take out phylogeny tree and also different forms of it.
• Comparison of protein and nucleotide phylogeny tree.
9. EXPERIMENT 9
• To predict protein structure by various tools.
• To visualise protein structure by Icn3D and RasMol.
10. CONCLUSION
INTRODUCTION
HLA-B major histocompatibility complex, class I, B [ Homo sapiens (human) ]
LA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer
consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is
anchored in the membrane. Class I molecules play a central role in the immune system by
presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in
nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon
1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both
bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane
region and exons 6 and 7 encode the cytoplasmic tail.
There are 8 exons,out of which 7 are coding.
EXPERIMENT 1
AIM: TO FIND GC CONTENT OF THE GIVEN NUCLEOTIDE SEQUENCE
SOFTWARE USED: PYTHON
STEP 1:
Download your DNA sequence from NCBI.
STEP 2:
Write the python code and run.
RESULT:
Interpretation
1.The nucleotide sequence has 58.127% gc content.A higher GC content implies higher
stability.Hence our DNA is moderately stable.
2.A=634,T=722, G=1025,C=924
EXPERIMENT 2
AIM: To find the mRNA sequence, position of start codon reverse complement of given nucleotide
sequence
Tool used: python
STEP 1:
Download DNA sequence from NCBI and save it as dna.txt.
Step 2:
Write the python code for reverse complement.
RESULT:
1. The reverse complement and mRNA sequence was obtained.
2. The position of start codon is 21.

3. The actual transcipt can be found from prediction tools like gen bank or databases like uniprot
etc. We can also observe that the gen bank protein is highly similar to thesequence obtained
from uniprot but they are not totally identical.
EXPERIMENT 3
To find exon, introns and promoter sequence using GENSCAN
Step 1:
Open GENSCAN→enter the DNA sequence and set the parameter.
RESULT
1. Screenshot of GENSCAN OUTPUT
2. There are 8 exons, out of which 7 are coding.
Gene location 31357179: 31353875
3.
4.
4.
INTERPRETATION
• The initiation exon starts from 31-94 position with respect to transcription start site.
• The terminal exon starts from 2654-2697 position with respect to transcription start
site.
• Poly-A-tail starts from 3283-3288 position with respect to transcription start site.
EXPERIMENT 4
• Number of ORF that are present in given gene sequence

• Find out maximum length of ORF in all possible frame
Step 1:
Open ORF finder and enter the query sequence.
Result
1.
2.
INTERPRETATION
• Number of ORFs found=30
• Maximum length of ORF is ORF 18 with length=522(nt)/173(aa) which is found in frame 1
on reverse strand.
• The maximum length ORF starts from 2591 to 2070.
• Maximum length ORF in each frame:
FRAME LABEL STRAND LENGTH
1 ORF 18 - 522
2 ORF 26 + 348
3 ORF 9 - 474
EXPERIMENT 5
• To draw dot plot charts of homologous genes

• To find score of nucleotide sequence using pairwise alignment tool (EMBOSS
NEEDLE)
Objective 1
Step 1.
Go to NCBI and find the nucleotide sequence of your gene.
Step 2 Go to orthologous option and download the orthologus sequence of other species and open the
file in wordpad.
.
Step 3:
Open EMBOSS DOTMATCHER and one by one compare homo sapeins gene with the other
species gene.
Add title to X and Y AXIS
X=HOMOSAPIEN Y=OTHER
GENE
Threshold value=40(by default=23)
RESULT:
The results are as follows:
1.Mustela putorius furo

2. Galeopterus variegatus patr class I
3. Mandrillus leucophaeus
4.Aotus nancymaae
INTERPRETATION
Aotus nancymaae has clear diagonal with less background noice,thus,high similarity.
Objective 2
To find score using pairwise alignment tool
Step 1:
Go to Emboss needle and add homosapien nucleotide and other orthologs one by one to get the
result.
Results
Score: 5087.0
2. Galeopterus variegatus patr

Score: 4844.5
3.Mandrillus leucophaeus
Score: 5325.5
4.Aotus nancymaae
Score: 6577.5
INTERPRETATION
The highest score is of Aotus nancymaae has highest score which is 6577.5
EXPERIMENT 6
Aim:
1.To find score using pairwise alignment tool of NUCLEOTIDE (EMBOSS WATER)
2.To find score of protien using pairwise alignment tool (EMBOSS WATER)
3.To find score of protien using pairwise alignment tool (EMBOSS NEEDLE)
4. Blastn
Objective 1
To find score using pairwise alignment tool of NUCLEOTIDE (EMBOSS WATER)
Step 1.
Go to NCBI and find the nucleotide sequence of your gene.
Step 2
Go to orthologous option and download the orthologues sequence of other species and open the file in
wordpad.
Step 3:
Go to Emboss water and add homosapien nucleotide and other orthologs one by one to get the result.
Results
Score: 2780.0
2.Galeopterus variegatus patr
Score: 3396.0
3.Mandrillus leucophaeus
Score: 4048.0
4.Aotus nancymaae
Score: 4807.0
INTERPRETATION:
1.The highest score is of Aotus nancymaae, which is 4807.0, therefore,highly related sequence.
2.The lowest score if of Mustela putorius furo, which is 2870, therefore least related sequence.8\
AIM 2
To find score of protein using pairwise alignment tool(EMBOSS WATER)
Step 1
Download protein sequence from UNIPROT.
STEP 2
Use emboss water to find score.
Step 3
Compare with human and other species one by one to find score
Results
Human vs Pongo abelii
score:48.5
2. Human vs Sus scrofa
Score : 60.0
3.Human vs Rattus norvegicus
Score:51.5
Human vs Bos taurus
Score :42.5
5. Human vs Mus musculus
Score: 56.5
INTERPRETATION
1.The highest score is with suc sucrofa which is 60.0,therefore,most related.
2. The lowest score is with bos tauras which is 42.5,therefore,least related.

Aim 3
To find score using pairwise alignment tool(EMBOSS NEEDLE)

1. Human vs Pongo abelii
Score :34.5
2. Human vs Sus scrofa
Score:44.5
3. Human vs Rattus norvegicus
Score : 36.5
4. Human vs Bos taurus
Score : 31.5
5. Human vs Mus musculus
score:40.5
INTERPRETATION
1. The highest score is with Suc sucrofa which is 44.5,therefore,most related.

2. The lowest score is with Bos tauras which is 31.5,therefore,least related.
Aim 4:BLASTN
Top 5 eukaryote
Organism Accesion number score Percentage identity
Homo sapien NM_005514.8 2837 100.00
Pan troglodytes NM_001009081.1 2536 98.898
Chimpanzee mRNA X13115.1 2401 95.491
Cercocebus atys KP176507.1 2183 92.636
Macaca mulatta NM_001048245.1 2156 92.212
G.gorilla X60693.1 1997 95.04
SCREENSHOTS OF RESULTS
1.
2.
3.
Prokaryote excluded :Only one was found
Max score:784
Organism Accession number score % identity
Bacillus megaterium strain CP031776.1 784 94.16
SCREENSHOTS OF RESULT
1.
2.
EXPERIMENT 7
AIM:
1. To get 5 different homologous protein sequences using blastp.

2. To perform multiple sequence alignment of homologous protein sequences and analyze
the result by GUIDE TREE,PHYLOGENETIC ANALYSIS AND M VIEW.
3. To find structure of protein sequence using blastp and PDB.
AIM 1
To get 5 different homologous protein sequence using blastp.
STEP 1
Go to NCBI and then select BLAST.
In BLAST page select Protein Blast.
STEP 2
Enter your protein sequence and set the parameter.
STEP 3
Go to TAXONOMY and then ORGANISM.
Select 5 organism other than homo sapien(experiment gene)
RESULT
AIM 2
To perform multiple sequence alignment of homologous protein sequences and analyze the
result by GUIDE TREE,PHYLOGENETIC ANALYSIS AND M VIEW
STEP 1
Open CLUSTAL OMEGA and add the homologous protein sequence found in previous experiment.
RESULT
1. MULTIPLE SEQUENCE ALIGNMENT
2. GUIDE TREE
3. PHYLOGENETIC TREE
4. M VIEW
Identity matrix(clustal omega)
Pongo 100.00
Homo 95.35 100.00
Pan 94.96 96.90 100.00
Hylobates 92.64 93.02 92.25 100.00
Mandrillus 93.02 93.41 93.41 94.96 100.00
Macaca 91.86 93.02 92.25 93.02 97.29 100.00
INTERPRETATION
• The least distance (maximum score) is between Pan troglodytes and homo sapein,and
Mandrillus leucophaeus and Macaca fascicularis.
• The maximum distance is between pongo pygmaeus and Macaca fascicularis.
AIM 3
To find structure of protein sequence using blastp and PDB.
Step 1
Go to blastp and change data base parameter to PDB(protein data bank) and following results are
observed.
Results
1.
2.
3.
4.PDB SIMMILAR STRUCTURES
4P57 (HOMO SAPEIN)
1KTD(Columba livia)
2FSE(MUS MUSCULUS)
6KVM(Gallus gallus)
6A2B(Xenopus laevis)
EXPERIMENT 8
Objective
1. NUCLEOTIDE OPERATION
• Multiple sequence alignment by MUSCLE ALGORITHM
• To find out distance matrix.
• To take out phylogeny tree and also different forms of it.
2. PROTEIN OPERATION
• Multiple sequence alignment by MUSCLE ALGORITHM
• To find out distance matrix.
• To take out phylogeny tree and also different forms of it.
3. Comparison of protein and nucleotide phylogeny tree.
NOTE:GENE used is PCNA because HLA-B didn’t have enough orthologs.
Tools used: MEGA X,NOTEPAD
Objective 1:NUCLEOTIDE OPERATION
• TO FIND MSA
Step 1:
Go to ncbi and download nucleotide sequence of orthologs of your gene(atleast 15) and save the file
in fasta format.
Also download MEGA X.
Step 2:
Go to MEGA X and align the downloaded sequence
Step 3:
Go to MUSCLE ALGORITHM and align DNA.MSA using MUSCLE ALGORITHM will be
performed.
INTERPRETATION
NUMBER OF CONSERVED site ARE:828
• DISTANCE MATRIX
Step 1:
Go to DISTANCE and set the parameters
RESULT
INTERPRETATION
1. The maximum distance,therefore minimum score is between goat and zebu cattle,the score is
.0370
2. The minmum distance,therefore maximum score is between cattle and zebrafish,the score is
0.7728
PHYLOGENETIC TREE CONSTRUCTION
Step 1. Go to Phylogeny ---->contruct UPGMA tree and set following parameter.

RESULT
INTERPRETATION
1. Following organisms are closely related,(human,cattle),(pig,zebu
cattle,goat),(mouse,rat),(chicken,quail).
2. The organism which are less related burton and killfish.
DIFFERENT FORMS OF PHYLOGENETIC TREE
1. STRAIGHT
2. CURVED
3. CIRCULAR
4. IF YOU SWAP ANY OF THE BRANCH,THE RESULT WON’T CHANGE

OBJECTIVE 2.PROTEIN OPERATION
MSA USING MUSCLE

Step 1:
Go to ncbi and download ortholog sequence.
Align the sequence using MEGA X
Step 2:
Perform MUSCLE ALGORITHM for MSA.
INTERPRETATION
The number of conserved site are 247
DISTANCE MATRIX
Step 1:
Go to DISTANCE in MEGA X,set the parameter.
Result
INTERPRETATION
1. The maximum score is 0.1207 which is between quail and frog.
2. According to result of 10 decimal places,there are quite a few pairs which have 0.0 score:
(chimpanzee,human),(monkey,human),(pig,dog),(goat,dog),(monkey,chimpanzee), (pig,goat).
PHYLOGENETIC TREE
Step 1.
Go to MSA-->phylogeny--->construct UPGMA tree and set following parameter.
RESULT:
Interpretation
• Chicken and quail are closely related.
• Pig goat and dog are not related.
Different forms:
1. straight
2. curved
3. circle
3. Comparison of protein and nucleotide phylogeny tree.

There is a difference between results obtained by nucleotide and amino acid as when we do
it for nucleotides, there are non-coding sequences that are also included in this.
While when we do the alignment for amino acid sequence, only translated portion is
compared.
EXPERIMENT 9
Aim:
• To predict protein structure by various tools.

• To visualise protein structure by Icn3D and RasMol.
GOR4 TOOL:
This method has same algorithm as propensity algoritm of Chou Fasman method.
Step 1: Open GOR4 tool and enter the amino acid sequence.
Result:
1.Configuration of residues
c/C:random coil, h/H:Alpha helix, e/E:extended strand(beta sheet)

2.Tabular representation
3.Graphical representation
INTERPRETATION:NUMBER OF
ALPHA HELIX:118(32.6)(since it is the most stable helices)
BETA SHEET:74(20.44%)
BETA TURN:0
RANDOM COIL:170 (48.96%)which is highest therefore structure has a lot of coils which are
general nature of protein structure.
JPRED:
It is basically a confirmatory tool for beta turns
Step 1:Open JPRED tool and enter the amino acid sequence.
Result:
1.hits found with e value

2.
helices are marked as red tubes, and sheets as dark green arrows.
PHD TOOL:
Step 1:Open PHD tool and enter the amino acid sequence.
RESULT:
1.It tell about the configuration of the residues.

2.TABULAR REPRESENTATION
3.GRAPHICAL REPRESENTATION
INTERPRETATION:
Alpha helix:99 (27.35%)(since most stable helices)
Beta sheet:102(28.12)
Beta turn:0
Random coil:161 (44.48%) which is highest therefore structure has a lot of coils which are general
nature of protein structure.
PREDATOR TOOL
It follws DSSP (dictionary for secondary structure prediction),so it is better than previous
methods.
Step 1:open the PREDATOR tool and enter the query sequence.
RESULT
1.Configuration of residue
2.Tabular Representation
3.Graphical representation
INTERPRETATION
ALPHA HELIX: 92(25.41%)
BETA SHEET:94(25.97%)
BETA TURN:0
RANDOM COIL:176(48.62) which is highest therefore structure has a lot of coils which are general
nature of protein structure.
PREDICT PROTEIN TOOL
Step 1:
Open the tool and enter query sequence
RESULTS
1.POSITION OF SECONDARY STRUCTURE
2.ALIGNED PROTEIN
3.AMINO ACID COMPOSITION
INTERPRETATION
1.There is high amount of loops, whereas the amount of helices and strand are almost same.
2.There are 2 transmembrane helices region,one of 15 residue length(5-19) and other 21 (310-330).
3.No disulphide bridge found.
4.There are 11 binding sites.
5.Out of aligned proteins 1A03_PANTR has maximum E value.
PSIPRED TOOL
Step 1:open the tool and enter the query sequence.

RESULT:
INTERPRETATION
Large portion is occupied by coil,then helices and then strand.
COIL: 151,
HELICES:118,
STRAND:93
The consensus is that coil occupies a large portion, then helices then strand.
AIM 2:
To visualise protein structure by Icn3D and RasMol.
Icn3D:
Step 1:
Open PDB and enter your gene name HLA-B,apply filter Homo sapien and choose the crystal
structure which has maximum e value.
Step 2:
Open Icn3D and enter the id of chosen sequence,that is, 3DX9.
RESULT:
1.RIBBON
2.STRAND
3.Cylinders and plants

4.schematic
5.C alpha tree
6.Backbone
7.B-factor tube
8.Lines
9.Stick
10.Ball and stick
11.Sphere
RasMol
Step 1:download Rasmol and open the pdb format file of chosen organism.
RESULT:
1.Wireframe
2.Backbone
3.STICKS
4.SPACE FILLED
5.BALL AND STICK

6.RIBBONS
7.STRANDS
8.CARTOON
9.MOLECULAR SURFACE
INTERPRETATION
• Two tools for tertiary/quaternary structure of protein are used: iCn3D (onlineversion) and
RasMol(offline version). Different styles of structures were observed like ribbon, cylindrical
and plates, strands, ball and sticks,lines etc. The molecular surface model is heaviest of all.
CONCLUSION
• We started from raw data,i.e,INFORMATION,which was our nucleotide sequence

• From our raw data (nucleotide sequence),we performed various insights and stored DATA
from that. The operations done were calculating GC content,reverse complement using
python.
• Then we calculated scores using various alignments,which related our sequence to other
sequence.
• Then the sequence was converted to amino acid sequence and then we performed operations
on amino acid sequences like alignments and scores.
• Then we used different analysis like like phylogenetic tree analysis to compare our amino
acid sequence with other organism.
• We,then used SECONDARY STRUCTURE prediction tools to configure the structure of
residues of amino acid.
• Finally we used tools like : iCn3D (onlineversion) and RasMol(offline version) for predicting
tertiary and quaternary structure.

Lab Manual: Birla Institute of Technology and Science, Pilani Pilani Campus

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab Manual: Birla Institute of Technology and Science, Pilani Pilani Campus

Uploaded by

Copyright:

Available Formats

LAB MANUAL

Birla Institute of Technology and Science, Pilani

• AKRITI SRIVASTAVA 2017B1A10482P

• Comparison of protein and nucleotide phylogeny tree.

2. The position of start codon is 21.

• Number of ORF that are present in given gene sequence

• To draw dot plot charts of homologous genes

1.Mustela putorius furo

2. Galeopterus variegatus patr

To find score using pairwise alignment tool of NUCLEOTIDE (EMBOSS WATER)

Human vs Pongo abelii

2. Human vs Sus scrofa

3.Human vs Rattus norvegicus

5. Human vs Mus musculus

1.The highest score is with suc sucrofa which is 60.0,therefore,most related.

2. The lowest score is with bos tauras which is 42.5,therefore,least related.

To find score using pairwise alignment tool(EMBOSS NEEDLE)

2. Human vs Sus scrofa

3. Human vs Rattus norvegicus

5. Human vs Mus musculus

1. The highest score is with Suc sucrofa which is 44.5,therefore,most related.

Organism Accesion number score Percentage identity

Homo sapien NM_005514.8 2837 100.00

Pan troglodytes NM_001009081.1 2536 98.898

Chimpanzee mRNA X13115.1 2401 95.491

Cercocebus atys KP176507.1 2183 92.636

Macaca mulatta NM_001048245.1 2156 92.212

G.gorilla X60693.1 1997 95.04

Prokaryote excluded :Only one was found

Organism Accession number score % identity

Bacillus megaterium strain CP031776.1 784 94.16

1. To get 5 different homologous protein sequences using blastp.

Enter your protein sequence and set the parameter.

Identity matrix(clustal omega)

3. Comparison of protein and nucleotide phylogeny tree.

NOTE:GENE used is PCNA because HLA-B didn’t have enough orthologs.

Tools used: MEGA X,NOTEPAD

Objective 1:NUCLEOTIDE OPERATION

PHYLOGENETIC TREE CONSTRUCTION

Step 1. Go to Phylogeny ---->contruct UPGMA tree and set following parameter.

DIFFERENT FORMS OF PHYLOGENETIC TREE

4. IF YOU SWAP ANY OF THE BRANCH,THE RESULT WON’T CHANGE

MSA USING MUSCLE

3. Comparison of protein and nucleotide phylogeny tree.

• To predict protein structure by various tools.

c/C:random coil, h/H:Alpha helix, e/E:extended strand(beta sheet)

ALPHA HELIX:118(32.6)(since it is the most stable helices)

It is basically a confirmatory tool for beta turns

1.hits found with e value

1.It tell about the configuration of the residues.

Alpha helix:99 (27.35%)(since most stable helices)

ALPHA HELIX: 92(25.41%)

Open the tool and enter query sequence

1.POSITION OF SECONDARY STRUCTURE

3.No disulphide bridge found.

4.There are 11 binding sites.

5.Out of aligned proteins 1A03_PANTR has maximum E value.

Step 1:open the tool and enter the query sequence.

Large portion is occupied by coil,then helices and then strand.

To visualise protein structure by Icn3D and RasMol.

Open Icn3D and enter the id of chosen sequence,that is, 3DX9.

3.Cylinders and plants

5.C alpha tree