You are on page 1of 62

LAB MANUAL

INTRODUCTION TO BIOINFORMATICS

Birla Institute of Technology and Science, Pilani


Pilani Campus

• AKRITI SRIVASTAVA 2017B1A10482P


Index
0. Introduction about gene HLA-B
1. EXPERIMENT 1
• To find GC content of the given nucleotide sequence
2. EXPERIMENT 2
• To find the mRNA sequence, position of start codon reverse complement of given
nucleotide sequence
3. EXPERIMENT 3
• To find exon, introns and promoter sequence using GENSCAN
4. EXPERIMENT 4
• Number of ORF that are present in given gene sequence
• Find out maximum length of ORF in all possible frame
5. EXPERIMENT 5
• To draw dot plot charts of homologous genes
• To find score of nucleotide sequence using pairwise alignment tool (EMBOSS
NEEDLE)
6. EXPERIMENT 6
• To find score using pairwise alignment tool (EMBOSSWATER)
• To find score of protein using pairwise alignment tool (EMBOSS WATER)
• To find score using pairwise alignment tool (EMBOSS NEEDLE)
• BLASTN
7. EXPERIMENT 7
• To get 5 different homologous protein sequences using BLASTP.
• To perform multiple sequence alignment of homologous protein sequences and
analyse the result by guide tree, phylogenetic analysis and m view.
• To find structure of protein sequence using BLASTP and PDB.
8. EXPERIMENT 8
• NUCLEOTIDE OPERATION
➢ Multiple sequence alignment by MUSCLE ALGORITHM
➢ To find out distance matrix.
➢ To take out phylogeny tree and also different forms of it.
• PROTEIN OPERATION
➢ Multiple sequence alignment by MUSCLE ALGORITHM
➢ To find out distance matrix.
➢ To take out phylogeny tree and also different forms of it.

• Comparison of protein and nucleotide phylogeny tree.

9. EXPERIMENT 9
• To predict protein structure by various tools.
• To visualise protein structure by Icn3D and RasMol.

10. CONCLUSION
INTRODUCTION
HLA-B major histocompatibility complex, class I, B [ Homo sapiens (human) ]
LA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer
consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is
anchored in the membrane. Class I molecules play a central role in the immune system by
presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in
nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon
1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both
bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane
region and exons 6 and 7 encode the cytoplasmic tail.
There are 8 exons,out of which 7 are coding.
EXPERIMENT 1
AIM: TO FIND GC CONTENT OF THE GIVEN NUCLEOTIDE SEQUENCE
SOFTWARE USED: PYTHON
STEP 1:
Download your DNA sequence from NCBI.
STEP 2:
Write the python code and run.

RESULT:

Interpretation
1.The nucleotide sequence has 58.127% gc content.A higher GC content implies higher
stability.Hence our DNA is moderately stable.
2.A=634,T=722, G=1025,C=924
EXPERIMENT 2
AIM: To find the mRNA sequence, position of start codon reverse complement of given nucleotide
sequence
Tool used: python
STEP 1:
Download DNA sequence from NCBI and save it as dna.txt.
Step 2:
Write the python code for reverse complement.

RESULT:
1. The reverse complement and mRNA sequence was obtained.

2. The position of start codon is 21.


3. The actual transcipt can be found from prediction tools like gen bank or databases like uniprot
etc. We can also observe that the gen bank protein is highly similar to thesequence obtained
from uniprot but they are not totally identical.
EXPERIMENT 3
To find exon, introns and promoter sequence using GENSCAN
Step 1:
Open GENSCAN→enter the DNA sequence and set the parameter.

RESULT
1. Screenshot of GENSCAN OUTPUT
2. There are 8 exons, out of which 7 are coding.
Gene location 31357179: 31353875
3.

4.

4.
INTERPRETATION

• The initiation exon starts from 31-94 position with respect to transcription start site.
• The terminal exon starts from 2654-2697 position with respect to transcription start
site.
• Poly-A-tail starts from 3283-3288 position with respect to transcription start site.
EXPERIMENT 4

• Number of ORF that are present in given gene sequence


• Find out maximum length of ORF in all possible frame

Step 1:
Open ORF finder and enter the query sequence.

Result
1.
2.

INTERPRETATION
• Number of ORFs found=30
• Maximum length of ORF is ORF 18 with length=522(nt)/173(aa) which is found in frame 1
on reverse strand.
• The maximum length ORF starts from 2591 to 2070.
• Maximum length ORF in each frame:
FRAME LABEL STRAND LENGTH
1 ORF 18 - 522
2 ORF 26 + 348
3 ORF 9 - 474
EXPERIMENT 5

• To draw dot plot charts of homologous genes


• To find score of nucleotide sequence using pairwise alignment tool (EMBOSS
NEEDLE)
Objective 1

Step 1.
Go to NCBI and find the nucleotide sequence of your gene.

Step 2 Go to orthologous option and download the orthologus sequence of other species and open the
file in wordpad.

.
Step 3:

Open EMBOSS DOTMATCHER and one by one compare homo sapeins gene with the other
species gene.
Add title to X and Y AXIS
X=HOMOSAPIEN Y=OTHER
GENE
Threshold value=40(by default=23)

RESULT:
The results are as follows:

1.Mustela putorius furo


2. Galeopterus variegatus patr class I

3. Mandrillus leucophaeus

4.Aotus nancymaae
INTERPRETATION

Aotus nancymaae has clear diagonal with less background noice,thus,high similarity.

Objective 2
To find score using pairwise alignment tool

Step 1:
Go to Emboss needle and add homosapien nucleotide and other orthologs one by one to get the
result.

Results
1.Mustela putorius furo

Score: 5087.0

2. Galeopterus variegatus patr


Score: 4844.5

3.Mandrillus leucophaeus

Score: 5325.5
4.Aotus nancymaae

Score: 6577.5
INTERPRETATION
The highest score is of Aotus nancymaae has highest score which is 6577.5
EXPERIMENT 6

Aim:
1.To find score using pairwise alignment tool of NUCLEOTIDE (EMBOSS WATER)
2.To find score of protien using pairwise alignment tool (EMBOSS WATER)
3.To find score of protien using pairwise alignment tool (EMBOSS NEEDLE)
4. Blastn

Objective 1

To find score using pairwise alignment tool of NUCLEOTIDE (EMBOSS WATER)

Step 1.
Go to NCBI and find the nucleotide sequence of your gene.

Step 2

Go to orthologous option and download the orthologues sequence of other species and open the file in
wordpad.
Step 3:
Go to Emboss water and add homosapien nucleotide and other orthologs one by one to get the result.

Results
1.Mustela putorius furo

Score: 2780.0
2.Galeopterus variegatus patr

Score: 3396.0

3.Mandrillus leucophaeus

Score: 4048.0

4.Aotus nancymaae

Score: 4807.0
INTERPRETATION:
1.The highest score is of Aotus nancymaae, which is 4807.0, therefore,highly related sequence.
2.The lowest score if of Mustela putorius furo, which is 2870, therefore least related sequence.8\

AIM 2
To find score of protein using pairwise alignment tool(EMBOSS WATER)

Step 1
Download protein sequence from UNIPROT.

STEP 2
Use emboss water to find score.

Step 3
Compare with human and other species one by one to find score
Results

Human vs Pongo abelii

score:48.5

2. Human vs Sus scrofa

Score : 60.0

3.Human vs Rattus norvegicus

Score:51.5
Human vs Bos taurus

Score :42.5

5. Human vs Mus musculus

Score: 56.5

INTERPRETATION

1.The highest score is with suc sucrofa which is 60.0,therefore,most related.

2. The lowest score is with bos tauras which is 42.5,therefore,least related.


Aim 3

To find score using pairwise alignment tool(EMBOSS NEEDLE)


1. Human vs Pongo abelii

Score :34.5

2. Human vs Sus scrofa

Score:44.5

3. Human vs Rattus norvegicus

Score : 36.5
4. Human vs Bos taurus

Score : 31.5

5. Human vs Mus musculus

score:40.5

INTERPRETATION

1. The highest score is with Suc sucrofa which is 44.5,therefore,most related.


2. The lowest score is with Bos tauras which is 31.5,therefore,least related.
Aim 4:BLASTN

Top 5 eukaryote

Organism Accesion number score Percentage identity

Homo sapien NM_005514.8 2837 100.00

Pan troglodytes NM_001009081.1 2536 98.898

Chimpanzee mRNA X13115.1 2401 95.491

Cercocebus atys KP176507.1 2183 92.636

Macaca mulatta NM_001048245.1 2156 92.212

G.gorilla X60693.1 1997 95.04

SCREENSHOTS OF RESULTS
1.

2.
3.

Prokaryote excluded :Only one was found

Max score:784

Organism Accession number score % identity

Bacillus megaterium strain CP031776.1 784 94.16

SCREENSHOTS OF RESULT
1.

2.
EXPERIMENT 7

AIM:

1. To get 5 different homologous protein sequences using blastp.


2. To perform multiple sequence alignment of homologous protein sequences and analyze
the result by GUIDE TREE,PHYLOGENETIC ANALYSIS AND M VIEW.
3. To find structure of protein sequence using blastp and PDB.

AIM 1
To get 5 different homologous protein sequence using blastp.

STEP 1
Go to NCBI and then select BLAST.
In BLAST page select Protein Blast.

STEP 2

Enter your protein sequence and set the parameter.

STEP 3
Go to TAXONOMY and then ORGANISM.
Select 5 organism other than homo sapien(experiment gene)

RESULT
AIM 2
To perform multiple sequence alignment of homologous protein sequences and analyze the
result by GUIDE TREE,PHYLOGENETIC ANALYSIS AND M VIEW

STEP 1
Open CLUSTAL OMEGA and add the homologous protein sequence found in previous experiment.

RESULT
1. MULTIPLE SEQUENCE ALIGNMENT
2. GUIDE TREE

3. PHYLOGENETIC TREE
4. M VIEW

Identity matrix(clustal omega)

Pongo 100.00
Homo 95.35 100.00
Pan 94.96 96.90 100.00
Hylobates 92.64 93.02 92.25 100.00
Mandrillus 93.02 93.41 93.41 94.96 100.00
Macaca 91.86 93.02 92.25 93.02 97.29 100.00

INTERPRETATION
• The least distance (maximum score) is between Pan troglodytes and homo sapein,and
Mandrillus leucophaeus and Macaca fascicularis.
• The maximum distance is between pongo pygmaeus and Macaca fascicularis.

AIM 3
To find structure of protein sequence using blastp and PDB.

Step 1
Go to blastp and change data base parameter to PDB(protein data bank) and following results are
observed.
Results

1.

2.

3.
4.PDB SIMMILAR STRUCTURES
4P57 (HOMO SAPEIN)

1KTD(Columba livia)

2FSE(MUS MUSCULUS)
6KVM(Gallus gallus)

6A2B(Xenopus laevis)
EXPERIMENT 8

Objective
1. NUCLEOTIDE OPERATION
• Multiple sequence alignment by MUSCLE ALGORITHM
• To find out distance matrix.
• To take out phylogeny tree and also different forms of it.
2. PROTEIN OPERATION
• Multiple sequence alignment by MUSCLE ALGORITHM
• To find out distance matrix.
• To take out phylogeny tree and also different forms of it.

3. Comparison of protein and nucleotide phylogeny tree.

NOTE:GENE used is PCNA because HLA-B didn’t have enough orthologs.

Tools used: MEGA X,NOTEPAD

Objective 1:NUCLEOTIDE OPERATION

• TO FIND MSA
Step 1:
Go to ncbi and download nucleotide sequence of orthologs of your gene(atleast 15) and save the file
in fasta format.
Also download MEGA X.

Step 2:
Go to MEGA X and align the downloaded sequence
Step 3:
Go to MUSCLE ALGORITHM and align DNA.MSA using MUSCLE ALGORITHM will be
performed.

INTERPRETATION
NUMBER OF CONSERVED site ARE:828

• DISTANCE MATRIX

Step 1:
Go to DISTANCE and set the parameters
RESULT

INTERPRETATION
1. The maximum distance,therefore minimum score is between goat and zebu cattle,the score is
.0370
2. The minmum distance,therefore maximum score is between cattle and zebrafish,the score is
0.7728

PHYLOGENETIC TREE CONSTRUCTION

Step 1. Go to Phylogeny ---->contruct UPGMA tree and set following parameter.


RESULT

INTERPRETATION
1. Following organisms are closely related,(human,cattle),(pig,zebu
cattle,goat),(mouse,rat),(chicken,quail).
2. The organism which are less related burton and killfish.

DIFFERENT FORMS OF PHYLOGENETIC TREE

1. STRAIGHT

2. CURVED
3. CIRCULAR

4. IF YOU SWAP ANY OF THE BRANCH,THE RESULT WON’T CHANGE


OBJECTIVE 2.PROTEIN OPERATION

MSA USING MUSCLE


Step 1:
Go to ncbi and download ortholog sequence.
Align the sequence using MEGA X

Step 2:
Perform MUSCLE ALGORITHM for MSA.

INTERPRETATION
The number of conserved site are 247

DISTANCE MATRIX
Step 1:
Go to DISTANCE in MEGA X,set the parameter.
Result

INTERPRETATION
1. The maximum score is 0.1207 which is between quail and frog.
2. According to result of 10 decimal places,there are quite a few pairs which have 0.0 score:
(chimpanzee,human),(monkey,human),(pig,dog),(goat,dog),(monkey,chimpanzee), (pig,goat).

PHYLOGENETIC TREE

Step 1.
Go to MSA-->phylogeny--->construct UPGMA tree and set following parameter.
RESULT:

Interpretation
• Chicken and quail are closely related.
• Pig goat and dog are not related.

Different forms:
1. straight

2. curved
3. circle

3. Comparison of protein and nucleotide phylogeny tree.


There is a difference between results obtained by nucleotide and amino acid as when we do
it for nucleotides, there are non-coding sequences that are also included in this.
While when we do the alignment for amino acid sequence, only translated portion is
compared.
EXPERIMENT 9

Aim:

• To predict protein structure by various tools.


• To visualise protein structure by Icn3D and RasMol.

GOR4 TOOL:

This method has same algorithm as propensity algoritm of Chou Fasman method.

Step 1: Open GOR4 tool and enter the amino acid sequence.

Result:

1.Configuration of residues

c/C:random coil, h/H:Alpha helix, e/E:extended strand(beta sheet)


2.Tabular representation

3.Graphical representation

INTERPRETATION:NUMBER OF

ALPHA HELIX:118(32.6)(since it is the most stable helices)

BETA SHEET:74(20.44%)

BETA TURN:0

RANDOM COIL:170 (48.96%)which is highest therefore structure has a lot of coils which are
general nature of protein structure.
JPRED:

It is basically a confirmatory tool for beta turns

Step 1:Open JPRED tool and enter the amino acid sequence.

Result:

1.hits found with e value


2.

helices are marked as red tubes, and sheets as dark green arrows.

PHD TOOL:

Step 1:Open PHD tool and enter the amino acid sequence.

RESULT:

1.It tell about the configuration of the residues.


2.TABULAR REPRESENTATION

3.GRAPHICAL REPRESENTATION

INTERPRETATION:

Alpha helix:99 (27.35%)(since most stable helices)

Beta sheet:102(28.12)

Beta turn:0

Random coil:161 (44.48%) which is highest therefore structure has a lot of coils which are general
nature of protein structure.
PREDATOR TOOL

It follws DSSP (dictionary for secondary structure prediction),so it is better than previous
methods.

Step 1:open the PREDATOR tool and enter the query sequence.

RESULT

1.Configuration of residue
2.Tabular Representation

3.Graphical representation

INTERPRETATION

ALPHA HELIX: 92(25.41%)

BETA SHEET:94(25.97%)

BETA TURN:0

RANDOM COIL:176(48.62) which is highest therefore structure has a lot of coils which are general
nature of protein structure.
PREDICT PROTEIN TOOL

Step 1:

Open the tool and enter query sequence

RESULTS

1.POSITION OF SECONDARY STRUCTURE

2.ALIGNED PROTEIN
3.AMINO ACID COMPOSITION

INTERPRETATION

1.There is high amount of loops, whereas the amount of helices and strand are almost same.

2.There are 2 transmembrane helices region,one of 15 residue length(5-19) and other 21 (310-330).

3.No disulphide bridge found.

4.There are 11 binding sites.

5.Out of aligned proteins 1A03_PANTR has maximum E value.

PSIPRED TOOL

Step 1:open the tool and enter the query sequence.


RESULT:

INTERPRETATION

Large portion is occupied by coil,then helices and then strand.

COIL: 151,

HELICES:118,

STRAND:93

The consensus is that coil occupies a large portion, then helices then strand.

AIM 2:

To visualise protein structure by Icn3D and RasMol.

Icn3D:

Step 1:

Open PDB and enter your gene name HLA-B,apply filter Homo sapien and choose the crystal
structure which has maximum e value.
Step 2:

Open Icn3D and enter the id of chosen sequence,that is, 3DX9.

RESULT:

1.RIBBON

2.STRAND

3.Cylinders and plants


4.schematic

5.C alpha tree

6.Backbone
7.B-factor tube

8.Lines

9.Stick
10.Ball and stick

11.Sphere
RasMol

Step 1:download Rasmol and open the pdb format file of chosen organism.

RESULT:

1.Wireframe

2.Backbone
3.STICKS

4.SPACE FILLED

5.BALL AND STICK


6.RIBBONS

7.STRANDS

8.CARTOON
9.MOLECULAR SURFACE

INTERPRETATION

• Two tools for tertiary/quaternary structure of protein are used: iCn3D (onlineversion) and
RasMol(offline version). Different styles of structures were observed like ribbon, cylindrical
and plates, strands, ball and sticks,lines etc. The molecular surface model is heaviest of all.

CONCLUSION

• We started from raw data,i.e,INFORMATION,which was our nucleotide sequence


• From our raw data (nucleotide sequence),we performed various insights and stored DATA
from that. The operations done were calculating GC content,reverse complement using
python.
• Then we calculated scores using various alignments,which related our sequence to other
sequence.
• Then the sequence was converted to amino acid sequence and then we performed operations
on amino acid sequences like alignments and scores.
• Then we used different analysis like like phylogenetic tree analysis to compare our amino
acid sequence with other organism.
• We,then used SECONDARY STRUCTURE prediction tools to configure the structure of
residues of amino acid.
• Finally we used tools like : iCn3D (onlineversion) and RasMol(offline version) for predicting
tertiary and quaternary structure.

You might also like