Professional Documents
Culture Documents
INTRODUCTION TO BIOINFORMATICS
9. EXPERIMENT 9
• To predict protein structure by various tools.
• To visualise protein structure by Icn3D and RasMol.
10. CONCLUSION
INTRODUCTION
HLA-B major histocompatibility complex, class I, B [ Homo sapiens (human) ]
LA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer
consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is
anchored in the membrane. Class I molecules play a central role in the immune system by
presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in
nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon
1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both
bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane
region and exons 6 and 7 encode the cytoplasmic tail.
There are 8 exons,out of which 7 are coding.
EXPERIMENT 1
AIM: TO FIND GC CONTENT OF THE GIVEN NUCLEOTIDE SEQUENCE
SOFTWARE USED: PYTHON
STEP 1:
Download your DNA sequence from NCBI.
STEP 2:
Write the python code and run.
RESULT:
Interpretation
1.The nucleotide sequence has 58.127% gc content.A higher GC content implies higher
stability.Hence our DNA is moderately stable.
2.A=634,T=722, G=1025,C=924
EXPERIMENT 2
AIM: To find the mRNA sequence, position of start codon reverse complement of given nucleotide
sequence
Tool used: python
STEP 1:
Download DNA sequence from NCBI and save it as dna.txt.
Step 2:
Write the python code for reverse complement.
RESULT:
1. The reverse complement and mRNA sequence was obtained.
RESULT
1. Screenshot of GENSCAN OUTPUT
2. There are 8 exons, out of which 7 are coding.
Gene location 31357179: 31353875
3.
4.
4.
INTERPRETATION
• The initiation exon starts from 31-94 position with respect to transcription start site.
• The terminal exon starts from 2654-2697 position with respect to transcription start
site.
• Poly-A-tail starts from 3283-3288 position with respect to transcription start site.
EXPERIMENT 4
Step 1:
Open ORF finder and enter the query sequence.
Result
1.
2.
INTERPRETATION
• Number of ORFs found=30
• Maximum length of ORF is ORF 18 with length=522(nt)/173(aa) which is found in frame 1
on reverse strand.
• The maximum length ORF starts from 2591 to 2070.
• Maximum length ORF in each frame:
FRAME LABEL STRAND LENGTH
1 ORF 18 - 522
2 ORF 26 + 348
3 ORF 9 - 474
EXPERIMENT 5
Step 1.
Go to NCBI and find the nucleotide sequence of your gene.
Step 2 Go to orthologous option and download the orthologus sequence of other species and open the
file in wordpad.
.
Step 3:
Open EMBOSS DOTMATCHER and one by one compare homo sapeins gene with the other
species gene.
Add title to X and Y AXIS
X=HOMOSAPIEN Y=OTHER
GENE
Threshold value=40(by default=23)
RESULT:
The results are as follows:
3. Mandrillus leucophaeus
4.Aotus nancymaae
INTERPRETATION
Aotus nancymaae has clear diagonal with less background noice,thus,high similarity.
Objective 2
To find score using pairwise alignment tool
Step 1:
Go to Emboss needle and add homosapien nucleotide and other orthologs one by one to get the
result.
Results
1.Mustela putorius furo
Score: 5087.0
3.Mandrillus leucophaeus
Score: 5325.5
4.Aotus nancymaae
Score: 6577.5
INTERPRETATION
The highest score is of Aotus nancymaae has highest score which is 6577.5
EXPERIMENT 6
Aim:
1.To find score using pairwise alignment tool of NUCLEOTIDE (EMBOSS WATER)
2.To find score of protien using pairwise alignment tool (EMBOSS WATER)
3.To find score of protien using pairwise alignment tool (EMBOSS NEEDLE)
4. Blastn
Objective 1
Step 1.
Go to NCBI and find the nucleotide sequence of your gene.
Step 2
Go to orthologous option and download the orthologues sequence of other species and open the file in
wordpad.
Step 3:
Go to Emboss water and add homosapien nucleotide and other orthologs one by one to get the result.
Results
1.Mustela putorius furo
Score: 2780.0
2.Galeopterus variegatus patr
Score: 3396.0
3.Mandrillus leucophaeus
Score: 4048.0
4.Aotus nancymaae
Score: 4807.0
INTERPRETATION:
1.The highest score is of Aotus nancymaae, which is 4807.0, therefore,highly related sequence.
2.The lowest score if of Mustela putorius furo, which is 2870, therefore least related sequence.8\
AIM 2
To find score of protein using pairwise alignment tool(EMBOSS WATER)
Step 1
Download protein sequence from UNIPROT.
STEP 2
Use emboss water to find score.
Step 3
Compare with human and other species one by one to find score
Results
score:48.5
Score : 60.0
Score:51.5
Human vs Bos taurus
Score :42.5
Score: 56.5
INTERPRETATION
Score :34.5
Score:44.5
Score : 36.5
4. Human vs Bos taurus
Score : 31.5
score:40.5
INTERPRETATION
Top 5 eukaryote
SCREENSHOTS OF RESULTS
1.
2.
3.
Max score:784
SCREENSHOTS OF RESULT
1.
2.
EXPERIMENT 7
AIM:
AIM 1
To get 5 different homologous protein sequence using blastp.
STEP 1
Go to NCBI and then select BLAST.
In BLAST page select Protein Blast.
STEP 2
STEP 3
Go to TAXONOMY and then ORGANISM.
Select 5 organism other than homo sapien(experiment gene)
RESULT
AIM 2
To perform multiple sequence alignment of homologous protein sequences and analyze the
result by GUIDE TREE,PHYLOGENETIC ANALYSIS AND M VIEW
STEP 1
Open CLUSTAL OMEGA and add the homologous protein sequence found in previous experiment.
RESULT
1. MULTIPLE SEQUENCE ALIGNMENT
2. GUIDE TREE
3. PHYLOGENETIC TREE
4. M VIEW
Pongo 100.00
Homo 95.35 100.00
Pan 94.96 96.90 100.00
Hylobates 92.64 93.02 92.25 100.00
Mandrillus 93.02 93.41 93.41 94.96 100.00
Macaca 91.86 93.02 92.25 93.02 97.29 100.00
INTERPRETATION
• The least distance (maximum score) is between Pan troglodytes and homo sapein,and
Mandrillus leucophaeus and Macaca fascicularis.
• The maximum distance is between pongo pygmaeus and Macaca fascicularis.
AIM 3
To find structure of protein sequence using blastp and PDB.
Step 1
Go to blastp and change data base parameter to PDB(protein data bank) and following results are
observed.
Results
1.
2.
3.
4.PDB SIMMILAR STRUCTURES
4P57 (HOMO SAPEIN)
1KTD(Columba livia)
2FSE(MUS MUSCULUS)
6KVM(Gallus gallus)
6A2B(Xenopus laevis)
EXPERIMENT 8
Objective
1. NUCLEOTIDE OPERATION
• Multiple sequence alignment by MUSCLE ALGORITHM
• To find out distance matrix.
• To take out phylogeny tree and also different forms of it.
2. PROTEIN OPERATION
• Multiple sequence alignment by MUSCLE ALGORITHM
• To find out distance matrix.
• To take out phylogeny tree and also different forms of it.
• TO FIND MSA
Step 1:
Go to ncbi and download nucleotide sequence of orthologs of your gene(atleast 15) and save the file
in fasta format.
Also download MEGA X.
Step 2:
Go to MEGA X and align the downloaded sequence
Step 3:
Go to MUSCLE ALGORITHM and align DNA.MSA using MUSCLE ALGORITHM will be
performed.
INTERPRETATION
NUMBER OF CONSERVED site ARE:828
• DISTANCE MATRIX
Step 1:
Go to DISTANCE and set the parameters
RESULT
INTERPRETATION
1. The maximum distance,therefore minimum score is between goat and zebu cattle,the score is
.0370
2. The minmum distance,therefore maximum score is between cattle and zebrafish,the score is
0.7728
INTERPRETATION
1. Following organisms are closely related,(human,cattle),(pig,zebu
cattle,goat),(mouse,rat),(chicken,quail).
2. The organism which are less related burton and killfish.
1. STRAIGHT
2. CURVED
3. CIRCULAR
Step 2:
Perform MUSCLE ALGORITHM for MSA.
INTERPRETATION
The number of conserved site are 247
DISTANCE MATRIX
Step 1:
Go to DISTANCE in MEGA X,set the parameter.
Result
INTERPRETATION
1. The maximum score is 0.1207 which is between quail and frog.
2. According to result of 10 decimal places,there are quite a few pairs which have 0.0 score:
(chimpanzee,human),(monkey,human),(pig,dog),(goat,dog),(monkey,chimpanzee), (pig,goat).
PHYLOGENETIC TREE
Step 1.
Go to MSA-->phylogeny--->construct UPGMA tree and set following parameter.
RESULT:
Interpretation
• Chicken and quail are closely related.
• Pig goat and dog are not related.
Different forms:
1. straight
2. curved
3. circle
Aim:
GOR4 TOOL:
This method has same algorithm as propensity algoritm of Chou Fasman method.
Step 1: Open GOR4 tool and enter the amino acid sequence.
Result:
1.Configuration of residues
3.Graphical representation
INTERPRETATION:NUMBER OF
BETA SHEET:74(20.44%)
BETA TURN:0
RANDOM COIL:170 (48.96%)which is highest therefore structure has a lot of coils which are
general nature of protein structure.
JPRED:
Step 1:Open JPRED tool and enter the amino acid sequence.
Result:
helices are marked as red tubes, and sheets as dark green arrows.
PHD TOOL:
Step 1:Open PHD tool and enter the amino acid sequence.
RESULT:
3.GRAPHICAL REPRESENTATION
INTERPRETATION:
Beta sheet:102(28.12)
Beta turn:0
Random coil:161 (44.48%) which is highest therefore structure has a lot of coils which are general
nature of protein structure.
PREDATOR TOOL
It follws DSSP (dictionary for secondary structure prediction),so it is better than previous
methods.
Step 1:open the PREDATOR tool and enter the query sequence.
RESULT
1.Configuration of residue
2.Tabular Representation
3.Graphical representation
INTERPRETATION
BETA SHEET:94(25.97%)
BETA TURN:0
RANDOM COIL:176(48.62) which is highest therefore structure has a lot of coils which are general
nature of protein structure.
PREDICT PROTEIN TOOL
Step 1:
RESULTS
2.ALIGNED PROTEIN
3.AMINO ACID COMPOSITION
INTERPRETATION
1.There is high amount of loops, whereas the amount of helices and strand are almost same.
2.There are 2 transmembrane helices region,one of 15 residue length(5-19) and other 21 (310-330).
PSIPRED TOOL
INTERPRETATION
COIL: 151,
HELICES:118,
STRAND:93
The consensus is that coil occupies a large portion, then helices then strand.
AIM 2:
Icn3D:
Step 1:
Open PDB and enter your gene name HLA-B,apply filter Homo sapien and choose the crystal
structure which has maximum e value.
Step 2:
RESULT:
1.RIBBON
2.STRAND
6.Backbone
7.B-factor tube
8.Lines
9.Stick
10.Ball and stick
11.Sphere
RasMol
Step 1:download Rasmol and open the pdb format file of chosen organism.
RESULT:
1.Wireframe
2.Backbone
3.STICKS
4.SPACE FILLED
7.STRANDS
8.CARTOON
9.MOLECULAR SURFACE
INTERPRETATION
• Two tools for tertiary/quaternary structure of protein are used: iCn3D (onlineversion) and
RasMol(offline version). Different styles of structures were observed like ribbon, cylindrical
and plates, strands, ball and sticks,lines etc. The molecular surface model is heaviest of all.
CONCLUSION