Professional Documents
Culture Documents
Raj Bharti
Raj Bharti
on
Computational Analysis of Genomes and Proteomes
Submitted to
Submitted To
Submitted By
Raj Bharti
B.Tech (Biotechnology); 2nd Year
1
EXPERIMENT NO. 01
Aim: To retrieve five nucleotides and five protein sequences in FASTA format from NCBI
and EMBL.
Date: 13/09/2017
Theory:
NCBI:
The National Center For Biotechnology Information (Ncbi) Is Part Of The United States
National Library Of Medicine (Nlm), A Branch Of The National Institutes Of Health (Nih). The
Ncbi Is Located In Bethesda, Maryland And Was Founded In 1988 Through Legislation
Sponsored By Senator Claude Pepper.
The Ncbi Houses A Series Of Databases Relevant To Biotechnology And Biomedicine And Is
An Important Resource For Bioinformatics Tools And Services. Major Databases
Include Genbank For Dna Sequences And Pubmed, A Bibliographic Database For The
Biomedical Literature. Other Databases Include The Ncbi Epigenomics Database. All These
Databases Are Available Online Through The Entrez Search Engine.
Ncbi Was Directed By David Lipman, One Of The Original Authors Of The Blast Sequence
Alignment Program And A Widely Respected Figure In Bioinformatics. He Also Leads An
Intramural Research Program, Including Groups Led By Stephen Altschul (Another Blast Co-
Author), David Landsman, Eugene Koonin (A Prolific Author On Comparative Genomics), John
2
Wilbur, Teresa Przytycka, And Zhiyong Lu. David Lipman Stood Down From His Post In May
2017.
EMBL:
The ENA is produced and maintained by the European Bioinformatics Institute and is a
FASTA format:
One line starting with a ">" sign, followed by a sequence identification code.
It is optionally be followed by a textual description of the sequence. Since it is not part of the
official description of the format, software can choose to ignore this, when it is present.
A file in FASTA format may comprise more than one sequence The FASTA format is
sometimes also referred to as the "Pearson" format (after a
author of the FASTA program and ditto format).
Link use :
www.ncbi.nlm.nih.gob
www.ebi.ac.in
PROCEDURE:
NUCLEOTIDE SEQUENCE(NCBI)
Switch on the system and check the internet connection and look for MS office word
3
.Clicked on internet browser and typed NCB Iwww.ncbi.nlm.nih.gob
Type the desired gene(ANY3)name and clicked on search button selected approbiated item
from resultant gene database
The page of the particular gene get displayed clicked on FASTA option to get the gene
sequence in FASTA format
Copied the entire the sequence and pasted on word pad / note pad . simillarly process was
repeated for four nucleotide sequence (AADC, AHC, AR, CCA)
Switch on the system and check the internet connection and look for MS office word .
Clicked on internet browser and typed EMBI
www.ebi.ac.uk
Type the desired gene(ANY3)name and clicked on search button selected approbiated
item from resultant gene database
The page of the particular gene get displayed clicked on FASTA option to get the gene
sequence in FASTA format
Copied the entire the sequence and pasted on word pad / note pad . simillarly process was
repeated for four nucleotide sequence (AADC, AHC, AR, CCA)
Switch on the system and check the internet connection and look for MS office word .
Clicked on internet browser and typedNCBI www.ncbi.nlm.nih.gob
Type the desired gene(ANY3)name and clicked on search button selected approbiated
item from resultant gene database
The page of the particular gene get displayed clicked on FASTA option to get the gene
sequence in FASTA format
Copied the entire the sequence and pasted on word pad / note pad . simillarly process was
repeated for four nucleotide sequence (AADC, AHC,AR, CCR)
4
NUCLEOTIDES RESULT ( NCBI)
AADC GENE
AHC GENE
>NM_000475.4 Homo sapiens nuclear receptor subfamily 0 group B member 1 (NR0B1), mRNA
5
CGGGCGCCGCGGGCCATGGCGGGCGAGAACCACCAGTGGCAGGGCAGCATCCTCTACAACATGCTTA
TGACGCGAAGCAAACGCGCGCGGCTCCTGAGGCTCCAGAGACGCGGCTGGTGGATCAGTGCTGGGGC
TGTTCGTGCGGCGATGAGCCCGGGGTGGGCAGAGAGGGGCTGCTGGGCGGGCGGAACGTGGCGCTCC
TGTACCGCTGCTGCTTTTGCGGTAAAGACCACCCACGGCAGGGCAGCATCCTCTACAGCATGCTGACG
AGCGCAAAGCAAACGTACGCGGCACCGAAGGCGCCCGAGGCGACGCTGGGTCCGTGCTGGGGCTGTT
CGTGCGGCTCTGATCCCGGGGTGGGCAGAGCGGGGCTTCCGGGTGGGCGGCCCGTGGCACTCCTGTAC
CGCTGCTGCTTTTGTGGTGAAGACCACCCGCGGCAGGGCAGCATCCTCTACAGCTTGCTCACTAGCTCA
AAGCAAACGCACGTGGCTCCGGCAGCGCCCGAGGCACGGCCAGGGGGCGCGTGGTGGGACCGCTCCT
ACTTCGCGCAGAGGCCAGGGGGTAAAGAGGCGCTACCAGGCGGGCGGGCCACGGCGCTTCTGTACCG
CTGCTGCTTTTGCGGTGAAGACCACCCGCAGCAGGGCAGCACCCTCTACTGCGTGCCCACGAGCACAA
ATCAAGCGCAGGCGGCTCCGGAGG
AGCGGCCGAGGGCCCCCTGGTGGGACACCTCCTCTGGTGCGCTGCGGCCGGTGGCGCTCAAGAGTCCA
CA
GGTGGTCTGCGAGGCAGCCTCAGCGGGCCTGTTGAAGACGCTGCGCTTCGTCAAGTACTTGCCCTGCT
TCCAGGTGCTGCCCCTGGACCAGCAGCTGGTGCTGGTGCGCAACTGCTGGGCGTCCCTGCTCATGCTTG
AGCTGGCCCAGGACCGCTTGCAGTTCGAGACTGTGGAAGTCTCGGAGCCCAGCATGCTGCAGAAGATC
CTCACCACCAGGCGGCGGGAGACCGGGGGCAACGAGCCACTGCCCGTGCCCACGCTGCAGCACCATT
TGGCACCGCCGGCGGAGGCCAGGAAGGTGCCCTCCGCCTCCCAGGTCCAAGCCATCAAGTGCTTTCTT
TCCAAATGCTGGAGTCTGAACATCAGTACCAAGGAGTACGCCTACCTCAAGGGGACCGTGCTCTTTAA
CCCGGACGTGCCGGGCCTGCAGTGCGTGAAGTACATTCAGGGACTCCAGTGGGGAACTCAGCAAATA
CTCAGTGAACACACCAGGATGACGCACCAAGGGCCCCATGACAGATTCATCGAACTTAATAGTACCCT
TTTCCTGCTGAGATTCATCAATGCCAATGTCATTGCTGAACTGTTCTTCAGGCCCATCATCGGCACAGT
CAGCATGGATGATATGATGCTGGAAATGCTCTGTACAAAGATATAAAGTCATGTGGGCCACACAAGTG
CAGTAGTGCAGTTCACCATGAGGGAAGAATAAAGAGCTGTGGGCAAAAGAGTGTAAAATATTTTAAA
ATAA
AR GENE
6
GCTGGGCACCTGGAAGTCACCACCGGGCCAGGTGACCGAGGCCGTGAAGACAGCCATCGACCTCGGG
TACCGCCACATCACTGCGCCCACGTGTACCAGAACGAGAACGAGGTCGGGGTGGCCCTGCAGGAGAA
GCTCAAGGAGCAGGTGGTGAAACGTGAGGAGCTCTTCATCGTCAGCAAGCTGTGGTGCACGTCCCACG
ACAAGAGCCTGGTAAAAGGTGCCTGCCAGAAGACACTGAACGACCTGAAGTTGGACTACCTGGACCT
CTACCTTATCCACTGGCCGACGGGCTTTAAGCACGGCAGTGAGTATTTCCCCCTGGATGCGGCGGGCA
ACGTGATTCCCAGCGACACTGACTTTCTGGACACGTGGGAGGCCATGGAGGGGCTGGTGGACGAAGG
AATTTCAACCATCTGCAGATCGAGAGGATCCTAAACAAGCCGGGCTTAAAATACAAGCCGGCAGTTAA
CAGATCGAGTGCCACCCGTACCTAACTCAGGAGAAATTAATCCAGTACTGCCACTCCAAAGGCATCGT
GGTCACTGCCTACAGTCCCCTCGGCTCTCCCGACAGGCCCTGGGCAAAGCCCGAGGACCCTTCCCTCCT
GGAGACCCCAGGATCAAGGCGATTGCAGACAAGCACAAAAAAACCACCGCCCAGGTTCTGATCCGGT
TCCCCATGCAGAGGAACCTGGTGGTGATTCCCAAGTCCGTGACGCCGGCACGCATTGCTGAGAACTTC
CAGGTCTTTGACTTTGAACTGAGCAGCGAGGACATGACTACCTTACTGAGCTACAACAGGAACTGGAG
GGTCTGCGCCCTGGTGAGCTGTGCCTCTCACAAGGATTACCCCTTCCACGCCGAGTTCTGAAGCTGCGG
ATGCCGGCTCTTCCCCACGTCACGTGTGCCTGCTTTCCCTGCCTGACAAATCCTCGGAGCAGCCCAGCC
AGCCAGGGCCTGCTCGCAGGGATCTGGGAGTGAGCAGCACCATCAGTAGGTTAGAAGTCGCCGCCAG
TGTTTTCTTTGCCTTTCTTCTCGCCCAGCTGGGAAAAGTACAATTCTTCCGACCCAGGAGAAGCAAAAC
CTACGAAGTCAGAGTAGTGCCACTAACAGCTGAGTTTTGACTGCTTAGAACTATAATCCTTTCAGCCA
GACTTACTTTGCCTCCAATAAAAAGTGCTTTTGTGAGCCTGAACTTTCTTAATATTTTTACATGCAGAG
TATTTTTGTATTCAATTAAAGAAATAATTTTATTCCAAAAAAA
CCA GENE
7
AAATAGATACGTTTGTGCATACTATGCTGGTCCTACAACAAGCAGTCTTGTTGACAGAAAACACGGAC
AGTGATAAAAGTGCGGTACGTTTTGCTGCAATTTGTCATGATTTAGGCAAAGCCTTAACACCAAAAGA
AATATTGCCACATCATTATGGACATGAAAAAGCTGGTGTCATGCCGACAAGACGCTTATGTCAGCGCT
TTAAATTACCTCACCAATTCAAGATTTTGCAGAACTTTGTTGTGAATATCATTCGCACATACACAAAGC
CTTTGAATTACGTGCGGAGACAATATTGAAATTATTCAATCGTCTAGATGTCTGGCGTAAGTCGGAGC
GTTTTAAAGCACTTTTGTTAGTCTGTATTGCAGACACGCGTGGTAGGACCGGATTTGAACAAGTTGACT
ATCCACAACGTGAATTTCTCTGGCAACTTTATCAAAGTACTCTGCAGGTTAACGTGCAAGACATCATCC
AACAAGGTTTCCAGCAGCAAGCCAT
TCGTGATGAACTCAATCGCCGTCGTATAATCGCGATTAAACAGACACGCGCGGAAATCCTCCCGCGCT
TTACTAATCCGTGTTAA
ANY3GENE
>JN804612.1 Anonychonitis freyi isolate ANY3 cytochrome oxidase subunit I (COI) gene, partial cds;
mitochondrial
GGATAATTTCTCATATTATTAGACAAGAAACTAGAAAAAAGGAAACTTTTGGTACTTTAGGAATAATT
TATGCTATAATAACAATTGGACTTCTAGGTTTTATTGTTTGAGCTCACCATATATTTACAGTAGGTATG
GATGTAGATACACGAGCTTATTTTACATCAGCTACTATAATTATTGCCGTTCCAACAGGAATCAAAATT
TTTAGATGATTAGCTACTCTTCATGGATCTCAATTAAATTACTCACCCTCTCTACTATGAGCATTAGGA
TTTGTATTTCTATTCACCGTAGGGGGATTAACAGGAGTAATTCTTGCCAATTCTTCAATTGATATTATTC
TTCATGACACTTATTATGTAGTAGCACACTTCCATTATGTTTTATCAATAGGGGCTGTATTTGCTATTAT
AGCTGGGCTTGTACATTGATTTCCATTATTTACTGGTTTAACTATAAATCAAAAACTTTTAAAAATCCA
ATTTATAATTATATTTATCGGAGTTAATATAACCTTTTTTCCCCAACATTTTCTTGGATTAAGAGGAATA
CCCCGTCGTTATTCTGACTACCCTGACGCTTACACCACTTGAAATATTATTTCTTCAATTGGATCAATA
ATTTCTTTAATTAGAATTTTTATATTTTTATTTATTATTTGAGATAGATTTACCTCAATTCGTAAATCTAT
TATACCCTTAAATATACCTTCATCTATTGAATGATTACAAAAATTACCACCAG
RESULT OF EMBL
AADC
>ENA|AB037498|AB037498.1 Pan troglodytes AADC gene for aromatic L-amino acid decarboxylase, exon 1.
8
GGAGTCCTGCTCCTTCTATTGCACCCATCAACCAGGAGTGGGGGGAGGGGGTGGAGGTGGGGAAGAT
GATCCTCCCTGTTGCTGCCCCATGGTGGCAGGAGAGACTGAGCCCAAACCATGTTTTAGATGCTGATA
GGCTTAAGGGTAACAGCACAGGAGTTTGAGATGCATGCGGCTCAACACCTAATCTACATCTCACTTCA
CTTTCTCATCTGGGGAAGTGGGCTTGGGACCCTGAGCCTCCCGGGTATCACAGGGTCCTAATAGTCCCT
CACAGAAGGAGCAGACCCAGAGTGAGCACTCCCCAAATGCCACGCCGTCCCTTCCTCACTCTTGGAGT
GGAGCCTGGGGGTTCTCAGA
GTTGCTGGGAGAGTCCCAGGAGCCCTGGCCCCAAATCTGCATCCTACACAGTGCCTGGGA
ACACAGGGCCCATTTTTTCCTTGGCCTCTCCCCAGTCCCAGCAGGCCCTGATGCTCCTCT
CCATCCTGCTAGGATGGCTGTCTCCCCCTGGGGGCAGAGTGGGGGCAGGAGGTGGTGGGA
GTGGAGAGGAGAGAGAGAGGACAGAGAGCAAGTCACTCCCGGCTGCCTGTGAGTACTG
GGGTGGAGGGATGCTGCTCAGTAAATAATGCAGAGCCGGCAGCTCTGATTGGCTTCGGGG
AGGCAGACACTCTGTCTACATAAATGGCAATCACATCTTCTGTGCCTCTTAACTGTCACT
AHC
>ENA|AJ853475|AJ853475.1 Nicotiana glauca partial mRNA for putative adenosylhomocysteinase (ahc gene)
CTTTGCTTTTCCCCGCTATTAACGTTAACGACTCTGTTACCAAGAGCAAGTTCGACAACT
TGTACGGATGCCGCCATTCACTGCCCGATGGTCTCATGAGGGCTACTGATGTTATGATTG
CTGGAAAGGTTGCCCTTGTTGCTGGTTATGGAGATGTCGGAAAGGGTTGTGCTGCTGCCT
TGAAACAAGCTGGTGCCCGTGTGATTGTGACCGAGATTGACCCGATCTGTGCTCTCCAAG
CTACCATGGAAGGTCTCCAAGTTCTTACTCTTGAGGACGTTGTTTCTGATGTCGATATCT
TCGTCACCACAACCGGTAACAAGGACATCATCATGGTTGACCACATGAGGAAGATGAAGA
ACAATGCCATTGTTTGCAACATTGGTCACTTTGACAATGAAATCGACATGCTCGGTCTCG
AGACCTACCCTGGTGTCAAGAGGATCACAATTAAGCCTCAAACCGACAGATGGGTTTTCC
CCGACACCAACAGTGGCATCATTGTCTTGGCCGAGGGTCGTCTCATGAACTTGGGATGTG
CAACTGGACACCCTAGTTTTGTGATGTCTTGCTCATTCACTAACCAAGTCATTGCCCAAC
TCGAGTTGTGGAATGAGAAGAGCAGTGGCAAGTATGAGAAGAAGGTGTACGTCTTGCCAA
AACACCTCGACGAGAAGGTTGCTGCACTTCATCTTGGAAAGCTCGGAGCCAAGCTTACCA
AACTTTCCAAGGATCAAGCTGACTACATTAGCGTACCAGTTGAAGGTCCTTACAAGCCTG
CTCACTACAGGTACTGAGTGAAGACAAATCGACAGAGAAGAACAACGTTGTTGCAGCATG
ATTGTTTTGCATTTAATACTTTGATTTTTGTTTAGGATACTAGTATTTTGAATATTGTTG
GTGATATATTTGGGAGGAAGTAGCATGTTTTGCTGGAAAAGATATGGTCTTATATGAAAG
TAAGACCAAAATGTGTTGAATAAGATTATGGTTGGTGTGAAAAAAAAAA
9
AR
>ENA|KT990095|KT990095.1 UNVERIFIED: Eugenia sp. AR-2016 NADH dehydrogenase subunit F-like (ndhF)
gene, partial sequence; plastid.
AATCGGCACAATTCCCCCTCCATGTATGGTTACCTGATGCCATGGAAGGTCCTACTCCTATTTCGGCTA
TACATGCCGCTACTATGGTAGCAGCGGGCATTTTTCTTGTAGCTCGACTTCTTCCTCTTTTTATAATCAT
ACCTTACATAATGAATTTCATATCCTTAATAGGTATAATAACAGTATTATTAGGGGCTACTTTAGCTCT
TGCTCAAAAAGATATTAAAAGAGGTTTAGCTTATTCTACAATGTCTCAATTGGGTTATATGATGTTAGC
TCTAGGTATGGGGTCTTATCGAGTCGCTTTATTTCATTTGATTACTCATGCTTATTCAAAAGCATTGTTG
TTTTTAGGATCCGGATCAATTATTCATTCAATGGAAGCTATTGTTGGATATTCTCCAGATAAAAGTCAG
ATATGGTTCTTATGGGAGGTTTAAAAAAGCATGTACCAATTACAAAAACTGCTTTTTTAGTAGGTACAC
TTTCTCTTTGTGGTATTCCTCCACTTGCTTGTTTTTGGTCCAAAGATGAAATTCTTAATGATAGTTGGTT
GTATTCACCTATTTTCGCAATAATAGCTTGTTCTACAGCAGGATTAACCGCATTTTATATGTTTCGNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNTATGATAGTTGGTTGTATTCACCTATTTTCGCAATAATAGCTTGTTCTACAGCAGGATTA
ACCGCATTTTATATGTTTCGGATCTATTTACTTACTTTTGAGGGACATTTCAATGTTCATTTTCAAAATT
ACAGTGGTCAAAAAAGTAGTTTCTACTATTCAATATCTCTATGGGGAAAAGAAGTACCAAAAACGATT
AAAAACAATTTTCATTTATTAAGTTTATTAACAATGAATAATAATGAAAGGGCTTCTTTTTTTTGGAAA
AGACATATCAAATTGGTGGTAATGGAAAAAACAGGATGCACCCCTTTATTACTATTACTCATTTTGGTA
CTAAAAATACTTTCTCTTATCCTCATGAATCGGACAATACTATGCTATTTTCCATGGTTATATTAGTGCT
ATTTACTTTGTTTGTTGGAGTCGTAGGAATTCCCTTTTCTTTTAATCAAGAAGGAATTCATTTGGATATA
TTATCCAAATTGTTAAATCCGTCTATAAACCTTTTACATCAGAATTCAAATAATTCTGTGGATTGGTAT
GAATTTGTGACAAATGCAAGTTTTTCTGTCAGTATAGCTTTTTTCGGAATATTTATATCGTCTTTTTTAT
ATAAGCCTATTTATTCATCTTTACAAAATTTGAACTTACTAAATTCGTTTTCTAAAAGAGGTCCTAATA
GAATTTTAGGGGACAAAATAATAAATGTGATATATGATTGGTCATATAATCGTGGTTACATAGATGCT
TTTTATACAATATCCTTAACTCAGGGTATAAGGGGACTAGCTGAACTAATTCATTTTTTTT
CCA
ANY3
10
>ENA|JN804612|JN804612.1 Anonychonitis freyi isolate ANY3 cytochrome oxidase subunit I (COI) gene, partial
cds; mitochondrial.
GGATAATTTCTCATATTATTAGACAAGAAACTAGAAAAAAGGAAACTTTTGGTACTTTAGGAATAATT
TATGCTATAATAACAATTGGACTTCTAGGTTTTATTGTTTGAGCTCACCATATATTTACAGTAGGTATG
GATGTAGATACACGAGCTTATTTTACATCAGCTACTATAATTATTGCCGTTCCAACAGGAATCAAAATT
TTTAGATGATTAGCTACTCTTCATGGATCTCAATTAAATTACTCACCCTCTCTACTATGAGCATTAGGA
TTTGTATTTCTATTCACCGTAGGGGGATTAACAGGAGTAATTCTTGCCAATTCTTCAATTGATATTATTC
TTCATGACACTTATTATGTAGTAGCACACTTCCATTATGTTTTATCAATAGGGGCTGTATTTGCTATTAT
AGCTGGGCTTGTACATTGATTTCCATTATTTACTGGTTTAACTATAAATCAAAAACTTTTAAAAATCCA
ATTTATAATTATATTTATCGGAGTTAATATAACCTTTTTTCCCCAACATTTTCTTGGATTAAGAGGAATA
CCCCGTCGTTATTCTGACTACCCTGACGCTTACACCACTTGAAATATTATTTCTTCAATTGGATCAATA
ATTTCTTTAATTAGAATTTTTATATTTTTATTTATTATTTGAGATAGATTTACCTCAATTCGTAAATCTAT
TATACCCTTAAATATACCTTCATCTATTGAATGATTACAAAAATTACCACCAG
AADC
11
EHLPSVAEVQAIKGFLAKCWSLDISTKEYAYLKGTVLFNPDLPGLQCVKYIQGLQWGTQQILSEHIRMTH
RGYQARFAELNSALFLLRFINANVLAELFFRPIIGTVSMDDMMLEMLCAKL
AR
>ARR95948.1 androgen receptor [Equus caballus]
MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGAHLQQQQETSPRQQQQQGE
DGSPQTQSRGPTGYLALEEEQQPSQQPSAPEGHPESGCVPEARAALAAGKGLQQQPPAPPDEDDSAAPST
LSLLGPTFPGLSSCSADLKDILSEAGTMQLLQQQQQEVVSEGSSSGRAREAAGAPTCSKDSYLGCSSTIS
DSAKELCKAVSVSMGLGVEALEHLSPGEQLRGDCMYAPLLGGPPAVRPTSCAPRAECKGSLLDNGPGKGT
EETAEYSPFKAGYAKGLDGESLGCSGSGEAGGSGTLELPSTLSLYKPGAVDEAAVYQSRDYYNFPLALPG
PPPPAPPPHPHARIKLENPLDYGSAWAAAAQCRYGDLAGLHGGGAAGPGSGSPSAAASSSWHTLFTAEEG
QLYGPCSGGGGGSAGEAGTVAPYGYTRPPQGLAGQEGDFPPPDVWYPGGMGSRVPYPSPSCVKSEMGPW
MESYSGPYGDMRLETARDHVLPIDYYFPPQKTCLICGDEASGCHYGALTCGSCKVFFKRAAEGKQKYLCA
SRNDCTIDKFRRKNCPSCRLRKCYEAGMTLGARKLKKLGNLKLQEEGEASSATSPTEEPTQKLTVSHIEGY
ECQPIFLNVLEAIEPGVVCAGHDNNQPDSFAALLSSLNELGERQLVHVVKWAKALPGFRNLHVDDQMAVI
QYSWMGLMVFAMGWRSFTNVNSRMLYFAPDLVFNEYRMHKSRMYSQCVRMRHLSQEFGWLQITPQEFL
CMKALLLFSIIPVDGLKNQKFFDELRMNYIKELDRIIACKRKNPTSCSRRFYQLTKLLDSVQPIARELHQFTF
DLLIKSHMVSVDFPEMMAEIISVQVPKILSGKVKPIYFHTQ
CCA
ANY3
12
13
EXPERIMENT NO. 02
DATE: 15/09/2017
Theory:
EMBOSS:
EMBOSS is a free open source software analysis package specially developed for the needs of
the molecular biology and bioinformatics user community. The software automatically copes
with data in a variety of formats and even allows transparent retrieval of sequence data from the
web.
CLUSTAL OMEGA
Pairwise Sequence Alignment is used to identify regions of similarity that may indicate
functional, structural and/or evolutionary relationships between two
biological sequences (protein or nucleic acid).
14
Multiple sequence alignment
Procedures:
15
Then clicked on submit option observed the result.
FOR PAIRWISE ALIGNMENT:
Switch on the system and check the internet connection and look for MS office word
Clicked on internet browser and typed NCBI
At NCBI homepage clicked on nucleiotide and protein from dropdown menu
Type the desired gene(AADC)name and clicked on search button selected approbiated item
from resultant gene database
The page of the particular gene get displayed clicked on FASTA option to get the gene
sequence in FASTA format
Copied the entire the sequence and pasted on word pad / note pad . simillarly process was
repeated for four nucleotide and protein sequence (AADC).
Click on run BLAST-result
Download 02 sequence from BLAST hits then opened EMBOSS homepage using
The browsed the aline file of BLAST result from computer location
Then clicked on submit option observed the result.
16
NUCLEOTIDE SEQUENCE (CLUSTR OMEGA)
17
PROTEIN SEQUENCE (CLUSTER OMEGA)
18
19
Protein sequence (EMBOSS)
20
21
Nucleotide sequence(EMBOSS)
22
23
EXPERIMENT NO. 03
DATE :15/09/2017
THEORY:
RCSB PDB:
Cn3D:
Cn3D ("see in 3D") is a helper application for your web browser that allows you to view 3-
dimensional structures from NCBI's Entrez Structure database. Cn3D is provided for Windows
and Macintosh, and can be compiled on Unix.
3D structure of Protein:
24
crystallized provides strong evidence that this is the case. The
ordered arrays of molecules in a crystal can generally form only
if the molecular units making up the crystal are identical. The
enzyme urease (Mr 483,000) was among the first proteins
crystallized, by James Sumner in 1926. This accomplishment
demonstrated dramatically that even very large proteins are
discrete chemical entities with unique structures, and it
revolutionized thinking about proteins.
PROCEDURE:
Switch on the system and check the internet connection and look for MS office word
.Clicked on internet browser and typed NCBI and retrieve the protein 3D sequence ancheive
the fasta format of sequence .
Then download the Cn3D tool. And achieve their result.
After this we openRCSB- PDB web and input fasta format sequences and achieved their
resut.
And paste the result in our documentary.
Then we open the cn3D web ad paste the FASTA format which we achieve in NCBI and
then find the resuit and paste it in their documentary.
RESULT:
25
PROTEIN SEQUENCE (Cn3D,NCBI)
HAEMOGLOBINE (3D):
LINE
SPHERE
26
BALL AND STICK
27
B FACTOR TUBE
RCSB PDB
NGL(Web GL)
SPACEFILL
28
SURFACE
29
BACKBONE
SURFACE
30
JSMOL
CARTOON
31
BALL AND STICK
TRACE
32
EXPERIMENT NO- 04
DATE:
THEORY:
Chau fasman:
Jump to: navigation, search. The Chou–Fasman method is an empirical technique for the
prediction of secondary structures in proteins, originally developed in the 1970s by Peter
Y. Chou and Gerald D.Fasman.
GOR TOOL:
33
The GOR method of protein secondary structure prediction is described. ... Our preliminary
studies show that the GOR method is a promising and efficient alternative to other protein
aggregation predicting tools.
JPred:
JPred is a web server that takes a protein sequence or multiple alignment of protein sequences,
and from these predicts the location of secondary structures using a neural network called Jnet.
The most common types of secondary structures are the α helix and the β pleated sheet. Both
structures are held in shape by hydrogen bonds, which form between the carbonyl O of
one amino acid and theamino H of another. Images showing hydrogen bonding patterns in beta
pleated sheets and alpha helices.
LINK USE:
PROCEDURES:
Switch on the system and check the internet connection and look for MS office word
Clicked on internet browser and typed NCBI and retrieve the protein 2D sequence acheive
the fasta format of sequence.
Then we open the CHAU FASMAN server web then we entre the sequence and predict it and
achieve their result.
Same process is apply for GOR TOOL and JPred method. and achieve their result and paste
it in our documentary all result .
34
GOR finder
35
36
J pred
JNetPRED
The consensus prediction - helices are marked as red tubes,
and sheets as dark green arrows.
JNetCONF
The confidence estimate for the prediction. High values
mean high confidence. prediction - helices are marked as
red tubes, and sheets as dark green arrows.
JNetALIGN
Alignment based prediction - helices are marked as red
tubes, and sheets as dark green arrows.
JNetHMM
HMM profile based prediction - helices are marked as red
tubes, and sheets as dark green arrows.
37
JNETPSSM
PSSM based prediction - helices are marked as red tubes,
and sheets as dark green arrows.
JNETJURY
A '*' in this annotation indicates that the JNETJURY was
invoked to rationalise significantly different primary
prediction.
EXPERIMENT NO. 05
Aim: To predict Open reading frams (ORF) & exons of a given DNA sequence
using ORF finder, genscan and gene mark.
Date :
Theory:
ORF Finder:
38
The ORF finder is a program available at NCBI website. It identifies the all open reading
frames or the possible protein coding region in sequence. It shows 6 horizontal bars
corresponding to one of the possible reading frame.
GENSCAN:
GENEMARK:
GeneMark is a generic name for a family of ab initio gene prediction programs developed at the
Georgia Institute of Technology in Atlanta. GeneMark is a generic name for a family of ab
initio gene prediction programs developed at the Georgia Institute of Technology in Atlanta.
Link used:
Procedures:
Switch on the system and check the internet connection and look for MS office word
Clicked on internet browser and typed NCBI and retrieve the nucleotide DNA sequence
acheive the fasta format of sequence.
Then open the CHAU FASMAN server web and input their sequence and run it and find out
their result.
Same way is apply for GOR TOOL, JPred method .
And retrieve their result and paste the result in their docunmentry .
Result:
ORF finder
39
40
Genscan
41
Gene mark
42
EXPERIMENT NO –O6
Aim : To perfrom homology modeling of given target sequence using SWISS
Model .
Date : 06/10/2017
Theory:
NCBI:
43
comparative genomics), John Wilbur, Teresa Przytycka, and
Zhiyong Lu. David Lipman stood down from his post in May 2017.[1]
HOMOLOGY:
In biology, homology is the existence of shared ancestry between a pair of structures, or genes,
in different taxa. A common example of homologous structures is the forelimbs of vertebrates,
where the wings of bats, the arms of primates, the front flippers of whales and the forelegs of
dogs and horses are all derived from the same ancestral tetrapod structure.
UNIPROT:
The UniProt consortium comprises the European Bioinformatics Institute (EBI), the Swiss Institute of
Bioinformatics (SIB), and the Protein Information Resource (PIR). EBI, located at the Wellcome Trust
Genome Campus in Hinxton, UK, hosts a large resource of bioinformatics databases and services. SIB,
located in Geneva, Switzerland, maintains the ExPASy (Expert Protein Analysis System) servers that are a
central resource for proteomics tools and databases. PIR, hosted by the National Biomedical Research
Foundation (NBRF) at the Georgetown University Medical Center in Washington, DC, USA, is heir to the
oldest protein sequence database, Margaret Dayhoff's Atlas of Protein Sequence and Structure, first
published in 1965.[2] In 2002, EBI, SIB, and PIR joined forces as the UniProt consortium.[3]
SWISS MODEL:
44
Today, SWISS-MODEL consists of three tightly integrated components: (1) The SWISS-
MODEL pipeline – a suite of software tools and databases for automated protein structure
modelling,[1] (2) The SWISS-MODEL Workspace – a web-based graphical user workbench
Linked used:
Procedure
Switch on the system and check the internet connection and look for MS office word
Clicked on internet browser and typed NCBI retrieve the protein acheive the fasta format of
sequence.
Click on SWISS model and enter protein sequence in SWISS model and retrieve the protein
structure .
Same process apply UNIPROT and achieve their result .
Paste their result in our documentary .
Result:
45
Protein insulin (NCBI):
46
INSULIN (UNIPROT):
47
STRUCTURE:
48
49
50