0% found this document useful (0 votes)
9 views14 pages

Colorful Organic Project Proposal

The document is an assignment on bioinformatics that involves the analysis and interpretation of sequence data using various bioinformatics tools. It includes downloading a query sequence in FASTA format, identifying databases used for searching, describing the gene/protein, and determining restriction sites for specific enzymes. The analysis reveals details about the Escherichia coli genome, including its length, protein-coding genes, and specific ORF characteristics.

Uploaded by

shaikhalima4922
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

Colorful Organic Project Proposal

The document is an assignment on bioinformatics that involves the analysis and interpretation of sequence data using various bioinformatics tools. It includes downloading a query sequence in FASTA format, identifying databases used for searching, describing the gene/protein, and determining restriction sites for specific enzymes. The analysis reveals details about the Escherichia coli genome, including its length, protein-coding genes, and specific ORF characteristics.

Uploaded by

shaikhalima4922
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

halima shaikh 22bsc210

SMIC601: CA- 2
Assignment on Practical
Bioinformatics
Analysis and Interpretation of
sequence data using Bioinformatics
tools
QUERY SEQUENCE ACCESSION
NUMBER: NC_000913.3
[Link] the sequence of the query
in FASTA format
ANS:-
QUERY SEQUENCE ACCESSION NUMBER: NC_000913.3
>NC_000913.3 Escherichia coli str. K-12 substr. MG1655,
complete genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTG
GATTAAAAAAAGAGTGTCTGATAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTG
ACTTAGGTCACTAAATACTTTAACCAA
TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACA
CAACATCCATGAAACGCATTAGCACCACC
ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGG
CTGACGCGTACAGGAAACACAGAAAAAAG
CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTA
ACGAGGTAACAACCATGCGAGTGTTGAA
GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCG
TGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAA
AATCACCAACCACCTGGTGGCGATGATTG
AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGC
GATGCCGAACGTATTTTTGCCGAACTTTT
GACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCG
CAATTGAAAACTTTCGTCGATCAGGAATTT
GCCCAAATAAAACATGTCCTGCATGGCATTAGTTTGTTGGGG
CAGTGCCCGGATAGCATCAACGCTGCGC
TGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCG
GCGTATTAGAAGCGCGCGGTCACAACGT
TACTGTTATCGATCCGGTCGAAAAACTGCTGGCAGTGGGGC
ATTACCTCGAATCTACCGTCGATATTGCT
GAGTCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTGA
TCACATGGTGCTGATGGCAGGTTTCACCG
CCGGTAATGAAAAAGGCGAACTGGTGGTGCTTGGACGCAAC
GGTTCCGACTACTCTGCTGCGGTGCTGGC
TGCCTGTTTACGCGCCGATTGTTGCGAGATTTGGACGGACGT
TGACGGGGTCTATACCTGCGACCCGCGT
CAGGTGCCCGATGCGAGGTTGTTGAAGTCGATGTCCTACCA
GGAAGCGATGGAGCTTTCCTACTTCGGCG
CTAAAGTTCTTCACCCCCGCACCATTACCCCCATCGCCCAGT
TCCAGATCCCTTGCCTGATTAAAAATAC
CGGAAATCCTCAAGCACCAGGTACGCTCATTGGTGCCAGCC
GTGATGAAGACGAATTACCGGTCAAGGGC
ATTTCCAATCTGAATAACATGGCAATGTTCAGCGTTTCTGGT
CCGGGGATGAAAGGGATGGTCGGCATGG
CGGCGCGCGTCTTTGCAGCGATGTCACGCGCCCGTATTTCC
GTGGTGCTGATTACGCAATCATCTTCCGA
ATACAGCATCAGTTTCTGCGTTCCACAAAGCGACTGTGTGCG
AGCTGAACGGGCAATGCAGGAAGAGTTC
TACCTGGAACTGAAAGAAGGCTTACTGGAGCCGCTGGCAGT
GACGGAACGGCTGGCCATTATCTCGGTGG
TAGGTGATGGTATGCGCACCTTGCGTGGGATCTCGGCGAAA
TTCTTTGCCGCACTGGCCCGCGCCAATAT
CAACATTGTCGCCATTGCTCAGGGATCTTCTGAACGCTCAAT
CTCTGTCGTGGTAAATAACGATGATGCG
ACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACC
GATCAGGTTATCGAAGTGTTTGTGATTG
GCGTCGGTGGCGTTGGCGGTGCGCTGCTGGAGCAACTGAAG
CGTCAGCAAAGCTGGCTGAAGAATAAACA
TATCGACTTACGTGTCTGCGGTGTTGCCAACTCGAAGGCTCT
GCTCACCAATGTACATGGCCTTAATCTG
GAAAACTGGCAGGAAGAACTGGCGCAAGCCAAAGAGCCG
TTTAATCTCGGGCGCTTAATTCGCCTCGTGA
AAGAATATCATCTGCTGAACCCGGTCATTGTTGACTGCACT
TCCAGCCAGGCAGTGGCGGATCAATATGC
CGACTTCCTGCGCGAAGGTTTCCACGTTGTCACGCCGAAC
AAAAAGGCCAACACCTCGTCGATGGATTAC
TACCATCAGTTGCGTTATGCGGCGGAAAAATCGCGGCGTA
AATTCCTCTATGACACCAACGTTGGGGCTG
Q2. What databases are used to search the
sequence? What is the length of the
sequence used to search the database?
Ans: Nucleotide database, Genebank and
Refseq .
The length of the sequence used to search
database is 4641652 bp .

Q3. Name the gene/ protein. What is the


source of the same?
Ans: The name of the gene is Escherichia coli
str. K-12 substr. MG1655, complete genome.
The source of the gene is Escherichia coli str. K-
12 substr. MG1655 .
Q4. Describe the gene/protein under study
Ans: The gene is the complete genome of
Escherichia coli str. K-12 substr. MG1655 . It is a
circular DNA. The sequence is of 4641652 bp in
length. Of 4288 protein-coding genes
annotated, 38 percent have no attributed
function. The majority of genes, guanines, and
oligonucleotides that may be involved in
replication and recombination are all
remarkably orientated in relation to the local
direction of replication across the genome. In
addition, the genome has phage remnants,
insertion sequence (IS) elements, and several
additional patches with odd compositions that
show horizontal transfer-induced genome
plasticity.
Q5. On the basis of alignment and graphic
summary what is the E-value and
Maximum Identity of the best hit?
Ans:- On basis of alignment and graphic
summary the E-value and maximum identity of
the best hit is 0.0 and 100.00%
Q6. Using ORF finder find the number of ORF
for the given sequence. For each ORF,
mention the number of nucleotides and the
amino acids incorporated in the peptide
chain.
Ans :- The number of ORF for the given
sequence is 159
The number of nucleotide and the amino
acids incorporated in peptide chain:-

ORF 7-
Nucleotide- 366
Amino acids - 121
ORF16-
Nucleotide- 288
Amina acids- 95
ORF20 -
Nucleotide- 2817
Amino acids- 938
ORF1-
Nucleotide- 2463
Amino acids- 820
Q7. Note down the amino acid sequence for
each ORF.
Ans:- The aminoacid sequence f each ORF is as followed :
ORF7 - >lcl|ORF7
MSLLTIWQPAVCSLTHHKSSRHKKTRLRGLFHKASANWRLSQFVDLCSQV
SLMTCSFVFVDQTFSSLTVHDRLHFVKCFLCSSFVASFDSCVYFLDESTH
HRATACVVLTSLFRLNGALLS

ORF16 - >lcl|ORF16
MQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRD
HGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR

ORF 20- >lcl|ORF20


MSDYKSTLNLPETGFPMRGDLAKREPGMLARWTDDDLYGIIRAAKKGKKT
FILHDGPPYANGSIHIGHSVNKILKDIIVKSKGLSGYDSPYVPGWDCHGL
PIELKVEQEYGKPGEKFTAAEFRAKCREYAATQVDGQRKDFIRLGVLGDW
SHPYLTMDFKTEANIIRALGKIIGNGHLHKGAKPVHWCVDCRSALAEAEV
EYYDKTSPSIDVAFQAVDQDALKAKFAVSNVNGPISLVIWTTTPWTLPAN
RAISIAPDFDYALVQIDGQAVILAKDLVESVMQRIGVTDYTILGTVKGAE
LELLRFTHPFMGFDVPAILGDHVTLDAGTGAVHTAPGHGPDDYVIGQKYG
LETANPVGPDGTYLPGTYPTLDGVNVFKANDIVVALLQEKGALLHVEKMQ
HSYPCCWRHKTPIIFRATPQWFVSMDQKGLRAQSLKEIKGVQWIPDWGQA
RIESMVANRPDWCISRQRTWGVPMSLFVHKDTEELHPRTLELMEEVAKRV
EVDGIQAWWDLDAKEILGDEADQYVKVPDTLDVWFDSGSTHSSVVDVRP
E
FAGHAADMYLEGSDQHRGWFMSSLMISTAMKGKAPYRQVLTHGFTVDG
QG
RKMSKSIGNTVSPQDVMNKLGADILRLWVASTDYTGEMAVSDEILKRAAD
SYRRIRNTARFLLANLNGFDPAKDMVKPEEMVVLDRWAVGCAKAAQEDIL
KAYEAYDFHEVVQRLMRFCSVEMGSFYLDIIKDRQYTAKADSVARRSCQT
ALYHIAEALVRWMAPILSFTADEVWGYLPGEREKYVFTGEWYEGLFGLAD
SEAMNDAFWDELLKVRGEVNKVIEQARADKKVGGSLEAAVTLYAEPELSA
KLTALGDELRFVLLTSGATVADYNDAPADAQQSEVLKGLKVALSKAEGEK
CPRCWHYTQDVGKVAEHAEICGRCVSNVAGDGEKRKFA
ORF1 - >lcl|ORF1
MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVA
M
IEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQ
IKHVLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDP
VEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELV
VLGRNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKS
MS
YQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASRD
EDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVL
IT
QSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIIS
VVGDGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATT
GVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLR
V
CGVANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNP
V
IVDCTSSQAVADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKS
R
RKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGILSGSLSYIFGKLDEG
MSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIE
IEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNI
D
EDGVCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGN
D
VTAAGVFADLLRTLSWKLGV
Q8. Determine how many restriction sites
are present on the gene for the following
enzymes Hind III and SmaI. What is the cut
position and what kind of overhangs are
generated?
Ans:-
For HindIII
Overhang:- five_prime
Frequency:- 297
0, 8912, 14322, 15205, 19406, 20795, 83462, 87744, 89558, 101721, 104933, 108475, 108781,
109468, 111416, 122331, 124433, 126428, 152489, 157044, 165598, 165719, 174629, 190578,
198726, 198855, 199665, 206910, 211938, 212091, 216626, 221255, 223850, 224418, 231843,
234882, 237347, 266869, 270999, 272871, 283175, 290177, 295177, 297513, 299357, 300323, 315363,
316279, 347209, 373567, 381715, 382741, 384205, 388835, 391908, 392824, 393046, 401416,
401465, 423088, 438942, 439566, 448753, 450205, 459057, 491527, 503071, 516851, 526446,
554859, 557885, 562243, 566914, 567830, 571087, 572235, 582752, 587715, 601324, 607698,
618515, 625192, 636794, 638016, 647471, 651713, 655407, 656016, 660082, 675269, 711327,
715117, 718164, 755355, 776457, 791637, 853721, 878554, 924792, 926881, 933349, 937407, 940941,
941971, 949510, 960709, 962293, 963640, 971889, 976930, 982644, 1004410, 1049730, 1066374,
1067989, 1068348, 1070416, 1078984, 1094002, 1094444, 1095360, 1098413, 1102792, 1102996,
1136615, 1143811, 1145296, 1147559, 1149831, 1183043, 1185004, 1199348, 1210932, 1220774,
1231624, 1234771, 1243430, 1261858, 1267933, 1274031, 1278994, 1280304, 1280508, 1281365,
1286810, 1286988, 1287166, 1296138, 1298222, 1298889, 1308328, 1315366, 1318322, 1321105,
1326664, 1350649, 1388460, 1400039, 1410218, 1417873, 1420393, 1424265, 1448409, 1449362,
1450920, 1451808, 1462140, 1467829, 1468780, 1469754, 1476439, 1479965, 1490759, 1513578,
1520107, 1528900, 1531361, 1544675, 1556142, 1563888, 1565985, 1574304, 1574456, 1579130,
1580057, 1585008, 1586096, 1588907, 1591817, 1597051, 1599275, 1602745, 1609105, 1642077,
1642166, 1646275, 1646516, 1656592, 1673587, 1678430, 1679886, 1683648, 1689783, 1702635,
1754907, 1764028, 1771435, 1795505, 1806075, 1812489, 1814292, 1818297, 1821051, 1821899,
1840001, 1853894, 1854206, 1870552, 1873476, 1887494, 1890760, 1893710, 1902374, 1905236,
1912810, 1941049, 1962180, 1967884, 1979710, 1979950, 1980194, 1983056, 1995745, 2019354,
2028244, 2035518, 2043318, 2049203, 2049761, 2051973, 2053271, 2060572, 2068012, 2069811,
2086089, 2087916, 2093038, 2097499, 2103552, 2105899, 2106145, 2107739, 2112441, 2115634,
2132988, 2138471, 2142303, 2153694, 2168150, 2170308, 2171224, 2171897, 2171906, 2190850,
2195127, 2210096, 2212265, 2220806, 2227938, 2242440, 2246160, 2252870, 2256605, 2261944,
2268789, 2270130, 2271588, 2271817, 2283788, 2307302, 2310501, 2311521, 2314078, 2319357,
2333248, 2335914, 2338724, 2357015, 2357358, 2359996, 2366046, 2367133, 2387934, 2400099,
2401180, 2403890, 2413109
For SmaI
Overhang:- blunt
Frequency:- 226
15600, 15810, 33518, 53982, 90788, 99705, 100703, 113232, 122180, 123453, 134268,
139474, 142601, 169843, 189574, 190475, 190663, 192733, 197348, 202285, 206138,
208354, 209111, 224384, 225154, 242677, 264467, 266013, 267781, 273318, 282539,
286892, 287873, 297864, 301485, 303988, 305002, 308371, 320414, 320845, 320944,
321041, 336500, 371375, 378407, 381701, 385675, 391083, 398687, 417364, 426330,
440311, 450972, 454916, 455719, 464117, 491953, 494549, 498854, 524300, 527504,
558649, 561491, 565928, 566369, 584640, 584929, 595463, 608220, 608430,
639583, 641744, 643192, 650934, 653453, 663847, 669248, 675004, 764730, 788421,
819923, 831269, 839294, 854489, 855529, 872332, 891988, 892252, 904229, 915190,
918709, 921294, 923384, 945303, 949267, 962053, 1021483, 1025077, 1026424,
1028732, 1040155, 1045075, 1065973, 1077368, 1086078, 1102458, 1112271, 1125336,
1150557, 1172958, 1181419, 1196719, 1202499, 1219621, 1232002, 1232289, 1235018,
1242215, 1243821, 1253242, 1259175, 1268598, 1303526, 1305610, 1314284, 1315694,
1333947, 1336888, 1349531, 1349768, 1361389, 1362677, 1367292, 1388990, 1405270,
1410700, 1414308, 1421650, 1429757, 1433151, 1440180, 1468798, 1478830, 1479227,
1501903, 1514546, 1533621, 1541531, 1552656, 1554266, 1560211, 1561088,
1568203, 1575139, 1594896, 1634144, 1642319, 1658907, 1679061, 1696917,
1700634, 1702759, 1704224, 1707838, 1748271, 1795303, 1795370, 1799029, 1801566,
1806480, 1828288, 1850013, 1864580, 1872351, 1874193, 1878861, 1879838, 1881943,
1927797, 1932475, 1934128, 1937300, 1958836, 1959543, 1965196, 1977351, 1985520,
1997192, 2013167, 2048527, 2061409, 2063117, 2064095, 2069829, 2071714,
2074125, 2074446, 2075502, 2076542, 2077129, 2077592, 2084002, 2123180,
2135065, 2135135, 2139565, 2143090, 2163162, 2176435, 2205901, 2224278, 2229212,
2234936, 2241610, 2261205, 2265964, 2272590, 2309185, 2311382, 2332235, 2338782,
2356473, 2368987, 2389171, 2400377, 2404146

You might also like