halima shaikh 22bsc210
SMIC601: CA- 2
Assignment on Practical
Bioinformatics
Analysis and Interpretation of
sequence data using Bioinformatics
tools
QUERY SEQUENCE ACCESSION
NUMBER: NC_000913.3
[Link] the sequence of the query
in FASTA format
ANS:-
QUERY SEQUENCE ACCESSION NUMBER: NC_000913.3
>NC_000913.3 Escherichia coli str. K-12 substr. MG1655,
complete genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTG
GATTAAAAAAAGAGTGTCTGATAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTG
ACTTAGGTCACTAAATACTTTAACCAA
TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACA
CAACATCCATGAAACGCATTAGCACCACC
ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGG
CTGACGCGTACAGGAAACACAGAAAAAAG
CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTA
ACGAGGTAACAACCATGCGAGTGTTGAA
GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCG
TGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAA
AATCACCAACCACCTGGTGGCGATGATTG
AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGC
GATGCCGAACGTATTTTTGCCGAACTTTT
GACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCG
CAATTGAAAACTTTCGTCGATCAGGAATTT
GCCCAAATAAAACATGTCCTGCATGGCATTAGTTTGTTGGGG
CAGTGCCCGGATAGCATCAACGCTGCGC
TGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCG
GCGTATTAGAAGCGCGCGGTCACAACGT
TACTGTTATCGATCCGGTCGAAAAACTGCTGGCAGTGGGGC
ATTACCTCGAATCTACCGTCGATATTGCT
GAGTCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTGA
TCACATGGTGCTGATGGCAGGTTTCACCG
CCGGTAATGAAAAAGGCGAACTGGTGGTGCTTGGACGCAAC
GGTTCCGACTACTCTGCTGCGGTGCTGGC
TGCCTGTTTACGCGCCGATTGTTGCGAGATTTGGACGGACGT
TGACGGGGTCTATACCTGCGACCCGCGT
CAGGTGCCCGATGCGAGGTTGTTGAAGTCGATGTCCTACCA
GGAAGCGATGGAGCTTTCCTACTTCGGCG
CTAAAGTTCTTCACCCCCGCACCATTACCCCCATCGCCCAGT
TCCAGATCCCTTGCCTGATTAAAAATAC
CGGAAATCCTCAAGCACCAGGTACGCTCATTGGTGCCAGCC
GTGATGAAGACGAATTACCGGTCAAGGGC
ATTTCCAATCTGAATAACATGGCAATGTTCAGCGTTTCTGGT
CCGGGGATGAAAGGGATGGTCGGCATGG
CGGCGCGCGTCTTTGCAGCGATGTCACGCGCCCGTATTTCC
GTGGTGCTGATTACGCAATCATCTTCCGA
ATACAGCATCAGTTTCTGCGTTCCACAAAGCGACTGTGTGCG
AGCTGAACGGGCAATGCAGGAAGAGTTC
TACCTGGAACTGAAAGAAGGCTTACTGGAGCCGCTGGCAGT
GACGGAACGGCTGGCCATTATCTCGGTGG
TAGGTGATGGTATGCGCACCTTGCGTGGGATCTCGGCGAAA
TTCTTTGCCGCACTGGCCCGCGCCAATAT
CAACATTGTCGCCATTGCTCAGGGATCTTCTGAACGCTCAAT
CTCTGTCGTGGTAAATAACGATGATGCG
ACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACC
GATCAGGTTATCGAAGTGTTTGTGATTG
GCGTCGGTGGCGTTGGCGGTGCGCTGCTGGAGCAACTGAAG
CGTCAGCAAAGCTGGCTGAAGAATAAACA
TATCGACTTACGTGTCTGCGGTGTTGCCAACTCGAAGGCTCT
GCTCACCAATGTACATGGCCTTAATCTG
GAAAACTGGCAGGAAGAACTGGCGCAAGCCAAAGAGCCG
TTTAATCTCGGGCGCTTAATTCGCCTCGTGA
AAGAATATCATCTGCTGAACCCGGTCATTGTTGACTGCACT
TCCAGCCAGGCAGTGGCGGATCAATATGC
CGACTTCCTGCGCGAAGGTTTCCACGTTGTCACGCCGAAC
AAAAAGGCCAACACCTCGTCGATGGATTAC
TACCATCAGTTGCGTTATGCGGCGGAAAAATCGCGGCGTA
AATTCCTCTATGACACCAACGTTGGGGCTG
Q2. What databases are used to search the
sequence? What is the length of the
sequence used to search the database?
Ans: Nucleotide database, Genebank and
Refseq .
The length of the sequence used to search
database is 4641652 bp .
Q3. Name the gene/ protein. What is the
source of the same?
Ans: The name of the gene is Escherichia coli
str. K-12 substr. MG1655, complete genome.
The source of the gene is Escherichia coli str. K-
12 substr. MG1655 .
Q4. Describe the gene/protein under study
Ans: The gene is the complete genome of
Escherichia coli str. K-12 substr. MG1655 . It is a
circular DNA. The sequence is of 4641652 bp in
length. Of 4288 protein-coding genes
annotated, 38 percent have no attributed
function. The majority of genes, guanines, and
oligonucleotides that may be involved in
replication and recombination are all
remarkably orientated in relation to the local
direction of replication across the genome. In
addition, the genome has phage remnants,
insertion sequence (IS) elements, and several
additional patches with odd compositions that
show horizontal transfer-induced genome
plasticity.
Q5. On the basis of alignment and graphic
summary what is the E-value and
Maximum Identity of the best hit?
Ans:- On basis of alignment and graphic
summary the E-value and maximum identity of
the best hit is 0.0 and 100.00%
Q6. Using ORF finder find the number of ORF
for the given sequence. For each ORF,
mention the number of nucleotides and the
amino acids incorporated in the peptide
chain.
Ans :- The number of ORF for the given
sequence is 159
The number of nucleotide and the amino
acids incorporated in peptide chain:-
ORF 7-
Nucleotide- 366
Amino acids - 121
ORF16-
Nucleotide- 288
Amina acids- 95
ORF20 -
Nucleotide- 2817
Amino acids- 938
ORF1-
Nucleotide- 2463
Amino acids- 820
Q7. Note down the amino acid sequence for
each ORF.
Ans:- The aminoacid sequence f each ORF is as followed :
ORF7 - >lcl|ORF7
MSLLTIWQPAVCSLTHHKSSRHKKTRLRGLFHKASANWRLSQFVDLCSQV
SLMTCSFVFVDQTFSSLTVHDRLHFVKCFLCSSFVASFDSCVYFLDESTH
HRATACVVLTSLFRLNGALLS
ORF16 - >lcl|ORF16
MQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRD
HGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR
ORF 20- >lcl|ORF20
MSDYKSTLNLPETGFPMRGDLAKREPGMLARWTDDDLYGIIRAAKKGKKT
FILHDGPPYANGSIHIGHSVNKILKDIIVKSKGLSGYDSPYVPGWDCHGL
PIELKVEQEYGKPGEKFTAAEFRAKCREYAATQVDGQRKDFIRLGVLGDW
SHPYLTMDFKTEANIIRALGKIIGNGHLHKGAKPVHWCVDCRSALAEAEV
EYYDKTSPSIDVAFQAVDQDALKAKFAVSNVNGPISLVIWTTTPWTLPAN
RAISIAPDFDYALVQIDGQAVILAKDLVESVMQRIGVTDYTILGTVKGAE
LELLRFTHPFMGFDVPAILGDHVTLDAGTGAVHTAPGHGPDDYVIGQKYG
LETANPVGPDGTYLPGTYPTLDGVNVFKANDIVVALLQEKGALLHVEKMQ
HSYPCCWRHKTPIIFRATPQWFVSMDQKGLRAQSLKEIKGVQWIPDWGQA
RIESMVANRPDWCISRQRTWGVPMSLFVHKDTEELHPRTLELMEEVAKRV
EVDGIQAWWDLDAKEILGDEADQYVKVPDTLDVWFDSGSTHSSVVDVRP
E
FAGHAADMYLEGSDQHRGWFMSSLMISTAMKGKAPYRQVLTHGFTVDG
QG
RKMSKSIGNTVSPQDVMNKLGADILRLWVASTDYTGEMAVSDEILKRAAD
SYRRIRNTARFLLANLNGFDPAKDMVKPEEMVVLDRWAVGCAKAAQEDIL
KAYEAYDFHEVVQRLMRFCSVEMGSFYLDIIKDRQYTAKADSVARRSCQT
ALYHIAEALVRWMAPILSFTADEVWGYLPGEREKYVFTGEWYEGLFGLAD
SEAMNDAFWDELLKVRGEVNKVIEQARADKKVGGSLEAAVTLYAEPELSA
KLTALGDELRFVLLTSGATVADYNDAPADAQQSEVLKGLKVALSKAEGEK
CPRCWHYTQDVGKVAEHAEICGRCVSNVAGDGEKRKFA
ORF1 - >lcl|ORF1
MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVA
M
IEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQ
IKHVLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDP
VEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELV
VLGRNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKS
MS
YQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASRD
EDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVL
IT
QSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIIS
VVGDGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATT
GVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLR
V
CGVANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNP
V
IVDCTSSQAVADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKS
R
RKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGILSGSLSYIFGKLDEG
MSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIE
IEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNI
D
EDGVCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGN
D
VTAAGVFADLLRTLSWKLGV
Q8. Determine how many restriction sites
are present on the gene for the following
enzymes Hind III and SmaI. What is the cut
position and what kind of overhangs are
generated?
Ans:-
For HindIII
Overhang:- five_prime
Frequency:- 297
0, 8912, 14322, 15205, 19406, 20795, 83462, 87744, 89558, 101721, 104933, 108475, 108781,
109468, 111416, 122331, 124433, 126428, 152489, 157044, 165598, 165719, 174629, 190578,
198726, 198855, 199665, 206910, 211938, 212091, 216626, 221255, 223850, 224418, 231843,
234882, 237347, 266869, 270999, 272871, 283175, 290177, 295177, 297513, 299357, 300323, 315363,
316279, 347209, 373567, 381715, 382741, 384205, 388835, 391908, 392824, 393046, 401416,
401465, 423088, 438942, 439566, 448753, 450205, 459057, 491527, 503071, 516851, 526446,
554859, 557885, 562243, 566914, 567830, 571087, 572235, 582752, 587715, 601324, 607698,
618515, 625192, 636794, 638016, 647471, 651713, 655407, 656016, 660082, 675269, 711327,
715117, 718164, 755355, 776457, 791637, 853721, 878554, 924792, 926881, 933349, 937407, 940941,
941971, 949510, 960709, 962293, 963640, 971889, 976930, 982644, 1004410, 1049730, 1066374,
1067989, 1068348, 1070416, 1078984, 1094002, 1094444, 1095360, 1098413, 1102792, 1102996,
1136615, 1143811, 1145296, 1147559, 1149831, 1183043, 1185004, 1199348, 1210932, 1220774,
1231624, 1234771, 1243430, 1261858, 1267933, 1274031, 1278994, 1280304, 1280508, 1281365,
1286810, 1286988, 1287166, 1296138, 1298222, 1298889, 1308328, 1315366, 1318322, 1321105,
1326664, 1350649, 1388460, 1400039, 1410218, 1417873, 1420393, 1424265, 1448409, 1449362,
1450920, 1451808, 1462140, 1467829, 1468780, 1469754, 1476439, 1479965, 1490759, 1513578,
1520107, 1528900, 1531361, 1544675, 1556142, 1563888, 1565985, 1574304, 1574456, 1579130,
1580057, 1585008, 1586096, 1588907, 1591817, 1597051, 1599275, 1602745, 1609105, 1642077,
1642166, 1646275, 1646516, 1656592, 1673587, 1678430, 1679886, 1683648, 1689783, 1702635,
1754907, 1764028, 1771435, 1795505, 1806075, 1812489, 1814292, 1818297, 1821051, 1821899,
1840001, 1853894, 1854206, 1870552, 1873476, 1887494, 1890760, 1893710, 1902374, 1905236,
1912810, 1941049, 1962180, 1967884, 1979710, 1979950, 1980194, 1983056, 1995745, 2019354,
2028244, 2035518, 2043318, 2049203, 2049761, 2051973, 2053271, 2060572, 2068012, 2069811,
2086089, 2087916, 2093038, 2097499, 2103552, 2105899, 2106145, 2107739, 2112441, 2115634,
2132988, 2138471, 2142303, 2153694, 2168150, 2170308, 2171224, 2171897, 2171906, 2190850,
2195127, 2210096, 2212265, 2220806, 2227938, 2242440, 2246160, 2252870, 2256605, 2261944,
2268789, 2270130, 2271588, 2271817, 2283788, 2307302, 2310501, 2311521, 2314078, 2319357,
2333248, 2335914, 2338724, 2357015, 2357358, 2359996, 2366046, 2367133, 2387934, 2400099,
2401180, 2403890, 2413109
For SmaI
Overhang:- blunt
Frequency:- 226
15600, 15810, 33518, 53982, 90788, 99705, 100703, 113232, 122180, 123453, 134268,
139474, 142601, 169843, 189574, 190475, 190663, 192733, 197348, 202285, 206138,
208354, 209111, 224384, 225154, 242677, 264467, 266013, 267781, 273318, 282539,
286892, 287873, 297864, 301485, 303988, 305002, 308371, 320414, 320845, 320944,
321041, 336500, 371375, 378407, 381701, 385675, 391083, 398687, 417364, 426330,
440311, 450972, 454916, 455719, 464117, 491953, 494549, 498854, 524300, 527504,
558649, 561491, 565928, 566369, 584640, 584929, 595463, 608220, 608430,
639583, 641744, 643192, 650934, 653453, 663847, 669248, 675004, 764730, 788421,
819923, 831269, 839294, 854489, 855529, 872332, 891988, 892252, 904229, 915190,
918709, 921294, 923384, 945303, 949267, 962053, 1021483, 1025077, 1026424,
1028732, 1040155, 1045075, 1065973, 1077368, 1086078, 1102458, 1112271, 1125336,
1150557, 1172958, 1181419, 1196719, 1202499, 1219621, 1232002, 1232289, 1235018,
1242215, 1243821, 1253242, 1259175, 1268598, 1303526, 1305610, 1314284, 1315694,
1333947, 1336888, 1349531, 1349768, 1361389, 1362677, 1367292, 1388990, 1405270,
1410700, 1414308, 1421650, 1429757, 1433151, 1440180, 1468798, 1478830, 1479227,
1501903, 1514546, 1533621, 1541531, 1552656, 1554266, 1560211, 1561088,
1568203, 1575139, 1594896, 1634144, 1642319, 1658907, 1679061, 1696917,
1700634, 1702759, 1704224, 1707838, 1748271, 1795303, 1795370, 1799029, 1801566,
1806480, 1828288, 1850013, 1864580, 1872351, 1874193, 1878861, 1879838, 1881943,
1927797, 1932475, 1934128, 1937300, 1958836, 1959543, 1965196, 1977351, 1985520,
1997192, 2013167, 2048527, 2061409, 2063117, 2064095, 2069829, 2071714,
2074125, 2074446, 2075502, 2076542, 2077129, 2077592, 2084002, 2123180,
2135065, 2135135, 2139565, 2143090, 2163162, 2176435, 2205901, 2224278, 2229212,
2234936, 2241610, 2261205, 2265964, 2272590, 2309185, 2311382, 2332235, 2338782,
2356473, 2368987, 2389171, 2400377, 2404146