You are on page 1of 10

Practical 2.

1: Performing Basic Local Alignment Search Tool (BLAST) for extracting


biological sequence information
Tasks
1. Perform BLASTp (as default) to identify the species in case of unknown sequence. Go to
the GenPept page (Take a snapshot); mention the accession number, locus, definition, sequence
length, name of the organism, the taxonomy, title, name of the journal, BioProject, BioSample,
PMID, provide a short summary of the protein and details about the transcript variant (if
available).
>Sample-1
MKHEIMTSLEVATCEDDIVNIDLNTIPDEALFQGGIHDQVDIGVLDDIPYIISDVSNNSNNSFPLGPRGP
PPRKDDPGQYNFSVEIHSKDTHKKKFLFSHKLNRIYVNMETDFAVQFNWELVDLAVTQMYVRATVVFEDE
SQAEKRVERCIQHKLCSSDKGQDRVVSENVLRSSRPLGTNDVQYCGHPDDPDYWYSVLVQLPKPGREPCT
HAFKFVCKNSCSTGINRRSIAVIFTLESASGSVLGRQTVGARVCSCTARDMCKDEEAEGARAARKRPRPA
QSRLLKKIKLETVGDLPVDAETLTLPPLEIIGAKTVKTGLEVMLRMMEQAAHFHKHDQLAADGYQRRVAS

>Sample-2
MASDASHALEAALEQMDGIIAGTKTGADLSDGTCEPGLASPASYMNPFPVLHLIEDLRLALEMLELPQER
AALLSQIPGPTAAYIKEWFEESLSQVNHHSAASNETYQERLARLEGDKESLILQVSVLTDQVEAQGEKIR
DLEVCLEGHQVKLNAAEEMLQQELLSRTSLETQKLDLMTEVSELKLKLVGMEKEQREQEEKQRKAEELLQ
ELRHLKIKVEELENERNQYEWKLKATKAEVAQLQEQVALKDAEIERLHSQLSRTAALHSESHTERDQEIQ
RLKMGMETLLLANEDKDRRIEELTGLLNQYRKVKEIVMVTQGPSERTLSINEEEPEGGFSKWNATNKDPE
ELFKQEMPPRCSSPTVGPPPLPQKSLETRAQKKLSCSLEDLRSESVDKCMDGNQPFPVLEPKDSPFLAEH
KYPTLPGKLSGATPNGEAAKSPPTICQPDATGSSLLRLRDTESGWDDTAVVNDLSSTSSGTESGPQSPLT
PDGKRNPKGIKKFWGKIRRTQSGNFYTDTLGMAEFRRGGLRATAGPRLSRTRDSKGQKSDANAPFAQWST
ERVCAWLEDFGLAQYVIFARQWVSSGHTLLTATPQDMEKELGIKHPLHRKKLVLAVKAINTKQEEKSALL
DHIWVTRWLDDIGLPQYKDQFHESRVDRRMLQYLTVNDLLFLKVTSQLHHLSIKCAIHVLHVNKFNPHCL
HRRPADESNLSPSEVVQWSNHRVMEWLRSVDLAEYAPNLRGSGVHGGLIILEPRFTGDTLAMLLNIPPQK
TLLRRHLTTKFNALIGPEAEQEKREKMASPAYTPLTTTAKVRPRKLGFSHFGNIRKKKFDESTDYICPME
PSDGVSDSHRVYSGYRGLSPLDAPELDGLDQVGQIS

>Sample-3
MGLTSTWRYGRGPGIGTVTMVSWGRFICLVVVTMATLSLARPSFSLVEDTTLEPEDAISSGDDEDDTDGA
EDFVSENSNNKRAPYWTNTEKMEKRLHAVPAANTVKFRCPAGGNPTPTMRWLKNGKEFKQEHRIGGYKVR
NQHWSLIMESVVPSDKGNYTCVVENEYGSINHTYHLDVVERSPHRPILQAGLPANASTVVGGDVEFVCKV
YSDAQPHIQWIKHVEKNGSKYGPDGLPYLKVLKAAGVNTTDKEIEVLYIRNVTFEDAGEYTCLAGNSIGI
SFHSAWLTVLPAPGREKEITASPDYLEIAIYCIGVFLIACMVVTVILCRMKNTTKKPDFSSQPAVHKLTK
RIPLRRQVTVSAESSSSMNSNTPLVRITTRLSSTADTPMLAGVSEYELPEDPKWEFPRDKLTLGKPLGEG
CFGQVVMAEAVGIDKDKPKEAVTVAVKMLKDDATEKDLSDLVSEMEMMKMIGKHKNIINLLGACTQDGPL
YVIVEYASKGNLREYLRARRPPGMEYSYDINRVPEEQMTFKDLVSCTYQLARGMEYLASQKCIHRDLAAR
NVLVTENNVMKIADFGLARDINNIDYYKKTTNGRLPVKWMAPEALFDRVYTHQSDVWSFGVLMWEIFTLG
GSPYPGIPVEELFKLLKEGHRMDKPANCTNELYMMMRDCWHAVPSQRPTFKQLVEDLDRILTLTTNEEYL
DLSQPLEPYSPCYPDPR

>Sample-4
MEPSSETGMDPPLSQETFEDLWSLLPDPLQTVTCRLDNLSEFPDYPLAADMSVLQEGLMGNAVPTVTSCA
PSTDDYAGKYGLQLDFQQNGTAKSVTCTYSPELNKLFCQLAKTCPLLVRVESPPPRGSILRATAVYKKSE
HVAEVVKRCPHHERSVEPGEDAAPPSHLMRVEGNLQAYYMEDVNSGRHSVCVPYEGPQVGTECTTVLYNY
MCNSSCMGGMNRRPILTIITLETPQGLLLGRRCFEVRVCACPGRDRRTEEDNYTKKRGLKPSGKRELAHP
PSSEPPLPKKRLVVDDDEEIFTLRIKGRSRYEMIKKLNDALELQESLDQQKVTIKCRKCRDEIKPKKGKK
LLVKDEQPDSE

>Sample-5
MAVVIRLQGLPIVAGTMDIRHFFSGLTIPDGGVHIVGGELGEAFIVFATDEDARLGMMRTGGTIKGSKVT
LLLSSKTEMQNMIELSRRRFETANLDIPPANASRSGPPPSSGMSSRVNLPTTVSNFNNPSPSVVTATTSV
HESNKNIQTFSTASVGTAPPNMGASFGSPTFSSTVPSTASPMNTVPPPPIPPIPAMPSLPPMPSIPPIPV
PPPVPTLPPVPPVPPIPPVPSVPPMTPLPPMSGMPPLNPPPVAPLPAGMNGSGAPMNLNNNLNPMFLGPL
NPVNPIQMNSQSSVKPLPINPDDLYVSVHGMPFSAMENDVRDFFHGLRVDAVHLLKDHVGRNNGNGLVKF
LSPQDTFEALKRNRMLMIQRYVEVSPATERQWVAAGGHITFKQNMGPSGQTHPPPQTLPRSKSPSGQKRS
RSRSPHEAGFCVYLKGLPFEAENKHVIDFFKKLDIVEDSIYIAYGPNGKATGEGFVEFRNEADYKAALCR
HKQYMGNRFIQVHPITKKGMLEKIDMIRKRLQNFSYDQREMILNPEGDVNSAKVCAHITNIPFSITKMDV
LQFLEGIPVDENAVHVLVDNNGQGLGQALVQFKNEDDARKSERLHRKKLNGREAFVHVVTLEDMREIEKN
PPAQGKKGLKMPVPGNPAVPGMPNAGLPGVGLPSAGLPGAGLPSTGLPGSAITSAGLPGAGMPSAGIPSA
GGEEHAFLTVGSKEANNGPPFNFPGNFGGSNAFGPPIPPPGLGGGAFGDARPGMPSVGNSGLPGLGLDVP
GFGGGPNNLSGPSGFGGGPQNFGNGPGSLGGPPGFGSGPPGLGSAPGHLGGPPAFGPGPGPGPGPGPIHI
GGPPGFASSSGKPGPTVIKVQNMPFTVSIDEILDFFYGYQVIPGSVCLKYNEKGMPTGEAMVAFESRDEA
TAAVIDLNDRPIGSRKVKLVLG

>Sample-6
MDNKNIDPNFNPERFLETQKYKVIVTALVFLLLFIVFLMVAFKKAFFAQANMPTLVMSKQDTATRGTIYS
QDNYSLATSQTLFKLGFDTRFLNPDKEDFFIDFLSIYSNIPKKSLKDAINTKGYTILAYDLTPNTAANLR
DLNKKFLTFGVFQNFKDARDKVWQKQGLNIEVSGVSRHYPYQNSLEPIIGYVQKQEENKLTLTTGKKGVE
KSQDHLLKAQQNGIRTGKRDVSFNFIQNHSYTEVERLDGYEVYLSIPLKLQREIETLLDKAKDKLKAEEI
LVGIINPKSGEILSLASSKRFNPNAIKTSDYESLNLSVAEKVFEPGSTIKPIVYSLLLDKNLINPKERID
LNHGYYQLGKYTIKDDFVPSKKAVVEDILIQSSNVGMIKISKNLNPEDFYNGLLGYGFSQKTGIDLSLEA
TGKIPPLSAFKREVLKGSVSYGYGLNATFLQLLRAYAVFSNEGKLTTPYLVQRETAPNGDIYIPSPKPTF
QVINPKSARKMKETLIKVVRYGTGKNAQFEGLYIGGKTGTARVAKNGSYSAQSYNSSFFGFAEDERQVFT
IGVVILGSHGKEEYYASKIAAPIFKEITEILVRYNYLSPSIAIQNALEKNRFKIK

>Sample-7
MSRRKPASGGLAASSSAPARQAVLSRFFQSTGSLKSTSSSTGAADQVDPGAAAAAAAAAAAAPPAPPAPA
FPPQLPPHIATEIDRRKKRPLENDGPVKKKVKKVQQKEGGSDLGMSGNSEPKKCLRTRNVSKSLEKLKEF
CCDSALPQSRVQTESLQERFAVLPKCTDFDDISLLHAKNAVSSEDSKRQINQKDTTLFDLSQFGSSNTSH
ENLQKTASKSANKRSKSIYTPLELQYIEMKQQHKDAVLCVECGYKYRFFGEDAEIAARELNIYCHLDHNF
MTASIPTHRLFVHVRRLVAKGYKVGVVKQTETAALKAIGDNRSSLFSRKLTALYTKSTLIGEDVNPLIKL
DDAVNVDEIMTDTSTSYLLCISENKENVRDKKKGNIFIGIVGVQPATGEVVFDSFQDSASRSELETRMSS
LQPVELLLPSALSEQTEALIHRATSVSVQDDRIRVERMDNIYFEYSHAFQAVTEFYAKDTVDIKGSQIIS
GIVNLEKPVICSLAAIIKYLKEFNLEKMLSKPENFKQLSSKMEFMTINGTTLRNLEILQNQTDMKTKGSL
LWVLDHTKTSFGRRKLKKWVTQPLLKLREINARLDAVSEVLHSESSVFGQIENHLRKLPDIERGLCSIYH
KKCSTQEFFLIVKTLYHLKSEFQAIIPAVNSHIQSDLLRTVILEIPELLSPVEHYLKILNEQAAKVGDKT
ELFKDLSDFPLIKKRKDEIQGVIDEIRMHLQEIRKILKNPSAQYVTVSGQEFMIEIKNSAVSCIPTDWVK
VGSTKAVSRFHSPFIVENYRHLNQLREQLVLDCSAEWLDFLEKFSEHYHSLCKAVHHLATVDCIFSLAKV
AKQGDYCRPTVQEERKIVIKNGRHPVIDVLLGEQDQYVPNNTDLSEDSERVMIITGPNMGGKSSYIKQVA
LITIMAQIGSYVPAEEATIGIVDGIFTRMGAADNIYKGQSTFMEELTDTAEIIRKATSQSLVILDELGRG
TSTHDGIAIAYATLEYFIRDVKSLTLFVTHYPPVCELEKNYSHQVGNYHMGFLVSEDESKLDPGAAEQVP
DFVTFLYQITRGIAARSYGLNVAKLADVPGEILKKAAHKSKELEGLINTKRKRLKYFAKLWTMHNAQDLQ

>Sample-8
MEYTYQYSWIIPFIPLPVPMLIGVGLLLFPTATKNLRRMWAFPSIFLLSIVMILSVYLSIQQINRSFIYQ
YVWSWTINNDFSLEFGHLIDPLTSIMLILITTVGILVLFYSDNYMSHDQGYLRFFAYMSFFNTSMLGLVT
SSNLIQIYIFWELVGMCSYLLIGFWFTRPSAATACQKAFVTNRVGDFGLLLGILGLYWITGSFEFRDLFQ
ILNNLIYNNEVPFLFLTLCAFLLFAGAVAKSAQFPLHVWLPDAMEGPTPISALIHAATMVAAGIFLVARL
LPLFIIIPYIMNLISLIGIITVLLGATLALAQKDIKRGLAYSTMSQLGYMMLALGMGSYRAALFHLITHA
YSKALLFLGSGSIIHSMEAIVGYSPDKSQNMVLMGGLKKHVPITKTAFLVGTLSLCGIPPLACFWSKDEI
LNDSWLYSPIFAIIACSTAGLTAFYMFRIYLLTFEGHFNVHFQNYNGQKSSSCYSISLWGKEVPKTIKNH
FCLLSLLTMNNNERASFFSNKTYQIDGNGKNRIHPFITITNFVTKNTFSYPHESDNTMLFSIVILVIFTL
FVGVVGIPFAFNQEEIHLDILSKLLNPSINLLHPNSNNSVDWYEFVTNASFSVSIAFFGIFIASFLYKPI
YSSLQNLNLLNSFSKRGPNRILGDRIRNGIYDWSYNRGYIDAFYTISLTQGIRGLAELIHFLDRRVIDGI
TNGFGLTSFFFGEGIKYVGGGRISSYLLLYLLFVLIFLLIYSFLFFF
2. For an unknown sequence

>Sample
MYGGENREKRTKASRPTKDSIITRREAQSTSPHVTFVCDSEGAETSVRHSKSSDVHCGGVRLFSDETVNA
VVPNSTPVESFNGAGANYWRNMDNMVVDRLSLSMDDISVMRLRGCRATGLSQGCCGASVSTSYVLPPSLY
ASPFEQLLDIGALRGCYSYHDTGNTILNGGEDSVDNLAAAAAAVDATVTMHDVGVEMANDNDKNNNIHDD
GDTPCGVRGDRGVQTPGLKLGCAPRIFSEALSSLHLENHDNLDAMISQRPGKNAVTPPASSRPSTTSSKN
HTPAFQPFSSWKFPVLGKVDSAPAVSLQRADLIGEGEKGAWHNGFQKEVNAAAAAGGGGGGGIPGARCGA
VNCSDNGDRCGYGAGGDDDDGDNDKSVSLLEGQEYQGYKKRLRFMYAIYERHALQEGRINNNINISQRDT
NRNGSNALALHTSLQCPSPTFTTTWVPSGYYSLGTRCSIHHPNKQVVPVVSPLLNSLMSRQRDECPRSCT
VVMDPSIVALIERRPVLQTTIFASHTYRQLRRQIKQQKLQSSGERGYGPDATPFLPHVEDSTRQQDMCSG
GVAGGISNVAAREKSPLKKLWATERARRLNSKVATGTTPVAAATVAAGETSSAEPAAVPLMSREEPPNLV
HHRVLTQVNSWNSKVHTIDGINRQVDNEADDLVVYVGMTLMGWLEVVDLLGAGTFGQVFLCKDLRIANGC
FMHPMEIEGEDFQYWQCSHEYIPFSDPSIMPTHPSLVAVKVVKSRALFEQQSVLEAEMLVCIGAQTPSQK
DHGPLQNEGFGAAVHTTEPPQVDPRCNYVAKVYAHGICYGHHCIVMERYGANLFEYVQSRGFKGLPMYYI
QTIGKKILLALTLLHDECRVVHCDIKPENVLLTLDSCISTVTIHGSGGPVGSNGSGAKATLEASGVLLAS
SVRKPCLSTRLEASMSNTIDVPLPAPLPLRLHRAVPLEKVHDKTRRGETNTDGEGIGDGGPREVPSGSVI
PPLHIKLIDFSSSAYVGGCVYTYVQSRYYRAPEVIIGAGYGPPIDVWSTGCFLAELLLGLPLLPGSCDYH
QLYLMEEMLGPLPTSLLAQGRLTHDYYDAEDAEPERESTASGSSTLLKTKTKGQSSFRLLREEEYRARHG
QKQPVEWRCYFQYHTLAELVRRCMLTAEEKRMAIGCSPIASVGDISEEEVEQQKPIKTILDEMMQQRLWL
YDLLKKMLHGDPSKRPTAREALAHSFFTHTPEYAKPYLPLPE

i. Perform BLASTp to identify the organism. Go to the GenPept page (Take a snapshot)
ii. Mention the accession number, locus, definition, sequence length, name of the
organism, the taxonomy, title, name of the journal, BioProject, BioSample (if
available).
iii. Now for the same, Repeat BLASTp for algorithm PAM 30 and PAM 250 and mention
3 scientific names each of distant and closely related species for this particular
organism.
Answer 1:
Sequence 1 :-

 Accession Number: XP_028026898


 Locus: XP_028026898
 definition: cellular tumor antigen p53 [Bombyx mandarina].
 Sequence Length: 368 aa
 Name Of The Organism: Bombyx mandarina
 The Taxonomy: Eukaryota ; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta;
o Pterygota; Neoptera; Endopterygota; Lepidoptera; Glossata;
o Ditrysia; Bombycoidea; Bombycidae; Bombycinae; Bombyx.
 Title : A draft genome for the African crocodilian trypanosome Trypanosoma grayi
Sequence 2:
 Accession number, NP_001338791 XP_016873932
 Locus :NP_001338791
 Definition: liprin-beta-2 isoform 1 [Homo sapiens].
 Sequence length : 876 aa
 Name of the organism, Homo sapiens

 The taxonomy: Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;


Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
 Name of the journal :-Mol Syst Biol 18 (1), e10584 (2022)

Sequence 3:

 Accession number : XP_030662077


 Locus : XP_030662077
 Definition :- fibroblast growth factor receptor 2 isoform X9 [Nomascus leucogenys].
 Sequence length, 717 aa
 Name of the organism, Nomascus leucogenys
 The taxonomy, Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
o Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
o Catarrhini; Hylobatidae; Nomascus.
Sequence 4

 Accession number : MBN3310676


 Locus MBN3310676
 Definition P53 protein, partial [Amia calva].
 Sequence length 321 aa
 Name of the organism, Amia calva
 The taxonomy, Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Actinopterygii; Neopterygii; Holostei; Amiiformes; Amiidae; Amia.
 Title, P53 protein, partial [Amia calva]
 Name of the journal: Cell (2021) In press

Sequence 5:

 Accession number, XP_016802591


 Locus, XP_016802591
 Definition, RNA-binding protein 12 isoform X4 [Pan troglodytes].
 Sequence length, 1466 aa
 Name of the organism, Pan troglodytesthe
 Taxonomy, Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae;
 Title, RNA-binding protein 12 isoform X4 [Pan troglodytes]
Sequence 6:

 Accession number, WP_202170786


 Locus, WP_202170786
 Definition, penicillin-binding protein 2 [Helicobacter pylori].
 Sequence length, 615 aa
 Name of the organism, Helicobacter pylori
 The taxonomy, Bacteria; Campylobacterota; Epsilonproteobacteria;
Campylobacterales; Helicobacteraceae; Helicobacter.
Sequence 7:
 Accession number, NP_002430
 Locus, NP_002430
 Definition, DNA mismatch repair protein Msh3 [Homo sapiens].
 Sequence length, 1137 aa
 Name of the organism, Homo sapiens
 The taxonomy, Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
 Title, name of the journal, Fam Cancer 22 (1), 49-54 (2023)
Sequence 8:

 Accession number, YP_010226126


 Locus, YP_010226126
 Definition, NADH dehydrogenase subunit 5 (chloroplast) [Eucalyptus albopurpurea].
 Sequence length, 753 aa
 Name of the organism, Eucalyptus albopurpurea
 The taxonomy, Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae; Pentapetalae; rosids;
malvids; Myrtales; Myrtaceae; Myrtoideae; Eucalypteae; Eucalyptus.
 Name of the journal, Tree Genet Genomes 17 (6) (2021) In press
Answer 2:
i.

ii.
 Accession number: XP_009309175.1
 Locus: XP_009309175
 Definition: protein kinase [Trypanosoma grayi]
 Sequence length: 1232 aa
 Name of the organism: Trypanosoma grayi
 The taxonomy: Eukaryota; Discoba; Euglenozoa; Kinetoplastea; Metakinetoplastina;
o Trypanosomatida; Trypanosomatidae; Trypanosoma.
 Title: protein kinase [Trypanosoma grayi]
 Name of the journal: National Center for Biotechnology
o Information, NIH, Bethesda, MD 20894, USA
 BioProject: PRJNA258390
 BioSample: SAMN02726834

i. PAM30
 CLOSELY:
DISTANT:
1. Trypanosoma theileri
2. Trypanosoma cruzi 1. Mortierella sp. AM989
3. Trypanosoma conorhini 2. Mortierella sp. AD032
3. Bodo saltans
 PAM250
CLOSELY:
DISTANT:
1. Trypanosoma theileri
2. Trypanosoma cruzi 1. Micromonas pusilla CCMP1545
3. Trypanosoma melophagium 2. Amborella trichopoda
3. Mortierella antarctica

You might also like