This action might not be possible to undo. Are you sure you want to continue?
For the more complicated becomesmuch more hits I f t h e E - v a | u e i s b e t w e e n l X l 0 - 2 a n dThis maytindicate i g h t l , t h e h i h a s a s |a f"*f +, the lengthof BLAST ielated to,the query' or poiriOiritv Oeing important' that iJationship' In order to conclude distantevolutionaTy t h e q u e r y S e q u e n c e w a s e i t h e r t e r r e s t r i a I o r e x t r a - t e r r e s t r i a | , I f t h e q u e r y S e q u e n c e i s s h o r t ( | e s s t h a n l 0 0 n u c l e1 Xi d e s o r ot may be l ar gert han wou l db e re q u i re d ' tr.n mo r e analy s is ami noi ci Oi l ong),the top E -val ues 10-50evenifthereisanexactmatch.Besureandcheckthe very closelyrelatedto oZo If the E-valueis above 1, the hit is-not identityof the top hits, not just the E-values' i n th i s ra n geare un Vi "q u " nc e in t he data b a s e 'If E-v a l u e s th e s a m pl e th a t to that only have similarity shott in o b ta i n e d t he g; r " , it c a n b e c o n c l u d e d origin' This Hits with low E-values of extra-terrestrial is sequence very bottiOfV regionsofthequerySequenceare.morelikelytoindicatethat matchesare found at domainsimilarities can have motif or functional conclusion atJobe made when no tnE sequences relatedgenesor proteins' all. ratherthan that they represent rnisisvery|ike|ythecasewhena||ofthematchesarefrom T . F o r t h e s e a r c h p e r f o r m e d i n t h i s t u t os ia n| etxh e r matchs g i v e a , " q u " n . " ' ' w h o s e n a m e s a n d d e s c r i p t i o n s d o n o t s e e m t o r a , a ct e s u | t 1 0 -1 3 2w h i c hi o that the hits are in any way relatedto eachother. to p h i t w it h an e- J at ue f 1 X inoicate exact and closely query sequence' Sometimes to the 10- 50t o 1 X of i maicnebhits witl have E-values 0' H i tsw i th hi gherE -val ues,n the rangesof 1 X that the query and hits are relatedif indicate iO_S, mayltitt givesa pairwise 8. The last largesectionof the results, t h " n i t h a s a t l e a s t a 3 5 o / o i d e n t i t y w i t h t h e q u eif y o v e r a t r i s i mi l a rre g i o n s n each is several a of of of a l i g n m ent t he quer ys e q u e n c e n d least80o/o its r"ngtn. Anotherindication this of the hit that they numberand name that indicate accession hit. After the database hits have names unlilo, descriptions of percentage be studied tfrissectiongivesthe score' E-value' relatedto eachoihu.. Theseshoulddefinitely sequence, are and for proteinsthe tnit are identical basesor nucteotiJes further. similar(positives)'The [ei-ntaSe that are cnemicallyif the query has an.exact sectionis usefulin O"t"ttnining 2.A|ways|ookatthenames/descriptionsofthehitsreturned of the first hit match in the database. If the percentidentity sample b y B L A S T . T h e h i g h e r t h e p e r c e n . t a g e o f h i t squery e e m t o b e t h a t s is game the and the query t"q-r"*" is 10O%then i"tutuo to eachotier, the more likely it is that the is definitely is only I"qu"n.uj is ih tnJ Jatabaseand the sample aiso retatedto them. But remember,if the similarity contamination' this is more likelyto indicaterelated regions, terrestrial in small functionaIdomainsratherthanre|atedproteins/gene g . T h e v e r y e n d o f t h e r e s u | t r e p o r t g i v e s i n f o r m a t i o n o n t h e pioOucts.If only a small regionof a proteinhas some used' the numberof it is a in to scoringmatrix unOgup penalties simrfarity a few sequences the database' a so on' evolution ity poitiOif that this resultedfrom convergent t"q u "n . ut quer iedin th e d a ta b a s e n d for evolvingsimilarstructures similar sequences Iunrelated This ano tne query might still be extra-terrestrial' l 0 . T h e q u e r y r u n f o r t h i s t u t o r i a I r e s u | t s i n a | a r g e n u m b e r o f functions) resullwould iequire much more study before f u l l | e n g t h a n o r r e a r f u | | | e n g t h r e d a n d p i n thei top h o w n i n t h e tvpe or alAsr kh tss overview. The pJirwisealignmentwith coul dbe reached' graphical any concl usi on with the query sequence. identity i.oiing hit showsa 10oo/o is a above that the query sequence This leadsto tlre conclusion 3. The blastnsearchperformedon the sequence Bacilluscereusbeta-|actamasepresentinthenrdatabase.If resultsinSomehitswithmid-rangeE-valuesandtherestwith g0o/o game' terrestrial identityto more these results*u." obtainedin the hi;h;r ones. The top hits have over all A.nd, of the wouldb e e v i d e n t' co n ta m inat ion of tn-ataoozo the tengthof the query sequence. with many of caicytochromeb sequences' top matchesare theremainingmatchescomingfromcytochromebsequences ofothermamma|ianspecies'Eventhoughthereisnoexact matchinthedatabase,theseresultsindicatethatthesample More Advanced Information ,"quun.uisaImostsure|yterrestria|.otheravai|ab|etoo|s so as not to shouldbeusedtoana|ysethesequencetobui|dabodyof The information in this section is separated for evi supporti ng dence thi s concl usi on' confusethebeginner,butwillbeusefulandevennecessary
to the PlaYersin level 4' Performing a Search l.GothetheBLASTpageagain'Thistimechoosethelinkto and run a blastn the standard nucleotioe-nucteotideBLAST llowing n r d a t a b a s e w i t h t h e f o'Choose D N A againstthe ;;;;h filter' box the t"qu"n.". This time, however, 19que c h e c k e d s o t h a t t h e s e q u e n c e w i | | b e f i l t e r e d f o r | o w at parameters .otpf""ity regions. Leave all of the other defau|tasabove'Thefi|teroptionensuresthatnofa|se po'itiuuresultsareobtainedduetoshortSequencesthatare Verycommonacrossthespectrumofbiologicalsequences. is filtered' Be aware, however, that if the sequence sometimesregionsareX'edoutandanexactmatchSequence inthedatabasemayshowon|yaroundg5o/oidentityeven though it is the same sequence'
Thefivef|avorsofBLASTperformthefo||owingtasks: a against an bl astp compares ami noaci dquery sequence database; proteinsequence againsta query sequence a blastn compares nucleotide database; sequence nucleotide b|astxcomparesthesix-frameconceptua|trans|ation (both strands) query sequence froducts of a nucleotide database; proteinsequence againsta a aga.inst a tblastn compares proteinquery sequence in translated all six dynamically database sequence nucleotide readingframes(both strands)'
of the six-frametranslations a nucleotide tblastx compares querysequenceagainstthesix-frametrans-|ationsofa >unknown DNA database' sequence CAGTCTAGTTCAAACTTACAAATCCTCAGAGTccTcTTTTcGGCcATAcAcTT nucleotide z .... CACATCGGAAACATTAA
HOW tO BLAST Performing a Search 1. Go to the NCBI BI-ASTpage'This page lists all of the BLAST related tools/programs available from NCBI. For this exercise click on the link to the standard protein-protein BLAST [blastp]. Please look over the page to become familiar with the information available and the data required for a search. 2. There are quite a few links on this page to information about BI-ASTand to explanations of the data required or the program settings. These links are a good resource and should be reviewed. 3. Paste the following FASTAformat of the sequence into the large data entrY field.
Interpreting Results resultspage beginswith the programversion 1. The BLAST the for used,the reference BI-AST, name and lengthof the and contact query sequence, database searched, the so information.It is best to pay attentionto this information and program you will know that the properQu€Y, database were run. 2. Next can be foundthe numberof hits on the query in that (i.e. the numberof sequences the database sequence and a graphical to have some similarity the query sequence) of overviewof the alignment the hits to the query.
DLQSWN KAHSTALTAELAKKNGYEEPLG KTLKERGI WDVIITHAHADRIGGM IVVWLPQYQILAGGCLVKSASSKDLGNVADA KG M LKFGN KVETFYPG HTEDN LLLHTLDLLK DRG HG NIN NVLKRYG LVVPG EVG YVN EWSTSIE 4. The boxes immediately below the sequence entry box allow the selection of only a part of the entered sequence as the query for the search. For the purposes of this game' nothing should be entered in these fields. 5. The next field is a drop menu that allows the selection of the database to be searched. The non-redundant database (nr) is the default setting, but the new entries for the last month, EST's, a specific organism, or many others may also be selected. Leave the choice as nr for the game because it is the largest and most comprehensive database for BLASTto search. 6. The next check box allows the search of a conserved domains database. You may leave this checked, but the g a m e w i l l n o t u s e t h i s f u n c t i o n . D o m a i n s e a r c h e sw i l l b e performed using BLOCKSand ProfileScan' 7. For basic searches, the only other option that should be c h a n g e d i s t h e f i l t e r i n g . U n c h e c kt h e b o x u n d e r t h e n e x t 'Choose filter.' For game section (Options) next to the words purposes, the sequences should not normally be filtered. For information on situations when this box should be checked (only sometimes in levels 3 and 4), see the "More Advanced Information" section below. 8. Click on the "BLAST!" button to run the blastp search. 9. A new page will appear with the ID number of the search and the approximatewait time, Clickon the "Format!" button and wait. The results will be returned when the search is complete.
overview 3. The long red line near the top of the graphical the represents lengthof the query sequence.Eachhit from by is the database represented a line belowthe query that is bar its coloredto represent score. The multicolored at the top overviewis a legendwith the differentcolors of the graphical >unknown protein KNV KVIKN ETGTISISQLN differentsimilarityscoreranges. Red is usedfor representing MKNTLLKLGVCVSLLGITPFVSTISSVQAERTVEH and so on. WVHTELGYFSGEAVPSNGLVLNTSKGLVLVDSSWDDKLTKELIEMVEKKFKKR a scoreover 200 (goodsimilarity) hits with of overviewshowsthe relativeposition 4. The graphical i si mi l arregi ons n eachhi t and the queryand how big t hese are. Movingthe mousecursorover each regionsof similarity namein t he t ext box hi t l i new i l l bri ngup that hi t sequences abovethe graphi c. overviewcan be a very usefulpreliminary 5. The graphical tool in the game to help determineif the sample(query) do originor not. However, not is sequence of terrestrial only with this sectionof the results. make your determination and If the queryhas hi tsthat showup as l ongsi milarr egions that the query is terrestrial. are red and/orpink, it is probable If the query has hits that are short and blackor blue, it is possible that the query is extra-terrestrial.If the query has shortred or pi nk hi ts or hi ts that are i n the middleof t he will than BI-AST needto be done to range,more analysis makean ori gi ndetermi nati on. of 6. The next two sections the resultsare more important, The sectionbelow alignments. they show scoringand pairwise overviewranksthe hits by score. For eachhit it the graphical in numberto the hit sequence gi vesa l i nk by the accessi on score, the the the database, name of the hit sequence, BLAST is and the Expectvalueor E-value.The E-value a statistical basedon the scorethat givesthe numberof hits calculation of this scorethat this searchwould return by chanceusinga i of database thi s si ze. I.e. If the E -val ue s 1, it is likelyt hat you wouldget one chancehit with this scoreto the query that was searched.The database usingthe particular generalconclusions be drawn from the Ecan following values: t i If the E -val ues l essthan 1 X 10-50 ,the hi t is ver y sim ilar o and is very likelyto be evolutionarily the query sequence the sequence in related. If E-values this rangeare obtained, if especially contamination, to can be assumed be terrestrial that of the names/descriptions many of the top hits indicate at the hits are relatedto eachother. However, the higher game levelswhen more tools are available use,the for with all tools to show shouldbe analyzed samplesequence for evidence the conclusion. corroborating i If the E -val ues betw een1 X 10-50and 1 X 1 0- 2,t he hit has and may be related. to some similarity the query sequence the i W henE -val uesn thi s rangeare obtai ned, gam esequence but contamination, further may be found to be terrestrial t w anal ysi s i l l be needed.Theseval uescan i ndicat ehat t he i sampl e sequences i n the samefami l yas the hit or it m ay domains. If the top hits all have closelyrelatedfunctional