Professional Documents
Culture Documents
8-1
Chapter 8 BLASTX Although the DNA sequences in Fig 8-1 show only 20% identity, the amino acid sequences are 100% identical. A DNA match this poor is usually not considered to be significant and would never be picked up in the BLASTN search. However, this match would be significant in an alignment of protein sequences. We therefore need to perform a search to determine if there are protein sequences that are similar to the protein sequences coded for by your DNA. iii) Which protein sequence should you use? To perform the search we will need to determine the protein sequence that is coded for by your DNA. However, remember that there are 6 possible reading frames for your DNA: the three reading frames on the top strand in the 5 to 3 direction as your DNA is written (the +1, +2, and +3 reading frames in green) and the three reading frames coded by the bottom strand in the other direction (-1, -2, and -3) (Fig 8-2). It may be possible to determine which direction (+ or -) is correct. For example, if there is a poly-A at the end of the SP sequence then the correct open reading frame must be +1, +2 or +3. The sequence would be the non-template strand of DNA (Fig 8-3). In contrast, if the sequence begins with the poly-T (corresponding to a poly-A sequence on the blue DNA strand) then the correct open reading frame must be -1, -2 or -3 (Fig 8-4). The red sequence would be the template strand of the gene.
Fig 8-3. Possible direction of the ORF if the DNA sequence ends with a run of poly-As.
Fig 8-4. Possible direction of the ORF if the DNA sequence begins with a run of poly-Ts.
If there is no poly-A then the open reading frame could be in either direction. You would then have to look for a long open reading frame among all six possible reading frames. However, if your clone did not contain the complete coding region of a gene and was mainly the 3 UTR, then it may only contain a short sequence of the real open reading frame and would therefore not be identified in the ORF analysis. Because of these potential problems, we suggest that you perform a search with the protein sequences derived from all six reading frames of your cDNA. Although this sounds like a lot of work, it is actually very simple using the blastx program, which takes a DNA sequence and determines the sequences of the six reading frames then uses these sequences to search the protein databases.
8-2
Chapter 8 BLASTX
Fig 8-7 Graphic view of BLASTX with EX2 Why are there so many good matches with the EX2 blastx search when there were only a few with the blastn search? The answer is because of the degeneracy of the amino acid 8-3
Chapter 8 BLASTX code. Although the base pair sequence of a gene may change through divergence of two species, if there are strong requirements for a particular amino acid at each position in the protein, then the protein sequence will stay the same. The fact that there were so many strong matches indicates that the protein sequence is conserved between different species even though the DNA sequence is not. Scrolling further down the EX1 BLASTX report shows the list of matches (Fig 8-8). The first three matches are to protein sequences from Sphaerius sp., Dascillus cervinus, and Carabus Fig 8-8. List of BLASTX EX1 matches granulatus, which are all beetles. The sequences are from the same study that determined the Sphaerius sp. DNA sequence in the BLASTN search. The fourth sequence is from a different study, but is from a gene from Tribolium castaneum, commonly referred to as the Red flour beetle. This information indicates that the Artemia gene is most closely related to genes from beetles. Scanning further down the list shows matches to other insects such as Papilio dardanus (African swallowtail butterfly), Drosophila melanogaster (fruit fly), Bombyx mori (domestic silkworm). Further down the list are matches to Xenopus laevis (African clawed frog), Mus musculus (house mouse), Pongo pygmaeus (orangutan), and Equus caballus (horse). All of these matches show relatively low E-values (less than 3e-13), suggesting that they are significant. This result indicates that the protein coded by the gene we isolated is strongly conserved in a wide range of organisms. Examining the alignments to the predicted Artemia sequence with the Sphaerius protein sequence show they are both 57% identical and 77% similar (conserved amino acids) in sequence (Fig 8-9). Similarity indicates positions with similar chemical properties, such as Glutamate-E vs Aspartate-D, or Alanine-A vs Valine V, Isoleucine I vs Leucine L, etc.
>gi|69608657|emb|CAJ01895.1| ubiquitin/ribosomal protein S30e fusion protein [Sphaerius sp. APV-2005] Length=131 Score = 103 bits (258), Expect = 2e-21 Identities = 68/119 (57%), Positives = 92/119 (77%), Gaps = 3/119 (2%) Frame = +3 Query Sbjct Query Sbjct 63 1 243 59 IMQIHLRGSDSSTQVINCDEGDCVIALKEQVAALEGVKVSEVRLFANGTPLTEDIPLNGI ++Q+H+RG S V++C+ + + +K+++AALE VK ++ L+A GTP+ +D ++ MIQLHIRGQ--SQHVLDCNGDEKIGQIKDRIAALENVKAKDICLYAEGTPVEDDSVVSAF QDT-IDFSVPLLGGKVHGSLARAGKVKGQTPkvdkqekkkkktgrckRRIQYNRRFVNV +D ++PLLGGKVHGSLARAGKVK QTPKV+KQEKKKKKTGR KRRIQYNRRFVNV ASVDLDLNIPLLGGKVHGSLARAGKVKQQTPKVEKQEKKKKKTGRAKRRIQYNRRFVNV 242 58 416 117
Fig 8-9. Alignment of the best BLASTX match to EX1. + indicates similar amino acids at that position in the two sequences. The gray sequence indicates low sequence complexity. The gray sequence of kvdkqekkkkktgrck in the Query indicates the sequence was removed from the analysis because of low sequence complexity. Repeating the search 8-4
Chapter 8 BLASTX without the low complexity filter gives similar matches. However, the E-value for this match is 3e-31 instead of 2e-21. Notice how there is a one-residue gap in the alignment of the predicted Aretmia sequence. Unlike in DNA alignments, these gaps do not strongly decrease the significance of the alignments. Many times these gaps are in flexible loops that are on the surface of the protein. The addition or loss of these loops may not significantly alter the overall fold or function of the protein. To find out more information about the protein, connect to the Accession link, which brings you to the report page on the sequence file. This lists a number of papers that have investigated the function of this protein. We will discuss more about this and other information on the protein in Chapter 10.
8-5