You are on page 1of 3

FACULTY OF BIOTECHNOLOGY AND BIOMOLECULAR SCIENCE

BACHELOR OF SCIENCE IN BIOCHEMISTRY WITH HONOURS


BSM4301 - BIOINFORMATIC

SEMESTER 2 2022/2023

LAB REPORT 2

NAME RANI ANAK MAT

MATRIC NUMBER 212111

4
PRACTICAL

TITLE ADVANCED BLAST

LECTURER’S NAME PROF. MADYA DR. MAS JAFFRI BIN MASARUDIN

DATE 18.05.2023
1.0 OBJECTIVES

2. To determine the distantly related sequences.


3. To utilize a combination of pattern matching and local alignment to reduce the
probability of false positives.

2.0 RESULTS

2.1 POSITION-SPECIFIC ITERATED BLAST (PSI-BLAST)

Figure 1 shows the five iterations taken to remove all false positives

2.2 PATTERN-HIT INITIATED BLAST (PHI-BLAST)

IIAHSQG [IL]I[AG]HS[QMH]G

Figure 2 shows the segment of alignment containing PHI pattern

3.0 DISCUSSION

A position-specific scoring matrix (PSSM) or profile is generated by PSI-BLAST (Position-


Specific Iterative Basic Local Alignment Search Tool) from the multiple sequence alignment
of sequences that are found to be greater than a specified score threshold. This PSSM is updated
for subsequent rounds with the newly discovered sequences and is used to explore the database
further for additional matches (Bhagwat & Aravind, 2007). Consequently, PSI-BLAST offers
a method for identifying far-reaching connections between proteins. In this practical 4.1, PSI-
BLAST was used to discover proteins that were distantly related to the target protein, retinol-
binding protein (NP_000509). In order to determine a profile or position-specific score matrix
(PSSM), it first creates multiple alignments using the couples with the highest scores from the
BLASTp run that are greater than a predetermined score or e-value threshold. The PSSM
encodes the conservation pattern as a matrix of scores for each position in the alignment, giving
highly conserved locations high values and weakly conserved site scores that are near to zero.
Sequences having 94% identity in pairwise alignment that are redundant are removed after
iteration. Each iteration has its own score matrix built just for it. Sequences that are more
specific to the Retinol-Binding Protein have E-values over the threshold. Sequences with E-
values below the cutoff have a higher chance of producing false positive results. The E-value
and likelihood of false positives decrease as the iteration number increases. The erroneous
amplification of sequences unrelated to the sequence is the primary cause of false positives.
The Retinol-Binding Protein sequence underwent five rounds, and the results show that no
additional sequences were discovered over the 0.005 thresholds, confirming no corruption. The
sequence is damaged if, after five rounds, there is at least one false positive with an E-value
less than 10-4. The biassed composition regions filter can be used to prevent corruption, the E-
value can be changed from 0.005 to a lower value, and suspicious hits can be eliminated by
unchecking the box next to them.

In practical 4.2, PHI-BLAST was used searches protein sequences using a combination of
pattern matching and local alignment to reduce the probability of false positives. PHI-BLAST
searches protein sequences for the existence of regular expression patterns as well as
homologous sequences close to the pattern. It combines local alignments close to the match
with regular expressions that match. PHI-BLAST distinguishes the genuine homologous
sequences from the random hits more successfully than earlier BLAST iterations. The pattern
syntax follows the PROSITE rules. In order to locate protein sequences that are homologous
to a sequence containing this pattern, the PHI pattern "[IL]I[AG]HS[QMH]G" was employed.
A variety of amino acids are indicated by the brackets ([]). [IL] denotes a single instance of
either I or L. Similarly, [AG], [HS], and [QMH] are applicable.

4.0 CONCLUSION

In conclusion, multiple sequence alignment is built using PSI-BLAST, and the resulting
position-specific scoring matrix (PSSM) is customized. As a result, since a different scoring
matrix is created for each iteration, each iteration is based on the PSSM. PHI-BLAST was used
to look for similar sequences based on the distribution of the amino acids.

5.0 REFERENCES

Bhagwat M, Aravind L. PSI-BLAST Tutorial. In: Bergman NH, editor. Comparative


Genomics: Volumes 1 and 2. Totowa (NJ): Humana Press; 2007. Chapter 10. Available from:
https://www.ncbi.nlm.nih.gov/books/NBK2590/

PHI-BLAST. (n.d.). http://www.vardb.org/vardb/blast/phiblast.html

You might also like