You are on page 1of 22

DOT PLOT

What is Dot Plot?


 In bioinformatics a dot plot is a graphical method for comparing two biological sequences
and identifying regions of close similarity.

 A dot plot is a graphical display of data using dots.

 One way to visualize the similarity between two protein or nucleic acid sequences is to use a
similarity matrix, known as a dot plot.

 These were introduced by Gibbs and McIntyre in 1970 and are two-dimensional matrices that
have the sequences of the proteins being compared along the vertical and horizontal axes.

 For a simple visual representation of the similarity between two sequences, individual cells in
the matrix can be shaded black if residues are identical, so that matching sequence segments
appear as runs of diagonal lines across the matrix.
Dot Plot
 Dot plots are a type of graphical display that can be used to show a data distribution.

 They are used for univariate data when the variable is categorical or quantitative.

 Because individual dots must be drawn for each value in the data, dot plots are ideally
used for small sets of data.

 Dot plot are two dimensional graphs, showing a comparision of two sequences. The
principle used to generate the dot plot is: The top X and the left y axes of a rectangular
array are used to represent the two sequences to be compared.

 Calculation: Matrix
• Columns = residues of sequence 1
• Rows = residues of sequence 2.
 A dot is plotted at every co-ordinate where there is similarity between the bases.
Dot Plot

Sub diagonal

Forward Sub diagonal

Backward Sub diagonal

Principal Diagonal
Dot Plot
Few Software to create dot plots
•D-Genies– Specializes in interactive whole genome dot plots of large genomes
•Dotlet – Provides a program allowing you to construct a dot plot with your own sequences.
•dotmatcher– Web tool to generate dot plots (and part of the EMBOSS suite).
•Dotplot – easy (educational) HTML5 tool to generate dot plots from RNA sequences.
•dotplot – R package to rapidly generate dot plots as either traditional or ggplot graphics.
•Dotter– Stand alone program to generate dot plots.
•JDotter – Java version of Dotter.
•Flexidot – Customizable and ambiguity-aware dotplot suite for aesthetics, batch analyses and
printing (implemented in Python).
•Gepard – Dot plot tool suitable for even genome scale.
•Genomdiff – An open source Java dot plot program for viruses.
•LAST for whole-genome “split-alignment”.
Sequential Alignment
• In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or
protein to identify regions of similarity that may be a consequence of functional, structural, or
evolutionary relationships between the sequences.

• What “similarities” are being detected will depend on the goals of the particular alignment
process. Sequence alignment appears to be extremely useful in a number of bioinformatics
applications.

• Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a
matrix. Gaps are inserted between the residues so that residues with identical or similar characters
are aligned in successive columns.

• The sequence alignment is made between a known sequence and an unknown sequence or between
two unknown sequence.

• The known sequence is known as reference(target) sequence and unknown sequence is known as
query sequence.
Biological interpretation of an alignment

A trace can represent a substitution:

AKVAIL
AKIAIL

A trace can represent a deletion:

VCGMD
VCG-D

A trace can represent a insertion:

GS-K
GSGK
represent evolutionary deletions or
insertions in a DNA Sequence
alignment
Types of Alignment

Local Alignment
Global Alignment
Semi Global .
Types of Alignment
 Types: Local Alignment and Global Alignment, Semi Global etc.

 Global alignments, which attempt to align every residue in every sequence, are most useful
when the sequences in the query set are similar and of roughly equal size. (This does not mean
global alignments cannot end in gaps.)

 A general global alignment technique is called the Needleman-Wunsch algorithm and is based
on dynamic programming.

 Local alignments are more useful for dissimilar sequences that are suspected to contain regions
of similarity or similar sequence motifs within their larger sequence context.

 The Smith-Waterman algorithm is a general local alignment method also based on dynamic
programming. With sufficiently similar sequences, there is no difference between local and
global alignments.

You might also like