Professional Documents
Culture Documents
AIMS OF BIOINFORMATICS
● Development of a database containing all biological information.
● Development of better tools for data designing, annotation and
mining.
● Design and development of drugs by using simulation software.
● Design and development of software tools for protein structure
prediction function, annotation and docking analysis.
● Creation and development of software to improve tools for
analysing sequences for their function and similarity with other
sequences.
WHAT IS A SEQUENCE?
A biological sequence is a single, continuous molecule of nucleic acid or
protein. It can be thought of as a multiple inheritance class hierarchy.
One hierarchy is that of the underlying molecule type: DNA, RNA, or
protein.
SEQUENCING A GENOME
Most genomes are enormous(e.g10^10 Base pair in case of humans).
Current Sequencing technology, on the other hand, Only allows
biologists to determine ~10^3 Base pairs at a time.
This leads to some very interesting problems in bioinformatics.
What is a database?
• A collection of related data elements
• tables
• columns (fields)
• rows (records)
• Records retrieved using a query language
• Database technology is well established
For example, a protein database would have protein entries as records
and protein properties as fields (e.g., name of protein, length, amino-acid
sequence).
Derivative database
• Expressed Sequences
• dbSNP
• Structure
• Gene
Format
• ASN.1
• Flat Files
• DNA
• Protein
• FASTA
• DNA
• Protein
FASTA format
In bioinformatics and biochemistry, the FASTA format is a text-based
format for representing either nucleotide sequences or amino acid
(protein) sequences, in which nucleotides or amino acids are
represented using single-letter codes. The format also allows for
sequence names and comments to precede the sequences. The format
originates from the FASTA software package, but has now become a
near universal standard in the field of bioinformatics.
Sequence alignment
Alignment: The process of lining up two more sequences to achieve
maximum levels of identity (and conservation, in the case of amino acid
sequences) for the purpose of assessing the degree of similarity and the
possibility of homology.
Program Description
BLAST ALGORITHM
• Scoring of matches done using scoring matrices
• Sequences are split into words (default n=3)
• Speed, computational efficiency
• BLAST algorithm extends the initial “seed” hit into an HSP Word hits are
extended in either direction in an attempt to generate an alignment with
a score exceeding the threshold of "S"
• HSP = high scoring segment pair = Local optimal alignment
Why BLAST is great
• Very fast and can be used to search extremely large databases
• Sufficiently sensitive and selective for most purposes
• Robust - the default parameters can usually be used
MULTIPLE SEQUENCE ALIGNMENT (MSA)
Applications of MSA
some of the applications of MSA are:
1.Preliminary step in molecular evolution analysis using Phylogenetic
methods for constructing phylogenetic trees.
2.Help prediction of the secondary and tertiary structures of new
sequences;
3.the identification of modules or domains or motifs in multimodular
protein
4.In order to characterise protein families, identify shared regions of
homology in a multiple sequence alignment; (this happens generally
when a sequence search reveals homologies to several sequences)
5. Determination of the consensus sequence of several aligned
sequences.
6.the detection of weak similarities in databases using profiles
7.the design of PCR primers for related genes
APPLICATIONS OF BIOINFORMATICS
Bioinformatics is widely applied in the examination of Genomics,
Proteomics, 3D structure modelling of proteins, Image analysis, Drug
designing and a lot more. A significant application of bioinformatics can
be found in the fields of precision and preventive medicines, which are
mainly focused on developing measures to prevent, control and cure
dreadful infectious diseases.
The main aim of Bioinformatics is to increase the understanding of
biological processes.
● In Gene therapy.
● In Evolutionary studies.
● In Microbial applications.
● In Prediction of Protein Structure.
● For the Storage and Retrieval of Data.
● In the field of medicine, used in the discovery of new drugs.
● In Biometrical Analysis for identification and access control for
improvising crop management, crop production and pest control.
REFERENCES
• http://www.ncbi.nlm.nih.gov
• http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html
• http://www.ncbi.nlm.nih.gov/Genbank/