Bioinformatics: Farhan Haq, PHD Department of Biosciences Cui

Bioinformatics
Farhan Haq, PhD

Department of Biosciences
CUI
Bioinformatics
• The term was introduced in 1970s by Paulien
Hogeweg
– Study biotic systems
• Information processing in various forms

Brief History
• 1953 - Watson & Crick proposed the double helix
model for DNA based x-ray data.
• 1955 - The sequence of the first protein to be
analyzed, bovine insulin, is announced by F.Sanger.
• 1969 - The ARPANET is created by linking computers
at Stanford and UCLA.
• 1970 - The details of the Needleman-Wunsch
algorithm for sequence comparison are published.
• 1972 - The first recombinant DNA molecule is
created by Paul Berg and his group.
• 1973 - The Brookhaven Protein Data Bank is
announced (http://www.rcsb.org/)
• 1988 - The National Centre for Biotechnology
Information (NCBI) is established at the National
Cancer Institute.
– The Human Genome initiative is started
• 1990 - The BLAST program (Altschul et.al.) is
implemented
• 1994 The PRINTS database of protein motifs is
published by Attwood and Beck
• 1996 - The genome for Saccharomyces cerevisiae
(baker's yeast, 12.1 Mb) is sequenced.
• 2001 - The human genome (3 Gb) is published.
General definition
Bioinformatics is about searching, managing and
analyzing large amount of biological data using
different computational approaches
Methods for biological analyses
1. In vitro -
within controlled
environment outside
of a living
2. In vivo – within the

living
3. In Silico biology –
performed on
computer-
experiments?
In silico/Bioinformatics
Experiments in Bioinformatics
• Bioinformatics isn't just about storing
biological data in databases, it also concerns
conducting experiments on that data.
1) Searching
2) Comparing
3) Modeling
4) Integrating
Searching – Simplest of all
Comparing - Sequence comparison
Evolutionary analysis
Disease analysis
Modeling - Structure Analysis
Integration
Bioinformatics and Scientific method
Lets revise the basics first
What Bioinformatics offer to a
biologist?
• In today’s world, computers

are as likely to be used by
biologists as by any other
highly trained professionals.
• More specifically, we can say

bioinformatics as a
computational branch of
molecular biology.
Proteins
• The twenty amino-acids found
in proteins have different bodies
— but all have the same pair of
hooks — NH2 and COOH.
• These groups of atoms are used

to form the so-called peptidic
bonds between the successive
residues in the sequence.
• For instance, MAVLD
The biology in bioinformatics
Central dogma
RNA and DNA are made up

of nucleotides while proteins
are made up of amino acids.
DNA and RNA
• DNA and RNA are made up
of nucleotide sequences
• Nucleotides consist of
carbohydrates, phosphate,
and one out of five nitrogen
bases
• Adenine, Guanine, Cytosine,

Thymine, and Uracil or
simply A, T, G, and C
Reading DNA sequences the right way
• The four nucleotides making DNA have
different bodies but all have the same pair of
hooks: 5' phosphoryl and 3' hydroxyl groups
• For instance, TGACT
Chained DNA sequence

Complementarity of DNA
• Thymine (T) on one strand is always facing an adenine
(A) (and vice versa)
• Guanine (G) is always facing a cytosine (C) (and vice
versa).
• So when we know the sequence of nucleotides along

one strand, we can automatically deduce the sequence
on the other one.
• In a DNA sequence if T is facing G or C and G is facing T

or A then what do you say about this phenomenon?
Palindromic DNA
• A fascinating property
of DNA
complementarity is that
sometimes the two
strands are identical
• Known as palindromes
and are very important
– Recognized by restriction
enzymes
– Important binding sites
Turning DNA\RNA into Proteins: The
Genetic Code
• When we know a DNA sequence, we can
translate it into the corresponding protein
sequence by using the genetic code
How to read DNA and predict protein
sequence?
Read the DNA sequence:
ATGGAAGTATTTAAAGCGCCACCTATTGGGATATAAG
Decompose it into successive triplets:
ATG GAA GTA TTT AAA GCG CCA CCT ATT GGG ATA TAA G ..
Translate each triplet into the corresponding amino acid:
M E V F K A P P I G I STOP
Open Reading Frames (ORFs)
• An interval of DNA that remains free of “stop

codon” is called an open reading frame (ORF)
• However, not all DNA is translated into

proteins. Many regions simply do not, known
as introns.
Lab Exercises
-> Retrieving DNA sequences from database
-> Retrieving RNA sequences from database
-> Retrieving protein sequences from database

Bioinformatics: Farhan Haq, PHD Department of Biosciences Cui

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bioinformatics: Farhan Haq, PHD Department of Biosciences Cui

Uploaded by

Copyright:

Available Formats

Bioinformatics

Farhan Haq, PhD

• Information processing in various forms

2. In vivo – within the

• In today’s world, computers

• More specifically, we can say

• These groups of atoms are used

RNA and DNA are made up

• Adenine, Guanine, Cytosine,

Chained DNA sequence

• So when we know the sequence of nucleotides along

• In a DNA sequence if T is facing G or C and G is facing T

Decompose it into successive triplets:

Translate each triplet into the corresponding amino acid:

• An interval of DNA that remains free of “stop

• However, not all DNA is translated into

You might also like