You are on page 1of 24

Bioinformatics

Farhan Haq, PhD


Department of Biosciences
CUI
Bioinformatics
• The term was introduced in 1970s by Paulien
Hogeweg
– Study biotic systems

• Information processing in various forms


Brief History
• 1953 - Watson & Crick proposed the double helix
model for DNA based x-ray data.
• 1955 - The sequence of the first protein to be
analyzed, bovine insulin, is announced by F.Sanger.
• 1969 - The ARPANET is created by linking computers
at Stanford and UCLA.
• 1970 - The details of the Needleman-Wunsch
algorithm for sequence comparison are published.
• 1972 - The first recombinant DNA molecule is
created by Paul Berg and his group.
• 1973 - The Brookhaven Protein Data Bank is
announced (http://www.rcsb.org/)
• 1988 - The National Centre for Biotechnology
Information (NCBI) is established at the National
Cancer Institute.
– The Human Genome initiative is started
• 1990 - The BLAST program (Altschul et.al.) is
implemented
• 1994 The PRINTS database of protein motifs is
published by Attwood and Beck
• 1996 - The genome for Saccharomyces cerevisiae
(baker's yeast, 12.1 Mb) is sequenced.
• 2001 - The human genome (3 Gb) is published.
General definition
Bioinformatics is about searching, managing and
analyzing large amount of biological data using
different computational approaches
Methods for biological analyses

1. In vitro -
within controlled
environment outside
of a living

2. In vivo – within the


living

3. In Silico biology –
performed on
computer-
experiments?

In silico/Bioinformatics
Experiments in Bioinformatics
• Bioinformatics isn't just about storing
biological data in databases, it also concerns
conducting experiments on that data.

1) Searching
2) Comparing
3) Modeling
4) Integrating
Searching – Simplest of all
Comparing - Sequence comparison

Evolutionary analysis

Disease analysis
Modeling - Structure Analysis
Integration
Bioinformatics and Scientific method
Lets revise the basics first
What Bioinformatics offer to a
biologist?

• In today’s world, computers


are as likely to be used by
biologists as by any other
highly trained professionals.

• More specifically, we can say


bioinformatics as a
computational branch of
molecular biology.
Proteins
• The twenty amino-acids found
in proteins have different bodies
— but all have the same pair of
hooks — NH2 and COOH.

• These groups of atoms are used


to form the so-called peptidic
bonds between the successive
residues in the sequence.
• For instance, MAVLD
The biology in bioinformatics

Central dogma

RNA and DNA are made up


of nucleotides while proteins
are made up of amino acids.
DNA and RNA
• DNA and RNA are made up
of nucleotide sequences

• Nucleotides consist of
carbohydrates, phosphate,
and one out of five nitrogen
bases

• Adenine, Guanine, Cytosine,


Thymine, and Uracil or
simply A, T, G, and C
Reading DNA sequences the right way
• The four nucleotides making DNA have
different bodies but all have the same pair of
hooks: 5' phosphoryl and 3' hydroxyl groups
• For instance, TGACT

Chained DNA sequence


Complementarity of DNA
• Thymine (T) on one strand is always facing an adenine
(A) (and vice versa)
• Guanine (G) is always facing a cytosine (C) (and vice
versa).

• So when we know the sequence of nucleotides along


one strand, we can automatically deduce the sequence
on the other one.

• In a DNA sequence if T is facing G or C and G is facing T


or A then what do you say about this phenomenon?
Palindromic DNA
• A fascinating property
of DNA
complementarity is that
sometimes the two
strands are identical

• Known as palindromes
and are very important
– Recognized by restriction
enzymes
– Important binding sites
Turning DNA\RNA into Proteins: The
Genetic Code
• When we know a DNA sequence, we can
translate it into the corresponding protein
sequence by using the genetic code
How to read DNA and predict protein
sequence?
Read the DNA sequence:

ATGGAAGTATTTAAAGCGCCACCTATTGGGATATAAG

Decompose it into successive triplets:

ATG GAA GTA TTT AAA GCG CCA CCT ATT GGG ATA TAA G ..

Translate each triplet into the corresponding amino acid:

M E V F K A P P I G I STOP
Open Reading Frames (ORFs)

• An interval of DNA that remains free of “stop


codon” is called an open reading frame (ORF)

• However, not all DNA is translated into


proteins. Many regions simply do not, known
as introns.
Lab Exercises
-> Retrieving DNA sequences from database
-> Retrieving RNA sequences from database
-> Retrieving protein sequences from database

You might also like