Professional Documents
Culture Documents
ATGCCGAA1010111TGAACCA
Nandini Kotharkar
Physics
Computer
Science Biological
Science
Bioinformatics
Mathematics Chemistry
Statistics
Why is Bioinformatics Important?
First
Data organization
• researchers access to existing information
• submit new entries
Second
develop tools and resources that aid in the analysis of data
Third
interpret the results in a biologically meaningful manner.
Challenges in bioinformatics
• Explosion of information
– Need for faster, automated analysis to process large amounts of data
– Need for integration between different types of information
(sequences, literature, annotations, protein levels, RNA levels etc…)
– Need for “smarter” software to identify interesting relationships in very
large data sets
• Lack of “bioinformaticians”
– Software needs to be easier to access, use and understand
– Biologists need to learn about the software, its limitations, and how to
interpret its results
A Few Current Challenges of Bioinformatics
2.To improve ability to predict exactly where and when transcription will
occur in a genome.
• Bioinformatics Scientist-a laboratory scientist who has skills with Unix, Perl,
SQL, etc., knows the limitations of available bioinformatics tools, and uses
these tools and his/her scientific knowledge to help the validation process in
the lab.
1970's
•Needleman-Wunschalgorithm for sequence comparison appears.
•The Brookhaven Protein Data Bank is announced.
•Connecting networks of computers into an "internet“is born.
•Maxam, Gilbert, & Sanger report methods for sequencing DNA
1990's
•The BLAST program is implemented.
•The advent of Linux.
•The use of Bacterial Artificial Chromosomes for cloning.
•National Center for Genome Resources (NCGR) was founded.
•Sun releases version 1.0 of Java.
•Affymetrixproduces the first commercial DNA chip.
Early 2000's
•IBM Blue Gene project begins
•Human genome sequence is published
•RefSeqand LocusLinkare introduced
•Ensemblannotates the human genome
•EMBOSS software package is made available
•Human genome on a chip becomes available
Research Scientists Would Like...
•An effective way to integrate stand-alone applications that easily interact with
one another behind a single interface and integration framework.
Gene Analysis
Research Activities [%] Prot. Struc. Pred.
Macr. Modelling
Evol. Biol.
Struc-Func Studies
Mech. of Dis.
Data Mining Stud
Knowledge
Discoveries
Products
Strategic alliance & Partnerships
IT Industries
Software Expertise Hospitals for
Clinical records
Biotech Companies
General Types of Informatics techniques…..
• Databases • Geometry
– Building, Querying – Robotics
– Object DB – Graphics (Surfaces,
• Text String Comparison Volumes)
– Text Search – Comparison and 3D
– 1D Alignment Matching
(Vision, recognition)
– Significance Statistics
• Physical Simulation
• Finding Patterns – Newtonian Mechanics
– AI / Machine Learning – Electrostatics
– Clustering – Numerical Algorithms
– Datamining – Simulation
Bioinformatics as Information
Technology
Database
Information
Hardware
Retrieval
Bioinformatics
Algorithm Agent
Machine
Learning
Bioinformatics on the Web
The experimental process
sample
hybridization
array
scanner
Data management
relational
database
web
interface
Sequence analysis
Sequence alignment
Structure and function prediction
Gene finding
Structure analysis
Protein structure comparison
Protein structure prediction
RNA structure modeling
Expression analysis
Gene expression analysis
Gene clustering
Pathway analysis
Metabolic pathway
Regulatory networks
Bioinformatcs Tools and Services
5’ 3’
DNA
Transcription
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
At Genome Level Genome Projects
need to store and
organize DNA
sequences
5’ 3’
DNA
Transcription
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
At Transcription Level
Transcription
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
At Translation Level
5’ 3’
DNA
Transcription
mRNA Splicing
Translation
Poly-peptide
Folding
What do we know
about a specific
Protein protein?
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
At Translation Level
5’ 3’
DNA
Transcription
mRNA Splicing
Translation
Poly-peptide
Folding
How can we compare
protein sequences?
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
At Structure Level
5’ 3’
DNA
Transcription
mRNA Splicing
Translation
Poly-peptide
Folding
Can we predict
protein structures?
Protein
• Transport / Localization
• Oligomerization
• Post-Translational Modification
Function Function
Web Access:
http://www.ncbi.nlm.nih.gov
Text Searches
Entrez System
NCBI BLAST
http://www.ncbi.nlm.nih.gov/blast/
Sequence alignments and
comparison
1: MY-TAIL--ORIS-RICH-
¦x ¦¦¦¦ x¦x¦ ¦¦¦¦ Two Local Alignments
2: MONTAILLEURESTRICHE
• Ab initio modeling
• Threading & Fold Recognition
• Homology Modeling
MNIFEMLRID
HLLTKSPSLN
EGLRLKIYKD
AAKSELDKAI
TEGYYTIGIG
GRNCNGVITK
?
DEAEKLFNQD VDAAVRGILR NAKLKPVYDS
LDAVRRCALI NMVFQMGETG VAGFTNSLRM
LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI
TTFRTGTWDA YKNL
Rational Drug Design
Complementarity
- Shape
- Chemical
- Electrostatic
Drug Development Life Cycle
Discovery
(2 to 10 Years)
Preclinical Testing
(Lab and Animal Testing)
Phase I
(20-30 Healthy Volunteers used to
With the aid of Bioinformatics check for safety and dosage)
Phase II
(100-300 Patient Volunteers used to
check for efficacy and side effects)
Phase III
$600-700 Million!
(1000-5000 Patient Volunteers
used to monitor reactions to
long-term drug use)
FDA Review
& Approval
Post-Marketing
Years Testing
0 2 4 6 8 10 12 14 16
7 – 15 Years!
Drug lead screening
DNA chips NH
NH2
Cl
N
N N
CH2
NH2 N
NH2
OCH3
Cl
NH Cl NH
H NH NH
C CH
NH N
3 Cl
NH2 OH
CH3
COO- H
Genetic Variations
C CH
OH NH N
3
N N N CH3
NH2 NH
NH2 CH3
N CH3 N
NH2 OCH3 N N
Biochemical Networks
Optimizing therapies
Genomes
caaaaatagggttaatatgaatctcgatctccattttgttcatcgtattcaacaacaagcc
aaaactcgtacaaatatgaccgcacttcgctataaagaacacggcttgtggcgagatatct
cttggaaaaactttcaagagcaactcaatcaactttctcgagcattgcttgctcacaatat
tgacgtacaagataaaatcgccatttttgcccataatatggaacgttgggttgttcatgaa
actttcggtatcaaagatggtttaatgaccactgttcacgcaacgactacaatcgttgaca
ttgcgaccttacaaattcgagcaatcacagtgcctatttacgcaaccaatacagcccagca
agcagaatttatcctaaatcacgccgatgtaaaaattctcttcgtcggcgatcaagagcaa
tacgatcaaacattggaaattgctcatcattgtccaaaattacaaaaaattgtagcaatga
aatccaccattcaattacaacaagatcctctttcttgcacttgg
d4dfra_ ISLIAALAVDRVIGMENAMPWN
- LPADLAWFKRNTL
--------
- NLVIMGKKTWFSI
d8dfr__ LNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSHVEGKQ
- NAVIMGKKTWFSI
NKPVIMGRHTWESI
Structure Prediction
d3dfr__ TAFLWAQDRNGLIGKDGHLPW- HLPDDLHYFRAQTVG
-------- KIMVVGRRTYESF
Sequence Analysis
Gene Expression Analysis
Pathway Analysis
Database Search