You are on page 1of 44

ITE312 - Basic Bioinformatics

Introduction to Bioinformatics
Syllabus
Five modules
– Introduction to Bioinformatics
– Sequencing Alignment and Dynamic
Programming
– Sequence Databases and Uses
– Evolutionary Trees and Phylogeny
– Special Topics in Bioinformatics
Data & Information
• Data is a representation of a fact, figure,
and idea.
• In computer science – data are numbers,
words, images, etc.
• Information is an ordered sequence of
symbols.
Information Technology
• Information technology (IT) is "the
acquisition, processing, storage and
dissemination of vocal, pictorial, textual
and numerical information by a
microelectronics-based combination of
computing and telecommunications"
– Dennis Longley, Michael Shain (1985)
Dictionary of Information Technology
Informatics
• Informatics is the study of application of
computer and statistical techniques to the
management of information.
• Biology in the 21st century is being
transformed from a purely lab-based
science to an information science as well.
Bioinformatics
• Bioinformatics is the field of science in
which biology, computer science, and
information technology merge to form a
single discipline.
– National Center for Biotechnology Information (NCBI)
Bioinformatics
• Bioinformatics is the marriage
of biology and information
technology.
• Bioinformatics is the application of
statistics and computer science to the field
of molecular biology.
Bioinformatics
• The term bioinformatics
was coined by Paulien
Hogeweg and Ben Hesper
in 1978 for the study of
informatic processes in
biotic systems at Utrecht
University, Netherlands.
Bioinformatics
• Bioinformatics encompasses any computational
tools and methods used to manage, analyze and
manipulate large sets of biological data. Three
components:
– Creation of databases allowing the storage and
management of large biological data sets.
– Development of algorithms and statistics to
determine relationships among members of large
data sets.
– Use of tools for the analysis and interpretation of
various types of biological data, including DNA, RNA
and protein sequences, protein structures, gene
expression profiles and biochemical pathways.
– The actual process of analyzing and interpreting data
is referred to as computational biology.
Computers in Bioinformatics
• Repeat same task millions of times
• Problem solving

Bioinformatics is more of a tool


than a discipline.
Scope of Bioinformatics
• Bioinformatics derives knowledge from
computer analysis of biological data.
• It is the technology that uses computers
for storage, retrieval, manipulation and
distribution of information by analysis
sequence data of biological
macromolecules like DNA, RNA and
proteins.
Related Fields
• Computational Biology
• Genomics
• Proteomics
• Pharmacogenomics
• Pharmacogenetics
• Cheminformatics
• Structural genomics or structural bioinformatics
• Comparative genomics
Applications
• Molecular medicine
• Personalized medicine
• Preventive medicine
• Gene therapy
• Drug development
• Microbial genome applications
• Waste cleanup
• Climate change Studies
• Alternative energy sources
• Biotechnology
• Antibiotic resistance
• Forensic analysis of microbes
• Bio-weapon creation
• Evolutionary studies
• Crop improvement
• Insect resistance
• Improve nutritional quality
• Development of Drought resistance varieties
• Veterinary Science
Skills for Bioinformatics
• Molecular Biology – Central Dogma
• Operating System – Unix or Linux
• Sequence Analysis and Molecular modeling
software packages
– EMBOSS, RasMol, Swiss-PdbViewer, GCG
Wisconsin Package
• Programming Language – C, C++, Perl, Python,
Java
• Markup Languages – HTML, XML
• Scripting Languages - JavaScript
• DBMS – MySQL, Oracle
Introductory Genetics

Central Dogma of
Molecular Biology
Why genetics is important
Genetics

G×E
interaction

Environment Health
ISI Web of Science topic search for "genetic AND disease"

8000

7000

6000
Number of journal records

5000

4000

3000

2000

1000

0
1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005
How genes work
What is a gene?
• A gene is a stretch of DNA whose
sequence determines the structure and
function of a specific functional molecule
…GAATTCTAATCTCCCTCTC …function
(usually
DNA a protein)
AACCCTACAGTCACCCATTT
Computer program
GGTATATTAAAGATGTGTTG
sf(){document.
f.q.focus()}…
TCTACTGTCTAGTATCC…

mRNA Working copy

Protein Specific function


Genes are located in the cell
nucleus on chromosomes
Karyotype
Down syndrome karyotype
(trisomy 21)
DNA
(deoxyribonu Protein
cleic acid)

mRNA
Transcription movie
Translation
Translation
Translation
Translation movie
Gene expression movie
Summary
• A gene is a length of DNA that contains
instructions for making a specific protein
• Genes are arranged along 23 pairs of
chromosomes in the cell nucleus
• Genes work by specifying the amino acid
sequence of a protein
Summary
• Post-genomic genetics has enormous
promise for tracking down the genes
involved in common complex diseases
• Currently our ability to exploit this potential
is limited by
– study size
– difficulty of correcting for confounding factors
Components of a Digital Computer System
Bioinformatics and Internet
• Biological information is stored on many different
computers around the world.
• The easiest way to access this information is for
the computers to be joined together in a
network.
• A computer network, is a collection of computers
and devices interconnected by communications
channels that facilitate communications among
users and allows users to share resources.
WORLD INTERNET USAGE AND POPULATION STATISTICS

Internet Users
Population Internet Users Latest Data Penetration Growth
World Regions
( 2010 Est.) Dec. 31, 2000 (June 30, (% Population) 2000-2010
2010)

Africa 1,013,779,050 4,514,400 110,931,700 10.9 % 2,357.3 %

Asia 3,834,792,852 114,304,000 825,094,396 21.5 % 621.8 %

Europe 813,319,511 105,096,093 475,069,448 58.4 % 352.0 %

Middle East 212,336,924 3,284,800 63,240,946 29.8 % 1,825.3 %

North America 344,124,450 108,096,800 266,224,500 77.4 % 146.3 %

Latin
592,556,972 18,068,919 204,689,836 34.5 % 1,032.8 %
America/Caribbean

Oceania / Australia 34,700,201 7,620,480 21,263,990 61.3 % 179.0 %

WORLD TOTAL 6,845,609,960 360,985,492 1,966,514,816 28.7 % 444.8 %

http://www.internetworldstats.com/stats.htm
Internet
• The internet is an international network of computers
derived from an earlier system, ARPAnet, developed by
the US military.
• The foundations of the Internet were formed when packet-
switching networks came into operation in the 1960s.
• Transmitted data is broken up into small packets of data,
sent to its destination, and reassembled at the other side.
• This means that a single signal can be routed to multiple
users, and an interrupted packet may be re-sent without
loss of transmission.
• Packets can be compressed for speed and encrypted for
security.
• Internet Access – Hardware (network card and/ or
modem), Software, Permission for network access
TCP/ IP
• Information transfer over the internet is
governed by a set of protocols (procedures for
handling data packages) called TCP/IP.
• TCP is the Transmission Control Protocol, which
determines how data is broken into packages
and reassembled.
• IP is the Internet Protocol, which determines
how the packets of information are addressed
and routed over the network.
FTP
• File Transfer Protocol (FTP) is a standard
network protocol used to copy a file from one host
to another over a TCP-based network, such as
the Internet. FTP is built on a client-server
architecture and utilizes separate control and data
connections between the client and server. FTP
users may authenticate themselves using a clear-
text sign-in protocol but can connect anonymously
if the server is configured to allow it.
Telnet
• Telnet is a network protocol used on the Internet or local
area networks to provide a bidirectional interactive text-
oriented communications facility using a
virtual terminal connection.
• Telnet is a user command and an underlying TCP/IP
protocol for accessing remote computers. Through
Telnet, an administrator or another user can
access someone else's computer remotely.
• On the Web, HTTP and FTP protocols allow one to
request specific files from remote computers, but not to
actually be logged on as a user of that computer.
• With Telnet, one can log on as a regular user with
whatever privileges may have been granted to the
specific application and data on that computer.
Telnet
• A Telnet command request looks like this:
telnetthe.libraryat.whatis.edu
The result of this request would be an invitation
to log on with a userid and a prompt for a
password. If accepted, one would be logged on
like any user who used this computer every day.
• Telnet is most likely to be used by program
developers and anyone who has a need to use
specific applications or data located at a
particular host computer.
WWW
• The World Wide Web is a way of
exchanging information over the Internet
using a program called a browser.
• The WWW was developed in 1992 and
allows the display of information pages
containing multimedia objects in a special
format called hypertext.
– URL or Hyperlink
Gateway sites for Bioinformatics on WWW

• http://www.ncbi.nlm.nih.gov/
• http://www.ebi.ac.uk/
• http://www.expasy.ch/
• http://www.genome.jp/kegg/