You are on page 1of 13

BIO-INFORMATICS

“Bioinformatics promises a new era in drug discovery”

ABSTRACT:

Bioinformatics is the field of integrating biology and computers to improve the discovery
of new medical breakthroughs. A very wide ranging area, bioinfomatics includes
developing systems, databases, tools, algorithms to collect, model, and analyze the
enormous amount of data available about biology today.

Bioinformatics is the symbiotic relationship between computational and biological


sciences. The ability to sort and extricate genetic codes from a human genomic database
of 3 billion base pairs of DNA in a meaningful way is perhaps the simplest form of b.
In the field of bioinformatics, the current research drive is to be able to understand
evolutionary relationships in terms of the expression of protein function. Two
computational approaches have been brought to bear on the problem, tackling the
identification of protein function from the perspectives of sequence analysis and of
structure analysis respectively. From the point of view of sequence analysis, concerned
with the detection of relationships between newly determined sequence and those of
known function usually within database.

Bioinformatics as a field of study is becoming increasingly important due to the interest


of the pharmaceutical industry in genome sequencing projects, which comprise a key
segment of the discipline. There is a vital need to harness this information for medical
diagnostic and therapeutic uses, and there are opportunities for other industrial
applications. This field is evolving rapidly, which makes it challenging for biotechnology
professionals to keep up with recent advancements. NBI is dedicated in providing
education of the-state-of-the-art technologies in Bioinformatics.

Bioinformatics has emerged as a multidisciplinary subject that encompasses


developments in information and computer technology as applied to biotechnology and
biological sciences. Bioinformatics uses computer software tools for database creation,
data management, data warehousing, data mining and global communication network.
Functional genomics, biomolecular structure, proteome analysis, cell metabolism,
biodiversity, downstream processing in chemical engineering, drug design, vaccine
design are some of the areas in which bioinformatics is an integral component.
DETAIL DESCRIPTION

Background

During the last decade ,molecular biology has witnessed an information revolution as a
result both of the development of rapid DNA sequencing techniques and of the
corresponding progress in computer-based technologies, which are allowing us to cope
with this information deluge in increasingly efficient ways. The broad term that was
coined in the mid-1980s to encompass computer applications in biological sciences is
bioinformatics.
The term bioinformatics has been commandeered by several different disciplines to mean
rather different things. In its broadest sense, the term can be considered to mean
information technology applied to the management and analysis of biological data; this
has implications in diverse areas, ranging from artificial intelligence and robotics to
genome analysis. In the context of genome initiatives, the term was originally applied to
the computational manipulation and analysis of biological sequence data (DNA and/or
protein). However, in view of the recent rapid accumulation of available protein
structures, the term now tends also to be used to embrace the manipulation and analysis
of three –dimensional (3D) structural data.
At the beginning of 1998,in publicly available, non-redundant databases, more than
300000 protein sequences have been deposited, and the number of partial sequences in
public (Boguski et al., 1994) and proprietary Expressed Sequence Tag (EST) databases
is estimated to run in millions. By contrast, the number of unique 3D structures in the
protein data bank (PDB) is still less than 1500.although structural information is far more
complex to derive, store and manipulate than are sequence data, these figures
nevertheless highlights an enormous information deficit; this situation is likely to get
worse as the various Genome Projects around the world begin to bear fruit.
This graph illustrates the non-redundant growth of sequence data during the last decade (-
) and the corresponding growth in the number of unique structures (-).
In the mid 1980s, the United States Department of Energy (DoE) initiated a number of
projects to construct detailed genetic and physical maps of the human genome, to
determine its complete nucleotides sequence, and to localize its estimated 100 000 genes.
Work on this scale required the development of new techniques and instrumentation for
detecting and analyzing DNA .in April 1998, although the sequencing projects of only a
small number of relatively small genomes had been completed.

What is bioinformatics?

Bioinformatics is about:

• Understanding the functioning of living things - to ``improve the quality of life''.


• Rapid and better drug development.
• Identification of genetic risk factors
• Gene therapy
• Genetic modification of food crops and animals, etc.

In these goals bioinfomatics takes the help of information technology for storage,
retrieval & analysis of data and to simulate biological processes.

The understanding of the bioinfomatics requires the a little basic knowledge of biology
especially genetics. So what is a genome? The complete set of instructions for making an
organism is called its genome. It contains the master blueprint for all cellular structures
and activities for the lifetime of the cell or organism. Found in every nucleus of a person's
many trillions of cells, the human genome consists of tightly coiled threads of
deoxyribonucleic acid (DNA) and associated protein molecules. If unwound and tied
together, the strands of DNA would stretch more than 5 feet but would be only 50
trillionths of an inch wide. For each organism, the components of these slender threads
encode all the information necessary for building and maintaining life, from simple
bacteria to remarkably complex human beings. Four different bases are present in DNA:
adenine (A), thiamine (T), cytosine (C), and guanine (G). The particular sequence of
these bases specifies the exact genetic instructions required to create a particular
organism with its own unique traits.

DNA STRUCTURE

On June 26th, 2000, President Clinton, leaders of the Human Genome Project (HGP) and
representatives of the biotechnology company Celera announced the completion of a
"working draft" reference DNA sequence of the human genome. Corporate and
government-led scientists have already compiled the three gigabytes of paired A's, C's,
T's and G's that spell out the human genetic code. But that is just the initial trickle of the
flood of information to be tapped from the human genome. Researchers are generating
gigantic databases containing the details of when and in which tissues of the body various
genes are turned on, the shapes of the proteins the genes encode, how the proteins interact
with one another and the role those interactions play in disease. Gene Myers, Jr., vice
president of informatics research at Celera Genomics, calls the data generated "a tsunami
of information."
This new discipline of bioinformatics seeks to make sense of this tsunami of information.
In so doing, it is destined to change the face of biomedicine. "For the next two to three
years, the amount of information will be phenomenal, and everyone will be overwhelmed
by it," Myers predicts. "The race and competition will be who can mine it best. There will
be such a wealth of riches."

The amount of information may be enormous but the divining the importance of the data
is the job of bioinformatics. The field got its start in the early 1980s with a database
called GenBank, which was originated by the U.S. Department of Energy to hold the
short stretches of DNA sequence that scientists were just beginning to obtain from a
range of organisms. In the early days of GenBank a roomful of technicians sat at
keyboards consisting of only the four letters A, C, T and G, tediously entering the DNA-
sequence information published in academic journals. As the years went on, newer
communication technologies enabled researchers to dial up GenBank and dump in their
sequence data directly, and the administration of GenBank was transferred to the U.S
National Institutes of Health's National Center for Biotechnology Information (NCBI).
After the advent of the World Wide Web, researchers could access the data in GenBank
for free from around the globe. Once the Human Genome Project (HGP) officially got off
the ground in 1990, the volume of DNA-sequence data in GenBank began to grow
exponentially. With the introduction in the 1990s of high-throughput sequencing (an
approach using robotics, automated DNA-sequencing machines and computers) additions
to GenBank skyrocketed.

One of the most basic operations in bioinformatics involves searching for similarities, or
homologies, between a newly sequenced piece of DNA and previously sequenced DNA
segments from various organisms. Finding near-matches allows researchers to predict the
type of protein the new sequence encodes. This not only yields leads for drug targets
early in drug development but also weeds out many targets that would have turned out to
be dead ends.

A popular set of software programs for comparing DNA sequences is BLAST (for Basic
Local Alignment Search Tool), which first emerged in 1990. BLAST is part of a suite of
DNA- and protein-sequence search tools accessible in various customized versions from
many database providers or directly through NCBI. NCBI also offers Entrez, a so-called
meta-search tool that covers most of NCBI's databases, including those housing three-
dimensional protein structures, the complete genomes of organisms such as yeast, and
references to scientific journals that back up the database entries

Michael N. Liebman, head of computational biology at Roche Bioscience in Palo Alto,


says "Genomics is not the paradigm shift; it's understanding how to use it that is the
paradigm shift. In bioinformatics, we're at the beginning of the revolution." The
revolution involves many different players, each with a different strategy. Some
bioinformatics companies cater to large users, aiming their products and services at
genomics, biotechnology and pharmaceutical companies by creating custom software and
offering consulting services. Lion Bioscience, based in Heidelberg, Germany, has been
particularly successful at selling "enterprise-wide" bioinformatics tools and services. It
has a $100-million agreement with Bayer to build and manage a bioinformatics capability
across all of Bayer's divisions. Web-based bioinfomatics companies such as DoubleTwist
and eBioinformatics offer one-stop Internet shopping. These on-line portals allow users
to access various types of databases and use software to manipulate the data. In May
2000, Double Twist scientists announced they had used their technology to determine
that the number of genes in the human genome is roughly 105,000, although they said the
final count would probably come in at 100,000.

Large pharmaceutical companies have also sought to leverage their genomics efforts with
in-house bioinformatics investments. Many have established entire departments to
integrate and service computer software and facilitate database access across multiple
departments, including new product development, formulation, toxicology and clinical
testing. The old model of drug development often compartmentalized these functions,
isolating data that might have been useful to other researchers. Bioinformatics allows
researchers across a company to see the same thing while still manipulating the data
individually.

In addition to making drug discovery more efficient, in-house bioinformatics can also
save drug companies money in software support. GlaxoWellcome, is replacing individual
packages used by various investigators and departments to access and manipulate
databases with a single software platform. GlaxoWellcome, estimates that this will save
approximately $800,000 in staffing support over a three- to five-year period.

To integrate bioinformatics throughout their companies, pharmaceutical giants also forge


strategic alliances, enter into licensing agreements and acquire smaller biotechnology
companies. Using partners and vendors not only allows drug companies to fill in the gaps
in its bioinformatics capabilities but also gives it the mobility to adapt new technologies
as they come onto the market rather than constantly overhauling its own systems.
Occupying some of that overlap in resources, products and market capitalization are
companies such as Human Genome Sciences, Celera and Incyte. They straddle the terrain
between large companies and the data integration and mining offered by specialist
companies. They have also quickly seized on the degree of automation that
bioinformatics has brought to biology.

But with all this variety comes the potential for miscommunication. Getting various
databases to interoperate is becoming more and more important. An obvious solution
would be annotation, which is tagging data with names that are cross-referenced across
databases and naming systems. This has worked to a degree in companies like Roche
Bioscience. But this problem becomes more acute as the understanding of the biology
and the ability to conduct computational analysis becomes more sophisticated.
Systematic improvements will help, but progress and ultimately profit still relies on the
ingenuity of the end user, according to David J. Lipman, director of NCBI. "It's about
brain ware, not hardware or software."
Linus Pauling, the chemist, vitamin C-ist and anti atom-bombist determined the structure
of the other type of molecule, the protein molecule - that is chains made up of things
called amino acids

The 3-dimensional structure of a protein, Beta-amylase. The main structural units


of the protein, which are made up of just a few amino acids each, are differently
coloured.

This work inspired James Watson and Francis Crick in 1953 to elucidate the structure of
DNA - the ABC of all known living matter. To cut a long story short over the next years
many people pieced the puzzle together: The building blocks of life are the 20 amino
acids that make up proteins; DNA contains the blueprints for these structures in its own
structure. It is a long strand made of 4 nucleotides - this is the code of life. It goes
ACGTTCCTCCCGGGCTCC, and so on, and so on, and so on. If you know the code you
know the structure of all living things, at least in theory.
GUANINE

Here is a summary of the relationship between DNA and protein:

An Enourmous Flood of Data

Restless technology has produced means of reading genes (DNA) almost like bar - code.
The problem is that life is a complicated business, and therefore the code to describe even
the smallest of God's creatures would fill many books. But scientists are very ambitious
people and do lots of over-time. They have started to decode "themselves" in the Human
Genome Project - HUGO for short. In fact, a sort of "average" human is decoded
sampling DNA from unknown donors. But the difference in DNA between any human,
and another one (or a scientist...) is almost null. Nevertheless, an average human scientist
is made up of about 2.9 billion (2.9*109) nucleotides!
This orgy of reductionism presents problems which only big brother can solve: How do I
store all this information in a form, which is universally accessible and retrievable? What
started as a Cartesian dream is turning out to Bill Gates' satisfaction: Computers are
needed!
Vast computer data banks accessible to you and me store this vast quantity of
information. There are a lot of different data banks where DNA and protein sequence
information are stored. Three examples are listed in the table below.

Number of
Name of data bank Type of sequences stored
sequences (1996)
EMBL / GENBANK Nucleotide sequences 827174
SWISSPROT Protein sequences 52205
PDB Protein structures 4525

How can We Analyze the Flood of Data?

An advantage of these data banks is their flexibility. All this information can be ordered
and combined according to different patterns and tell us an awful lot.
The motto goes: don't just store it, analyze it! By comparing sequences, one can find out
about things like

 ancestors of organisms
 phylogenetic trees
 protein structures
 protein function

Phylogenetic trees are genealogical trees which are built up with information gained
from the comparison of the amino acid sequences of a protein like cytochrome C,
sampled from different species. Proteins like Beta-amylase or Hemoglobin cannot be
chosen to get the "full picture", that is the full tree, because they don't occur throughout
the living matter. Due to Darwinian Evolution, the protein has a slightly different amino
acid sequence for each of the species. One phylogenetic tree was created for instance
with the sequences of cytochrome C from several plants, animals and fungi. Below, part
of this phylogenetic tree is shown.
Drawing of a phylogenetic tree based on the amino acid sequence data of cytocrome C

Sequence comparison is a very powerful tool in molecular biology, genetics and protein
chemistry. Frequently it is unknown for which proteins a new DNA sequence codes or if
it codes for any protein at all. If you compare a new coding sequence with all known
sequences there is a high probability to find a similar sequence. Often it is already known
which role the protein in the data bank plays in the cell. If you assume that a similar
sequence implies a similar function, you now have much more knowledge about your
new sequence than before. Proteins of one class often show a few amino acids that
always occur at the same positions in the amino acid sequence. By looking for "patterns"
you will be able to gain information about the activity of a protein of which only the gene
(DNA) is known. Evaluation of such patterns yields information about the architecture of
proteins. Often these patterns are involved in active sites, which are the workbenchs of
proteins.

A lot of complicated algorithms have been created. There are tools to scan data banks for
sequences as FASTA and BLAST are. There are programs like Clustal and MSA for
comparing sequences. There are hundreds more. Although the development of new tools
is more transparent because of the possibilities of the Internet, it is not easy to keep up
with everything. Exploitation of these possibilities requires a new breed of scientist: those
versed in information technology AND biology, and they may enable us go where no
man has gone before. Through a new surge of interdisciplinarity it may be possible to
transcend the limits of reductionism; from the vast quantities of bytes and pieces, the
contours of complex structures and relationships might emerge from the genetic alphabet
soup as life itself once emerged from the primordial soup.

Why is bioinformatics is important?

In the field that has been dominated by structural biology for the last 20-30 years, we are
now witnessing a dramatic change of focus towards sequence analysis, spurred on by the
advent of the genome projects and the resultant sequence/structure deficit. The central
challenge of bioinformatics is the rationalization of the mass of sequence information,
with a view not only to deriving more efficient means of data storage, but also to
designing more incisive tools .the imperative that derive this analytical process is the
need to convert sequence information into biochemical and biophysical knowledge; to
decipher the structural, functional and evolutionary clues encoded in the language of
biological sequences.
It is clear that mere acquisition of sequences conveys little more about the intricate
biology of the system from which they are derived than company phone directory can
reveal about the complexities of the company’s business. To extract biological meaning
From sequence information is exacting science. In essence, we are faced with the task of
decoding an language an unknown language. This language may be decompose into
sentences (proteins), words (motifs), and letters – its alphabet- (amino acids), and the
code may be tackled at a variety of these levels. By themselves, the letters have no higher
meaning, but their particular combination into words is important. Sometimes, the most
suitable of changes, a single letter within a word perhaps, can change its meaning (e.g.
hog-hag), and hence the meaning of entire sentence; so it is vital to decipher the code
correctly. Consider, the example, the single base change in the human hemoglobin A
chain codon for glutamic acid (GAA) to valine (GUA) ;in homozygous individuals ,this
minute difference results in a change from a normal healthy state to sickle cell anemia.
To understand the words in a sequence sentence that form a particular protein structure,
and perhaps one day to be able to write sentences (design proteins) of our own. Today,
application of computational methods allows us to be recognize words that form
characteristics patterns or signatures, but we do not yet understand the intricate syntax
required to piece the patterns together and build complete protein structures.
In investigating the meaning of sequences, two distinct analytical themes have emerged:
in the first approach, pattern recognition techniques are used to detect similarity between
sequences and hence to infer related structures and functions; in the second, ab intio
prediction methods are used to deduce 3D structure, and ultimately to infer function,
directly from the linear sequence. The development of more powerful pattern recognition
and structure prediction techniques will continue to be dominant themes in bioinformatics
Research while the number of experimentally determined protein structures remains
small.
Scope in Bioinformatics

Bioinformatics uses advances in the area of computer science, information science,


computer and information technology, and communication technology to solve complex
problems in life sciences and particularly in biotechnology. Data capture, data
warehousing and data mining have become major issues for biotechnologists and
biological scientists due to sudden growth in quantitative data in biology such as
complete genomes of biological species including human genome, protein sequences,
protein 3-D structures, metabolic pathways databases, cell line & hybridoma information,
biodiversity related information. Advancements in information technology, particularly
the Internet, are being used to gather and access ever-increasing information in biology
and biotechnology. Functional genomics, proteomics, discovery of new drugs and
vaccines, molecular diagnostic kits and pharmacogenomics are some of the areas in
which bioinformatics has become an integral part of Research & Development. The
knowledge of multimedia databases, tools to carry out data analysis and modeling of
molecules and biological systems on computer workstations as well as in a network
environment has become essential for any student of Bioinformatics. Bioinformatics, the
multidisciplinary area, has grown so much that one divides it into molecular
bioinformatics, organal bioinformatics and species bioinformatics. Issues related to
biodiversity and environment, cloning of higher animals such as Dolly and Polly, tissue
culture and cloning of plants have brought out that Bioinformatics is not only a support
branch of science but is also a subject that directs future course of research in
biotechnology and life sciences. The importance and usefulness of Bioinformatics is
realized in last few years by many industries. Therefore, large Bioinformatics R & D
divisions are being established in many pharmaceutical companies, biotechnology
companies and even in other conventional industry dealing with biological.
Bioinformatics is thus rated as number one career in the field of biosciences.

The need of trained manpower in this area is sharply on the rise but there are very few
training institutions in the world where such training is provided. National Bioinformatics
Institute is one of the few such institutions in the world.

In short, Bioinformatics deals with database creation, data analysis and modeling. Data
capturing is done not only from printed material but also from network resources.
Databases in biology are generally in the multimedia form organized in relational
database model. Modeling is done not only on single biological molecule but also on
multiple systems thus requiring a use of high performance computing systems.

1. Skills that have great value on the current bioinformatics-related job market are:
sequence analysis, molecular modeling, Perl programming, Web interface design, data
mining, communication skills, Internet literacy, integration of heterogeneous and
distributed resources and tools, user support, virtual reality systems (esp. for real-time
communication), visualization, UNIX, database retrieval,...
SOME FACTS:

• In India Bio-informatics industry is estimated between $500 and $700 million.


• Global Pharmaceutical industry spending in R&D is in between $30-$40 billons
per year.
• Indian Bio-informatics business is expected to grow at 50%, to be $2 billion by
2005.
• Motorola is working on a processor designed to speed up experiments with
different combinations of protein –DNA reactions.
• IBM is investing $100 million in building a supercomputer Blue Gene for the
simulation of protein processes.
• IBM is also evolving techniques to represent and work with 3D structural data of
proteins.
• TCS in collaboration with centre for biochemical Technology has set up a centre
for Bioinformatics Research in Hyderabad.

You might also like