Bioinformatics a starter

Published by K Mani
Short essay on Bioinformatics
Published by: K Mani on Mar 18, 2009
BioinformaticsA starter
K. Mani.Reader in Botany,PSG College of Arts and Science,Coimbatore, Tamilnadu, India.kmani52@gmail.com
What is Bioinformatics?
How it started?
Who are the Bioinformaticians? the tool makers and the tool users
The information: Nucleotides, proteins, and structures.
The Database: Primary, Secondary and special.
The elements:
Sequence analysis: Pair wise and Multiple sequence analysis
Phylogenetic analysis:
Genomics: Structural, functional and comparative
Proteomics: Structural, functional, comparative and interactive
Metabolomics:Reconstruction of metabolic pathways
Systems Biology: Cell function Simulation
Applications of Bioinformatics:
What is Bioinformatics?
Bioinformatics is about collecting, storing, maintaining and distribution of Biological data for extraction and identification of meaningful information andconverting them into knowledge with the help of computers.The first phase in Bioinformatics is conversion of Biological data into digitalformat. The second phase is cleaning and arranging the data into an easily retrievablefrom. This is called the database.The third phase is extraction of hidden information in the data by comparison andanalysis using computer programs to convert the data into knowledge.
How it started?
Two major incidents in Biology gave a kick start for the bioinformatics. Theattempts of Margarette Dayhoff in 1980 to analyze protein sequences with the help of computer program were the initial ground work for Bioinformatics. However the real jumpstart occurred only after the release of Human genome draft in the year 2001.All necessary information for the growth and development of an organism isalready present in the nucleotide sequences of its genome. Similarly the structure andfunction of a protein is already inscribed in its primary sequence. Therefore all one need
to know is the sequence the remaining can be discerned from it using computationalanalysis.The attempt to cull out every bit of information from a given sequence orstructure is the bioinformatics.
The two pillars of bioinformatics
Since, bioinformatics is a happy marriage of computer and biology, they remainas the two pillars of Bioinformatics. Computer savvy biologists and Biology lovingcomputational scientists both enjoy the Bioinformatics equally. Each of them have theirown share in the development and growth of Bioinformatics.Computational domain involves creation and maintenance of biological data anddeveloping beastly program that can crunch the huge data and smart analytical tools thatwould extract hidden secrets buried inside the genetic codes.The second domain belongs to Biologists. The secrets hidden in the nucleotideand amino acid sequence are unearthed using intelligent tools.For instant from a given nucleic acid sequence the following information can bedug out.
Locating position genes on the genome sequence
Locating eh intron and exon positions
Identifying the deletion, insertion and substitution mutations
Locating paralogous, orthologous and psuedogenes
Identifying non coding gene regulatory elements
The protein to which the gene codes for
The gene flanking sequences that would serve as primer for PCRSimilarly from protein sequences the following and much more information can betraced:
Protein function
Protein secondary structure
Tertiary structure
Super secondary structures
Antigenic regions
Post translational modifications
Grand Hydropathy
Molecular weight
Electrical properties
Half life
Optical extinction coefficient
Interaction with other molecule including other proteins
Cellular location
Signal peptides
Active site
Probable location in the 2D gel electrophoresis
Trans membrane properties
Iso-electric pointThe nature of Biological data
There are three kinds of biological data.
The gene related
protein related
gene and protein function related and
Structure related.The gene related data consists of gene sequences, the location of the genes and theregulatory elements. Further the Genome sequences, chromosomal architecture, noncoding regions, gene markers, chromosome map elements, and Single nucleotidepolymorphic markers. These data are stored as text files in a format known as FASTA.The protein related data include sequences, domain, pattern, motif etc are storedin FASTA format in the databanks.The structure related data include: three dimensional atomic coordinate files of proteins and small molecular compounds, ligands and tRNA such that.The Data BanksBiological databanks collect, maintain and distribute data. Some databanks arelarge store houses known as data warehouse. Some databanks are well classified likesupermarkets.Sequence, structure and gene expression data are maintained digitally in robustcomputer servers and linked to Internet through World Wide Web. Excepting a few, allmost all biological data are free and open access.
The Big three
Across the world there are three major biological data bases. These databases canbe accessed through internet browsers. The site maps of these databases will take thevisitor to a tour and introduce all their components.National Center for Biotechnology Information (NCBI)European Molecular Biology laboratory (EMBL)DNA data bank of Japan (DDBJ)Nucleic acid sequences either as genes or genomes with complete annotations areavailable in these databases. Entrez (NCBI), SRS (EMBL) are the user friendly dataretrieval systems offered by the databases themselves.Classification of DATA basesEvery first issue of the year of Nucleic acid Research come out with newly addeddatabases. The total number of databases in the world is nearing now 1000. Databasescan be classified based on the nature of data source. Primary databases consists of dataderived from the source laboratories. The secondary databases are information enricheddata derived from the primary databases. The specialized databases contain data of special interest.Primary databases stack either nucleic acid sequence data or Protein sequencedata. Structure files from crystallography research centers are other primary data. Genes,

