BIOINFORMATICS TRAINING PROGRAM

May 16-18, 2007

Reference Material

Indian Institute of Advanced Research

www.helpBIOTECH.blogspot.com | Your Gate Way to Life Science Career

BIOINFORMATICS
1.0 Introduction 2.0 Protein Structure 3.0 Genome Analysis 4.0 Phylogeny 5.0 Modeling 6.0 Tools for Structure based drug design and docking 7.0 Computational Resources 1-2 3-4 5-9 10-15 16-18 19-22 23-36

1.0 Introduction
Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. At the beginning of the "genomic revolution", a bioinformatics concern was the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences. Development of this type of database involved not only design issues but the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data. Ultimately, however, all of this information must be combined to form a comprehensive picture of normal cellular activities so that researchers may study how these activities are altered in different disease states. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. The process of analyzing and interpreting data it is hoped, will lead to elucidation of underlying principles in the biological phenomenon. Some Important Landmarks in the development of Bioinformatics: 1962 1965 1970 1977 1977 1977 1981 1981 The first theory of molecular evolution; the Molecular Clock concept (Linus Pauling and Emile Zukerkandl) Atlas of Protein Sequences, the first protein database (Margaret Dayhoff and coworkers) Needleman-Wunsch algorithm for global protein sequence alignment New DNA sequencing methods (Fred Sanger, Walter Gilbert and coworkers); bacteriophage X174 sequence First software for sequence analysis (Roger Staden) Phylogenetic taxonomy; archaea discovered; the notion of the three primary kingdoms of life introduced (Carl Woese and coworkers) Smith-Waterman algorithm for local protein sequence alignment Human mitochondrial genome sequenced

www.helpBIOTECH.blogspot.com | Your Gate Way to Life Science Career

1981 1982 1982 1983 1985 1986 1987 1988 1988 1990 1991 1994 1994 1995 1996 1996 1997 1997 1998 1999 2001

The concept of a sequence motif (Russell Doolittle) GenBank Release 3 made public Phage genome sequenced (Fred Sanger and coworkers) The first practical sequence database searching algorithm (John Wilbur and David Lipman) FASTP/FASTN: fast sequence similarity searching (William Pearson and David Lipman) Introduction of Markov models for DNA analysis (Mark Borodovsky and coworkers) First profile search algorithm (Michael Gribskov, Andrew McLachlan, David Eisenberg) National Center for Biotechnology Information (NCBI) created at NIH/NLM EMBnet network for database distribution created BLAST: fast sequence similarity searching with rigorous statistics (Stephen Altschul, David Lipman and coworkers) EST: expressed sequence tag sequencing (Craig Venter and coworkers) Hidden Markov Models of multiple alignments (David Haussler and coworkers; Pierre Baldi and coworkers) SCOP classification of protein structures (Alexei Murzin, Cyrus Chothia and coworkers) First bacterial genomes completely sequenced First archaeal genome completely sequenced First eukaryotic genome (yeast) completely sequenced Introduction of gapped BLAST and PSI-BLAST COGs: Evolutionary classification of proteins from complete genomes Worm genome, the first multicellular genome, (nearly) completely sequenced Fly genome (nearly) completely sequenced Human genome (nearly) completely sequenced

www.helpBIOTECH.blogspot.com | Your Gate Way to Life Science Career

2.0 Protein Structure
A set of 20 different subunits, called amino acids, can be arranged in any order to form a polypeptide that can be thousands of amino acids long. These chains can then loop about each other or fold, in a variety of ways, but only one of these ways allows a protein to function properly. The critical feature of a protein is its ability to fold into a conformation that creates structural features, such as surface grooves, ridges, and pockets, which allow it to fulfill its role in a cell. A protein's conformation is usually described in terms of levels of structure. Traditionally, proteins are looked upon as having four distinct levels of structure, with each level of structure dependent on the one below it. In some proteins, functional diversity may be further amplified by the addition of new chemical groups after synthesis is complete. The stringing together of the amino acid chain to form a polypeptide is referred to as the primary structure. The secondary structure is generated by the folding of the primary sequence and refers to the path that the polypeptide backbone of the protein follows in space. Certain types of secondary structures are relatively common. Two well-described secondary structures are the alpha helix and the beta sheet. In the first case, certain types of bonding between groups located on the same polypeptide chain cause the backbone to twist into a helix, most often in a form known as the alpha helix. Beta sheets are formed when a polypeptide chain bonds with another chain that is running in the opposite direction. Beta sheets may also be formed between two sections of a single polypeptide chain that is arranged such that adjacent regions are in reverse orientation. The tertiary structure describes the organization in three dimensions of all of the atoms in the polypeptide. If a protein consists of only one polypeptide chain, this level then describes the complete structure. Multimeric proteins, or proteins that consist of more than one polypeptide chain, require a higher level of organization. The quaternary structure defines the conformation assumed by a multimeric protein. In this case, the individual polypeptide chains that make up a multimeric protein are often referred to as the protein subunits. The four levels of protein structure are hierarchal, that is, each level of the build process is dependent upon the one below it. A protein's primary amino acid sequence is crucial in determining its final structure. In some cases, amino acid sequence is the sole determinant, whereas in other cases, additional interactions may be required before a protein can attain its final conformation. For example, some proteins require the presence of a cofactor, or a second molecule that is part of the active protein, before it can attain its final conformation. Multimeric proteins often require one or more subunits to be present for another subunit to adopt the proper higher order structure. The entire process is cooperative, that is, the formation of one region of secondary structure determines the formation of the next region. Allosteric Proteins: These are proteins which under certain conditions have a stable alternate conformation, or shape, that enables it to carry out a different biological function. The interaction of an allosteric protein with a specific cofactor, or with another protein, may influence the transition of the protein between shapes. In addition, any

www.helpBIOTECH.blogspot.com | Your Gate Way to Life Science Career

However. This is a very time consuming and tedious process.change in conformation brought about by an interaction at one site may lead to an alteration in the structure. These exact conditions can only be discovered by repeated trials that entail varying certain experimental conditions. the moving charge creates what is called a magnetic moment. each diffracted ray.helpBIOTECH. many molecules are in the same orientation with respect to the incoming Xrays. Protein structure determination: Traditionally. The set of diffracted. one at a time. X-ray Crystallography: Crystals are a solid form of a substance in which the component molecules are present in an ordered array called a lattice. they tilt even more. www. an important component of any X-ray diffraction instrument is a device for accurately setting and changing the orientation of the crystal.blogspot. that this type of transition affects only the protein's shape. If an X-ray detector. as well as many other interesting properties of the molecule. hese resonating nuclei emit a unique signal that is then picked up by a detector and processed by the Fourier Transform algorithm. However. In the past 10 years. a protein's structure was determined using one of two techniques: X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. a complex equation that translates the language of the nuclei into something a scientist can understand. emerging beams contains information about the underlying crystal structure. called a reflection. When the crystal is placed in an X-ray beam. The X-ray beam enters the crystal and a number of smaller beams emerge: each one in a different direction. By measuring the frequencies at which different nuclei flip. Crystals are formed by slowly precipitating proteins under conditions that maintain their native conformation or structure. such as a piece of film. The major drawback associated with this technique is that crystallization of the proteins is a difficult task. because only a few reflections can be detected with any one orientation of the crystal. One should bear in mind. will produce a spot on the film. all of the unit cells present the same face to the beam. at another site. NMR has the advantage over crystallographic techniques in that experiments are performed in solution as opposed to a crystal lattice. is placed on the opposite side of the crystal from the X-ray source. NMR has proven to be a powerful alternative to X-ray crystallography for the determination of molecular structure. though. each one with a different intensity.and medium-sized molecules. sometimes flipping over. and thus function. Nuclear Magnetic Resonance (NMR) Spectroscopy: The basic phenomenon of NMR spectroscopy was discovered in 1945. scientists can determine molecular structure. the smallest possible set that is fully representative of the crystal. In this technique. Each unit cell contains exactly one unique set of the crystal's components. the principles that make NMR possible tend to make this technique very time consuming and limit the application to small. Allosteric proteins play an important role in both metabolic and genetic regulation. a sample is immersed in a magnetic field and the positively charged nucleus spins.com | Your Gate Way to Life Science Career . When the radio waves hit the spinning nuclei. therefore. The basic building block of a crystal is called a unit cell. not the primary amino acid sequence.

In general terms. are fed to the algorithm on the BLAST server. BLAST performs "local" alignments. The bit score gives an indication of how good the alignment is. In comparative genomics one of the major functions is the identification of homologous genes in different organisms. or whether the similarity observed is attributable to chance alone. the sequence. and so on. Scientists also use the term homology. Most proteins are modular in nature. or homologous. The lower the E-value.helpBIOTECH. The BLAST algorithm is a heuristic program.05 means that this similarity has a 5 in 100 (1 in 20) chance of occurring by chance alone. with functional domains often being repeated within the same protein as well as across different proteins from different species. it is helpful to have some idea of whether the alignment is "good" and whether it portrays a possible biological relationship. The BLOSUM62 matrix is the default for most BLAST programs. especially with respect to domains and motifs. If instead BLAST started out by attempting to align two sequences over their entire lengths (known as a global alignment).com | Your Gate Way to Life Science Career . even if different scoring matrices have been used. An important tool which is utilized for this function is BLAST (Basic local alignment search tool). word size. as well as any gaps introduced to align the sequences. the more significant the hit. to simply mean similar. A sequence alignment that has an E-value of 0. it is used to initiate gap-free and gapped extensions of the "word". A key element in this calculation is the "substitution matrix ".0 GENOME ANALYSIS Homology refers to two genes sharing a common evolutionary history. which assigns a score for aligning any possible pair of residues. The local alignment approach also means that a mRNA can be aligned with a piece of genomic DNA. as is frequently required in genome assembly and analysis. The sequence database is then scanned for these "hot spots".3. The BLAST algorithm is tuned to find these domains or shorter stretches of sequence similarity. which means that the bit scores from different alignments can be compared. expect value. The E-value gives an indication of the statistical significance of a given pairwise alignment and reflects the size of the database and the scoring system used. fewer similarities would be detected. this score is calculated from a formula that takes into account the alignment of similar or identical residues. BLAST works by first making a look-up table of all the "words" (short subsequences. BLAST Scores and Statistics: Once BLAST has found a similar sequence to the query in the database.blogspot. the higher the score. Bit scores are normalized. www. which for proteins the default is three letters) and "neighboring words". BLAST uses statistical theory to produce a bit score and expect value (E-value) for each alignment pair (query to hit). i.e. which means that it relies on some smart shortcuts to perform the search faster. regardless of the evolutionary relationship. plus any other input information such as the database to be searched. When a match is identified. the better the alignment.. When a query is submitted via one of the BLAST Web pages. the exceptions being blastn and MegaBLAST (programs that perform nucleotide nucleotide comparisons and hence do not use proteinspecific matrices). similar words in the query sequence.

eShadow. rVista. Tools include: zPicture. 5.helpBIOTECH.Although a statistician might consider this to be significant. www. 6.org website provides access to tools for comparative genomic analyses developed by the Comparative Genomics Center at the Lawerence Livermore National Laboratory. FootPrinter3 extends the motif discovery 2. CMR The Comprehensive Microbial Resource (CMR) gives access to a central repository of the sequence and annotation of all complete public prokaryotic genomes as well as comparative genomics tools across all of the genomes in the database. EnteriX EnteriX is a collection of tools for viewing pairwise and multiple alignments for bacterial genome sequences. S is the raw score.com | Your Gate Way to Life Science Career . DAVID Bioinformatic Resources The Database for Annotation. and the ECR Browser. The score and E value are calculated using the equations given below: S’ = λS-ln K ln 2 E = mn 2-S’ Where S’ is the normalized score. FootPrinter3 FootPrinter3 is a web server for predicting transcription factor binding sites (TFBS) by using phylogenetic footprinting. CisMols CisMols (Cis-regulatory Modules) is a tool that identifies compositionally predicted cis-clusters that occur in groups of co-regulated genes within each of their ortholog-pair evolutionarily conserved cis-regulatory regions. λ and K are constants. 4. Mulan. FootPrinter FootPrinter is a program for phylogenetic footprinting that identifies regions of DNA that are well conserved across a set of orthologous sequences in order to infer phylogenetic relationships. 3.blogspot. Visualization and Integrated Discovery (DAVID) provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes. it still may not represent a biologically meaningful result. DCODE. 7.ORG The dcode. m and n is the length of the query and hit sequences Tools for Comparative Genomics: 1. and analysis of the alignments (see below) is required to determine "biological" significance. CREME.

MicroFootPrinter MicroFootPrinter identifies the conserved motifs in regulatory regions of prokaryotic genomes using the phylogenetic footprinting program FootPrinter. funding agencies. www. and viruses).algorithms of FootPrinter by making use of local multiple sequence alignment blocks when those are available and reliable. including project timelines. but also allowing finding motifs in unalignable regions. 10. eurkaryotes. MIPS 11. 15.helpBIOTECH. GENSTYLE GENSTYLE is based on the genomic signature paradigm and allows the user to classify and characterize nucleotide sequences using oligonucleotide frequencies. ISC Large-scale Sequencing Project Database The International Sequencing Consortium (ISC) Large-scale Sequencing Project Database contains information on current and completed sequencing projects. It can be searched using keywords or BLASTp. 12. bacteria. Mauve Mauve is a stand-alone software tool for constructing multiple genome alignments.com | Your Gate Way to Life Science Career . 14. chromosomal location and neighbourhood and lists of paralogues and orthologues. IBM Genome Annotation Page IBM's Bio-Dictionary-based Annotations Of Completed Genomes page lists annotations for over 75 complete genomes (archae. One can easily build a list of genomes to be considered or excluded from the search and the Phylogenetic Profiler tool allows one to refine the selection by building a list of homologues either common to or excluded from specific organisms. GenomeTraFaC GenomeTraFaC is a comparative genomics based resource for initial characterization of gene models and the identification of putative cis-regulatory regions of RefSeq gene orthologs. and the gene records diplayed include biochemical properties. 8. sequencing strategy and links out to project web pages.blogspot. 13. protein domains. 9. You can query these annotations at the sequence level as well as search/compare across genomes. Integrated Microbial Genomes (IMG) The Integrated Microbial Genomes (IMG) system facilitates the comparison of genomes sequenced by the Joint Genome Institute (JGI).

Comparisons can be by library and at a sequence level. PartiGeneDB PartiGeneDB is a database of about 300 partial genomes from eukaryotic organisms that have been assembled from EST data. 20. 22. and PEDANT. Sequences are clustered to redunce redundacy. a visualisation tool is included. MNCDB. further annotations such as GO terms and physical properties are also included.Munich Information Centre for Protein Sequences projects include: fungal genome analysis. NEMBASE2 NEMBASE2 is a database resource for EST datasets for 37 species of nematode. SIMAP. SPRING 19. 21.helpBIOTECH. It can be used to view features at various levels. 18. Coding region predictions for each cluster. www. Projector 2 Projector 2 allows users to map completed portions of the genome sequence of an organism onto the finished (or unfinished) genome of a closely-related species or strain. structural genomics. 23. QUIPOS. Projects and databases include: CYGD. MATDB. Sockeye displays genomic features along tracks. and links to the Ensembl database. 17. The mapping of phenotypic data fields allows crossspecies phenotype comparison. SPUTNIK. PhenomicDB PhenomicDB integrates the genotype and phenotype information of several organisms from public data sources. Using the related genome sequence as a template can facilitate sequence assembly and the sequencing of the remaining gaps. MLST MLST (Multi Locus Sequence Typing) is a nucleotide sequence based approach for the unambiguous characterisation of isolates of bacteria and other organisms using the sequences of internal fragments of seven house-keeping genes. access other genes with similar conservation profiles.blogspot. Phydbac2 Phydbac2 (Phylogenomic display of bacterial genes) is a tool to visualize and explore the phylogenomic profiles of bacterial protein sequences. MPPI. and view genes that are found nearby a selected gene in multiple genomes. plant genome bioinformatics. Sockeye Sockeye is a visualization tool allowing one to assemble and analyze genomic information in a three dimensional workspace. MOsDB. NGFN. ranging from SNPs to karyotypes. 16. proteomics and genome annotation. It also allows the user to view sequence similarity across different organisms.com | Your Gate Way to Life Science Career .

TIGR Software Tools A list of open-source software packages available for free from The Institute for Genomic Research (TIGR). Viral Bioinformatics Viral Bioinformatics provides access to viral genomes and a variety of tools for comparative genomic analyses. 27. and visualize the evolutionary sequence conservation mapped back onto the gene structure scaffold. www. alternative splicing and human-mouse orthology information for the analysis of tissue-specific gene and transcript expression patterns. and a table of curated orthologs between budding yeast and fission yeast.com | Your Gate Way to Life Science Career . this database provides comprehensive. gene expression. combined information on orthologs in other species using data from five independent resources: KOGs. 25. TraFaC TraFaC (Transcription Factor Binding Site Comparison) is a tool that identifes regulatory regions using a comparative sequence analysis approach. align the protein coding sequences. SVC SVC (Structured Visualization of Evolutionary Conserved Sequences) is a tool that can search for pairs of orthologous genes.blogspot. Inparanoid.Sorting Permutation by Reversals and Block Interchanges (SPRING) is a tool for the analysis of genome rearrangements. Associated Gene Ontology (GO) terms of orthologs can also be retrieved. T-STAG Tissue-Specific Transcripts And Genes (T-STAG) is a system integrating EST. 28. 24. 29. OrthoMCL. YOGY Eukaryotic Orthology (YOGY) is a resource for retrieving orthologous proteins from nine eukaryotic organisms. SPRING takes two or more chromosomes as its input and then computes a minimum series of reversals and/or block-interchanges for transforming one chromosome into another.helpBIOTECH. 26. Phylogenetic trees based on the rearrangement analysis are also shown as part of the results. Using a gene or protein identifier as a query. Homologene.

indicating a clear evolutionary relationship. although. By studying protein folds (distinct protein building blocks) and families. it should be possible to find the ancestral ties between different organisms.These eukaryote-like features of archaea include the structure of the ribosomes.e. homology refers to two genes sharing a common evolutionary history. phenotypically. Changes in a gene pool can result from mutation—variation within a particular gene—or from changes in gene frequency—the proportion of an allele in a given population.blogspot. and the organization of the DNA replication apparatus. or homologous. distinct from both bacteria and eukaryotes. closer to eukaryotes than to bacteria. Carl Woese and colleagues cocluded that these organisms were not really bacteria but should be assigned to a separate domain of life with the same status as bacteria and eukaryotes.0 Phylogeny New insight into the molecular basis of a disease may come from investigating the function of homologs of a disease gene in model organisms. are said to be from the same protein family. to simply mean similar.com | Your Gate Way to Life Science Career . scientists are able to reconstruct the evolutionary relationship between two species and to estimate the time of divergence between two organisms since they last shared a common ancestor. which is also conserved in archaea and eukaryotes but not in bacteria. Thus far. www. The evolutionary process: Genetic Variation: Evolution is not always discrete with clearly defined boundaries that pinpoint the origin of a new species.4. Evolution requires genetic variation which results from changes within a gene pool. The Three Domains of Life: In the mid-1970s. These trees clearly indicated that archaea comprised a unique branch of life. such as the unusual structure of lipids and the topology of phylogenetic trees of 16S rRNA. the organization of the basal transcriptional apparatus. with several transcription factors of the eukaryotic variety. i. This group was originally referred to as archaebacteria and later renamed archaea. archaea are obviously prokaryotes. Equally exciting is the potential for uncovering evolutionary relationships and patterns between different forms of life. Scientists also use the term homology. In this case. while studying some unusual groups of bacteria. in some important respects. which have a number of proteins shared with eukaryotes but not with bacteria. thermophilic methanogens and halophiles. have small cells without nuclei or organelles. even from some of their biochemical features. nor is it a steady continuum. like bacteria. they are. Proteins that show significant sequence conservation. the presence of histones (in one of the two major branches of archaea). With the aid of nucleotide and protein sequences. experience has taught us that closely related organisms have similar sequences and that more distantly related organisms have more dissimilar sequences.helpBIOTECH. regardless of the evolutionary relationship. the genetic make-up of a specific population. The uniqueness of the archaea was apparent. Furthermore. A gene pool is the combination of all the alleles —alternative forms of a genetic locus—for all traits that population may exhibit.

The predictions are based on the assumptions that residues www. The biological information contained in a genome is encoded in the nucleotide sequence of its DNA or RNA molecules and is divided into discrete units called genes. DNA replication must be extremely accurate to avoid introducing mutations or changes in the nucleotide sequence of a short region of the genome. Phylogenetic Trees: Systematics describes the pattern of relationships among taxa and is intended to help us understand the history of all life.blogspot. Inevitably. 3. function. either from errors in DNA replication or from damaging Mutations in the coding regions of genes are much more important. Codon Usage Database Find GC content and frequency of codon usage for any organism that has a sequence in GenBank. of life's history. The website also contains a substantial list of links to related software. the most convenient way of visually presenting evolutionary relationships among a group of organisms is through illustrations called phylogenetic trees. In phylogenetic studies.Every organism possesses a genome that contains all of the biological information needed to construct and maintain a living example of that organism. a process called DNA replication. and structure prediction. Gain-of-function mutations. confer an abnormal activity on a protein. Tools for phylogeny reconstruction 1. Scientists use these clues to build hypotheses. Bioinformatics Toolkit This Toolkit is a collection of a wide range of tools and links for sequence analysis.com | Your Gate Way to Life Science Career . Every time a cell divides. The information stored in a gene is read by proteins. some mutations do occur. ConSeq ConSeq is a tool for predicting functionally and structurally important amino acid residues in protein sequences. 2. loss-of-function mutations and gain-of-function mutations. This resource offers convienent web interfaces for many freely available tools. But history is not something we can see—it has happened once and leaves only clues as to the actual events. usually in one of two ways. Other goals of the project include providing a central resource enabling computational systematics and education and training initiatives. CIPRes The Cyberinfrastructure for Phylogenetic Research (CIPRes) project aims to develop a computational infrastructure for systematics. which attach to the genome and initiate a series of reactions called gene expression. or models. it must make a complete copy of its genome. 4.helpBIOTECH. which are much less common. A loss-of-function mutation results in reduced or abolished protein function. Those mutations that do have an evolutionary effect can be divided into two categories.

You can query these annotations at the sequence level as well as search/compare across genomes. 12. and structural data for identification of functional sites in proteins. Users can also configure "meta"-tools as a pipeline of individual tools and intermediate filters. 10. JEvTrace Jevtrace is a tool that combines multiple sequence alignments. cpnDB is built and maintained with open source tools. Joes Site . 7. and protein structure prediction. compiled by Joe Felsenstein. eurkaryotes. 5. 8.blogspot. MIGenAS Toolkit Max-Planck Integrated Gene Analysis System (MIGenAS) provides access to many different bioinformatics software tools and databases for sequence similarity searching. creator of Phylip.Phylogeny Programs Comprehensive list of phylogeny packages. cpnDB cpnDB is a curated collection of chaperonin sequence data collected from public databases or generated by a network of collaborators exploiting the cpn60 target in clinical. phylogenetic. It allows viewing and editing of the aligned input sequence data and provides many tools for phylogenetic and statistical analysis of the alignments. population genetics. MINER www.helpBIOTECH.com | Your Gate Way to Life Science Career . phylogenetic analysis. phylogenetic and microbial ecology studies. multiple sequence alignments. bacteria. 11.of functional importance are often conserved and solvent-accessible. MEGA MEGA (Molecular Evolutionary Genetics Analysis) is a software package for phylogenetic analysis with a graphical user interface. Mesquite contains modules for phylogenetic analysis. and viruses). IBM Genome Annotation Page IBM's Bio-Dictionary-based Annotations Of Completed Genomes page lists annotations for over 75 complete genomes (archae. 9. and non-phylogenetic multivariate analysis. Mesquite Mesquite is an open source software project designed to deal with comparative data about organisms and evolutionary analyses. 6. and those of structural importance are often conserved and located in the protein core. The database contains all available sequences for both group I and group II chaperonins. A multiple sequence alignment is used to predict the relative solvent accessibility state and the evolutionary rate at each residue.

MPI Toolkit Max-Planck Institute Bioinformatics Toolkit provides access to many different bioinformatics software tools and databases for sequence similarity searching. NCBI Taxonomy Database Taxonomic classification of all organisms with sequences in GenBank.MINER is a tool for the identification and visualization of phylogenetic motifs (regions within a multiple sequence alignment (MSA) that conserve the overall phylogeny of the complete family). phylogenetic analysis. NEWT NEWT is the taxonomy database maintained by the UniProt group. 19. Linux and Solaris. Available for several platforms including Windows. Orthologue Search Service BLAST a protein sequence then perform automated phylogenetic analysis to detect orthologous sequences. 21. then perform automated phylogenetic analysis on hits or on uploaded sequences. source code available for easy compiling in UNIX.com | Your Gate Way to Life Science Career .helpBIOTECH.blogspot. 13. 15. NJplot NJplot is a tool for visualizing binary trees such as the phylogenetic trees output from the PHYLIP programs. 20. 14. PHYLIP-based analyses. available for PC and Mac. and protein structure prediction. and view genes that are found nearby a selected gene in multiple genomes. multiple sequence alignments. PAL2NAL PAL2NAL converts a multiple sequence alignment of proteins and the corresponding DNA (or mRNA) sequences into a codon alignment. PHYLIP Comprehensive set of programs for phylogenetic analyses. It also allows the user to view sequence similarity across different organisms. 18. Synonymous (Ks) and non-synonymous (Ka) substitution rates can be calculated. MacOS. 16. www. Phydbac2 Phydbac2 (Phylogenomic display of bacterial genes) is a tool to visualize and explore the phylogenomic profiles of bacterial protein sequences. 17. PhyloBLAST BLAST a protein sequence. access other genes with similar conservation profiles.

27. PHYML Phyml is a program that constructs phylogenetic trees from sequence alignments using the maximum likelihood method. Puzzleboot Puzzleboot is a UNIX shell script facilitating bootstrap analysis using TREE-PUZZLE and PHYLIP. among a set of candidate models. and manipulating multiple alignments of protein sequences and structures.and the relationships between them.22. Expresso (or 3DCoffee) aligns sequences using structural information. evaluating. 29. 28. 24. Ribosomal Database Project Highly curated database of aligned and annotated rRNA sequences with accompanying phylogenies. data available for download. TCOFFEE is a protein multiple sequence alignment tool that is more accurate than ClustalW for sequences with less than 30% identity. for a given protein sequence alignment. structure. PROTOGENE turns amino acid alignments into CDS nucleotide alignments. T-COFFEE The T-COFFEE site includes links to a collection of tools for computing. ProtTest ProtTest is a program that determines the best-fit model of evolution. SWAKK Sliding Window Analysis of Ka and Ks (SWAKK) is tool for detecting positive selection in proteins using a sliding window substitution rate analysis. The POWER pipeline can start with processing either multiple sequence alignments (MSA) or can proceed directly with aligned sequences. 25. STING Millenium STING is a suite of tools for the analysis of protein sequence. www. PhyloDome PhyloDome is a tool with which you can visualize and analyze the phylogenetic distribution of one or more eukaryotic domains. 26. The program can display the results on a 3D protein structure. stability and function .com | Your Gate Way to Life Science Career . It enhances TREE-PUZZLE by allowing one to analyse multiple datasets. POWER The Phylogenetic Web Repeater (POWER) allows users to perform phylogenetic analysis using the PHYLIP package.blogspot. and can be used for both protein and DNA distance bootstrap analysis.helpBIOTECH. 23. Tree Editors Tree Editors is an annotated listing of software for the visualization and manipulation of phylogenetic trees. 30. 31.

33. TreeView Generates nice graphics of trees. TREE-PUZZLE Tree-puzzle is a program that constructs phylogenetic trees from sequence alignments using the maximum likelihood method. 37.000 nodes.helpBIOTECH. Understanding Evolution A fantastic site for teaching/understanding evolution. TreeDomViewer constructs phylogenetic trees and projects the corresponding protein domain information onto the multiple sequence alignment. Life History and Ecology. Tree of Life Multi-authored project attempting to represent online the entire phylogeny of life on earth. 35. TSEMA The Server for Efficient Mapping Assessment (TSEMA) predicts possible protein-protein interactions based on the comparision of phylogenetic trees derived from sequences of associated protein families. Systematics and Morphology.32. 34. taxonomies. At each level of the tree there is a brief summary. reads multiple tree file formats. and automatically calculates and marks the differences. www. It can work with trees having up to 500.). Bacteria and Eukaryota). gene trees.com | Your Gate Way to Life Science Career . Weighbor Weighbor is a tool for building phylogenetic trees from distance matrices. etc. UCMP Phylogeny Wing "Phylogeny-Diversity of Life Through Time" is an on-line exhibit at the University of California Museum of Paleontology website. and you can navigate through a very informative phylogenetic tree rooted at the three main domains of life (Archaea. 38.blogspot. 39. TreeDomViewer TreeDomViewer is a tool for the visualization of phylogeny and protein domain structure. 40. There is an introduction to phylogenetics and cladistics. 36. TreeJuxtaposer TreeJuxtaposer is a free software tool that allows a visual comparison of two trees in Newick format (phylogenies. available for download to Mac or PC. It employs a weighted version of the neighbour-joining method in which longer distances in the matrix are given less weight. and links to more information about the Fossil Record.

protein modeling will become an increasingly important tool for scientists working to understand normal and diseaserelated processes in living organisms. Variable regions (VRs). For other side chain coordinates one can apply a side chain rotamer library in a systematic approach to explore possible side chain www. sometimes years. Because the different genome projects are producing more sequences and because novel protein folds and families are being determined.helpBIOTECH. The challenge lies in developing methods for accurately and reliably understanding this intricate relationship. Identification of Structurally Conserved and Structurally Variable Regions: After the known structures are aligned. Illuminating a protein's structure also paves the way for the development of new agents and devices to treat a disease. can be constructed for these regions of the proteins.0 Protein Modeling The process of evolution has resulted in the production of DNA sequences that encode proteins with specific functions. one needs to model main chain atoms and side chain atoms. Protein modeling involves identification of the proteins with known three-dimensional structures that are related to the target sequence. or structure. scientists have begun to turn toward computers to help predict the structure of a protein based on its sequence. it is still extremely helpful in proposing and testing various biological hypotheses. In the absence of a protein structure that has been determined by X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. Although molecular modeling may not be as accurate at determining a protein's structure as experimental methods.blogspot. they are examined to identify the structurally conserved regions (SCRs) from which an average structure. or framework. Generating Coordinates for the Unknown Structure: When generating coordinates for the unknown structure. Side chain coordinates are copied if the residue type in the unknown is identical or very similar to that in the known homologues. Identifying a protein's shape.com | Your Gate Way to Life Science Career . to experimentally determine a single structure. both in SCRs and VRs. Yet solving the structure of a protein is no easy feat. it is straightforward to generate the coordinates of the main chain atoms of the unknown structure from those of the known structure(s). Therefore. is key to understanding its biological function and its role in health and disease. constructing a model for the target sequence based on its alignment with the template structure(s) and evaluating the model against a variety of criteria to determine if it is satisfactory. also must be identified because special techniques must be applied to model these regions of the unknown protein. For the SCRs. It often takes scientists working in the laboratory months. Molecular modeling also provides a starting point for researchers wishing to confirm a structure through X-ray crystallography and NMR spectroscopy. researchers can try to predict the three-dimensional structure using protein or molecular modeling. This method uses experimentally determined protein structures (templates) to predict the structure of another protein that has a similar amino acid sequence (target). in which each of the known structures may differ in conformation. Evaluating the Alignment: The best way to assess the accuracy is to compare alignments from sequence comparisons with alignments from protein three-dimensional structures.5.

For the VRs.com | Your Gate Way to Life Science Career .blogspot. That is. one to generate the homology models. In the event that some coordinates in the unknown are undefined in the SCRs. regularization can be used to build and relax both main chain and side chain atoms in those regions. WHAT IF. Coordinates for side chain atoms in these loop regions may be copied if residues are similar. Side chain coordinates of residues that are similar in length and character also may be copied. other distances within the main-chain. Two of these databases are ModBase and 3DCrunch. one to evaluate the quality of the homology models. Rotamer libraries can be used to define other side chain coordinates.g. then the main chain coordinates of that known structure can be copied. which creates models based on the satisfaction of spatial restraints. It may be desirable to weight the contribution of each homologue in each SCR based on the extent of similarity with the unknown. The loop may then be subjected to conformational searching to identify low energy conformers if desired. and one to www. restraints are identified from the alignments of homologues of known structure. Recall that these regions will correspond most often to the loops on the surface of the protein. When a good model for a loop cannot be found among the known structures. and main-chain and side-chain dihedral angles. Restraints can include distances between alpha carbons. 3DCrunch is a large scale modeling project that aims to submit all entries from protein sequence databases to SWISS-MODEL.helpBIOTECH. A residue range is chosen to include the undefined loop as well as a few residues (e. using their program Modeller. includes three components.. Automated Web-Based Homology Modeling: Web-based tools are now available to generate models of protein 3-dimensional structures using comparative modeling techniques.000 entries. three) on either side of the loop for which coordinates have been defined. Modbase was created by Sali and co-workers. one can search fragment databases for loops in other proteins that may provide a suitable model for the unknown. Fragments are examined for their ability to fit in the undefined region without making bad contacts with other atoms and to overlap well with the residues on either side of the loop. available on EMBL servers. and these restraints are then applied to the unknown sequence. If a loop in one of the known structures is a good model for that of the unknown.conformations. a variety of approaches may be applied in assigning coordinates to the unknown. Switzerland. Databases of Structures from Homology Modeling: Databases are now available that contain large numbers of protein structures that have been obtained by comparative (homology) modeling. Note that this procedure should be used only if the region of undefined atoms is one or two residues in length. Routines to satisfy the restraints optimally include conjugate gradient minimization and molecular dynamics with simulated annealing. though it is likely that considerable application of side chain rotamer libraries will be required to define coordinates in these regions. SWISS-MODEL is available through Glaxo Wellcome Experimental Research in Geneva. Currently the database contains 64.

The restraints then may be gradually removed for subsequent minimizations. Once any irregularities have been resolved. That is. This process may consist of energy minimization with restraints. a solvent shell. O and N).com | Your Gate Way to Life Science Career . the structure should be solvated. www.psi) angles. the entire structure may then be subjected to further refinement. Programs that provide structure analysis along with output that is useful for publication include PROCHECK and 3D-Profiler PROCHECK is based on an analysis of (phi. the expected values of these parameters are known and can be compared to a modeled structure based on the atomic resolution of the structures from which the model was developed.helpBIOTECH. bond lengths. 3D-profiler compares a homology model to its sequence using a 3D profile. or a periodic box of pre-equilibrated water molecules. a 3D structure is converted into a 1D profile that describes each residue in the folded protein structure. (2) the fraction of side-chain area that is covered by polar atoms (i. peptide bond planarity. The profile is based on the statistical preferences of each of the 20 amino acids for particular environments within the protein. It also may be advantageous to apply molecular dynamics in conjunction with energy minimization. and (3) the local secondary structure. especially for the SCRs. Each residue position in a 3D model can be characterized by its environment. For any of these refinement procedures. Evaluation and Refinement of the Structure: For a homology model from any source. using for example crystallographic waters from the known homologues.e. hydrogen-bond geometry. researchers have analyzed three-dimensional structures of proteins from which basic principles of protein structure and folding have been developed. Based on these environment variables.blogspot. Thus. it is important to demonstrate that the structural features of the model are reasonable in terms of what is know about protein structures in general. bond angles. and side-chain conformations of known protein structures as a function of atomic resolution. Examination of these profiles reveals which regions of a sequence appear to be folded correctly and which do not.evaluate models of proteins for which the structure is already known.. Several programs are available to assist in this analysis of correctness of a homology model. Preferred environments for amino acids are derived from known three-dimensional structures and are defined by three parameters: (1) the area of each residue that is buried. thereby providing for evaluation of the quality of the modeling program.

and predicts which ones will be the most potent and successfully applied to find nanomolar inhibitors of Cathepsin D roughly an order of magnitude superior to standard diversity approaches 4. filtering for shape complementarity and optional pharmacaphoric features before scoring with more traditional functions 6. rigid protein placement algorithm based on the interactions occurring between the molecules (limited to lowenergy structures) 8.helpBIOTECH. partially flexible. Genetic Algorithm. bind to a receptor of known 3D structure 3. virtual screening (selecting a set of compounds for experimental testing) conformational flexibility of the ligand. such as substrates or drug candidates.) automated. Protein-Ligand Docking Affinity (Accelrys Inc. multiconformer docking program examines all possible poses within a protein active site.blogspot.com | Your Gate Way to Life Science Career . FlexiDock (Tripos) simple. flexible docking of ligands into binding sites on proteins fast genetic algorithm for generation of configurations rigid.0 Tools for structure based drug design and docking Docking Software 1. CombiBUILD (Sandia National Labs) structure-based drug design program created to aid the design of combinatorial libraries screens a library possible reactants on the computer. and database screening docking algorithms 5. FlexX (BioSolveIT GmbH) fast computer program for predicting protein-ligand interactions two main applications: complex prediction (create and rank a series of possible protein-ligand complexes). FRED (OpenEye) accurate and extremely fast. or fully flexible receptor side chains provide optimal control of ligand binding characteristics conformationally flexible ligands tunable energy evaluation function with special H-bond treatment very fast run times 7. interaction geometry database used to exactly describe intermolecular interaction patterns Boehm function (with minor adaptions necessary for docking) applied for scoring 9.GLIDE (Schrödinger GmbH) high-throughput ligand-receptor docking for fast library screening fast and accurate docking program identifies the best binding mode through Monte Carlo sampling provides an accurate scoring function for ranking of binding www. flexible docking uses the energy of the ligand/receptor complex to automatically find the best binding modes of the ligand to the receptor (energy-driven method) 2.6. MIMUMBA torsion angle database used for the creation of conformers. DockVision (University of Alberta) docking package created by scientists for scientists by including Monte Carlo. AutoDock (The Scripps Research Institute) automated docking of flexible ligands to macromolecules designed to predict how small molecules.

GOLD (CCDC) calculating docking modes of small molecules into protein binding sites genetic algorithm for protein-ligand docking full ligand and partial protein flexibility energy functions partly based on conformational and non-bonded contactinformation from the CSD choice of scoring functions: GoldScore. DOCK (UCSF Molecular Design Institute) generates many possible orientations (and more recently. SenSitus interactive docking and visualization program for low-resolution density maps and atomic structures GUI-based alternative to certain Situs docking programs that can benefit from an interactive user interface and 3D visualization methods 15. HINT! (Virginia Commonwealth University) Hydropathic Interactions empirical molecular modeling system with new methods for de novo drug design and protein or nucleic acid structural analysis translates the well-developed Medicinal Chemistry and QSAR formalism of LogP and hydrophobicity into a free energy interaction model for all biomolecular systems based on the experimental data from solvent partitioning calculates 3D hydropathy fields and 3D hydropathic interaction maps estimates LogP for modeled molecules or data files numerically and graphically evaluates binding of drugs or inhibitors into protein structures and scores DOCK orientations constructs hydropathic (LOCK and KEY) complementarity maps (can be used to predict an ideal substrate from a known receptor or protein structure or to propose the hydropathic structure from known agonists or antagonists) evaluates/predicts effects of site-directed mutagenesis on protein structure and stability 12.by predicting binding affinity rapidly and with a reasonable level of accuracy .will greatly enhance the probability of success in a drug discovery program 10. ChemScore and User defined score virtual library screening 11.com | Your Gate Way to Life Science Career .blogspot. VEGA (Milan University) calculation of ligand-receptor interaction energy Protein-Ligand & Protein-Protein Docking 16.helpBIOTECH.affinities can enrich the fraction of suitable lead candidates in a chemical database . SITUS (Scripps Research Institute) program package for modeling of atomic resolution structures into low-resolution density maps software supports both rigid-body and flexible docking using a variety of fitting strategies 14. or to rank molecules from a database search databases for www. conformations) of a putative ligand within a user-selected region of a receptor structure orientations may be scored using several schemes designed to measure steric and/or chemical complementarity of the receptor-ligand complex evaluate likely orientations of a single ligand. LIGPLOT (University College of London) program for automatically plotting protein-ligand interactions generates schematic diagrams of protein-ligand interactions for a given PDB file interactions shown are those mediated by hydrogen bonds (dashed lines between the atoms involved) and by hydrophobic contacts (represented by an arc with spokes radiating towards the ligand atoms they contact) 13.

Roberts at The Scripps Research Institute for use in the study of www. ICM-Dock (MolSoft LLC) fast and accurate docking simulations unique set of tools for accurate individual ligand-protein docking.blogspot. ClusPro (Boston University) integrated approach to protein-protein docking docking algorithm includes the following steps: rigid body docking based on the Fourier correlation approach (used DOT and ZDOCK docking programs) selection of structures with favorable desolvation and electrostatic properties clustering the retained complexes using a pairwise RMSD criterion refinement of the 25 largest clusters by the flexible docking algorithm SmoothDock 23. DOT (San Diego Supercomputer Center) Daughter Of TURNIP TURNIP . peptide-protein docking. Bielefeld Protein Docking (Bielefeld University) detects geometrical and chemical complementarities between surfaces of proteins and estimates docking positions 21.helpBIOTECH.DNA-binding compounds examine possible binding orientations of protein-protein and protein-DNA complexes design combinatorial libraries 17. GRAMM (SUNY) Global Range Molecular Matching empirical approach to smoothing the intermolecular energy function by changing the range of the atom-atom potentials requires only the atomic coordinates of the two molecules to predict the complex structure (no binding site information needed) performs an exhaustive 6dimensional search through the relative translations and rotations of the molecules see also the database of Protein-Protein Decoys for the validation of energy functions and refinement procedures 18. a molecular graphics and modeling program for studying protein structures and interactions 22. 3D-Dock Suite (BioMolecular Modeling. Cancer Research UK) incorporating FTDock. and proteinprotein docking. including interactive graphics tools Protein-Protein (Peptide) Docking 19.com | Your Gate Way to Life Science Career .program. BiGGER (BioTecnol.) Biomolecular complex Generation with Global Evaluation and Ranking efficient protein-docking algorithm predicts the structure of binary protein complexes from the unbound structures search the complete binding space and select a set of candidate complexes evaluate and rank each candidate according to the estimated probability of being an accurate model of the native complex intergrated in chemera.A. S. developed by V. RPScore and MultiDock FTDock (Fourier Transform Dock) performs rigidbody docking on two biomolecules in order to predict their correct binding geometry outputs multiple predictions that can be screened using biochemical information RPScore (Residue level Pair potential Score) uses a single distance constraint empiricaly derived pair potential to screen the ouptut from FTDock can reduce dramatically the list of possible complexes within which can be found a correct solution MultiDock (Multiple copy side-chain refinement Dock) 20.

blogspot. G. Cesareni and M. Ausiello.com | Your Gate Way to Life Science Career . Helmer Citterich new release.macromolecular dockingcomputation of the electrostatic potential energy between two proteins or other charged molecules 24.defined as an ambiguous distance between all residues shown to be involved in the interaction 26. with a reengineered code. HADDOCK (Utrecht University Netherlands) High Ambiguity Driven proteinprotein Docking biochemical and/or biophysical interaction data such as chemical shift perturbation data resulting from NMR titration experiments or mutagenesis data introduced as ambiguous interaction restraints (AIRs) to drive the docking process AIR . includes some new features: protein-protein and DNA-protein docking capability fast surface calculation based on the NSC algorithm 25. ESCHER NG (Milan University) enhanced version of the original ESCHER proteinprotein automatic docking system developed in 1997 by G. HEX (University of Aberdeen) protein docking and molecular superposition program use spherical polar Fourier correlations to accelerate docking calculations www.helpBIOTECH.

the content markup language used on the Web. Active X. that can often be increased. A Web browser performs multiple tasks. it knows how to transfer data using the HTTP protocol. it is also an email client. any Web browser is an HTTP client. simply selecting the link downloads the file. Finally.7. If. POP. The Web. many protocols other than HTTP flow over the Internet. newsgroups. you are given the following instructions to retrieve a file: www. Finally. in addition to supporting extensions to HTML. These extra capabilities may be built into the browser or may be added by "plugins". and gopher for example. however. The current common name. HTTP does not have to be transmitted over the Internet.0 Computational Resources Web services: The Web describes information using HyperText Markup Language (HTML) and transmits it using HyperText Transport Protocol (HTTP).helpBIOTECH. an FTP client and a Gopher client. represent extensions to HTML. or Javascript. Some of the differences in the way different Web browsers display the same Web page come from different design decisions ("what font should be used for <H1> text?") and some of it comes from the fact that different Web clients have different capabilities. Some of these differences. However. What this really means is that the particular piece of software (e. In part. the new user is confused by the fact that. First. such as the ability to display various kinds of still or moving images as part of the Web page or to run programs written in Java. the behavior of a Web browser can frequently changed by configuring its preferences. is a contraction of its original name. and every Web client I have worked with has the ability to read and display local HTML files. Second. many people choose to so use them. and HTML doesn't have to be transmitted via HTTP. Netscape Communicator) is more than just a Web client. Many new computer users assume that the Web and the Internet are synonymous. Web technology has become a common interface tool for communication between computers on a local network (sometimes called an Intranet). IMAP).com | Your Gate Way to Life Science Career . many popular web browsers have support for other protocols such as email (SMTP. if you find the default font too small. the Word Wide Web.blogspot. software extensions which give the browser new functionality. Different browsers have different display capabilities and display the same HTML code in different ways (which is why HTML is referred to as a content markup language instead of a page description language) but all of them can understand (parse) HTML and do something reasonable with it. In the case where a Web page contains a link to an FTP server. any Web browser also knows how to interpret and display HTML. Because virtually every Web client is also a limited FTP client.g. ftp. also abbreviated as WWW or W3.

Such specialized telnet services have become much less common since the rise in popularity of the Web. however. but this almost always includes ASCII text.blogspot. access to the host filesystem is accomplished by a series of commands. and some people will continue to use telnet as their client. but is occasionally done when debugging. Login with this name. A client may choose to hide these commands. www. However. a telnet client can sometimes be used to connect to a server for these other protocols. Originally it. but buttons. Because many protocols for other services (e. Ftp is an older service designed specifically for file transfer. might not have typed commands at all. cd to Change Directory and ls to LiSt the files in that directory. To transfer files. but is much less useful for transferring files. Most people will use a telnet client the first time connecting to a MOO. the commands are unix-like. a client with a graphical user interface (GUI). every telnet host will be different. Although "full service logins" as is described above are perhaps the most common use of the telnet protocol. On a unix ftp client.com | Your Gate Way to Life Science Career . In this variant. you frequently will have all the privileges of a local user. ftp: Telnet is useful for interactive computer access. create and delete files. you execute either get a file from the host computer or put a file onto it (where allowed). This is almost never done to use a Web server. These commands do not depend on the host computer running UNIX! These are ftp commands. Once logged on.Networking Telnet: Telnet is one of the oldest of the network services and perhaps the easiest to understand. A telnet session can negotiate a range of different protocols. for example. The National Institutes of Health in the United States used. is likely to be restricted to a limited number of commands.helpBIOTECH. Once logged on via ftp. like telnet. you can run programs. a telnet service may be advertised with a public login name and password. Thus. Similarly. the variant of "anonymous ftp" developed. From a practical point of view. Telnet allows one computer to "log on" to another computer as if it were a terminal. HTTP) are encoded as ASCII text. This is probably the most common way that users with accounts will use a computer. was intended for account owners. at one point. SMTP.g. logging in with a "magic" user name (most commonly "anonymous" or "ftp") eliminates the requirement for a password. such a telnet login to disseminate information as to the membership of study sections. some of which happen to be similar to unix commands. in fact as much control as the host's system administrator desires may be imposed on a telnet connection. although most of us find dedicated clients to be significantly more convenient. and thus you will need to learn about each one as you have occasion to use it. it is possible to connect to a Web server with a telnet client if you understand the syntax of HTTP. as it became apparent that it was useful to make files available to the world at large without giving all those wanting the files an account.

a user on another computer. (IMAP is a newer protocol for accomplishing the same task about which you may hear more in the future. the Macintosh operating system with a carriage return (ASCII 13 decimal) and MSDOS uses one of each.helpBIOTECH. Rather. and it is that software which communicates with the SMTP server. the program can alternatively serve as client or server). should the receiving server not be reachable when the transmitting server needs to send email. If you send and receive email via a computer that is not always on and/or not always connected to the network. the email message will be held and the transmission will be retried several times over a period of days until a successful transmission occurs or until the maximum retry time has been exceeded.One pair of ftp commands which is especially important to understand are binary and ascii. sending email proceeds as above. Thus. Ftp transfers occur in ascii (text) mode by default. mutt. at which point an error message will be returned to the sender. but catastrophic for binary files like program object code and pictures.com | Your Gate Way to Life Science Career . SMTP transmits email on port 25 between two dedicated. a Unix workstation). as ftp may make changes in the file during transfer. a POP3 account will be provided by whoever provides your Internet access. Thus. you will not interact with these programs directly. This is highly desirable for text files. UNIX terminates lines with the linefeed character (ASCII 10 decimal). receive. and are complex. Examples of client software running on Unix workstations are mail. This instructs ftp to transfer files unmodified. Email: Both ftp and telnet are interactive.) The SMTP server stores your email on a remote host and your local client retrieves it from a POP3 server when you check for mail. but receiving email is different in that the SMTP server cannot necessarily get incoming email onto your computer's file system. web browsers sometimes can be used as email clients. to install an www. In that case. If you send and receive email via a computer that is always on and always connected to a network reachable by your mail server (e. the file received may not be identical to the one on the host. Typically. a different protocol is used. it is important to issue the binary command. Although the assumption is that both SMTP servers will be generally available. Typically. For example. In ascii mode. and outgoing email is passed to the SMTP server. more or less real time programs.g. then incoming mail is saved to a mail spool file on your computer from whence your client software retrieves it. mush. most email is transmitted by SMTP (Simple Mail Transport Protocol) via TCP/IP over the Internet. most commonly POP3. to communicate with another computer. Email is a generic term for a variety of processes which can use different protocols and network technology. At present. in many cases uses a more complex client/server model. These differences are corrected for during an ascii transfer. by leaving them a message which they can read and respond to at their convenience.blogspot. Sometimes it is useful. This is done over the Internet by using email. or more commonly. elm. before getting such a file. The SMTP programs discussed above are typically symmetrical (e. full time servers. mailx. dedicated client software is used to compose. and read email. send. however. Also.g. and pine. and which. to allow for differences in how different operating systems handle text. as is discussed below.

Important commands in LINUX/UNIX operating systems 1.change the permissions on a file or directory chmod alters the permissions on files and directories using either symbolic or octal numeric codes.. cat . you typically have to provide the domain name and/or IP address of the SMTP and POP3 servers (frequently the same) and the user name and password for the POP3 account.to remove a permission w write o other = to assign a permission explicitly x execute (for files). cd . moves to the parent directory of your current directory.helpBIOTECH.email client on a Mac or Windows computer. or its pathname relative to the current directory. chmod . No other permissions are altered. dir1 may be either the full pathname of the directory. access (for directories) The following examples illustrate how these codes are used. chmod u+x. unless redirected elsewhere).e. changes directory so that dir1 is your new current directory. or to string together copies of several files. cat ex1 ex2 > newex creates a new file newex containing copies of ex1 and ex2.blogspot. writing the output to a new file.display or concatenate files cat takes a copy of a file and sends it to the standard output (i. cd dir1 cd .g+w.o-r file1 www. 2. so it is generally used either to read files. The symbolic codes are given here:u user + to add a permission r read g group . 3. cat ex displays the contents of the file ex. to be displayed on your terminal. chmod u=rw file1 sets the permissions on the file file1 to give the user read and write permission on file1.change directory cd is used to change from one directory to another.com | Your Gate Way to Life Science Career . with the contents of ex2 following the contents of ex1. cd changes directory to your home directory.

cp file1 file2 copies the contents of the file file1 into a new file called file2.display differences between text files diff file1 file2 reports line-by-line differences between the text files file1 and file2. using the command line date '+The date is %d/%m/%y.helpBIOTECH.blogspot. cp file3 file4 dir1 creates copies of file3 and file4 (with the same names). cp cannot copy a file onto itself. and the contents and subdirectories of dir2 are recreated within it. within the directory dir1.alters the permissions on the file file1 to give the user execute permission on file1. For example. The form of the output is different from that given by diff. which ignores all trailing blanks. and prevents all other users having access to that directory (by using cd. dir1 must already exist for the copying to succeed. would produce the output The date is 14/12/97. cp -r dir2 dir3 recursively copies the directory dir2. containing a copy of all the contents of the original dir2. and diff -b. cp . After each such line.copy a file The command cp is used to make copies of files and directories. diff .display the current date and time date returns information on the current date and time in the format shown below:Tue Mar 25 15:21:16 GMT 1997 It is possible to alter the format of the output from date.n5 c n6. chmod u+w. diff -cn produces a listing of differences within n lines of context. it is created by cp.n7 means that lines n4 to n5 in file1 differ from lines n6 to n7 in file2). where the default is three lines. diff prints the relevant lines from the text files. a subdirectory called dir2 is created within it. They can still list its contents using ls.n5 c n6. There are several options to diff.go-x dir1 gives the user write permission in the directory dir1. and n4.n3 means that file2 has the extra lines n2 to n3 following the line that has the number n1 in file1. If dir3 does exist.' at exactly 3. date . including diff -i. to the directory dir3. and prevent any users not in this group from reading it.n7 . with < in front of each line from file1 and > in front of each line from file2. If dir3 does not already exist. which ignores the case of letters when comparing lines. The default output will contain lines such as n1 a n2. and the time is %H:%M:%S. together with its contents and subdirectories. 5.10pm on 14th December 1997.n3 and n4. with + indicating www. 6.) 4. (where n1 a n2. to give members of the user's group write permission on the file.com | Your Gate Way to Life Science Career . and the time is 15:10:00.

f' -print searches the current directory and all its subdirectories for files ending in . file .helpBIOTECH. but (a warning!) it does sometimes make mistakes. grep . file file1 can tell if file1 is. so grep motif1 file1 file2 . find . www. filen. diff dir1 dir2 will sort the contents of directories dir1 and dir2 by name. for example. If no file name is given. and writes their names to the standard output. grep -c motif1 file1 will give the number of lines containing motif1 instead of the lines themselves.searches files for a specified string or expression grep searches for lines containing a specified pattern and. find . 8. -name '*.lines which have been added. an executable program or shell script. 7. an empty file.com | Your Gate Way to Life Science Career . writes them to the standard output. filen will search the files file1.f. 9.blogspot.. or a library. grep motif1 file1 searches the file file1 for lines containing the pattern motif1. grep acts on the standard input. . grep -v motif1 file1 will write out the lines of file1 that do NOT contain motif1. a source program. for the pattern motif1. .determine the type of a file file tests named files to determine the categories their contents belong to... and then run diff on the text files which differ. In some versions of Unix the names of the files will only be written out if the -print option is used. file2. a directory. and ! indicating lines which have been changed. find /local -name core -user user1 -print searches the directory /local and its subdirectories for files called core belonging to the user user1 and writes their full file names to the standard output. ..find files of a specified name or type find searches for files in a named directory and all its subdirectories.indicating lines which have been removed. grep can also be used to search a string of files. by default.

If the printer is a laserwriter. The amount of space saved by compression varies. gzip -v file2 compresses file2 and gives information.print out a file lpr is used to send the contents of a file to a printer.helpBIOTECH. and the file contains PostScript.10.26 -. info .replaced with file2.display information about bash builtin commands help gives access to information about builtin commands in the bash shell. To see the status of the job on the printer queue use lpq -Pprinter1 for a list of the jobs queued for printing on printer1.compress a file gzip reduces the size of named files. will give details about the bash shell history listings.) www. Using the command info on its own will enter the info system. then gunzip file2 will replace file2. 12. gzip .read online documentation info is a hypertext information system. info bash will give details about the bash shell. If you have a compressed file file2. then the PostScript will be interpreted and the results of that printed out. on the percentage of the file's size that has been saved by compression:file2 : Compression 50. help history. for example. Use the command q to exit info. kill .gz. help followed by the name of one of these commands will give information about that commands. in the format shown below.gz To restore files to their original state use the command gunzip. (This may not work for remote printers. and give a list of the major subjects it has information about.blogspot. lpr -Pprinter1 file1 will send the file file1 to be printed out on the printer printer1. 11. gzip file1 results in a compressed file called file1.gz. replacing them with files of the same name extended by . help .kill a process to kill a process using kill requires the process id (PID). and deletes file1.com | Your Gate Way to Life Science Career .gz with the uncompressed file file2.gz . 14. 13. For example. lpr . Using help on its own will give a list of the commands it has information about.

together with any parent directories required. man -Mpath command1 is used to change the set of directories that man searches for manual pages on command1 17. man . ).helpBIOTECH.blogspot. This is useful if you do not yet know the name of a command you are seeking information about. man man.list names of files in a directory ls lists the contents of a directory. man command1 will display the manual page for command1. www. ls lists the contents of the current directory. (including files whose names begin with . rather than its contents.com | Your Gate Way to Life Science Career . mkdir . and the time it was last altered.g man cp.make a directory mkdir is used to create new directories. To obtain the information on dir1 itself. use ls -ld dir1 16. ls -l dir1 gives such information on the contents of the directory dir1. ls -a dir1 will list the contents of dir1. man -k keyword lists the manual page subjects that have keyword in their headings. In order to do this you must have write permission in the parent directory of the new directory.15. its size in kbytes.display an on-line manual page man displays on-line reference manual pages. ls . and can be used to obtain information on the files and directories within it. (excluding files whose names begin with . ls -l file1 gives details of the access permissions for the file file1. mkdir -p dir1/dir2/newdir will create newdir and its parent directories dir1 and dir2. mkdir newdir will make a new directory called newdir. e. if these do not already exist. ). mkdir -p can be used to create a new directory. ls dir1 lists the names of the files and directories in the directory dir1. If no directory is named.

in which case dir1 will be moved into dir2.helpBIOTECH. 19.move or rename files or directories mv is used to change the name of files or directories. You will be prompted once for your current password. nice . mv file1 file2 dir3 moves the files file1 and file2 into the directory dir3. mv file1 file2 changes the name of a file from file1 to file2 unless dir2 already exists. if it is necessary to do that. nice can be particularly useful when running a long program that could cause annoyance if it slowed down the execution of other users' commands. Type q if you wish to quit more before the end of file1 is reached.18. It will scroll up one line every time the return key is pressed. 21. or to move them into other directories. 20. more -n file1 will cause n lines of file1 to be displayed in each screenful instead of the default (which is two lines less than the number of lines that will fit into the terminal's screen).change your password Use passwd when you wish to change your password. www. more file1 starts by displaying the beginning of file1. so. and one screenful every time the space bar is pressed. passwd . and twice for your new password. mv dir1 dir2 changes the name of a directory from dir1 to dir2. An example of the use of nice is nice compress file1 which will execute the compression of file1 at a lower priority.scan through a text file page by page more displays the contents of a file on a terminal one screenful at a time. more .change the priority at which a job is being run nice causes a command to be run at a lower than usual priority. Type ? for details of the commands available within more.com | Your Gate Way to Life Science Career . use cp instead.blogspot. mv cannot move directories from one file-system to another. Neither password will be displayed on the screen. mv .

rm file1 will delete the file file1.display the name of your current directory The command pwd gives the full pathname of your current directory. the cpu time used so far. www. quota . 23. you will be asked if you wish to delete file1.22. rm -r dir1 recursively deletes the contents of dir1. and includes information on disks mounted from other machines. and the file will not be deleted unless you answer y. ps gives brief details of your own processes in your current session. its subdirectories. but it is not necessary to have read or write permission on the file itself. the controlling terminal (if there is one). including those from previous sessions use:ps -fu user1 using your own user name in place of user1.helpBIOTECH. 25. Use man ps for details of all the options available on the machine you are using.com | Your Gate Way to Life Science Career .disk quota and usage quota gives information on a user's disk space quota and usage. ps is a command whose options vary considerably in different versions of Unix (such as BSD and SystemV). rm . In order to remove a file you must have write permission in its directory. This information includes the process id. To obtain full details of all your processes. If you use rm -i file1 instead. This is a useful safety check when deleting lots of files. 24. quota will only give details of where you have exceeded your disc quota on local disks. pwd . ps . whereas quota -v will display your quota and usage. whether the quota has been exceeded or not.remove files or directories rm is used to remove files. and the name of the command being run. and dir1 itself. and should be used with suitable caution.blogspot.list processes ps displays information on processes currently running on your machine. as well as the local disks.

To be certain of ignoring leading blanks use sort -bn instead.secure remote login program slogin is used for logging onto a remote machine and for executing commands on a remote machine. 29. which treat leading blanks as significant. sort -d uses "dictionary order". sort sorts lines using a character by character comparison. Leading blanks are ignored when this option is used.remove a directory rmdir removes named empty directories. in which only letters. 27. (except in some System V versions of sort.blogspot. sort . using telnet to connect to the Central Unix Service You can then login using your user name on cus. By default. slogin . sort -n sorts lines according to the arithmetic value of leading numeric strings. If you need to delete a non-empty directory rm -r can be used instead. and provides secure encrypted communications between the local and remote machines using an SSH protocol. rmdir exdir will remove the empty directory exdir.sort and collate lines The command sort sorts and collates lines in files.26. If you use the escape character instead. and using the order of the ASCII character set.ac.). working from left to right.uk.remote login program telnet communicates with another computer using the TELNET protocol. digits. and the command quit will get you back to the command line of your local machine.cam. you will enter telnet's command mode (you'll get the prompt telnet > ). sort acts on the standard input. and white-space characters are considered in the comparisons. If no file names are given.com | Your Gate Way to Life Science Career . sort -r reverses the order of the collating sequence. telnet host1 will connect to the remote machine host1 (if it allows telnet connections). rmdir .helpBIOTECH. 28. For example. sending the results to the standard output. www. telnet . The remote machine must be running an SSH server for such connections to be possible.

BioPax The BioPAX web site provides information about a collaborative effort to create a data exchange format for biological pathways. genomics and life science research. 9. and coders whose aim is to explore various methodologies for biological data representation. Includes functions and minitools (copy and paste one page scripts for basic tasks in bioinformatics. 6. A wiki-like service allows modification and improvement of code. BioPipe The biopipe is a workflow framework that seeks to address some of the complexity involved in carrying out large scale bioinformatics analysis. Bioconductor Bioconductor is an open source and open development software project that aims to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data. BioPHP Open Source PHP code for bioinformatics. BioPerl The BioPerl Project is an international association of developers of open source Perl tools for bioinformatics. www. 4. 5. BioMoby BioMOBY is an international research project involving biological data hosts. distribution. if that is available. 7. it is preferable to use ssh.helpBIOTECH. 3. 8.blogspot. Some Bioprograming tools: 1. BioJava The BioJava Project is an open-source project dedicated to providing Java tools for processing biological data. It has been designed to work intimately with the bioperl package.com | Your Gate Way to Life Science Career .As communications between the two machines are not encrypted when using telnet. 2. BioPerl course Great tutorial for those interested in the bioperl group of modules. biological data service providers. BioDAS This site is the center of development of an Open Source system for exchanging annotations on genomic sequence data. and discovery.

18. GenAge is a database of genes related to human ageing. CCT CCT (Current Comparative Table) is a software package that you can install and set-up on your own system to help you to maintain and search databases. the tutorial document includes lots of examples of scripts and exercises for you to try.helpBIOTECH. an interpreted computer language which allows branching and looping as well as modular programming using functions. R System for statistical computation and graphics. 15. 13. 17. 11. Ensembl API Ensembl is a freely available software system for genomic analysis. NCBI C++ toolkit The NCBI C++ Toolkit is a collection of C++ modules developed by the NCBI for writing bioinformatics software and applications. Human Ageing Genomic Resources The Human Ageing Genomic Resources (HAGR) website provides tools and curated databases relevant to the genetics of human ageing. BioPython The Biopython Project is an international association of developers of freely available Python tools for computational molecular biology. 19. volunteer run organization focused on supporting open source programming in bioinformatics. and AnAge is a multi-species database facilitating the comparative biology of ageing.com | Your Gate Way to Life Science Career . PyMOL PyMOL is a molecular graphics system with an embedded Python interpreter designed for real-time visualization and rapid generation of high-quality molecular graphics images and animations. Seqhound API www. 16. BioRuby The BioRuby project aims to implement an integrated environment for bioinformatics with Ruby. 14. Open Bioinformatics Foundation The Open Bioinformatics Foundation is a non profit.10. In particular.blogspot. The documentation page at Ensembl is the best place to get information on the Ensembl application programming interface (API). 12. The Ageing Research Computational Tools (ARCT) is a collection of Perl modules to assist comparative genomics research.

Systems Biology Markup Language (SBML) The Systems Biology Markup Language (SBML) is a computer-readable format for representing models of biochemical reaction networks. and many other areas in systems biology.com | Your Gate Way to Life Science Career . genomic regulatory networks. structure and functional annotation data.blogspot.SeqHound is a bioinformatics application programming platform that provides access to biological sequence. 20. An application programming interface (API) is available to programmers using C. www. Java and PERL. SBML is applicable to metabolic networks. cell-signaling pathways. C++.helpBIOTECH.

[11] Dembo. 12:5529-5543. 46:501-514. (1984) "On the statistical assessment of similarities in DNA sequences. Sci.H. [3] Altschul. [10] Karlin. E. M. [7] Smith. [5] Reich. Mol." In "Time Warps. W. pp.M. Myers. & Waterman. Gish.W.J.B." Meth.F. Kruskal (eds..J. Mol. W. Enzymol. J. & Zeitouni. J. P. [6] Altschul. M. (1988) Improved tools for biological sequence comparison. Wilbur. (1994) "Limit distribution of maximal nonaligned two-sequence segmental score." Mol. 266:460-480." J. A.S. (1958) "Statistics of extremes. E.R. 2:526-538. Biol. Acids Res. S. (1984) "On the statistical significance of nucleic acid similarities. B. Karlin. S. 163:171-176." Columbia University Press. www. NY. D. MA. A. Addison-Wesley.R. 12:215-226. W." Bull. Natl. 55-91.F.). Sankoff & J. S. Biol." J.F. J. 22:2022-2039.F." Prot. & Waterman. Miller.. Sci.. & Erickson. 4:1145-1160. S. Smith T. [13] Pearson. O." Proc. Sci. (1983) "Random sequences. D. Drabsch." Proc. D. (1990) "Basic local alignment search tool. Evol. [4] Deken. (1990) "Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.." Ann.helpBIOTECH. Biol. (1981) "Identification of common molecular subsequences. S. W. [14] Altschul." J. T.J. [2] Lipman. S. W.F. (1984) "Pattern recognition in genetic sequences by mismatch density. [9] Gumbel. Biol.blogspot." Nucl. (1995) "Comparison of methods for searching protein sequence databases. Prob. Acad.W. & Daumler. [8] Sellers. H.. 147:195-197.F.References [1] Fitch. [12] Pearson. New York. String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. & Lipman.. (1985) "Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. Math. W. & Lipman. 215:403-410. Biol.. Reading. (1983) "Probabilistic behavior of longest-common-subsequence length. & Altschul. W. Mol.S. USA 85:2444-2448.com | Your Gate Way to Life Science Career . USA 87:2264-2268. Acids Res. Acad. Natl." Nucl. & Gish. (1996) "Local alignment statistics.J." D.G.

O. 276:71-84. Biol. Zhang. A. Prob. M. & Orcutt. [24] Dayhoff. Natl. (1978) "A model of evolutionary change in proteins. W." Proc. Biol. M.S. Found. & Vingron. [28] Taylor. W. Z. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.O. Mol. Biosci. A. 54:59-75. pp. Sci. Johnson. B." Comput. (1994) "Rapid and accurate estimates of statistical significance for sequence database searches. Evol. Math. Cytochrome c and cytochrome c-551.F. J. M. A. J. DC." J. Dayhoff). [25] Schwartz.J." Bull. Suppl. R. (1988) "The significance of protein sequence similarities. Biomed. M. Washington. M." In "Atlas of Protein Sequence and Structure.F. 3 (ed. (1985) "On the PAM matrix model of protein evolution. 5. & Dayhoff.M. Acad. Theor. [16] Smith.O.. 353-358. [22] Arratia. Suppl. M." Nucleic Acids Res. & Lyall." Nucleic Acids Res. Appl.. Schwartz." Ann.M. 13:645-656. 2:434-447. Dayhoff). R.. Natl. Sci. R. Evol." Stat.W. Appl. & Vingron. USA 91:4625-4628.. D. [18] Mott. (1985) "The statistical distribution of nucleic acid similarities.." J.S. Madden. (1986) "The classification of amino acid conservation. DC." J. (1998) "Empirical statistical estimates for sequence similarity searches. M. 25:3389-3402. Mol. [23] McLachlan. & Burks. C. Biomed. 345-352. R. 9:367-381. Schäffer. [19] Waterman. T. (1984) "Aligning amino acid sequences: comparison of commonly used methods. R. (1978) "Matrices for detecting distant relationships. [17] Collins." Vol. Coulson.F. 3 (ed. 21:112-125. (1971) "Tests for comparing related amino-acid sequences.R.S. (1992) "Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Miller. W. T. M.D. & Waterman. Biol. Found..F. p. [20] Waterman. (1994) "Sequence comparison significance and Poisson approximation." Mol.helpBIOTECH. Zhang.R.S. A. 5.com | Your Gate Way to Life Science Career . Mol.. Waterman. (1994) "A phase transition for the score in matching random sequences allowing deletions. S. 119:205-218. [26] Feng. 4:200-225." Vol. & Lipman..." J. Biol.A." In "Atlas of Protein Sequence and Structure.F.F. 4:67-71.[15] Altschul. www. M. 61:409-424. Biol..C. Washington. M. [21] Pearson. Res.. Res. W. Natl.J. M.O.blogspot.L. & Doolittle. D.S. [27] Wilbur.

Biol. (1992) "Environment-specific amino acid substitution tables: Tertiary templates and prediction of protein folds.. Natl. H.[29] Rao.. (1992) "Amino acid substitution matrices from protein blocks. [42] Claverie.R. [33] Gonnet. Mol.. (1982) "An improved algorithm for matching biological sequences. S." Proc. 48:603-616. [39] Fitch. [36] Overington. Natl." Int. & Federhen.G. Mol. W.A. Gish. A. [38] Gotoh. J." Comput. Peptide Protein Res. Chem. Donnelly." J. [32] States. 204:1019-1029. & Henaut. T. Appl. Delacroix." Proc. & States. 8:275-282. S. Chem. Biosci.W. 219:555-565. J.. J. & Erickson.blogspot. M. & Henikoff. 29:276-281. [40] Altschul. S. Delorme. B..O. [35] Jones. & Henikoff. www. [34] Henikoff.T. J. Determination of a new and efficient scoring matrix." J. A. S.M. J. [31] Altschul.M. S. 17:191-201." Science 256:1443-1445.G. Sci. & Smith. 162:705-708. D. 4:11-17. (1983) "Optimal sequence alignments. Biol. Cohen.H. [37] Henikoff. & Altschul.F. W. S." Proteins 17:49-61." Comput. W.F. (1993) "Performance evaluation of amino acid substitution matrices. D. J.J. J. D. USA 80:1382-1386.-M. USA 89:10915-10919. (1992) "The rapid generation of mutation data matrices from protein sequences.. Mol. T. A pattern recognition approach." Methods 3:66-70. & Thornton. O.F. [43] Wootton. (1993) "Information enhancement methods for largescale sequence-analysis. & Blundell.com | Your Gate Way to Life Science Career . J." Comput.C. Biosci. (1988) "Optimal alignments in linear space. (1988) "Amino acid substitutions in structurally related proteins. M. [30] Risler. Appl." Prot. (1992) "Exhaustive matching of the entire protein sequence database. J. W.S.A. Sali. Acad. (1991) "Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Sci.L. Taylor..L.M.K." J. Biol..J.F.helpBIOTECH. Sci. E. Acad. S. G. (1987) "New scoring matrix for amino acid residue exchanges based on residue characteristic physical parameters. & Miller." Comput. & Benner. Math. Biol. Johnson M." Bull. (1993) "Statistics of local complexity in amino acid sequences and sequence databases. 17:149-163. [41] Myers.W. (1986) "Optimal sequence alignment using affine gap costs. 1:216-226. D. (1991) "Amino acid substitution matrices from an information theoretic perspective.

O. [54] Dayhoff. [47] Greer. [56] Henikoff. 6:119-129. J. Johnson. USA 89: 10915-10919.S. [53] Dayhoff.R. [46] Fetrow. National Biomedical Research Foundation. (1991) Comparative Modeling of Homologous Proteins.. vol.). [52] Needleman. B... 3. 3. 91: 524-545. (1968) A Model of Evolutionary Change in Proteins. R. pp. In Atlas of Protein Sequence and Structure (Dayhoff. vol. 33-41.L. 48: 442-453.O. 29: 1-68. Biol. R. [49] Sali.. J. Enzymol. M. Mol. National Biomedical Research Foundation.C. 202: 239-252. N.. Sternberg. Schwartz. D. Sowdhamini. and Hunt. and Thornton. and Wunsch..E. R. S. [50] Lewin. and Blundell. [55] Dayhoff.P.S. Mol. Washington. 233: 716-738. (1987) When Does Homology Mean Something Else? Science 237: 1570. pp. and Eck. M. M. Crit. (1993) New Programs for Protein Tertiary Structure Prediction. M.. A.L. Sci. (1970) A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. (1987) "Homology" in Proteins and Nucleic Acids: A Terminology Muddle and a Way out of It...S. (1992) Amino Acid Substitution Matrices from Protein Blocks.L. and Overington. (1994) "Issues in searching molecular sequence databases. et al. Washington.C. M. [51] Reeck.C. B. Biol. Meth. 15: 235-240. Mol.F.S. and Orcutt. & Wootton.M. Overington. R. and Blundell. Bio/Technology 11: 479-484. M. ed. C. Gish. and Henikoff. J. Biochem.M. Barker. suppl. Natl. Proc.S. Meth. T. J. Enzymol. (1990) From Comparisons of Protein Sequences and Structures to Protein Modelling and Design. Cell 50: 667. [45] Blundell. T.O. M.. and Bryant." Nature Genet.H. (1987) Knowledge-Based Prediction of Protein Structures and the Design of Novel Molecules. J.. J. Sibanda.O.B.P. W..helpBIOTECH. M. S.J. Trends Biochem. Biol.). Rev.L.blogspot. Srinivasan. J. Boguski. Sci.An Evaluation of Scoring Methodologies. M.D.T..C. J. T. (1983) Establishing Homologies in Protein Sequences. S. 5. Nature 326: 347-352. ed. 345-358. [57] Johnson. (1978) A Model for Evolutionary Change. D.G. S. (1994) Knowledge-Based Protein Modeling.O.com | Your Gate Way to Life Science Career . W.. In Atlas of Protein Sequence and Structure (Dayhoff. L.. G. J.V..C.[44] Altschul. M. www. (1993) A Structural Basis for Sequence Comparisons . Acad. [48] Johnson..

R. D. J. [62] Bowie. Bowie. (1998) Class-directed Structure Determination: Foundation for a Protein Structure Initiative. Luthy. Biol. [63] Terwilliger. (1991) A Method to Identify Protein Sequences That Fold into a Known Three-Dimensional Structure. D. and Sander.. W. [59] Kabsch. R. Peat. K..U.R.[58] Pearson. and Eisenberg. and Eisenberg. Newman.M. J... Mol. (1995) Comparison of Methods for Searching Protein Sequence Databases. G.com | Your Gate Way to Life Science Career . (1993) Comparative Protein Modelling by Satisfaction of Spatial Restraints. J.S. Biopolymers 22: 2577. and Blundell. Protein Sci. [61] Luthy.C. J. C. Waldo.blogspot.helpBIOTECH. Nature 356: 83-85. 234: 779-815. and Berendzen. J.. W. Chu. A.. Science 253: 164-170. (1992) Assessment of Protein Models with Three-Dimensional Profiles. T.U. www... (1983) Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. T. T.L. Protein Sci. 4: 1145-1160. [60] Sali.. 7: 1851-1856.

Sign up to vote on this title
UsefulNot useful