BIOINFORMATICS TRAINING PROGRAM
May 16-18, 2007
Indian Institute of Advanced Research
www.helpBIOTECH.blogspot.com | Your Gate Way to Life Science Career
1.0 Introduction 2.0 Protein Structure 3.0 Genome Analysis 4.0 Phylogeny 5.0 Modeling 6.0 Tools for Structure based drug design and docking 7.0 Computational Resources 1-2 3-4 5-9 10-15 16-18 19-22 23-36
Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. At the beginning of the "genomic revolution", a bioinformatics concern was the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences. Development of this type of database involved not only design issues but the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data. Ultimately, however, all of this information must be combined to form a comprehensive picture of normal cellular activities so that researchers may study how these activities are altered in different disease states. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. The process of analyzing and interpreting data it is hoped, will lead to elucidation of underlying principles in the biological phenomenon. Some Important Landmarks in the development of Bioinformatics: 1962 1965 1970 1977 1977 1977 1981 1981 The first theory of molecular evolution; the Molecular Clock concept (Linus Pauling and Emile Zukerkandl) Atlas of Protein Sequences, the first protein database (Margaret Dayhoff and coworkers) Needleman-Wunsch algorithm for global protein sequence alignment New DNA sequencing methods (Fred Sanger, Walter Gilbert and coworkers); bacteriophage X174 sequence First software for sequence analysis (Roger Staden) Phylogenetic taxonomy; archaea discovered; the notion of the three primary kingdoms of life introduced (Carl Woese and coworkers) Smith-Waterman algorithm for local protein sequence alignment Human mitochondrial genome sequenced
www.helpBIOTECH.blogspot.com | Your Gate Way to Life Science Career
1981 1982 1982 1983 1985 1986 1987 1988 1988 1990 1991 1994 1994 1995 1996 1996 1997 1997 1998 1999 2001
The concept of a sequence motif (Russell Doolittle) GenBank Release 3 made public Phage genome sequenced (Fred Sanger and coworkers) The first practical sequence database searching algorithm (John Wilbur and David Lipman) FASTP/FASTN: fast sequence similarity searching (William Pearson and David Lipman) Introduction of Markov models for DNA analysis (Mark Borodovsky and coworkers) First profile search algorithm (Michael Gribskov, Andrew McLachlan, David Eisenberg) National Center for Biotechnology Information (NCBI) created at NIH/NLM EMBnet network for database distribution created BLAST: fast sequence similarity searching with rigorous statistics (Stephen Altschul, David Lipman and coworkers) EST: expressed sequence tag sequencing (Craig Venter and coworkers) Hidden Markov Models of multiple alignments (David Haussler and coworkers; Pierre Baldi and coworkers) SCOP classification of protein structures (Alexei Murzin, Cyrus Chothia and coworkers) First bacterial genomes completely sequenced First archaeal genome completely sequenced First eukaryotic genome (yeast) completely sequenced Introduction of gapped BLAST and PSI-BLAST COGs: Evolutionary classification of proteins from complete genomes Worm genome, the first multicellular genome, (nearly) completely sequenced Fly genome (nearly) completely sequenced Human genome (nearly) completely sequenced
www.helpBIOTECH.blogspot.com | Your Gate Way to Life Science Career
2.0 Protein Structure
A set of 20 different subunits, called amino acids, can be arranged in any order to form a polypeptide that can be thousands of amino acids long. These chains can then loop about each other or fold, in a variety of ways, but only one of these ways allows a protein to function properly. The critical feature of a protein is its ability to fold into a conformation that creates structural features, such as surface grooves, ridges, and pockets, which allow it to fulfill its role in a cell. A protein's conformation is usually described in terms of levels of structure. Traditionally, proteins are looked upon as having four distinct levels of structure, with each level of structure dependent on the one below it. In some proteins, functional diversity may be further amplified by the addition of new chemical groups after synthesis is complete. The stringing together of the amino acid chain to form a polypeptide is referred to as the primary structure. The secondary structure is generated by the folding of the primary sequence and refers to the path that the polypeptide backbone of the protein follows in space. Certain types of secondary structures are relatively common. Two well-described secondary structures are the alpha helix and the beta sheet. In the first case, certain types of bonding between groups located on the same polypeptide chain cause the backbone to twist into a helix, most often in a form known as the alpha helix. Beta sheets are formed when a polypeptide chain bonds with another chain that is running in the opposite direction. Beta sheets may also be formed between two sections of a single polypeptide chain that is arranged such that adjacent regions are in reverse orientation. The tertiary structure describes the organization in three dimensions of all of the atoms in the polypeptide. If a protein consists of only one polypeptide chain, this level then describes the complete structure. Multimeric proteins, or proteins that consist of more than one polypeptide chain, require a higher level of organization. The quaternary structure defines the conformation assumed by a multimeric protein. In this case, the individual polypeptide chains that make up a multimeric protein are often referred to as the protein subunits. The four levels of protein structure are hierarchal, that is, each level of the build process is dependent upon the one below it. A protein's primary amino acid sequence is crucial in determining its final structure. In some cases, amino acid sequence is the sole determinant, whereas in other cases, additional interactions may be required before a protein can attain its final conformation. For example, some proteins require the presence of a cofactor, or a second molecule that is part of the active protein, before it can attain its final conformation. Multimeric proteins often require one or more subunits to be present for another subunit to adopt the proper higher order structure. The entire process is cooperative, that is, the formation of one region of secondary structure determines the formation of the next region. Allosteric Proteins: These are proteins which under certain conditions have a stable alternate conformation, or shape, that enables it to carry out a different biological function. The interaction of an allosteric protein with a specific cofactor, or with another protein, may influence the transition of the protein between shapes. In addition, any
www.helpBIOTECH.blogspot.com | Your Gate Way to Life Science Career
is placed on the opposite side of the crystal from the X-ray source.com | Your Gate Way to Life Science Career
. NMR has proven to be a powerful alternative to X-ray crystallography for the determination of molecular structure.helpBIOTECH. a sample is immersed in a magnetic field and the positively charged nucleus spins. Crystals are formed by slowly precipitating proteins under conditions that maintain their native conformation or structure.and medium-sized molecules. hese resonating nuclei emit a unique signal that is then picked up by a detector and processed by the Fourier Transform algorithm. the moving charge creates what is called a magnetic moment. sometimes flipping over. therefore. the principles that make NMR possible tend to make this technique very time consuming and limit the application to small.change in conformation brought about by an interaction at one site may lead to an alteration in the structure. X-ray Crystallography: Crystals are a solid form of a substance in which the component molecules are present in an ordered array called a lattice. as well as many other interesting properties of the molecule. Protein structure determination: Traditionally. such as a piece of film. When the crystal is placed in an X-ray beam. because only a few reflections can be detected with any one orientation of the crystal. The set of diffracted. The major drawback associated with this technique is that crystallization of the proteins is a difficult task. many molecules are in the same orientation with respect to the incoming Xrays. at another site. one at a time. though. and thus function. The basic building block of a crystal is called a unit cell. they tilt even more. scientists can determine molecular structure. that this type of transition affects only the protein's shape. However. However. These exact conditions can only be discovered by repeated trials that entail varying certain experimental conditions. all of the unit cells present the same face to the beam. Allosteric proteins play an important role in both metabolic and genetic regulation. When the radio waves hit the spinning nuclei. called a reflection. One should bear in mind. The X-ray beam enters the crystal and a number of smaller beams emerge: each one in a different direction. By measuring the frequencies at which different nuclei flip. each one with a different intensity. In the past 10 years. This is a very time consuming and tedious process. not the primary amino acid sequence. the smallest possible set that is fully representative of the crystal. a protein's structure was determined using one of two techniques: X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy.
www. a complex equation that translates the language of the nuclei into something a scientist can understand. will produce a spot on the film. each diffracted ray. In this technique. If an X-ray detector.blogspot. Each unit cell contains exactly one unique set of the crystal's components. NMR has the advantage over crystallographic techniques in that experiments are performed in solution as opposed to a crystal lattice. emerging beams contains information about the underlying crystal structure. an important component of any X-ray diffraction instrument is a device for accurately setting and changing the orientation of the crystal. Nuclear Magnetic Resonance (NMR) Spectroscopy: The basic phenomenon of NMR spectroscopy was discovered in 1945.
com | Your Gate Way to Life Science Career
. BLAST Scores and Statistics: Once BLAST has found a similar sequence to the query in the database. the better the alignment. which for proteins the default is three letters) and "neighboring words".05 means that this similarity has a 5 in 100 (1 in 20) chance of occurring by chance alone. as is frequently required in genome assembly and analysis. The E-value gives an indication of the statistical significance of a given pairwise alignment and reflects the size of the database and the scoring system used. especially with respect to domains and motifs. In general terms. to simply mean similar. When a match is identified. it is helpful to have some idea of whether the alignment is "good" and whether it portrays a possible biological relationship. even if different scoring matrices have been used. BLAST works by first making a look-up table of all the "words" (short subsequences. The bit score gives an indication of how good the alignment is. and so on. The local alignment approach also means that a mRNA can be aligned with a piece of genomic DNA. or whether the similarity observed is attributable to chance alone. BLAST uses statistical theory to produce a bit score and expect value (E-value) for each alignment pair (query to hit). In comparative genomics one of the major functions is the identification of homologous genes in different organisms. Bit scores are normalized. as well as any gaps introduced to align the sequences.
www. the exceptions being blastn and MegaBLAST (programs that perform nucleotide nucleotide comparisons and hence do not use proteinspecific matrices).3. When a query is submitted via one of the BLAST Web pages. word size. similar words in the query sequence.. with functional domains often being repeated within the same protein as well as across different proteins from different species. this score is calculated from a formula that takes into account the alignment of similar or identical residues. plus any other input information such as the database to be searched. The lower the E-value. or homologous. Most proteins are modular in nature. The BLAST algorithm is tuned to find these domains or shorter stretches of sequence similarity. which means that it relies on some smart shortcuts to perform the search faster. which means that the bit scores from different alignments can be compared. regardless of the evolutionary relationship. An important tool which is utilized for this function is BLAST (Basic local alignment search tool). The BLOSUM62 matrix is the default for most BLAST programs. A sequence alignment that has an E-value of 0.0 GENOME ANALYSIS
Homology refers to two genes sharing a common evolutionary history. fewer similarities would be detected. The sequence database is then scanned for these "hot spots". If instead BLAST started out by attempting to align two sequences over their entire lengths (known as a global alignment).helpBIOTECH. the more significant the hit. Scientists also use the term homology. BLAST performs "local" alignments. it is used to initiate gap-free and gapped extensions of the "word". The BLAST algorithm is a heuristic program. the higher the score. i. which assigns a score for aligning any possible pair of residues. expect value. the sequence.blogspot. A key element in this calculation is the "substitution matrix ". are fed to the algorithm on the BLAST server.e.
λ and K are constants. CREME. it still may not represent a biologically meaningful result. CisMols CisMols (Cis-regulatory Modules) is a tool that identifies compositionally predicted cis-clusters that occur in groups of co-regulated genes within each of their ortholog-pair evolutionarily conserved cis-regulatory regions. 5. 6. 3. and analysis of the alignments (see below) is required to determine "biological" significance. EnteriX EnteriX is a collection of tools for viewing pairwise and multiple alignments for bacterial genome sequences. 7.helpBIOTECH. S is the raw score. FootPrinter3 FootPrinter3 is a web server for predicting transcription factor binding sites (TFBS) by using phylogenetic footprinting. DAVID Bioinformatic Resources The Database for Annotation. rVista.Although a statistician might consider this to be significant. FootPrinter3 extends the motif discovery 2. and the ECR Browser. FootPrinter FootPrinter is a program for phylogenetic footprinting that identifies regions of DNA that are well conserved across a set of orthologous sequences in order to infer phylogenetic relationships. CMR The Comprehensive Microbial Resource (CMR) gives access to a central repository of the sequence and annotation of all complete public prokaryotic genomes as well as comparative genomics tools across all of the genomes in the database. Visualization and Integrated Discovery (DAVID) provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.org website provides access to tools for comparative genomic analyses developed by the Comparative Genomics Center at the Lawerence Livermore National Laboratory. 4.com | Your Gate Way to Life Science Career
. Mulan. DCODE. eShadow.blogspot. The score and E value are calculated using the equations given below: S’ = λS-ln K ln 2 E = mn 2-S’
Where S’ is the normalized score. Tools include: zPicture.
www. m and n is the length of the query and hit sequences
Tools for Comparative Genomics:
1.ORG The dcode.
www. You can query these annotations at the sequence level as well as search/compare across genomes. 13. ISC Large-scale Sequencing Project Database The International Sequencing Consortium (ISC) Large-scale Sequencing Project Database contains information on current and completed sequencing projects. MIPS 11. eurkaryotes. 12. One can easily build a list of genomes to be considered or excluded from the search and the Phylogenetic Profiler tool allows one to refine the selection by building a list of homologues either common to or excluded from specific organisms. bacteria. but also allowing finding motifs in unalignable regions.algorithms of FootPrinter by making use of local multiple sequence alignment blocks when those are available and reliable.blogspot. Integrated Microbial Genomes (IMG) The Integrated Microbial Genomes (IMG) system facilitates the comparison of genomes sequenced by the Joint Genome Institute (JGI). 14. IBM Genome Annotation Page IBM's Bio-Dictionary-based Annotations Of Completed Genomes page lists annotations for over 75 complete genomes (archae. and viruses).
GENSTYLE GENSTYLE is based on the genomic signature paradigm and allows the user to classify and characterize nucleotide sequences using oligonucleotide frequencies.
9. funding agencies. Mauve Mauve is a stand-alone software tool for constructing multiple genome alignments. and the gene records diplayed include biochemical properties. including project timelines. 10. protein domains. MicroFootPrinter MicroFootPrinter identifies the conserved motifs in regulatory regions of prokaryotic genomes using the phylogenetic footprinting program FootPrinter. chromosomal location and neighbourhood and lists of paralogues and orthologues. 8. It can be searched using keywords or BLASTp. GenomeTraFaC GenomeTraFaC is a comparative genomics based resource for initial characterization of gene models and the identification of putative cis-regulatory regions of RefSeq gene orthologs.helpBIOTECH. 15. sequencing strategy and links out to project web pages.com | Your Gate Way to Life Science Career
SPUTNIK. Sockeye displays genomic features along tracks. 23.com | Your Gate Way to Life Science Career
. Coding region predictions for each cluster. It can be used to view features at various levels. QUIPOS. Projector 2 Projector 2 allows users to map completed portions of the genome sequence of an organism onto the finished (or unfinished) genome of a closely-related species or strain. 21. MNCDB. SPRING 19. MATDB. access other genes with similar conservation profiles. The mapping of phenotypic data fields allows crossspecies phenotype comparison. MLST MLST (Multi Locus Sequence Typing) is a nucleotide sequence based approach for the unambiguous characterisation of isolates of bacteria and other organisms using the sequences of internal fragments of seven house-keeping genes.blogspot. and view genes that are found nearby a selected gene in multiple genomes. Comparisons can be by library and at a sequence level. Sequences are clustered to redunce redundacy.helpBIOTECH. structural genomics. proteomics and genome annotation.Munich Information Centre for Protein Sequences projects include: fungal genome analysis. a visualisation tool is included. further annotations such as GO terms and physical properties are also included. NEMBASE2 NEMBASE2 is a database resource for EST datasets for 37 species of nematode. PartiGeneDB PartiGeneDB is a database of about 300 partial genomes from eukaryotic organisms that have been assembled from EST data. It also allows the user to view sequence similarity across different organisms. 16. 18. Projects and databases include: CYGD. and PEDANT. Sockeye Sockeye is a visualization tool allowing one to assemble and analyze genomic information in a three dimensional workspace. ranging from SNPs to karyotypes. Using the related genome sequence as a template can facilitate sequence assembly and the sequencing of the remaining gaps. NGFN. SIMAP. 17. MOsDB. MPPI.
www. 20. PhenomicDB PhenomicDB integrates the genotype and phenotype information of several organisms from public data sources. plant genome bioinformatics. and links to the Ensembl database. Phydbac2 Phydbac2 (Phylogenomic display of bacterial genes) is a tool to visualize and explore the phylogenomic profiles of bacterial protein sequences. 22.
29.blogspot. YOGY Eukaryotic Orthology (YOGY) is a resource for retrieving orthologous proteins from nine eukaryotic organisms.com | Your Gate Way to Life Science Career
www. SVC SVC (Structured Visualization of Evolutionary Conserved Sequences) is a tool that can search for pairs of orthologous genes. Using a gene or protein identifier as a query. TIGR Software Tools A list of open-source software packages available for free from The Institute for Genomic Research (TIGR). 24. alternative splicing and human-mouse orthology information for the analysis of tissue-specific gene and transcript expression patterns. gene expression. this database provides comprehensive. 27.helpBIOTECH. Phylogenetic trees based on the rearrangement analysis are also shown as part of the results. OrthoMCL. SPRING takes two or more chromosomes as its input and then computes a minimum series of reversals and/or block-interchanges for transforming one chromosome into another. 28. Inparanoid. T-STAG Tissue-Specific Transcripts And Genes (T-STAG) is a system integrating EST. and visualize the evolutionary sequence conservation mapped back onto the gene structure scaffold. Viral Bioinformatics Viral Bioinformatics provides access to viral genomes and a variety of tools for comparative genomic analyses. 25. and a table of curated orthologs between budding yeast and fission yeast. 26. combined information on orthologs in other species using data from five independent resources: KOGs.Sorting Permutation by Reversals and Block Interchanges (SPRING) is a tool for the analysis of genome rearrangements. TraFaC TraFaC (Transcription Factor Binding Site Comparison) is a tool that identifes regulatory regions using a comparative sequence analysis approach. Homologene. Associated Gene Ontology (GO) terms of orthologs can also be retrieved. align the protein coding sequences.
Furthermore. it should be possible to find the ancestral ties between different organisms.
www. the presence of histones (in one of the two major branches of archaea).helpBIOTECH. indicating a clear evolutionary relationship.e. while studying some unusual groups of bacteria. in some important respects. such as the unusual structure of lipids and the topology of phylogenetic trees of 16S rRNA. By studying protein folds (distinct protein building blocks) and families. which have a number of proteins shared with eukaryotes but not with bacteria. i. archaea are obviously prokaryotes. In this case. phenotypically. thermophilic methanogens and halophiles. Scientists also use the term homology. have small cells without nuclei or organelles.These eukaryote-like features of archaea include the structure of the ribosomes. These trees clearly indicated that archaea comprised a unique branch of life.com | Your Gate Way to Life Science Career
. are said to be from the same protein family. distinct from both bacteria and eukaryotes. The uniqueness of the archaea was apparent. nor is it a steady continuum. A gene pool is the combination of all the alleles —alternative forms of a genetic locus—for all traits that population may exhibit.0 Phylogeny
New insight into the molecular basis of a disease may come from investigating the function of homologs of a disease gene in model organisms.blogspot. although. With the aid of nucleotide and protein sequences. with several transcription factors of the eukaryotic variety. or homologous. Carl Woese and colleagues cocluded that these organisms were not really bacteria but should be assigned to a separate domain of life with the same status as bacteria and eukaryotes. the organization of the basal transcriptional apparatus. Thus far. Equally exciting is the potential for uncovering evolutionary relationships and patterns between different forms of life. Evolution requires genetic variation which results from changes within a gene pool. to simply mean similar. experience has taught us that closely related organisms have similar sequences and that more distantly related organisms have more dissimilar sequences. like bacteria. The evolutionary process: Genetic Variation: Evolution is not always discrete with clearly defined boundaries that pinpoint the origin of a new species. scientists are able to reconstruct the evolutionary relationship between two species and to estimate the time of divergence between two organisms since they last shared a common ancestor. regardless of the evolutionary relationship. and the organization of the DNA replication apparatus. even from some of their biochemical features. closer to eukaryotes than to bacteria. which is also conserved in archaea and eukaryotes but not in bacteria. The Three Domains of Life: In the mid-1970s. homology refers to two genes sharing a common evolutionary history. the genetic make-up of a specific population. Proteins that show significant sequence conservation. This group was originally referred to as archaebacteria and later renamed archaea. they are. Changes in a gene pool can result from mutation—variation within a particular gene—or from changes in gene frequency—the proportion of an allele in a given population.4.
2.helpBIOTECH. DNA replication must be extremely accurate to avoid introducing mutations or changes in the nucleotide sequence of a short region of the genome. In phylogenetic studies. Those mutations that do have an evolutionary effect can be divided into two categories. some mutations do occur. a process called DNA replication. Codon Usage Database Find GC content and frequency of codon usage for any organism that has a sequence in GenBank. usually in one of two ways. This resource offers convienent web interfaces for many freely available tools. Other goals of the project include providing a central resource enabling computational systematics and education and training initiatives. Every time a cell divides. Gain-of-function mutations.Every organism possesses a genome that contains all of the biological information needed to construct and maintain a living example of that organism. the most convenient way of visually presenting evolutionary relationships among a group of organisms is through illustrations called phylogenetic trees. CIPRes The Cyberinfrastructure for Phylogenetic Research (CIPRes) project aims to develop a computational infrastructure for systematics. and structure prediction. function. either from errors in DNA replication or from damaging Mutations in the coding regions of genes are much more important. confer an abnormal activity on a protein. loss-of-function mutations and gain-of-function mutations. Bioinformatics Toolkit
This Toolkit is a collection of a wide range of tools and links for sequence analysis. 3. 4. Inevitably. The biological information contained in a genome is encoded in the nucleotide sequence of its DNA or RNA molecules and is divided into discrete units called genes. of life's history. it must make a complete copy of its genome. or models. The predictions are based on the assumptions that residues
Tools for phylogeny reconstruction
1.blogspot. A loss-of-function mutation results in reduced or abolished protein function. But history is not something we can see—it has happened once and leaves only clues as to the actual events.com | Your Gate Way to Life Science Career
. which attach to the genome and initiate a series of reactions called gene expression. ConSeq ConSeq is a tool for predicting functionally and structurally important amino acid residues in protein sequences. which are much less common. The information stored in a gene is read by proteins. The website also contains a substantial list of links to related software. Scientists use these clues to build hypotheses. Phylogenetic Trees: Systematics describes the pattern of relationships among taxa and is intended to help us understand the history of all life.
and structural data for identification of functional sites in proteins. compiled by Joe Felsenstein. 6.of functional importance are often conserved and solvent-accessible. You can query these annotations at the sequence level as well as search/compare across genomes. 12. Joes Site . cpnDB is built and maintained with open source tools.helpBIOTECH. 8. eurkaryotes. It allows viewing and editing of the aligned input sequence data and provides many tools for phylogenetic and statistical analysis of the alignments. MEGA MEGA (Molecular Evolutionary Genetics Analysis) is a software package for phylogenetic analysis with a graphical user interface. IBM Genome Annotation Page IBM's Bio-Dictionary-based Annotations Of Completed Genomes page lists annotations for over 75 complete genomes (archae. 7. Mesquite Mesquite is an open source software project designed to deal with comparative data about organisms and evolutionary analyses. population genetics. phylogenetic analysis.Phylogeny Programs Comprehensive list of phylogeny packages. 11. 10. MIGenAS Toolkit Max-Planck Integrated Gene Analysis System (MIGenAS) provides access to many different bioinformatics software tools and databases for sequence similarity searching. multiple sequence alignments. and those of structural importance are often conserved and located in the protein core. Mesquite contains modules for phylogenetic analysis. 5. creator of Phylip. and non-phylogenetic multivariate analysis. The database contains all available sequences for both group I and group II chaperonins. MINER
www. A multiple sequence alignment is used to predict the relative solvent accessibility state and the evolutionary rate at each residue. phylogenetic and microbial ecology studies. JEvTrace Jevtrace is a tool that combines multiple sequence alignments. 9. Users can also configure "meta"-tools as a pipeline of individual tools and intermediate filters.com | Your Gate Way to Life Science Career
. and protein structure prediction.blogspot. phylogenetic. and viruses). cpnDB cpnDB is a curated collection of chaperonin sequence data collected from public databases or generated by a network of collaborators exploiting the cpn60 target in clinical. bacteria.
com | Your Gate Way to Life Science Career
.MINER is a tool for the identification and visualization of phylogenetic motifs (regions within a multiple sequence alignment (MSA) that conserve the overall phylogeny of the complete family).
www. NCBI Taxonomy Database Taxonomic classification of all organisms with sequences in GenBank. 15.blogspot. 20. PAL2NAL PAL2NAL converts a multiple sequence alignment of proteins and the corresponding DNA (or mRNA) sequences into a codon alignment. Available for several platforms including Windows. It also allows the user to view sequence similarity across different organisms. 16. Orthologue Search Service BLAST a protein sequence then perform automated phylogenetic analysis to detect orthologous sequences. multiple sequence alignments. PhyloBLAST BLAST a protein sequence. Phydbac2 Phydbac2 (Phylogenomic display of bacterial genes) is a tool to visualize and explore the phylogenomic profiles of bacterial protein sequences. Linux and Solaris. available for PC and Mac. 21. then perform automated phylogenetic analysis on hits or on uploaded sequences. PHYLIP-based analyses. NEWT NEWT is the taxonomy database maintained by the UniProt group. 18. 14. MPI Toolkit Max-Planck Institute Bioinformatics Toolkit provides access to many different bioinformatics software tools and databases for sequence similarity searching. 13. PHYLIP Comprehensive set of programs for phylogenetic analyses. and view genes that are found nearby a selected gene in multiple genomes.helpBIOTECH. MacOS. phylogenetic analysis. 19.
17. source code available for easy compiling in UNIX. NJplot NJplot is a tool for visualizing binary trees such as the phylogenetic trees output from the PHYLIP programs. Synonymous (Ks) and non-synonymous (Ka) substitution rates can be calculated. access other genes with similar conservation profiles. and protein structure prediction.
for a given protein sequence alignment. The program can display the results on a 3D protein structure. Puzzleboot Puzzleboot is a UNIX shell script facilitating bootstrap analysis using TREE-PUZZLE and PHYLIP.com | Your Gate Way to Life Science Career
. Tree Editors Tree Editors is an annotated listing of software for the visualization and manipulation of phylogenetic trees. POWER The Phylogenetic Web Repeater (POWER) allows users to perform phylogenetic analysis using the PHYLIP package. 29. stability and function . T-COFFEE The T-COFFEE site includes links to a collection of tools for computing.blogspot. 27. TCOFFEE is a protein multiple sequence alignment tool that is more accurate than ClustalW for sequences with less than 30% identity. and can be used for both protein and DNA distance bootstrap analysis. STING Millenium STING is a suite of tools for the analysis of protein sequence. PROTOGENE turns amino acid alignments into CDS nucleotide alignments. 24.helpBIOTECH. structure. 25. ProtTest ProtTest is a program that determines the best-fit model of evolution. 26. Ribosomal Database Project Highly curated database of aligned and annotated rRNA sequences with accompanying phylogenies. It enhances TREE-PUZZLE by allowing one to analyse multiple datasets. PhyloDome PhyloDome is a tool with which you can visualize and analyze the phylogenetic distribution of one or more eukaryotic domains. among a set of candidate models. data available for download. 23. The POWER pipeline can start with processing either multiple sequence alignments (MSA) or can proceed directly with aligned sequences.22. PHYML Phyml is a program that constructs phylogenetic trees from sequence alignments using the maximum likelihood method. evaluating. 31.and the relationships between them. 30. 28. and manipulating multiple alignments of protein sequences and structures. SWAKK Sliding Window Analysis of Ka and Ks (SWAKK) is tool for detecting positive selection in proteins using a sliding window substitution rate analysis. Expresso (or 3DCoffee) aligns sequences using structural information.
Life History and Ecology. 40. 34. and you can navigate through a very informative phylogenetic tree rooted at the three main domains of life (Archaea. TreeDomViewer TreeDomViewer is a tool for the visualization of phylogeny and protein domain structure. It can work with trees having up to 500. Systematics and Morphology. 36. 39.
www. TSEMA The Server for Efficient Mapping Assessment (TSEMA) predicts possible protein-protein interactions based on the comparision of phylogenetic trees derived from sequences of associated protein families. 35. There is an introduction to phylogenetics and cladistics.com | Your Gate Way to Life Science Career
. It employs a weighted version of the neighbour-joining method in which longer distances in the matrix are given less weight. etc. taxonomies.000 nodes. TREE-PUZZLE Tree-puzzle is a program that constructs phylogenetic trees from sequence alignments using the maximum likelihood method. TreeDomViewer constructs phylogenetic trees and projects the corresponding protein domain information onto the multiple sequence alignment. TreeJuxtaposer TreeJuxtaposer is a free software tool that allows a visual comparison of two trees in Newick format (phylogenies. gene trees. UCMP Phylogeny Wing "Phylogeny-Diversity of Life Through Time" is an on-line exhibit at the University of California Museum of Paleontology website. and automatically calculates and marks the differences. At each level of the tree there is a brief summary. Bacteria and Eukaryota). TreeView Generates nice graphics of trees. Tree of Life Multi-authored project attempting to represent online the entire phylogeny of life on earth. Weighbor Weighbor is a tool for building phylogenetic trees from distance matrices. 38. Understanding Evolution A fantastic site for teaching/understanding evolution. 37. 33. reads multiple tree file formats.blogspot. available for download to Mac or PC.helpBIOTECH.).32. and links to more information about the Fossil Record.
in which each of the known structures may differ in conformation. or structure. Therefore. Evaluating the Alignment : The best way to assess the accuracy is to compare alignments from sequence comparisons with alignments from protein three-dimensional structures. both in SCRs and VRs. to experimentally determine a single structure. Although molecular modeling may not be as accurate at determining a protein's structure as experimental methods. It often takes scientists working in the laboratory months. In the absence of a protein structure that has been determined by X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. can be constructed for these regions of the proteins. it is still extremely helpful in proposing and testing various biological hypotheses.5. researchers can try to predict the three-dimensional structure using protein or molecular modeling. For the SCRs. scientists have begun to turn toward computers to help predict the structure of a protein based on its sequence. Illuminating a protein's structure also paves the way for the development of new agents and devices to treat a disease.helpBIOTECH. they are examined to identify the structurally conserved regions (SCRs) from which an average structure. For other side chain coordinates one can apply a side chain rotamer library in a systematic approach to explore possible side chain
www. Identifying a protein's shape. or framework. one needs to model main chain atoms and side chain atoms. Generating Coordinates for the Unknown Structure: When generating coordinates for the unknown structure. Molecular modeling also provides a starting point for researchers wishing to confirm a structure through X-ray crystallography and NMR spectroscopy. constructing a model for the target sequence based on its alignment with the template structure(s) and evaluating the model against a variety of criteria to determine if it is satisfactory.0 Protein Modeling
The process of evolution has resulted in the production of DNA sequences that encode proteins with specific functions. is key to understanding its biological function and its role in health and disease. Because the different genome projects are producing more sequences and because novel protein folds and families are being determined.com | Your Gate Way to Life Science Career
. sometimes years. This method uses experimentally determined protein structures (templates) to predict the structure of another protein that has a similar amino acid sequence (target). Yet solving the structure of a protein is no easy feat. also must be identified because special techniques must be applied to model these regions of the unknown protein. The challenge lies in developing methods for accurately and reliably understanding this intricate relationship. it is straightforward to generate the coordinates of the main chain atoms of the unknown structure from those of the known structure(s). Variable regions (VRs). protein modeling will become an increasingly important tool for scientists working to understand normal and diseaserelated processes in living organisms. Protein modeling involves identification of the proteins with known three-dimensional structures that are related to the target sequence.blogspot. Side chain coordinates are copied if the residue type in the unknown is identical or very similar to that in the known homologues. Identification of Structurally Conserved and Structurally Variable Regions: After the known structures are aligned.
It may be desirable to weight the contribution of each homologue in each SCR based on the extent of similarity with the unknown. That is. and main-chain and side-chain dihedral angles.blogspot. though it is likely that considerable application of side chain rotamer libraries will be required to define coordinates in these regions. other distances within the main-chain. Side chain coordinates of residues that are similar in length and character also may be copied.000 entries. Routines to satisfy the restraints optimally include conjugate gradient minimization and molecular dynamics with simulated annealing. Databases of Structures from Homology Modeling: Databases are now available that contain large numbers of protein structures that have been obtained by comparative (homology) modeling. Two of these databases are ModBase and 3DCrunch. Note that this procedure should be used only if the region of undefined atoms is one or two residues in length.g. Automated Web-Based Homology Modeling: Web-based tools are now available to generate models of protein 3-dimensional structures using comparative modeling techniques. Recall that these regions will correspond most often to the loops on the surface of the protein. 3DCrunch is a large scale modeling project that aims to submit all entries from protein sequence databases to SWISS-MODEL. For the VRs. available on EMBL servers. and one to
www. using their program Modeller. a variety of approaches may be applied in assigning coordinates to the unknown. restraints are identified from the alignments of homologues of known structure. WHAT IF. When a good model for a loop cannot be found among the known structures.conformations. The loop may then be subjected to conformational searching to identify low energy conformers if desired..helpBIOTECH. Fragments are examined for their ability to fit in the undefined region without making bad contacts with other atoms and to overlap well with the residues on either side of the loop. Modbase was created by Sali and co-workers. Coordinates for side chain atoms in these loop regions may be copied if residues are similar. Rotamer libraries can be used to define other side chain coordinates. which creates models based on the satisfaction of spatial restraints. If a loop in one of the known structures is a good model for that of the unknown. A residue range is chosen to include the undefined loop as well as a few residues (e. three) on either side of the loop for which coordinates have been defined. Currently the database contains 64. In the event that some coordinates in the unknown are undefined in the SCRs. one to generate the homology models. SWISS-MODEL is available through Glaxo Wellcome Experimental Research in Geneva. Restraints can include distances between alpha carbons. then the main chain coordinates of that known structure can be copied. regularization can be used to build and relax both main chain and side chain atoms in those regions. includes three components. and these restraints are then applied to the unknown sequence. one can search fragment databases for loops in other proteins that may provide a suitable model for the unknown.com | Your Gate Way to Life Science Career
. one to evaluate the quality of the homology models. Switzerland.
peptide bond planarity.
www. The restraints then may be gradually removed for subsequent minimizations.psi) angles. O and N). a solvent shell. using for example crystallographic waters from the known homologues. and (3) the local secondary structure. For any of these refinement procedures. The profile is based on the statistical preferences of each of the 20 amino acids for particular environments within the protein. bond angles. (2) the fraction of side-chain area that is covered by polar atoms (i. and side-chain conformations of known protein structures as a function of atomic resolution. researchers have analyzed three-dimensional structures of proteins from which basic principles of protein structure and folding have been developed. Once any irregularities have been resolved. bond lengths. the structure should be solvated. This process may consist of energy minimization with restraints. a 3D structure is converted into a 1D profile that describes each residue in the folded protein structure. thereby providing for evaluation of the quality of the modeling program. it is important to demonstrate that the structural features of the model are reasonable in terms of what is know about protein structures in general..e. especially for the SCRs.helpBIOTECH. Several programs are available to assist in this analysis of correctness of a homology model. Each residue position in a 3D model can be characterized by its environment. 3D-profiler compares a homology model to its sequence using a 3D profile. Evaluation and Refinement of the Structure: For a homology model from any source. Examination of these profiles reveals which regions of a sequence appear to be folded correctly and which do not. the expected values of these parameters are known and can be compared to a modeled structure based on the atomic resolution of the structures from which the model was developed.evaluate models of proteins for which the structure is already known. or a periodic box of pre-equilibrated water molecules. Thus. That is. the entire structure may then be subjected to further refinement. Based on these environment variables.com | Your Gate Way to Life Science Career
. Preferred environments for amino acids are derived from known three-dimensional structures and are defined by three parameters: (1) the area of each residue that is buried. Programs that provide structure analysis along with output that is useful for publication include PROCHECK and 3D-Profiler PROCHECK is based on an analysis of (phi. hydrogen-bond geometry. It also may be advantageous to apply molecular dynamics in conjunction with energy minimization.blogspot.
FRED (OpenEye) accurate and extremely fast. such as substrates or drug candidates. rigid protein placement algorithm based on the interactions occurring between the molecules (limited to lowenergy structures) 8. virtual screening (selecting a set of compounds for experimental testing) conformational flexibility of the ligand. AutoDock (The Scripps Research Institute) automated docking of flexible ligands to macromolecules designed to predict how small molecules. bind to a receptor of known 3D structure 3. filtering for shape complementarity and optional pharmacaphoric features before scoring with more traditional functions 6. partially flexible. CombiBUILD (Sandia National Labs) structure-based drug design program created to aid the design of combinatorial libraries screens a library possible reactants on the computer.com | Your Gate Way to Life Science Career
. flexible docking uses the energy of the ligand/receptor complex to automatically find the best binding modes of the ligand to the receptor (energy-driven method) 2. Genetic Algorithm. and database screening docking algorithms 5. MIMUMBA torsion angle database used for the creation of conformers. FlexiDock (Tripos) simple. Protein-Ligand Docking Affinity (Accelrys Inc. and predicts which ones will be the most potent and successfully applied to find nanomolar inhibitors of Cathepsin D roughly an order of magnitude superior to standard diversity approaches 4. flexible docking of ligands into binding sites on proteins fast genetic algorithm for generation of configurations rigid.0 Tools for structure based drug design and docking
Docking Software 1. multiconformer docking program examines all possible poses within a protein active site. or fully flexible receptor side chains provide optimal control of ligand binding characteristics conformationally flexible ligands tunable energy evaluation function with special H-bond treatment very fast run times 7. interaction geometry database used to exactly describe intermolecular interaction patterns Boehm function (with minor adaptions necessary for docking) applied for scoring 9. FlexX (BioSolveIT GmbH) fast computer program for predicting protein-ligand interactions two main applications: complex prediction (create and rank a series of possible protein-ligand complexes).6.) automated.helpBIOTECH.GLIDE (Schrödinger GmbH) high-throughput ligand-receptor docking for fast library screening fast and accurate docking program identifies the best binding mode through Monte Carlo sampling provides an accurate scoring function for ranking of binding
www. DockVision (University of Alberta) docking package created by scientists for scientists by including Monte Carlo.blogspot.
VEGA (Milan University) calculation of ligand-receptor interaction energy Protein-Ligand & Protein-Protein Docking 16.blogspot.by predicting binding affinity rapidly and with a reasonable level of accuracy . HINT! (Virginia Commonwealth University) Hydropathic Interactions empirical molecular modeling system with new methods for de novo drug design and protein or nucleic acid structural analysis translates the well-developed Medicinal Chemistry and QSAR formalism of LogP and hydrophobicity into a free energy interaction model for all biomolecular systems based on the experimental data from solvent partitioning calculates 3D hydropathy fields and 3D hydropathic interaction maps estimates LogP for modeled molecules or data files numerically and graphically evaluates binding of drugs or inhibitors into protein structures and scores DOCK orientations constructs hydropathic (LOCK and KEY) complementarity maps (can be used to predict an ideal substrate from a known receptor or protein structure or to propose the hydropathic structure from known agonists or antagonists) evaluates/predicts effects of site-directed mutagenesis on protein structure and stability 12. SITUS (Scripps Research Institute) program package for modeling of atomic resolution structures into low-resolution density maps software supports both rigid-body and flexible docking using a variety of fitting strategies 14. DOCK (UCSF Molecular Design Institute) generates many possible orientations (and more recently. conformations) of a putative ligand within a user-selected region of a receptor structure orientations may be scored using several schemes designed to measure steric and/or chemical complementarity of the receptor-ligand complex evaluate likely orientations of a single ligand. SenSitus interactive docking and visualization program for low-resolution density maps and atomic structures GUI-based alternative to certain Situs docking programs that can benefit from an interactive user interface and 3D visualization methods 15.will greatly enhance the probability of success in a drug discovery program 10.helpBIOTECH. ChemScore and User defined score virtual library screening 11. LIGPLOT (University College of London) program for automatically plotting protein-ligand interactions generates schematic diagrams of protein-ligand interactions for a given PDB file interactions shown are those mediated by hydrogen bonds (dashed lines between the atoms involved) and by hydrophobic contacts (represented by an arc with spokes radiating towards the ligand atoms they contact) 13.com | Your Gate Way to Life Science Career
. GOLD (CCDC) calculating docking modes of small molecules into protein binding sites genetic algorithm for protein-ligand docking full ligand and partial protein flexibility energy functions partly based on conformational and non-bonded contactinformation from the CSD choice of scoring functions: GoldScore.affinities can enrich the fraction of suitable lead candidates in a chemical database . or to rank molecules from a database search databases for
program. Roberts at The Scripps Research Institute for use in the study of
www. and proteinprotein docking.) Biomolecular complex Generation with Global Evaluation and Ranking efficient protein-docking algorithm predicts the structure of binary protein complexes from the unbound structures search the complete binding space and select a set of candidate complexes evaluate and rank each candidate according to the estimated probability of being an accurate model of the native complex intergrated in chemera. including interactive graphics tools Protein-Protein (Peptide) Docking 19. Bielefeld Protein Docking (Bielefeld University) detects geometrical and chemical complementarities between surfaces of proteins and estimates docking positions 21. Cancer Research UK) incorporating FTDock. DOT (San Diego Supercomputer Center) Daughter Of TURNIP TURNIP .helpBIOTECH. BiGGER (BioTecnol.com | Your Gate Way to Life Science Career
.A. 3D-Dock Suite (BioMolecular Modeling. ClusPro (Boston University) integrated approach to protein-protein docking docking algorithm includes the following steps: rigid body docking based on the Fourier correlation approach (used DOT and ZDOCK docking programs) selection of structures with favorable desolvation and electrostatic properties clustering the retained complexes using a pairwise RMSD criterion refinement of the 25 largest clusters by the flexible docking algorithm SmoothDock 23. RPScore and MultiDock FTDock (Fourier Transform Dock) performs rigidbody docking on two biomolecules in order to predict their correct binding geometry outputs multiple predictions that can be screened using biochemical information RPScore (Residue level Pair potential Score) uses a single distance constraint empiricaly derived pair potential to screen the ouptut from FTDock can reduce dramatically the list of possible complexes within which can be found a correct solution MultiDock (Multiple copy side-chain refinement Dock) 20.blogspot. a molecular graphics and modeling program for studying protein structures and interactions 22. ICM-Dock (MolSoft LLC) fast and accurate docking simulations unique set of tools for accurate individual ligand-protein docking. GRAMM (SUNY) Global Range Molecular Matching empirical approach to smoothing the intermolecular energy function by changing the range of the atom-atom potentials requires only the atomic coordinates of the two molecules to predict the complex structure (no binding site information needed) performs an exhaustive 6dimensional search through the relative translations and rotations of the molecules see also the database of Protein-Protein Decoys for the validation of energy functions and refinement procedures 18. developed by V. peptide-protein docking. S.DNA-binding compounds examine possible binding orientations of protein-protein and protein-DNA complexes design combinatorial libraries 17.
helpBIOTECH.blogspot. Helmer Citterich new release. G. HADDOCK (Utrecht University Netherlands) High Ambiguity Driven proteinprotein Docking biochemical and/or biophysical interaction data such as chemical shift perturbation data resulting from NMR titration experiments or mutagenesis data introduced as ambiguous interaction restraints (AIRs) to drive the docking process AIR . includes some new features: protein-protein and DNA-protein docking capability fast surface calculation based on the NSC algorithm 25. Ausiello. with a reengineered code.com | Your Gate Way to Life Science Career
. HEX (University of Aberdeen) protein docking and molecular superposition program use spherical polar Fourier correlations to accelerate docking calculations
www. Cesareni and M. ESCHER NG (Milan University) enhanced version of the original ESCHER proteinprotein automatic docking system developed in 1997 by G.macromolecular dockingcomputation of the electrostatic potential energy between two proteins or other charged molecules 24.defined as an ambiguous distance between all residues shown to be involved in the interaction 26.
ftp. the Word Wide Web. an FTP client and a Gopher client. HTTP does not have to be transmitted over the Internet. newsgroups. software extensions which give the browser new functionality. Web technology has become a common interface tool for communication between computers on a local network (sometimes called an Intranet). the new user is confused by the fact that. and HTML doesn't have to be transmitted via HTTP. Some of these differences. Netscape Communicator) is more than just a Web client. many protocols other than HTTP flow over the Internet. IMAP). the behavior of a Web browser can frequently changed by configuring its preferences. in addition to supporting extensions to HTML. However. First. In part. Different browsers have different display capabilities and display the same HTML code in different ways (which is why HTML is referred to as a content markup language instead of a page description language) but all of them can understand (parse) HTML and do something reasonable with it.7. any Web browser also knows how to interpret and display HTML. it is also an email client. if you find the default font too small. Finally. Some of the differences in the way different Web browsers display the same Web page come from different design decisions ("what font should be used for <H1> text?") and some of it comes from the fact that different Web clients have different capabilities.0 Computational Resources
www. many popular web browsers have support for other protocols such as email (SMTP.g. it knows how to transfer data using the HTTP protocol. POP. simply selecting the link downloads the file. also abbreviated as WWW or W3. however. is a contraction of its original name.blogspot. represent extensions to HTML. any Web browser is an HTTP client.helpBIOTECH. and every Web client I have worked with has the ability to read and display local HTML files. Active X. Because virtually every Web client is also a limited FTP client. If. the content markup language used on the Web. such as the ability to display various kinds of still or moving images as part of the Web page or to run programs written in Java. What this really means is that the particular piece of software (e. Finally. many people choose to so use them. Many new computer users assume that the Web and the Internet are synonymous. These extra capabilities may be built into the browser or may be added by "plugins". and gopher for example. In the case where a Web page contains a link to an FTP server. The current common name. The Web. that can often be increased. A Web browser performs multiple tasks.com | Your Gate Way to Life Science Career
However. access to the host filesystem is accomplished by a series of commands. and some people will continue to use telnet as their client. but is occasionally done when debugging.
Once logged on via ftp. and thus you will need to learn about each one as you have occasion to use it.
www. for example. although most of us find dedicated clients to be significantly more convenient. Similarly.com | Your Gate Way to Life Science Career
. To transfer files. but buttons. is likely to be restricted to a limited number of commands. a telnet client can sometimes be used to connect to a server for these other protocols. A client may choose to hide these commands.
ftp: Telnet is useful for interactive computer access. Such specialized telnet services have become much less common since the rise in popularity of the Web. as it became apparent that it was useful to make files available to the world at large without giving all those wanting the files an account. at one point. such a telnet login to disseminate information as to the membership of study sections. like telnet. Thus. a client with a graphical user interface (GUI). logging in with a "magic" user name (most commonly "anonymous" or "ftp") eliminates the requirement for a password. A telnet session can negotiate a range of different protocols. This is almost never done to use a Web server. in fact as much control as the host's system administrator desires may be imposed on a telnet connection. Originally it. This is probably the most common way that users with accounts will use a computer.blogspot. however. it is possible to connect to a Web server with a telnet client if you understand the syntax of HTTP. create and delete files.Networking Telnet: Telnet is one of the oldest of the network services and perhaps the easiest to understand. Most people will use a telnet client the first time connecting to a MOO.helpBIOTECH. some of which happen to be similar to unix commands. SMTP. the variant of "anonymous ftp" developed. Ftp is an older service designed specifically for file transfer. the commands are unix-like. The National Institutes of Health in the United States used. On a unix ftp client. but is much less useful for transferring files. was intended for account owners. From a practical point of view. might not have typed commands at all. Because many protocols for other services (e. you can run programs. cd to Change Directory and ls to LiSt the files in that directory. These commands do not depend on the host computer running UNIX! These are ftp commands. every telnet host will be different. Telnet allows one computer to "log on" to another computer as if it were a terminal. a telnet service may be advertised with a public login name and password. Although "full service logins" as is described above are perhaps the most common use of the telnet protocol. In this variant. HTTP) are encoded as ASCII text. Login with this name. Once logged on. but this almost always includes ASCII text.g. you frequently will have all the privileges of a local user. you execute either get a file from the host computer or put a file onto it (where allowed).
g. For example. sending email proceeds as above. In ascii mode. Typically. These differences are corrected for during an ascii transfer. and pine. a Unix workstation). mailx. and are complex. Email is a generic term for a variety of processes which can use different protocols and network technology. (IMAP is a newer protocol for accomplishing the same task about which you may hear more in the future. the file received may not be identical to the one on the host. SMTP transmits email on port 25 between two dedicated. This instructs ftp to transfer files unmodified.One pair of ftp commands which is especially important to understand are binary and ascii.g. in many cases uses a more complex client/server model. UNIX terminates lines with the linefeed character (ASCII 10 decimal). before getting such a file. In that case. Rather. the Macintosh operating system with a carriage return (ASCII 13 decimal) and MSDOS uses one of each. Also. If you send and receive email via a computer that is not always on and/or not always connected to the network. Ftp transfers occur in ascii (text) mode by default. to communicate with another computer.) The SMTP server stores your email on a remote host and your local client retrieves it from a POP3 server when you check for mail. receive. and it is that software which communicates with the SMTP server. mush. most commonly POP3. Thus. or more commonly. The SMTP programs discussed above are typically symmetrical (e. should the receiving server not be reachable when the transmitting server needs to send email. mutt. the program can alternatively serve as client or server). but receiving email is different in that the SMTP server cannot necessarily get incoming email onto your computer's file system. At present. as ftp may make changes in the file during transfer. Examples of client software running on Unix workstations are mail. most email is transmitted by SMTP (Simple Mail Transport Protocol) via TCP/IP over the Internet.helpBIOTECH. send. by leaving them a message which they can read and respond to at their convenience. to install an
www. you will not interact with these programs directly. Typically. at which point an error message will be returned to the sender. it is important to issue the binary command. This is done over the Internet by using email. web browsers sometimes can be used as email clients. dedicated client software is used to compose. Email: Both ftp and telnet are interactive. the email message will be held and the transmission will be retried several times over a period of days until a successful transmission occurs or until the maximum retry time has been exceeded. Although the assumption is that both SMTP servers will be generally available. a different protocol is used. however. If you send and receive email via a computer that is always on and always connected to a network reachable by your mail server (e. then incoming mail is saved to a mail spool file on your computer from whence your client software retrieves it. Thus. more or less real time programs. This is highly desirable for text files.com | Your Gate Way to Life Science Career
. elm. and which. but catastrophic for binary files like program object code and pictures. Sometimes it is useful. full time servers.blogspot. and outgoing email is passed to the SMTP server. to allow for differences in how different operating systems handle text. as is discussed below. and read email. a POP3 account will be provided by whoever provides your Internet access. a user on another computer.
changes directory so that dir1 is your new current directory.change directory
cd is used to change from one directory to another.display or concatenate files
cat takes a copy of a file and sends it to the standard output (i. with the contents of ex2 following the contents of ex1. No other permissions are altered.
cd .com | Your Gate Way to Life Science Career
. chmod . or to string together copies of several files.g+w.
cat ex1 ex2 > newex
creates a new file newex containing copies of ex1 and ex2.e.
chmod u+x. or its pathname relative to the current directory.to remove a permission w write o other = to assign a permission explicitly x execute (for files). 3.
displays the contents of the file ex.o-r file1
changes directory to your home directory.blogspot. you typically have to provide the domain name and/or IP address of the SMTP and POP3 servers (frequently the same) and the user name and password for the POP3 account. to be displayed on your
terminal. 2.email client on a Mac or Windows computer.
chmod u=rw file1
sets the permissions on the file file1 to give the user read and write permission on file1.helpBIOTECH. access (for directories)
The following examples illustrate how these codes are used. dir1 may be either the full pathname of the directory. so it is generally used either to read files. unless redirected elsewhere). writing the output to a new file.
moves to the parent directory of your current directory. The symbolic codes are given here:u user + to add a permission r read g group . cat .
Important commands in LINUX/UNIX operating systems 1.change the permissions on a file or directory
chmod alters the permissions on files and directories using either symbolic or octal
and prevents all other users having access to that directory (by using cd.helpBIOTECH.
cp file3 file4 dir1
creates copies of file3 and file4 (with the same names). and the time is %H:%M:%S. For example. If dir3 does not already exist. and n4.
6. 5. diff . together with its contents and subdirectories. with < in front of each line from file1 and > in front of each line from file2.display differences between text files
diff file1 file2 reports line-by-line differences between the text files file1 and file2.n7 means that lines n4 to n5 in file1 differ from lines n6 to n7 in file2). and the contents and subdirectories of dir2 are recreated within it. There are several options to diff. and diff -b. with + indicating
www. The form of the output is different from that given by diff.
The date is 14/12/97.
chmod u+w. where the default is three lines. After each such line.n5 c n6.10pm
1997. (where n1 a n2.n7 . within the directory dir1. If dir3 does exist. and prevent any users not in this group from reading it.n3 and n4. which ignores the case of letters when comparing lines. containing a copy of all the contents of the original dir2.alters the permissions on the file file1 to give the user execute permission on file1. it is created by cp. including diff -i. and the time is 15:10:00. dir1 must already exist for the copying to succeed. a subdirectory called dir2 is created within it.'
3. using the command line
date '+The date is %d/%m/%y. diff prints the relevant lines from the text files. The default output will contain lines such as n1 a n2. They can still list its contents using ls.display the current date and time
date returns information on the current date and time in the format shown below:Tue Mar 25 15:21:16 GMT 1997
It is possible to alter the format of the output from date. cp cannot copy a file onto itself.go-x dir1
gives the user write permission in the directory dir1.) 4. to the directory dir3.n5 c n6. cp .copy a file The command cp is used to make copies of files and directories.
cp -r dir2 dir3
recursively copies the directory dir2.
cp file1 file2
copies the contents of the file file1 into a new file called file2. date .n3 means that file2 has the extra lines n2 to n3 following the line that has the number n1 in file1.com | Your Gate Way to Life Science Career
. which ignores all trailing blanks. diff -cn
produces a listing of differences within n lines of context.blogspot. to give members of the user's group write permission on the file.
diff dir1 dir2
will sort the contents of directories dir1 and dir2 by name.lines which have been added. 9. but (a warning!) it does sometimes make mistakes.f' -print
searches the current directory and all its subdirectories for files ending in .helpBIOTECH. filen.indicating lines which have been removed.
grep -c motif1 file1
will give the number of lines containing motif1 instead of the lines themselves. a directory. .f..
find /local -name core -user user1 -print
searches the directory /local and its subdirectories for files called core belonging to the user user1 and writes their full file names to the standard output. grep acts on the standard input. -name '*. for example.
www. or a library. and writes their names to the standard output. for the pattern motif1. grep ..find files of a specified name or type
find searches for files in a named directory and all its subdirectories. . file file1
can tell if file1 is. an empty file. a source program. find . 7. by default.
grep -v motif1 file1
will write out the lines of file1 that do NOT contain motif1. 8.searches files for a specified string or expression
grep searches for lines containing a specified pattern and.
grep motif1 file1
searches the file file1 for lines containing the pattern motif1.. an executable program or shell script. file2.determine the type of a file
file tests named files to determine the categories their contents belong to.com | Your Gate Way to Life Science Career
.. In some versions of Unix the names of the files will only be written out if the -print option is used. grep can also be used to search a string of files. find .blogspot. and then run diff on the text files which differ. . so
grep motif1 file1 file2 . and ! indicating lines which have been changed. writes them to the
standard output. filen
will search the files file1. file . If no file name is given.
for example. info bash will give details about the bash shell.print out a file
lpr is used to send the contents of a file to a printer.10.
11. For example. and give a list of the major subjects it has information about. and the
file contains PostScript.
lpr -Pprinter1 file1
will send the file file1 to be printed out on the printer printer1. If the printer is a laserwriter. on the percentage of the file's size that has been saved by compression:file2 : Compression 50. info . then the PostScript will be interpreted and the results of that printed out.
gzip -v file2
compresses file2 and gives information. Use the command q to exit info. help . 13.replaced with file2. and deletes file1. replacing them with files of the same name extended by . will give details about the bash shell history listings.helpBIOTECH. lpr .gz
To restore files to their original state use the command gunzip. then
will replace file2. help history.compress a file
gzip reduces the size of named files.gz. To see the status of the job on the printer queue use
for a list of the jobs queued for printing on printer1.com | Your Gate Way to Life Science Career
.gz with the uncompressed file file2.)
www. gzip . Using help on its own will give a list of the commands it has information about.blogspot. help followed by the name of one of these commands will give information about that commands. If you have a compressed file file2.26 -. kill . The amount of space saved by compression varies. Using the command info on its own will enter the
info system.gz. (This may not work for remote printers.read online documentation
info is a hypertext information system.
12. gzip file1
results in a compressed file called file1.display information about bash builtin commands
help gives access to information about builtin commands in the bash shell. in the format shown below.kill a process to kill a process using kill requires the process id (PID). 14.gz .
lists the names of the files and directories in the directory dir1.blogspot. man man.
mkdir -p dir1/dir2/newdir
will create newdir and its parent directories dir1 and dir2. and the time it was last altered. (excluding files whose names begin with . To obtain the information on dir1 itself. mkdir .helpBIOTECH.15. rather than its contents.display an on-line manual page
man displays on-line reference manual pages. e. if these do not already exist. ). man command1
will display the manual page for command1. This is useful if you do not yet know the name of a command you are seeking information about.make a directory
mkdir is used to create new directories.
man -Mpath command1
is used to change the set of directories that man searches for manual pages on
17. ls lists the contents of the current directory. its size in kbytes. and can be used to obtain information on the files and
directories within it. together with any parent directories
will make a new directory called newdir.
ls -l dir1
gives such information on the contents of the directory dir1. In order to do this you must have write permission
in the parent directory of the new directory.
mkdir -p can be used to create a new directory. ls .
man -k keyword
lists the manual page subjects that have keyword in their headings.
ls -l file1
gives details of the access permissions for the file file1. use
ls -ld dir1
16. ). If no directory is named.g man cp.
ls -a dir1
will list the contents of dir1. man . (including files whose names begin with .com | Your Gate Way to Life Science Career
www.list names of files in a directory
ls lists the contents of a directory.
starts by displaying the beginning of file1. 20.change the priority at which a job is being run
nice causes a command to be run at a lower than usual priority.
more -n file1
will cause n lines of file1 to be displayed in each screenful instead of the default (which is two lines less than the number of lines that will fit into the terminal's screen).blogspot.
mv file1 file2 dir3
moves the files file1 and file2 into the directory dir3. mv file1 file2
changes the name of a file from file1 to file2 unless dir2 already exists. Type ? for details of the commands available within more. It will scroll up one line every time the return key is pressed.
mv dir1 dir2
changes the name of a directory from dir1 to dir2. You will be prompted once for your current password.helpBIOTECH. Type q if you wish to quit more before the end of file1 is reached. 19. passwd . or to move them into other directories. more . An example of the use of nice is
nice compress file1
which will execute the compression of file1 at a lower priority. mv cannot move directories from one file-system to another. so. nice . mv .move or rename files or directories
mv is used to change the name of files or directories. and twice for your new password.com | Your Gate Way to Life Science Career
. and one screenful every time the space bar is pressed.scan through a text file page by page
more displays the contents of a file on a terminal one screenful at a time.change your password Use passwd when you wish to change your password. use cp instead. if it is necessary to do that.18.
21. in which case dir1 will be moved into dir2. Neither password will be displayed on the screen. nice can be particularly
useful when running a long program that could cause annoyance if it slowed down the execution of other users' commands.
disk quota and usage
quota gives information on a user's disk space quota and usage. quota . and the file will not be deleted unless you answer y. rm . 23. and dir1 itself.remove files or directories
rm is used to remove files. the controlling terminal (if there is one). In order to remove a file you must have write permission in its
will delete the file file1. whereas
will display your quota and usage. including those from previous sessions use:ps -fu user1
using your own user name in place of user1. Use man ps for details of all the options available on the machine you
are using. you will be asked if you wish to delete file1.list processes
ps displays information on processes currently running on your machine.
www. and the name of the command being run. pwd . and should be used with suitable caution.
gives brief details of your own processes in your current session.blogspot.
rm -r dir1
recursively deletes the contents of dir1. and includes information on disks mounted from other machines. If you use
rm -i file1
instead. its subdirectories. the cpu time used so far. 24. as well as the local disks. This
information includes the process id. This is a useful safety check when deleting lots of files.com | Your Gate Way to Life Science Career
. but it is not necessary to have read or write permission on the file itself. To obtain full details of all your processes. ps . whether the quota has been exceeded or not.
ps is a command whose options vary considerably in different versions of Unix (such as BSD and SystemV).22. quota
will only give details of where you have exceeded your disc quota on local disks.display the name of your current directory The command pwd gives the full pathname of your current directory. 25.helpBIOTECH.
For example.ac. slogin . To be certain of ignoring leading blanks use sort -bn instead.remove a directory
rmdir removes named empty directories. and provides secure encrypted communications between the local and remote machines using an SSH protocol. and the command quit will get you back to the command line of your local machine.
sorts lines according to the arithmetic value of leading numeric strings. and white-space characters are considered in the comparisons. which treat leading blanks as significant.sort and collate lines The command sort sorts and collates lines in files.helpBIOTECH. 29. working from left to right.
uses "dictionary order". using telnet to connect to the Central Unix Service You can then login using your user name on cus. The remote machine must be running an SSH server for such connections to be possible.
reverses the order of the collating sequence.remote login program
telnet communicates with another computer using the TELNET protocol. sort acts on the standard input. telnet host1
will connect to the remote machine host1 (if it allows telnet connections).secure remote login program
slogin is used for logging onto a remote machine and for executing commands on a remote
machine. sort sorts lines using a character by character comparison. rmdir . sending the results to the standard output. If you use the escape character instead.).26. and using the order of the ASCII character set. Leading blanks are ignored when this option is used. telnet .cam.
www.com | Your Gate Way to Life Science Career
. sort . If you need to delete a non-empty directory rm -r
can be used instead. you will enter telnet's command mode (you'll get the prompt telnet > ). digits. By default.uk.
will remove the empty directory exdir. If no file names are given.
27.blogspot. (except in some System V versions of sort. in which only letters.
blogspot. BioPerl The BioPerl Project is an international association of developers of open source Perl tools for bioinformatics. BioPipe The biopipe is a workflow framework that seeks to address some of the complexity involved in carrying out large scale bioinformatics analysis. A wiki-like service allows modification and improvement of code. BioPHP Open Source PHP code for bioinformatics. BioJava The BioJava Project is an open-source project dedicated to providing Java tools for processing biological data. 5. Some Bioprograming tools: 1. and discovery.As communications between the two machines are not encrypted when using telnet. 3.helpBIOTECH. 8. if that is available. genomics and life science research. and coders whose aim is to explore various methodologies for biological data representation. 2. biological data service providers. BioPerl course Great tutorial for those interested in the bioperl group of modules.
www. Bioconductor Bioconductor is an open source and open development software project that aims to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data. It has been designed to work intimately with the bioperl package. BioMoby BioMOBY is an international research project involving biological data hosts. it is preferable to use ssh.com | Your Gate Way to Life Science Career
. 9. BioDAS This site is the center of development of an Open Source system for exchanging annotations on genomic sequence data. distribution. 6. 7. Includes functions and minitools (copy and paste one page scripts for basic tasks in bioinformatics. BioPax The BioPAX web site provides information about a collaborative effort to create a data exchange format for biological pathways. 4.
GenAge is a database of genes related to human ageing. 14. 13.blogspot. 18. Ensembl API Ensembl is a freely available software system for genomic analysis. Open Bioinformatics Foundation The Open Bioinformatics Foundation is a non profit. The documentation page at Ensembl is the best place to get information on the Ensembl application programming interface (API). CCT CCT (Current Comparative Table) is a software package that you can install and set-up on your own system to help you to maintain and search databases. 11. Seqhound API
www. and AnAge is a multi-species database facilitating the comparative biology of ageing. Human Ageing Genomic Resources The Human Ageing Genomic Resources (HAGR) website provides tools and curated databases relevant to the genetics of human ageing. The Ageing Research Computational Tools (ARCT) is a collection of Perl modules to assist comparative genomics research. an interpreted computer language which allows branching and looping as well as modular programming using functions. 12. 19.10. R System for statistical computation and graphics. 15.helpBIOTECH.com | Your Gate Way to Life Science Career
. In particular. 16. the tutorial document includes lots of examples of scripts and exercises for you to try. volunteer run organization focused on supporting open source programming in bioinformatics. PyMOL PyMOL is a molecular graphics system with an embedded Python interpreter designed for real-time visualization and rapid generation of high-quality molecular graphics images and animations. BioPython The Biopython Project is an international association of developers of freely available Python tools for computational molecular biology. 17. NCBI C++ toolkit The NCBI C++ Toolkit is a collection of C++ modules developed by the NCBI for writing bioinformatics software and applications. BioRuby The BioRuby project aims to implement an integrated environment for bioinformatics with Ruby.
com | Your Gate Way to Life Science Career
. Java and PERL. C++. genomic regulatory networks.SeqHound is a bioinformatics application programming platform that provides access to biological sequence.helpBIOTECH. An application programming interface (API) is available to programmers using C.blogspot. Systems Biology Markup Language (SBML) The Systems Biology Markup Language (SBML) is a computer-readable format for representing models of biochemical reaction networks. 20.
www. structure and functional annotation data. and many other areas in systems biology. cell-signaling pathways. SBML is applicable to metabolic networks.
(1985) "Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. W.." Ann.  Altschul. 12:215-226. & Daumler. & Lipman.  Deken." Proc. Biol. Sankoff & J. 163:171-176. Karlin. NY. MA. D." J.S. S. P. Biol. S.  Pearson. S.." Columbia University Press.  Reich.J. & Zeitouni. Reading." Proc. A. String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. 12:5529-5543.  Smith.R. Natl. S.)." Mol. Drabsch.helpBIOTECH. (1995) "Comparison of methods for searching protein sequence databases. Myers. 55-91. Mol.. W.H. & Erickson.S. (1958) "Statistics of extremes. E. W. S. Natl. & Waterman. Mol. Evol. A. Acad. (1984) "On the statistical assessment of similarities in DNA sequences.
www. Acids Res. Smith T. Mol." J. D.R. H. & Waterman. Sci. (1990) "Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. J. pp. Wilbur. (1983) "Random sequences. M. S. T. Miller.J. 266:460-480. W. Prob.  Altschul.F.  Pearson. (1983) "Probabilistic behavior of longest-common-subsequence length. W.. Gish." J.. Acids Res. New York. USA 87:2264-2268.  Karlin.F. (1984) "Pattern recognition in genetic sequences by mismatch density. O.F. (1984) "On the statistical significance of nucleic acid similarities. & Altschul." Nucl. (1996) "Local alignment statistics.  Lipman. Biol. 215:403-410.J. (1988) Improved tools for biological sequence comparison.F.B..blogspot. M..  Altschul. D. W. E.F." Prot.  Dembo. 2:526-538. J. Biol. Acad.  Sellers." Bull. Sci. Kruskal (eds. 4:1145-1160. (1990) "Basic local alignment search tool. Biol. & Lipman.References
 Fitch." In "Time Warps. 46:501-514. W." Nucl." Meth.G.com | Your Gate Way to Life Science Career
. Sci. 147:195-197.M. & Gish.  Gumbel. Math. B.W. (1994) "Limit distribution of maximal nonaligned two-sequence segmental score. 22:2022-2039. (1981) "Identification of common molecular subsequences. Enzymol." D.J. J.W. USA 85:2444-2448.F. Addison-Wesley.
(1994) "A phase transition for the score in matching random sequences allowing deletions. D. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. DC. & Dayhoff.  Feng.O.S. Biol. Mol. Found. A. & Burks.J." J. Sci. Schwartz.M.C. M. (1992) "Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. 9:367-381. 54:59-75.O. (1994) "Sequence comparison significance and Poisson approximation. 5. Biomed. M.
www. D. M.F.  Arratia. & Lyall. W. Biosci. Evol.S. Biomed. 13:645-656.. Natl. M..O. J. M. R. Schäffer. Found. (1985) "On the PAM matrix model of protein evolution. Zhang. (1986) "The classification of amino acid conservation.M. & Lipman. (1998) "Empirical statistical estimates for sequence similarity searches." Mol.blogspot.S." Vol. R." J. (1994) "Rapid and accurate estimates of statistical significance for sequence database searches." Comput. 21:112-125. M. & Doolittle.  Pearson." J. R. J.com | Your Gate Way to Life Science Career
. & Orcutt.R. Madden. Appl. T.R." Vol.  Wilbur. 61:409-424. M. DC. Prob. T. M. M. Z. S. Natl. Waterman. Dayhoff). B. R." Ann." Bull. Biol. Cytochrome c and cytochrome c-551. W. Washington. (1978) "Matrices for detecting distant relationships. (1978) "A model of evolutionary change in proteins. p.D. (1988) "The significance of protein sequence similarities.W. pp.." Proc. Coulson. 4:67-71. Res.F." Nucleic Acids Res. Mol. 2:434-447. Acad. Evol. Altschul.L. 4:200-225. Res. (1984) "Aligning amino acid sequences: comparison of commonly used methods. C. Biol. & Vingron.J. 345-352. A.  McLachlan.  Smith. 25:3389-3402. Mol.  Mott." J. Suppl. (1985) "The statistical distribution of nucleic acid similarities. M.. Miller. Theor.  Collins." Stat. M. Zhang.F.S. Natl..F. & Waterman.O. A." In "Atlas of Protein Sequence and Structure. Math.. Dayhoff). (1971) "Tests for comparing related amino-acid sequences. 119:205-218. Suppl.F..S. & Vingron. W. 3 (ed.  Taylor.A.F." In "Atlas of Protein Sequence and Structure.. 5. USA 91:4625-4628. W. 353-358. A. Johnson... R. Sci. 3 (ed. Washington.  Dayhoff..  Waterman.  Waterman. Biol.  Schwartz. Appl. 276:71-84." Nucleic Acids Res.helpBIOTECH. Biol.
Determination of a new and efficient scoring matrix. D. 162:705-708.  Claverie.. Biol.  Henikoff.  Henikoff." Proc.K. S.F. & Thornton. (1991) "Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. H.J." Bull. M.S. Biosci.M.F.blogspot. B.  Jones. A pattern recognition approach. Peptide Protein Res." Int.  Risler. & Erickson. Biosci. Donnelly. M. & Henikoff. J. J. (1993) "Statistics of local complexity in amino acid sequences and sequence databases.L. USA 80:1382-1386. 4:11-17.F. J. & Federhen. Rao. 48:603-616..com | Your Gate Way to Life Science Career
.. J. Delorme.A. Delacroix." Comput. (1992) "Environment-specific amino acid substitution tables: Tertiary templates and prediction of protein folds. A. (1988) "Amino acid substitutions in structurally related proteins." Comput.J. Natl. T. (1993) "Information enhancement methods for largescale sequence-analysis. (1983) "Optimal sequence alignments. 17:149-163. Johnson M." Science 256:1443-1445. S." Proteins 17:49-61. (1982) "An improved algorithm for matching biological sequences.W." Methods 3:66-70. & Smith.M. (1992) "Amino acid substitution matrices from protein blocks. Sali..  Gotoh. Mol.  Altschul. A.A. & Altschul. & Blundell. S.M.G. USA 89:10915-10919. (1993) "Performance evaluation of amino acid substitution matrices." J. 17:191-201.G. W. 219:555-565.H. S.F." J.  Altschul." Comput.
www. D. W. Cohen. Biol. (1991) "Amino acid substitution matrices from an information theoretic perspective.  Overington. J. Sci. Appl. (1988) "Optimal alignments in linear space. (1992) "Exhaustive matching of the entire protein sequence database... (1987) "New scoring matrix for amino acid residue exchanges based on residue characteristic physical parameters. & Miller. Gish. Acad. S. T. 8:275-282. Acad.C. Natl. Appl." Prot.L." Proc. Chem. D.  States. Math. S.  Fitch. & States. & Benner. Taylor. 1:216-226. O. J. 204:1019-1029. (1986) "Optimal sequence alignment using affine gap costs.O. W. J. Mol. Sci. S. Sci.R. & Henaut.T.  Wootton. G. D. W.helpBIOTECH.  Myers. (1992) "The rapid generation of mutation data matrices from protein sequences. Biol.. & Henikoff. J.  Gonnet. Biol..-M." Comput.W. E. Mol. Chem. 29:276-281. J." J.
T. pp. T. Johnson. Sowdhamini. 3. R.helpBIOTECH.  Greer.L. and Eck.. Acad...L.C.L. Bio/Technology 11: 479-484. National Biomedical Research Foundation. (1993) A Structural Basis for Sequence Comparisons .C. Enzymol. and Overington.
www. 5. 3.S. M.  Dayhoff. Boguski.). Mol. (1991) Comparative Modeling of Homologous Proteins. In Atlas of Protein Sequence and Structure (Dayhoff. vol.O.S. M. Enzymol. M. G. Natl. (1987) When Does Homology Mean Something Else? Science 237: 1570.S. Mol.An Evaluation of Scoring Methodologies.V.S. M..R. S.  Sali.L. C.. S..B. 15: 235-240. Trends Biochem. pp. National Biomedical Research Foundation. W. Crit. 345-358. and Blundell.  Dayhoff..M. (1987) Knowledge-Based Prediction of Protein Structures and the Design of Novel Molecules. 233: 716-738. L. et al. Biol. M. 29: 1-68. J. (1968) A Model of Evolutionary Change in Proteins. Overington. R..  Blundell. Biochem. Washington.). & Wootton.  Lewin. USA 89: 10915-10919. (1987) "Homology" in Proteins and Nucleic Acids: A Terminology Muddle and a Way out of It. J. S. Nature 326: 347-352. Sternberg.C. D. and Blundell. Biol. 33-41. Biol. M. J.O.O. Mol.. and Thornton. J. (1994) Knowledge-Based Protein Modeling. 6:119-129. Sci. and Hunt.com | Your Gate Way to Life Science Career
. R..T.. M.O.blogspot. Sci. (1993) New Programs for Protein Tertiary Structure Prediction. Sibanda. and Henikoff. vol." Nature Genet. Cell 50: 667. T.  Johnson.  Johnson. In Atlas of Protein Sequence and Structure (Dayhoff.S. Meth.. (1990) From Comparisons of Protein Sequences and Structures to Protein Modelling and Design. Altschul. N..O. Rev. Meth. R. J.C. J.  Fetrow. Proc. W.  Henikoff. Washington. and Bryant. A. suppl. Srinivasan. (1983) Establishing Homologies in Protein Sequences. (1970) A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.. (1994) "Issues in searching molecular sequence databases.  Needleman. ed. and Orcutt.G. (1978) A Model for Evolutionary Change. Barker.F.J. J.P.H. B. B. 91: 524-545. Schwartz. S.. J. Gish.  Reeck.E. ed..D..  Dayhoff.C. D.P.M. M. J. M. and Wunsch. M. 48: 442-453. 202: 239-252. (1992) Amino Acid Substitution Matrices from Protein Blocks.
T. 4: 1145-1160.. T.  Luthy. Protein Sci.. K. C. D. Biopolymers 22: 2577. Bowie. J. W. Chu.U.helpBIOTECH... R. (1983) Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features.R.  Kabsch.  Sali. Pearson. and Eisenberg. and Sander. D..C. 7: 1851-1856. Biol. Newman. and Eisenberg..L. R. 234: 779-815. W. (1993) Comparative Protein Modelling by Satisfaction of Spatial Restraints. J..com | Your Gate Way to Life Science Career
. and Berendzen. Nature 356: 83-85.U. Peat. (1995) Comparison of Methods for Searching Protein Sequence Databases..M. J. Science 253: 164-170. G. (1991) A Method to Identify Protein Sequences That Fold into a Known Three-Dimensional Structure. J. and Blundell. Waldo.
www.  Bowie. Mol.. Luthy.  Terwilliger. J. A. (1998) Class-directed Structure Determination: Foundation for a Protein Structure Initiative. T. (1992) Assessment of Protein Models with Three-Dimensional Profiles. Protein Sci.blogspot.S.