Bioinfo Lab Manual 2015 PDF

You might also like

You are on page 1of 42
LABORATORY MANUAL BT403 — BIOINFORMATICS LABORATORY IV year/ VII Semester B.TECH - BIOTECHNOLOGY (So, DEPARTMENT OF BIOTECHNOLOGY NATIONAL INSTITUTE OF TECHNOLOGY, WARANGAL TELANGANA-506 004 Course Instructors Mr. Sri. P. Onkara Perumal Dr. K. Divakar Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual CONTENTS Ex:No Name of the Experiment Pg. No BIOLOGICAL DATABASES, 1 NCBI - Genbank 1 2. _ | Pubchem database 3 3. | UniprotKB 5 4. _ | Protein databank 7 CATH & SCOP 9 6. |KEGG rr SEQUENCE ALIGNMENT TOOLS 7. Pair wise Alignment —- BLAST 13 8. | Multiple Sequence Alignment: CLUSTAL-OMEGA, 15 PHYLOGENETIC ANALYSIS 9, _ | Phylogenetic Tree Construction and Analysis 7 VISUALIZATION OF BIOMOLECULES 10. _| Sketching and Visualization of Small molecules- 19 Avogadro Me Protein Visualization using RASMOL 21 PROTEIN MODELLING 12. _ | Protein Modelling using Swiss-Model 23 PROTEIN MODELLING 13. _ | Protein Modelling using VLife MDS 25 PROTEIN DOCKING 14. | Protein Docking using VLife MDS 28 Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual PERL PROGRAMMING 15. _ | Basic mathematical operations 30 16. mncatenating DNA 31 17. | Transcribing DNA into RNA 33 18. | Calculating the reverse complement of a strand of DNA [34 19. | Reading protein sequence data from a file 36 20. | Array operations 38 Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual BIOLOGICAL DATABASE Experiment 1 NCBI - Genbank Aim: To retrieve the nucleotide sequence of lipase from Candida rugosa strain ATCC 14830 using NCBI database and save the nucleotide sequence in FASTA format. Description: NCBI (National Center for Biotechnology Information) is a key repository for biological sequences and related data. It is essential for all of us to be familiar with the contents and tools available at this repository. For example, for most of the whole-genome projects, NCBI hosts a mirror page and integrates the data into its database. It is recommended that you visit the NCBI web site (http://www.ncbinlm.nih.gov) and explore the various pages on genome biology, cancer anatomy project, cluster of orthologous groups, human genome and pages for other genomes (including microbial genomes). Complete genome sequences of some of the deadly pathogens are available (e.g., M. leprae, M. genitaliaum, M. pneumoniae, H. influeanzae, E. coli OHS7, etc.). fact, for any newly discovered pathogen, the complete genome sequence is likely to be the first information available about the pathogen. The following assignment questions will help you to be familiar with some of the databases. Procedure: 1. Go to the NCBI website http://www.nebi.nim.nih.gov. 2. ‘Type the search key word “Candida rugosa strain ATCC 14830 lipase”. 3. Select nucleotide option from the pull down menu available near search space. 4. Click on search to retrieve the sequences related to the search key term. Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 5S. Report the first 10 hits. 6. Look for the entry with Genbank accession number “F1743697.1” click on the entry. 7. Note down the information contained in the sequence page. 8. Click the FASTA option in the page to retrieve the nucleotide sequence in FASTA format. Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual BIOLOGICAL DATABASE Experiment 2 Pubchem database Aim: To retrieve the 2D structure, physical- chemical information about chemical compound 4- Nitrophenyl palmitate from NCBI pubchem database and save the structure of the compound in SDF format. Description: PubChem, released in 2004, provides information on the biological activities of small molecules. PubChem is organized as three linked databases within the NCBI's Entrez. information retrieval system, These are PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem also provides a fast chemical structure similarity search tool. More information about using each component database may be found using the links in the homepage. Links from PubChem's chemical structure records to other Entrez databases provide information on biological properties. These include links to PubMed scientific literature and NCBI's protein 3D structure resource. Links to PubChem's bioassay database present the results of biological screening. Links to depositor web sites provide further information. Procedure: 1. Go to the NCBI website http://www.ncbi,nim.nih.gov. 2, Type the search key word “4-Nitrophenyl Palmitate”. 3. Sel 1 pubchem compound option from the pull down menu available near search spa 4. Click on search to retrieve the sequences related to the search key term. Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 5. Report the first 10 hits. te” click on the entry 6. Look for the entry with name “4-Nitrophenyl Pal 7. Note down the information (2D structure, Pubchem CID, other chemical names for this compound, molecular weight and other physical & chemical properties) contained in the page. 8. Click the download option in the page to retrieve the protein sequence in SDF format. Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual BIOLOGICAL DATABASE Experiment 3 UniprotkKB Aim: To retrieve the protein sequence of lipase from Candida rugosa from swissprot database and save the protein sequence in FASTA format. Description: The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation, In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. UniProtKB consists of two sections: 1. Swi Prot- Manually annotated, records with information extracted from literature and curator-evaluated computational analysis, 2. TTEMBL - Computationally analyzed, records that await full manual annotation, Procedure: 1. Go to the uniprotKB website htip:/wvww.uniprot.org 2. Click on swiss-prot (Manually annotated and reviewed) link in the UniprotKB home page. 3. Type the search key word “Candida rugosa lipase”, 4. Click on search to retrieve the sequences related to the search key term. 5. Report the first 10 hits. 6. Look for the entry with Uniprot accession number “ P20261” click over this acc.number. 7. Note down the information (Molecular functions, Biological processes, amino acid sequence) contained in the sequence page. Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 8. Click the FASTA option in the page to retrieve the protein sequence in FASTA format. 9. If the Uniprot accession number is already known, then the accession number is directly typed in the search box to retrieve the sequence, Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual BIOLOGICAL DATABASE Experiment 4 Protein Databank Aim: To retrieve the protein structure information of lipase from Candida rugosa from RCSB-Protein databank database and save the protein structure coordinate file in pdb format, Description: The RCSB PDB builds upon the data by creating tools and resources for research and education in molecular biology, structural biology, computational biology, and beyond. This resource is powered by the Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that help students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease. As a member of the wwPDB, the RCSB PDB curates and annotates PDB data. Procedure: 1, Go to the RCSB Protein Databank website http://vww.resb.or/pdbfhome/home.do. 2. ‘Type the search key word “Candida rugosa lipase” 3. Click on search to retrieve the protein structure entries related to the search key term. 4. Report the first 10 hits, From the list of hits, look for the entry with PDB- ID “1CRL"” click over this PDBid. 6. Note down the information (Molecular fimetions, Biological processes, amino acid sequence) contained in the structure summary page. 7. Explore and review the information contained in Annotation, sequence and experiment and literature page for the above PDB entry. Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 8. Retrieve the experimental method information and literature published based on this structure. 9. Download pdb file, by clicking over Download files option, then download PDB file in text. 10. If the PDB accession accession is already known, then the accession ID is directly typed in the search box to retrieve the information about that protein. Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual BIOLOGICAL DATABASE Experiment 5 CATH and SCOP Aim: To identify a protein that shares structural homology with Candida rugosa lipase with pdb id 1CRL, but catalyzes a different reaction using CATH and SCOP databases. Description: CATH defines four classes: mostly-alpha, mostly-beta, alpha and beta, few secondary structures, In order to better understand the CATH classification system it is useful to know how it is constructed: much of the work is done by automatic methods, however there are important manual elements to the classification. The very first step is to separate the proteins into domains, It is difficult to produce an unequivocal definition of a domain and this is one area in which CATH and SCOP differ. The domains are automatically sorted into classes and clustered on the basis of sequence similarities. These groups form the H levels of the classification. The topology level is formed by structural comparisons of the homologous groups. Finally, the Architecture level is assigned manually. The SCOP database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification. Procedure: 1. Goto the CATH database website htp://www.cathdb.infor, 2. To browse all the entries click over brows option and visualize all the entries in CATH db. Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 3. To search for the entry based on PDB id use browse option and then type the PDBid “ICRL”. 4. Click on search to retrieve the protein structure/super family entries related to the given protein ICRL. Click over the super family “3.40.50.1820" 6. Look for sequence/structure diversity map and find out the entry which has sequence/structure similarity but having different function. 7. Report the PDB id and function of the protein which you rettived. Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 10 BIOLOGICAL DATABASE Experiment 6 KEGG To retrieve the protein structure information of lipase from Candida rugosa from RCSB-Protein databank database and save the protein structure coordinate file in pdb format, Description: KEGG (Kyoto Encyclopedia of Genes and Genomes) has been developed under the Japanese Human Genome Program. The primary objective of KEGG is to computerize the current knowledge of molecular interactions; namely, metabolic pathways, regulatory pathways, and molecular assemblies. KEGG also maintains gene catalogs for all the organisms that have been sequenced and links each gene product to a component on the pathway. In addition, KE organizes a database of all chemical compounds in living cells and links each compound to a pathway component, And finally, KEGG aims at developing new bioinformatics technologies toward functional reconstruction. Procedure: 1. Visit KEGG website (htto://www.genome.ad.io/kege) 2, Find out in how many pathways enzyme phosphoglucomutase (E.C. 5.4.2.2) is involved, in E.coli K12 (Use Pathway search tool in KEGG) Choose one of the pathways, and from ortholog tables find out how many orthologs of this protein are listed. From Pasteur: pgm link (http://bioweh pasteur fi/GenoL ist/Colibri/genome.cgi?gene_detail_nametpgm, which will be displayed after clicking the enzyme box in the pathway) find out the location of the gene, and gene which is located immediately after this gene in the genome. Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 11 3. U ng the tool provided at KEGG, generate map of all possible reaction paths between sucrose and ethyl alcohol and identify the enzymes and intermediates involved. Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 2 SEQUENCE ALIGNMENT TOOLS Experiment 7 Pair wise Alignment - BLAST Aim: To retrieve related protein/nucleotide sequences (protein/nucleotide) for the given nucleotide/protein sequence based on sequence similarity, Description: Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST searchenables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity to distant sequence relationships. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits, BLAST uses a heuristic algorithm that seeks local as opposed to global alignments and is therefore able to detect relationships among sequences that share only isolated regions of similarity. There are many different types of BLAST available from the main BLAST web page. Choosing the right one depends on the type of sequence you are searching with (long, short; nucleotide protein), and the desired database. They are, ‘* blastn -Nucleotide-nucleotide BLAST blastp -Protein-protein BLAST PSI-BLAST -Position-Specific Iterative BLAST, blastx -Nucleotide 6-frame translation-protein tblastx -Nucleotide 6-frame translation-nucleotide 6-frame translation Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 13 © thlastn -Protein-nucleotide 6-frame translation ‘+ megablast -Large numbers of query sequences Procedure: 1. Goto the NCBI Blast tool webpage hitp://blast.ncbi,nlm.nih.gow/Blast.cgi. 2. Choose the Blast program (blastn, blastp, blastx, tblastx, tblastn) based on the query sequence and target you want to get. 3. Past the given sequence in the Query sequence box. 4, Choose the search set/database from where you want to retrieve the sequences from. 5. Optionally you can select name of the organism from which you want to retrieve similar sequence data set. 6. Click on Blast to retrieve the similarirelated sequence entries related to the query sequence. 7. Report the first 10 hits. 8. From the list of hits, report the e-value, coverage, gaps and identity percentage for the top hit. 9. Note down the information (Molecular functions, Biological processes, amino acid sequence) contained in the structure summary page. Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 14 SEQUENCE ALIGNMENT TOOLS Experiment 8 Multiple Sequence Alignment: CLUSTAL-OMEGA Aim: To retrieve the protein structure information of lipase from Candida rugosa from RCSB-Protein databank database and save the protein structure coordinate file in pdb format, Description: Multiple alignments of protein sequences are important tools in studying sequences. The basic information they provide is identification of conserved sequence regions. This is very useful in designing experiments to test and modify the function of specific protein s, in predicting the function and structure of proteins, and in identifying new members of protein families. Sequences can be aligned across their entire length (global alignment) or only in certain regions (local alignment). This is true for pairwise and multiple alignments. Global alignments need to use gaps (representing insertions/deletions) while local alignments can avoid them, aligning regions between gaps. Clustal-O is a fully automatic program for global multiple alignment of DNA and protein sequences. The alignment is progressive and considers the sequenceredundancy. Trees can also be calculated from multiple alignments. The program has some adjustable parameters with reasonable defaults. Procedure: 1. Go to the CLUSTAL-O website http://www. cbi.ac.uk/clustalw/ 2. Copy and paste the catenated FASTA sequence file into the CLUSTALW data window > Sequence #1 > Sequence #2 3. Orupload a file that includes all your sequences in an acceptable format (.Fasta, .txt). 4. Use the default alignment conditions provided by the program. Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 15 Note the different kinds of msa output formats. One is the align format with numbers, and the second is the FASTA format with the aligned sequences joined end to end in FASTA format, with gaps in each sequence corresponding to the alignment. 6. When viewing your results, these are the consensus symbols used by ClustalW: a, "*" means that the residues or nucleotides in that column are identical in all sequences in the alignment. b. ":" means that conserved substitutions have been observed. c. "." means that semi-conserved substitutions are observed 7. If you would like to see your results in color, push the button that displays Show Colors. Click Hide Colors to get rid of color. 8 Save this file for later referene: Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 16 PHYLOGENETIC ANALYSIS Experiment 9 Phylogenetic Tree Construction and Analysis Aim: To construct phylogenetic tree and analyze the evolutionary relationship between the given set of Description: Aphylogenetic tree, also known as aphylogeny, is a diagram that depicts the lines of evolutionary descent of different species, organisms, or genesfrom a common ancestor. Phylogenies are useful for organizing knowledge of biological diversity, for structuring classifications, and for providing insight into events that occurred during evolution, Furthermore, because these trees show descent from a common ancestor, and because much of the strongest evidence for evolution comes in the form of common ancestry, one must understand phylogenies in order to fully appreciate the overwhelming evidence supporting the theory of evolution. Tree diagrams have been used in evolutionary biology since the time of Charles Darwin, Therefore, one might assume that, by now, most scientists would be exceedingly comfortable with "tee thinking"-reading and interpreting phylogenies. However, it tums out that the tree model of evolution is somewhat counterintuitive and easily misunderstood, This may be the reason why biologists have only in the last few decades come to develop a rigorous understanding of phylogenetic trees. This understanding allows present-day researchers to use phylogenies to visualize evolution, organize their knowledge of biodiversity, and structure and guide ongoing evolutionary research. Phylogeny. fr has been designed to provide a high performance platform that transparently chains programs relevant to phylogenetic analysis in a comprehensive, and flexible pipeline. Although Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 7 phylogenetic aficionados will be able to find most of their favorite tools and run sophisticated analysis, the primary philosophy of Phylogeny.fr is to assist biologists with no experience in phylogeny in analyzing their data in a robust way. Procedure: 1. Go to hitp://www.phylogeny.fi! phyologenetic tree construction server webage. 2. The Phylogeny.fr platform offers a phylogeny pipeline which can be executed through three main modes: 3. The "One Click mode" targets users that do not wish to deal with program and parameter selection. By default, the pipeline is already set up to run and connect programs recognized for their accuracy and speed (MUSCLE for multiple alignment and PhyML for phylogeny) to reconstruct a robust phylogenetic tree from a set of sequences. 4, In the "Advanced mode", the Phylogeny.fr server proposes the succession of the same programs but you can choose the steps to perform (multiple sequence alignment, phylogenetic reconstruction, tree drawing) and the options of each program. The "A la carte mode" offers the possibility of running and testing more alignment and phylogeny programs, such as MUSCLE, ClustalW, T-Coffee, PhyML, BioNJ, TNT, 6. Altematively, you have the possibility to run the different programs separately. Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 18 VISUALIZATION OF BIOMOLECULES Experiment 10 Sketching and Visualization of Small molecules- Avogadro Aim: To sketch the structure of given molecule (Asprin) and visualize structure of proteins and small molecules. Description: A molecule editors are the computer program for creating and modifying representations of chemical structures. Molecule editors can manipulate chemical structure representations in either two- or three-dimensions, Two-dimensional editors generate output used as illustrations or for querying chemical databases. Three-dimensional molecule editors are used to build molecular models, usually as part of molecular modelling software packages, Avogadro is an advanced molecule editor and visualizer designed for cross-platform use in computational chemistry, molecular modeling, bioinformatics, materials science, and related areas. It offers flexible high quality rendering and a powerful plugin architecture, Procedure: 1. When you open Avogadro without loading a file it defaults to using the Draw Tool. This tool is the main one that will be used when drawing a new molecule. 2, Select the element you would like to use by clicking on the drop down menu, if the element you would like to use is not in that list then selecting ‘Other...’ will bring up a full periodic table. 3. Once you have selected an element you can left click on the black (currently empty) view. This will place one atom of the element you selected under the mouse cursor. Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 19 4. If the ‘Adjust Hydrogens' option is checked then the appropriate number of hydrogens will be added to the molecule for normal valence. Removing the check from the box will stop this from being done. 5. If you find you have made a mistake or wish to delete a particular atom then right clicking on the atom (or bond) will delete it. 6. Three force fields are available which can be used to optimize the structure of the molecules drawn. It defaults to use one called MMFF94 which can handle most common organic structures. 7. Click over ‘Extensions->Optimize Geometry’ is usually enough to give you a reasonably well optimized structure. 8. Once you are satisfied with the molecule you have drawn you can save it using the 'File->Save As.... menu item. 9. Save molecule in .mol2 format. Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 20 VISUALIZATION OF BIOMOLECULES Experiment 11 Protein Visualization using RASMOL To retrieve the protein structure from RCSB-Protein databank database and visualize the structure in RasMol. Description: Protein structure visualization using RasMol (Software-based): RasMol is a program for visualization of molecular structure. RasMol is a powerful educational tool for showing the structure of DNA, proteins and smaller molecules. It is also a powerful research tool. It is easy to use and produces beautiful, space-filling, colored, 3-dimensional images. Procedure: 1, Download the PDB (Protein Data Bank, http://www resb.org/pdb/) file of Trypsin (PDB code: ITRY), which consists of co-ordinates of atoms (in 3-D). Save the file with pdb extension, [Note: We can obtain PDB code of protein by Text search option in PDB (SearchLite)]. 2. Open this file in RasMol (Raswin) and view the three-dimensional structure. Download rasmol from: http://www _bernstein-plus-sons.com/software/rasmol 3. Highlight the Ser-His-Asp catalytic triad. The positions of these three residues in the sequence are given as :Asp (102), His (57) and Ser (195). 4, Just type select residue location (example: select 102) in RasMol command line. And then type colour colour-name (example: colour Red). This will highlight the selected Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 21 residue in the specified colour. [Note: To see available colours, type help colours in RasMol command line.] Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 22 PROTEIN MODELLING Experiment 12 Protein Modelling using Swiss-Model Aim: To model structure of the given protein sequence using open source tool swiss model. Description: SWISS-MODEL (http:/swissmodel.expasy.org) is a server for automated comparative modeling of threedimensional (3D) protein structures. It pioneered the field of automated modeling starting in 1993 and is the most widely-used free web-based automated modeling facility today. In 2002 the server computed 120000 user requests for 3D protein models. SWISS-MODEL provides several levels of user interaction through its World Wide Web interface: in the 'firstapproach mode’ only an amino acid sequence of a protein is submitted to build a 3D model. Template selection, alignment and model building are done completely automated by the server. In the ‘alignment mode’, the modeling process is based on a user-defined target-template alignment. Complex modelingtasks can be handled with the ‘project mode’ using DeepView (Swiss- PabViewer), an integrated sequence-to-structure workbench. All models are sent back via email with a detailed modeling report. WhatCheck analyses and ANOLEA evaluations are provided optionally. ‘The reliability of SWISS-MODEL is continuously evaluated in the EVA-CM project. The SWISS-MODEL server is under constant development to improve the successfull implementation of expert knowledge into an easy-to-use server Procedure: 1. Go to the NCBI home page again and follow the “BLAST” link submit the sequence to BLAST selecting/searching against the PDB database alone, Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 23 2, The resulting sequences are from sequences deposited with a known 3D structure and the four digit PDBcode is next to the words “pdb”. 3. Record the PDBcodes for the different known 3D structures which align with the sequence. 4. Go to the PDB. http://www.resb.orgipdb/ and eithertype in a four digit code for one (or more) of the structures 5. Check the different PDBcodes to find the one which structure was solved to the highest resolution. 6 Download one or more of these structures from the PDB. The files may have an “ent” file designation. These are equivalent to “pdb” files 7. Run RASMOL and view your protein. 8. Now you know that a reasonable structure exists, submit the sequence to the Swiss- Model web site. http://www.expasy.ch/swissmod/SWISS-MODEL html 9. The SWISS-MODEL server may take ~0.5-3 hours to return the results of the modeling exercise. You don’t have to submit a PDB for SWISS-MODEL to use, it will use a mixture of all the top hits. 10. Save the models to a file and view with RASMOL Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 24 PROTEIN MODELLING Experiment 13 Protein Modelling using VLife MDS Aim: To model structure of the given protein sequence using Vlife MDS tools Description: Protein structure prediction is the prediction of the three-dimensional structure of a protein from its ‘amino acid sequence — that is, the prediction of its folding and its secondary, tertiary, and quaternary structure from its primary structure. Structure prediction is fundamentally different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry; it is highly important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes). The term “homology modeling", also called comparative modeling or template-based modeling (TBM), referes to modeling a protein 3D structure using a known experimentally determined structure of a homologous protein as a template. A protein structure is always of great assistance in the study of protein function, dynamics, interactions with ligands and other proteins, and even within pharmaceutical industry in structure-based drug discovery and drug design. Homology modeling can provide the molecular biologists and biochemists with "low-resolution" structures, which will contain sufficient information about the spatial arrangement of important residues in the protein and which may guide the design of new experiments. For example, the design of site-directed mutagenesis experiments could be considerably improved if such "low-resolution" model structures could be used. Procedure: 1, Open Vlife MDS then load Module for homology modelling under Module>Biopredicta. Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 25 2. The sequence of the target, human estrogen receptor P03372.seq, is provided in the Homology Modelling folder (\Tutorial\BioPredicta\Homology Modelling) in the FASTA format. This sequence can be downloaded in the FASTA format from the SWISSPROT website (http://www.expasy.chisprov). Click on Biopredicta Tools >> BLAST >> Run. 3. Click on the Read FASTA File button. This will open a file browser window. Browse to the Homology_Modelling folder and choose the P03372.seq file. 4, Select the comparison matrix of choice from the Matrix list box. For our example we will use the default BLOSUM62 matrix. Choose No, Of Targets as 15. This number can be changed by the user, to obtain more, or less results from the BLAST run. Click on the Browse button to save the output file. In the dialog box at the same location as that of the .seq file type P03372. The output will be saved as P03372.out. S. Click on Run button. The output window shows the BLAST command that has been executed locally. 6. Click on Biopredicta Tools >> BLAST >> Edit to open the Sequence Alignment Editor box. 7. Click on Blast >> Open Blast Output to open a file browser window. Browse to the Homology Modelling folder and choose the file P03372.out and click Open. 8. Choose INDE as template, Click on Blast >> Open Blast Output choose INDE and click on OK. 9. Click on BioPredicta Tools >> Homology Modeling to open the following dialog box. 10.Click on Browse to select the template, INDE template.mol2, from the Homology Modelling folder. 11. Click on Read FASTA File to read P03372.seq from the Homology_Modelling folder. 12. Click on Browse to read the Blast output file. 13. Click on Browse to save the Homology model, browse to the Homology Modelling folder and name the file as Human_estrogen_receptorA and choose the mol2 format. Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 26 14, Details of the automated homology model building process can be checked in the Task Manager. 15. Once the job is complete, click on Open Result. This will display the homology model on the main MDS window. 16. To superimpose and find RMSD between two proteins, Load the homology model human_estrogen_receptora_Inde_model_xxx.mol2 and the structure IL2I_A.pdb (a chain of the pdb 1121) from the Homology Modelling folder. 17. Then Click on Biopredicta Tools >> Superimpose Select 1L21_A.pdb in Moleculel window, human_ Select the following residues in Moleculel and Molecule2, 310-416, 418-459, 469-525. trogen_receptora_Inde_model_xxx.mol2_ in Molecule? window. Choose Backbone option and click on OK. 18. Click View >> Output window or the main window, the RMSD as 3.5712A is displayed. The quality of the model depends how similar the template is to the actual target, as in our case we find that the model has an RMSD of 3.5712A with the crystal structure, although the identity 59% and similarity 83% are moderately high. 19, The superimposed original estrogen receptor 1L2I_A.pdb (Yellow color) and human_estrogen_receptora_Inde_model (Purple color) are displayed in the tube shaped models. Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 27 PROTEIN DOCKING Experiment 14 Protein Docking using VLife MDS Aim: To dock given protein molecule with the ligand molecule by grid docking using VLife MDS. Description: The application of computational methods to study the formation of intermolecular complexes is a subject of intensive research. Drug exerts its biological activity by binding to the pocket of it receptor molecule (usually protein). In their binding conformations, the molecules exhi geometric and chemical complementarily, both of which are essential for successful drug activity. The computational process of searching for a ligand that is able to fit both geometrically and energetically into the binding site of a protein is called molecular docking. Molecular docking helps in studying drug/ ligand or receptor/ protein interactions by identifying the suitable active sites in protein, obtaining the best geometry of ligand - receptor complex and calculating the energy of interaction for different ligands to design more effective ligands. Procedure: 1. Open Vlife MDS then load Module for homology modelling under Module>Biopredicta 2, Open the [ROB.mol2 receptor file from Tutorial BioPredicta\| ROB\Targets\I ROB.mol2 path and one conformer of the ligand C2P.mol2 from Tutorial\BioPredicta\l ROB\APIS\C2P path using the File >> Open function from the VLif¢MDS main window. Invoke the Grid Docking dialog box by clicking BioPredicta Tools >> Docking >> Grid Docking. Click Next to proceed to GA Based Docking Wizard, Ligand Molecule Selection dialog box, which displays ligand molecule with all ligand rotatable bonds for flexing their Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 28 corresponding torsional angles. If the flexible ligand option in terms of its rotatable bonds option is chosen then different conformers of the ligand chosen are automatically generated, which are then docked with the given receptor. Choose C2P from Select Ligand and C2-N1-C9-C10 from Rotatable Bonds. Click > to move the selected rotatable bond to the Selected Rotatable Bonds panel as shown below. 4. Click Next to proceed to GA Based Docking Wizard - Step 3 of 3, GA Parameter dialog box. Key in 400 as the number of Generations, retain the other GA default parameters. 5. Click Finish to start the GA based docking process. 6. To compare this best ligand pose docked inside receptor with the original co erystallized ligand (IROB_ligand_Original.mol2) access Tutorial\IROB\ligands\IROB_ligand_Original.mol2 and Tutorial\ROB\Targets\ROB_C2P_DockedComplex.mol2 using File >> Open option of VLifeMDS main window 7. Access BioPredicta Tools >> Edit to open the Edit Biopolymer dialog box. Choose ligand C2P from the list of Select Molecule and C2P126 from Select Residue option and check the Hilight option. The docked ligand C2P will be highlighted in yellow color. Interpretation: Results: Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 29 PERL PROGRAMMING Experiment 15 Basic mathematical operations Aim: To dock protein molecule Program : #Y/usn/bin/perl -w Sx=10; print” \¢The value of first variable,x is : $x\n"; Sy-5; print”\The value of second variable,y is : Sy\n"; Ssum=$x+Sy; print” The sum of Two variables is : Ssum\n"; Sdiff-Sx-Sy; print” The difference of Two variables is : $diff\n”; exit; Output : The value of first variable,x is : 10 The value of second variable,y is : $ The sum of Two variables is : 15 The difference of Two variables is : 5 Department of Biotechnology, NIT, Warangal 30 Bioinformatics Lab Manual PERL PROGRAMMING Experiment 16 Coneatenating DNA Aim: To Concatenating DNA sequences using PERL Program : #Y/use/bin/perl -w # Concatenating DNA # Store two DNA fragments into two variables called SDNA1 and SDNA2 SDNAI ~ 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC’; SDNA2 = ‘ATAGTGCCGTGAGAGTGATGTAGTA’; # Print the DNA onto the screen print "Here are the original two DNA fragments:\n\n"; print SDNAL, "in"; print SDNA2, "\n'n"; # Concatenate the DNA fragments into a third variable and print them # Using "string interpolation” SDNA3 = "SDNAISDNA2"; print "Here is the concatenation of the first two fragments (version 1):\n\n"; print "SDNA3\n\n"; # Analtemative way using the "dot operator”: # Concatenate the DNA fragments into a third variable and print them SDNA3 = SDNAI . SDNA2; print "Here is the concatenation of the first two fragments (version 2):\n\n"; print "SDNA3inin"; print "Here is the concatenation of the first two fragments (version 3):\n\n"; print SDNAL, $DNA2, "\n"; exit; Output : Here are the original two DNA fragments: ACGGGAGGACGGGAAAATIACTACGGCATTAGCATAGIGCCGTGAGAGTGATGTAG TA Here is the concatenation of the first two fragments, (version 1): Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 31 ACGGGAGGACGGGAAAATTACTACGGCATTAGCATAGTGCCGTGAGAGTGATGTAG TA Here is the concatenation of the first two fragments, (version 2) ACGGGAGGACGGGAAAATTACTACGGCATTAGCATAGTGCCGTGAGAGTGAT TA Here is the concatenation of the first two fragments (version 3) ACGGGAGGACGGGAAAATTACTACGGCATTAGCATAGTGCCGTGAGAGTGATGTAG TA Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 32 PERL PROGRAMMING Experiment 17 Transcribing DNA into RNA Aim: To Transcribe DNA sequence into RNA sequence using PERL Program #Yust/bin/perl -w # Transcribing DNA into RNA. # The DNA SDNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC; # Print the DNA onto the sereen print "Here is the starting DNA:\nin"; print "SDNA\nin"; # Transcribe the DNA to RNA by substituting all T's with U's. SRNA = SDNA; SRNA =~ s/T/U/g; # Print the RNA onto the screen print "Here is the result of transcribing the DNA to RNA&wnin"; print "SRNA\n"; # Exit the program. ox Out Put: Here is the starting DNA: ACGGGAGGACGGGAAAATTACTACGGCATTAGC Here is the result of transcribing the DNA to RNA: ACGGGAGGACGGGAAAAUUACUACGGCAUUAGC. Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 33 PERL PROGRAMMING Experiment 18 Caleulating the reverse complement of a strand of DNA Aim: To Calculate the reverse complement of a strand of DNA using PERL Program : #\/ust/bin/perl -w # Calculating the reverse complement of a strand of DNA SDNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC; # Print the DNA onto the screen print "Here is the starting DNA:\nin"; print "SDNA\n\n"; # Calculate the reverse complement # Warning: this attempt will fail! # First, copy the DNA into new variable Srevcom # (short for REVerse COMplement) # Notice that variable names can use lowercase letters like #revcom” as well as uppercase like "DNA". In fact, # lowercase is more common. ## It doesn't matter if we first reverse the string and then # do the complementation; or if we first do the complementation # and then reverse the string. Same result each time. # So when we make the copy we'll do the reverse in the same statement. Srevcom = reverse SDNA; # Next substitute all bases by their complements, BAST, TA, G>C, CoG # Sreveom =~ s/A/T/ Sreveom =~ s/T/AI Sreveom =~ s/GIC/ Sreveom = s/C/G) # Print the reverse complement DNA onto the screen print "Here is the reverse complement DNA:\n\n"; print "Srevcom\n"; # Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 34 # Ob-oh, that didn’t work right! # Our reverse complement should have all the bases in it, since the # original DNA had all the bases--but ours only has A and G! # # Do you see why? # # The problem is that the first two substitute commands above change #all the A's to T's (so there are no A's) and then all the # 7's to A's (so all the original A’s and T's are all now A’). # Same thing happens to the G's and C's all turning into G's. print "\nThat was a bad algorithm, and the reverse complement was wrong!in"; print "Try again ... \nin" # Make a new copy of the DNA (see why we saved the original?) Srevcom = reverse SDNA; # See the text for a discussion of tri! Sreveom =~ tr/ACGTacgtiTGCAtgea/; # Print the reverse complement DNA onto the screen print "Here is the reverse complement DNA:\n\n"; print "Srevcom\n"; print "\oThis time it worked! n\n"; exit; Output : Here is the starting DNA: ACGGGAGGACGGGAAAATTACTACGGCATTAGC Here is the reverse complement DNA: GGAAAAGGGGAAGAAAAAAAGGGGAGGAGGGGA That was a bad algorithm, and the reverse complement was wrong! Try again Here is the reverse complement DNA: GCTAATGCCGTAGTAATITTCCCGTC This time it worked! TCCCGT Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 35 PERL PROGRAMMING Experiment 19 Reading protein sequence data from a file Aim: To Read a protein sequence data from a file using PERL Program #Y/use/bin/perl -w # Reading protein sequence data from a fie, take to the filename of the file containing the protein sequence data Sproteinfilename = 'NM_021964fragment.pep'; # First we have to "open" the file, and associate a "filehandle” with it. We choose the filehandle # PROTEINFILE for readability. open(PROTEINFILE, Sproteinfilename); # Now we do the actual reading of the protein sequence data from the file, by using the angle brackets < and > to get the input from the filehandle, We store the data into our variable Sprotein. # Since the file has three lines, and since the read only returning one line, we'll read a line and print it, three times. # First line Sprotein = ; # Print the protein onto the screen print "\nHere is the first line of the protein file:\n\n"; print Sprotein; # Second line Sprotein = ; # Print the protein onto the screen print "\nHere is the second line of the protein file:\n\n"; print Sprotein; # Third line Sprotein = ; # Print the protein onto the sereen print "\nHere is the third line of the protein file:\n\n"; print Sprotein; # Now that we've got our data, we can close the file. close PROTEINFILE; exit; Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 36 Output: Here is the first line of the protein file: MNIDDKLEGLFLKCGGIDEMQSSRTMV VMGGVSGQSTVSGELQD Here is the second line of the protein file: SVLQDRSMPHQEILAADEVLQESEMRQQDMISHDELMVHEETVKNDEEQMETHERLPQ Here is the third line of the protein file: GLQY ALNVPISVKQEITFTDVSEQLMRDKKQIR Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 37 PERL PROGRAMMING Experiment 20 Array operations Aim: To perform various array operations using PERL Program 1: Pop operation using arrays #Y/usr/bin/perl —w_ @ases = ('A','C, Sbasel = pop @bases; print "Here's the element removed from the end: "; print Sbasel, "wn\n"; print "IHere's the remaining array of bases: "; print "@bases"; TY); OutPut Here's the element removed from the end: T Here's the remaining array of bases: A C G Program 2: Shift operation on arrays #Yust/bin/perl —w @bases = ('A','C,'G','T); Shase2 = shift @bases; print "IHere's an element removed from the beginning: "; print Sbase2, "\n\n"; print "Here's the remaining array of bases: "; print "@bases"; Output : Here's an element removed from the beginning: A Here's the remaining array of bases: CGT Program 3: Unshift operations on arrays #V/ust/bin/perl -w @bases = ('A','C, T); Department of Biotechnology, NIT, Warangal Bioinformatics Lab Manual 38 Sbasel = pop @bases; unshift (@bases, Sbase1); print "Here's the element from the end put on the beginning:"; print "@basesinin"; Output: Here's the element from the end put on the beginning: T A CG Program 4:Push operation on arrays #/usr/bin/perl —w_ @bases = ('A', 'C’, 'G', 'T’ Sbase2 = shift @bases; push (@bases, $base2); print "Here's the element from the beginning put on the end:"; print "@bases\n\n"; Output: Here's the element from the beginning put on the end: CGT A Program 5:Reverse of an array #V/ust/bin/perl -w @bases = ('A','C,'G', T @reverse = reverse @bases print "Here's the array in reverse: "; print "@re n” e\n\n! Output: Here's the array in reverse: TGC A Department of Biotechnology, NIT, Warangal 39 Bioinformatics Lab Manual

You might also like