• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
following list.Chapter 1: Historical Introduction and Overview Chapter 2: Collecting and Storing Sequences in the Laboratory Chapter 3: Alignment of Pairs of Sequences Chapter 4: Introduction of Probability and Statistical Analysis of Sequence Alignments Chapter 5: Multiple Sequence Alignment Chapter 6: Sequence Database Searching for Similar Sequences Chapter 7: Phylogenetic Prediction Chapter 8: Prediction of RNA Secondary Structure Chapter 9: Gene Prediction and Regulation Chapter 10: Protein Classification and Structure Prediction Chapter 11: Genome Analysis Chapter 12: Bioinformatics Programming Using Perl and Perl Modules Chapter 13: Analysis of Microarrays 
Chapter 1: Historical Introduction and Overview
 This chapter describes how bioinformatics has evolved into a new field of scientificinvestigation, describes the roles of biological and computational research in this field, andprovides a brief historical account. Also provided is an overview of the chapters in thissecond edition. References to earlier and current reference books, articles, reviews, and journals provide a broader view of the field.
 
Chapter 2: Collecting and Storing Sequences in the Laboratory
 This chapter summarizes methods used to collect sequences of DNA molecules and storethem in computer files. Procedures ranging from the actual sequencing, throughdetermination of accuracy, choice of sequence format, conversions from one format toanother, storage in databases, and accessing sequences in databases are described
Table 2.5.
Major sequence databases accessible through the Internet1. GenBank at the National Center for Biotechnology Information, National Library of Medicine, Washington, D.C. accessible from:http://www.ncbi.nih.gov/Entrez/ 2. European Molecular Biology Laboratory (EMBL) Outstation at Hixton, Englandhttp://www.ebi.ac.uk/embl/index.html 3. DNA DataBank of Japan (DDBJ) at Mishima, Japanhttp://www.ddbj.nig.ac.jp/ 4. Protein International Resource (PIR) database at the National Biomedical ResearchFoundation in Washington, D.C. (see Barker et al. 1998), an annotated protein databasehttp://www-nbrf.georgetown.edu/pirwww/ 5. The SwissProt protein sequence database at ISREC, Swiss Institute for ExperimentalCancer Research in Epalinges/Lausanne, an annotated protein databasehttp://www.expasy.org/cgi-bin/sprot-search-de 6. The Sequence Retrieval System (SRS) at the European Bioinformatics Institute allowsboth simple and complex concurrent searches of one or more sequence databases. TheSRS system may also be used on a local machine to assist in the preparation of localsequence databases.http://srs6.ebi.ac.uk The databases are available at the indicated addresses and return sequence files throughan Internet browser. Many of the sites shown provide access to multiple databases. The firstthree database centers are updated daily and exchange new sequences daily, so that it isonly necessary to access one of them. Additional Web addresses of databases of proteinfamilies and structure, and genomic databases, are given in Chapters 10 and 11. Thesedatabases can also provide access to sequences of a protein family or organism.The annotated protein data banks traditionally examine the scientific literature forphysical evidence that the protein is actually produced in cells. The presence of mRNAsequences reveals that the gene is expressed but do not reveal whether or not the mRNA istranslated into a protein. However, some proteins may be difficult to detect because theyare made in small quantities, in specific cells or tissues, or at a particular time indevelopment. Codon use by the mRNA of suspect genes can be examined for consistencywith codon use by other genes that are known to be translated, as discussed in Chapter 9.
Pro
blems
> Chapter 2
 
THE WWW SITES TO USE FOR THESE PROBLEMS ARE:Entrezhttp://www.ncbi.nlm.nih.gov/entrez/ LocusLinkhttp://www.ncbi.nlm.nih.gov/LocusLink/ SRShttp://srs.ebi.ac.uk/ SGDhttp://www.yeastgenome.org/ PIRhttp://pir.georgetown.edu/ SwissProthttp://www.expasy.ch/sprot/ READSEQhttp://searchlauncher.bcm.tmc.edu/seq-util/readseq.html or do a Web search for Readseq to locate another site.The Institute forGenomic Research(TIGR)http://www.tigr.org 1. This problem practices using the Entrez search program at the National Center forBiotechnology Information (NCBI) to perform a search for the amino acid sequence of thehuman heat shock factor HSF1. Normally a large number of matches are found in suchsearches. We will use the Entrez Boolean search features, which restrict the reportedmatches to a series of required conditions. This feature allows us to narrow the search tothe sequence that we want.This SRS Web site given above also provides powerful database search routines especiallydesigned for the retrieval of large data sets. The student is encouraged to repeat some of the following exercises on this site.a.
 
Go to the Entrez Web site and choose Protein from the drop-down window in theupper left.b.
 
Enter the terms <heat shock factor> (without the angled brackets) in the searchwindow and click the mouse on GO. This search is to find any sequence entry in theavailable protein sequence databases that have these three words anywhere in thetext. Show how many matches (hits) are found by clicking history.c.
 
Now reduce the search by entering the same terms but surrounding them by quotes"heat shock factor". The matches must now include this phrase. This time clickPreview to go directly to the number of hits in the protein database. What is thenumber now?d.
 
Now limit the search by clicking the mouse on Preview/Index, go to add terms,choose organism in the first box, type human in the second, then click AND to limitthe search to just human proteins, and then click Preview. The history will now showthe results of a search for database entrees with the term "heat shock factor" ANDoriginating from humans as the organism. How many hits are there now?e.
 
We can limit the hits to matches to RefSeq, which is GenBank's annotated sequencedatabase, to give a best representative sequence entry for each protein. Click themouse on Limits, and in the Limited To section of the pages, ignore the boxes on theleft, and choose RefSeq in the right box. Then click GO and history. Now we have allhuman heat shock factors in RefSeq.f.
 
The gene of interest is
HSF1
. Click clear in the text entry box at the top of the page,type
HSF1
, and click Preview. There should now be one entry left in History. Clicking
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...