THE WWW SITES TO USE FOR THESE PROBLEMS ARE:Entrezhttp://www.ncbi.nlm.nih.gov/entrez/ LocusLinkhttp://www.ncbi.nlm.nih.gov/LocusLink/ SRShttp://srs.ebi.ac.uk/ SGDhttp://www.yeastgenome.org/ PIRhttp://pir.georgetown.edu/ SwissProthttp://www.expasy.ch/sprot/ READSEQhttp://searchlauncher.bcm.tmc.edu/seq-util/readseq.html or do a Web search for Readseq to locate another site.The Institute forGenomic Research(TIGR)http://www.tigr.org 1. This problem practices using the Entrez search program at the National Center forBiotechnology Information (NCBI) to perform a search for the amino acid sequence of thehuman heat shock factor HSF1. Normally a large number of matches are found in suchsearches. We will use the Entrez Boolean search features, which restrict the reportedmatches to a series of required conditions. This feature allows us to narrow the search tothe sequence that we want.This SRS Web site given above also provides powerful database search routines especiallydesigned for the retrieval of large data sets. The student is encouraged to repeat some of the following exercises on this site.a.
Go to the Entrez Web site and choose Protein from the drop-down window in theupper left.b.
Enter the terms <heat shock factor> (without the angled brackets) in the searchwindow and click the mouse on GO. This search is to find any sequence entry in theavailable protein sequence databases that have these three words anywhere in thetext. Show how many matches (hits) are found by clicking history.c.
Now reduce the search by entering the same terms but surrounding them by quotes"heat shock factor". The matches must now include this phrase. This time clickPreview to go directly to the number of hits in the protein database. What is thenumber now?d.
Now limit the search by clicking the mouse on Preview/Index, go to add terms,choose organism in the first box, type human in the second, then click AND to limitthe search to just human proteins, and then click Preview. The history will now showthe results of a search for database entrees with the term "heat shock factor" ANDoriginating from humans as the organism. How many hits are there now?e.
We can limit the hits to matches to RefSeq, which is GenBank's annotated sequencedatabase, to give a best representative sequence entry for each protein. Click themouse on Limits, and in the Limited To section of the pages, ignore the boxes on theleft, and choose RefSeq in the right box. Then click GO and history. Now we have allhuman heat shock factors in RefSeq.f.
The gene of interest is
HSF1
. Click clear in the text entry box at the top of the page,type
HSF1
, and click Preview. There should now be one entry left in History. Clicking
Leave a Comment