You are on page 1of 4

Accessing Molecular Data & Web Tools.

Accessing Molecular Data & Web Tools

Somchai Saengamnatdej
April 25, 2010

1. Retrieving a genome/chromosome

EBI genomes (http://www.ebi.ac.uk/genomes/)


● For downloading a chromosome.
● Click on eukaryota to see a list.
● Scroll down to find the chromosome you want.
● Click on the accession number, then save file.

Genomes online database, GOLD (http://genomesonline.org)


● To retrieve a genome.
● Click on 'Enter GOLD'
● Click on 'Search GOLD'
● Type your name of organism in 'Organism Name Box'
● When a list appears, click on the link (in Data-Search column), you will be led to the entry in
NCBI/ GenBank

Artemis (If you know the accession number)


● Click 'File', then 'Open from EBI-Dbfetch'.
● Type in the accession number.
● Click OK.

2. Retrieving a protein sequence.

SRS (http://srs.ebi.ac.uk)
● In Quick Text Search Window, select 'Protein' in find box, and type in 'name of your protein' in
matching box. Them, click 'Search'.
● When the list shows up, go to the entry with the 'accession number' you want.
● Tick in the box at the start of the entry.
● In the 'Display Options' window, select 'UniprotView' in the 'view results using: box'
● Then, click on "Apply Display Options" button.
● When the window of a list appears, double click on the UniProtKB to open.
● When the full entry shows up, scroll through the entry. (General information, description &
origin of the protein, published/unpublished references, comments on the function of the gene,
database cross references, keyword, sequence features, & sequence.)
● Click on the hyper-linked text to go to the database entries.
● Go back to the query list page.
● Now, again tick into the box at the start of the entry.
● On the Result Options window, select 'FastA' in the Launch analysis tool box.
● Click 'Save'
● The new window shows up, select 'FastaSeqs' in save with box.
● In the window 'Output To', select 'Browser Window (HTML)'
Accessing Molecular Data & Web Tools. 2

● Click 'Save'.

3. Annotation of a gene.

Protein domain prediction

PROSITE (http://www.expasy.ch/prosite/)
● Paste the protein sequence retrieved from a database in the box provided.
● Click on 'Scan'.
● In the results viewer, there is a list of Prosite hits, click on the individual hits to go to the
specific entries and read their descriptions.
● There is a high level of false positives because prosite motif patterns are generally small and
rarely cover complete domains.
● The more reliable methods (Pfam, SMART) use HMMs (by searching against a library of
HMMs describing hundreds of conserved domains.
● Pfam (http://pfam.sanger.ac.uk/)
● Select 'SEQUENCE SEARCH'
● Paste your protein sequence in the box.
● Click 'Go'
● A 'progress' window appears.
● Then, search results window shows up.
● There is a list of 'significant' & 'insignificant' matches and an interactive graphical output.
● Click on the link in the 'Family' column. to go to the entry.
● In the Pfam entry page, click on the tabs at the top (Domain organization & Species
distribution)

SMART (http://www.embl-heidelberg.de/)
● Paste the protein sequence into the box.
● Select all the search options available.
● Click on 'Sequence SMART' to run.
● Output
● Schematic output.
● Description of the programs that are used to produce the schematic output.
● Interaction network.
● Other output including BLAST results.

InterPro (http://www.ebi.ac.uk/interpro/)
● A database of protein families, domains and functional sites.
● Identifiable features found in known proteins can be applied to unknown protein sequences.
● The icons at the bottom of the page are about the databases involved.
● Enter the interProScan Sequence Search page by clicking on the 'InterProScan' (on the left
column).
● The submission form presents.
● Paste the sequence of your protein in the box.
● In the 'Results', select 'interactive'
● Check all in 'APPLICATIONS TO RUN'
Accessing Molecular Data & Web Tools. 3

● Then, click 'Submitt Job' button.


● The temporary window appears.
● The results page shows up. There is a list of hits, click on 'InterPro' link to go to the entry.

BLOCKS (http://www.blocks.fhcrs.org/)
TIGRfam (http://www.tigr.org/TIGRFAMs/)
PRINTS (http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/)
ProDom (http://prodom.prabi.fr/prodom/current/html/home.php)

Transmembrane predictions

TMHMM (http://www.cbs.dtu.dk/services/TMHMM/)
● Open the TMHMM v2.0 server page from the URL above.
● Paste the protein sequence in the box.
● Select output format as 'Extensive, with graphics'
● Click on 'Submit'
● The results are in tabular output and graphics.
● How many transmembrane domains in the protein. Try 'TMPRED' at the URL below to
compare the result.

TMPRED (http://www.ch.embnet.org/software/TMPRED_form.html)

PHOBIUS (http://phobius.cgb.ki.se/)

Signal peptide prediction

SignalP (http://www.cbs.dtu.dk/services/TMHMM/)
● Go to the SignalP3.0 Server output page.
● Paste the protein sequence into the box.
● Select your search options and output format.
● Click on 'Submit' button.
● The prediction results are graphical, tabular, and SignalP-HMM outputs.
● Try 'PSORT' at the following URL to compare the results.

PSORT (http://psort.nibb.ac.jp/)

RNA annotation

tRNA Scan (http://selab.janelia.org/tRNAscan-SE/)


● Go to t-RNA Scan server
● Select your sequence format, source organism, analysis type, & output format
● Paste the genome DNA sequence in the box.
● Click on 'Run tRNAscan -SE'
Accessing Molecular Data & Web Tools. 4

Rfam (http://www.sanger.ac.uk/Software/Rfam/ or http://Rfam.sanger.ac.uk/)

4. Access the sequence read archive.

Sequence Read Archive (SRA) (http://www.ebi.ac.uk/ena)


● Type 'RNA-seq Plasmodium falciparum' into the box, in All Databases.
● When the new page appears, click on 'Nucleotide Sequences'.
● In a list on a new page, click on the link 'Experiments'
● A list of all the RNA experiments will show up.
● Click on the red arrow at the end of the line of the entry to expand the window.
● Then, click on 'Runs' at the end of the window.
● The SRA Run Record shows up and allows you to download the RNA-seq data.

Sequence Read Archive (SRA)(http://www.ncbi.nlm.nih.gov/sra)

5. Other web-based resources.

Entrez (http://ww.ncbi.nlm.nih.gov/Entrez/)
Blast searches (http://www.ncbi.nlm.nih.gov/BLAST)
Fasta searches (http://www.ebi.ac.uk/fasta33/)

Expasy Molecular Biology Server: (http://ca.expasy.org/)

References
See my previous documents.

You might also like