You are on page 1of 10

Banche

di da* Biologiche
Esercitazione 1 Ricerche testuali

Medline - PubMed h:p://www.ncbi.nlm.nih.gov/pubmed


Database della Na#onal Library of Medicine Con*ene pubblicazioni scien*che dal 1966 Medline un so:oinsieme di PubMed
Ad esempio, quando una rivista scien*ca inserita in Medline, la banca da* con*ene i numeri dallinserimento in poi. In PubMed vengono recupera* anche i numeri preceden*

Con*ene informazioni su disordini a origine gene*ca, e sui geni coinvol* Nasce come pubblicazione cartacea ad inizio anni 60, ed informa*zzata nel 1985 (John Hopkins University) Con*ene informazioni su oltre 12000 geni

OMIM, Online Mendelian Inheritance in Man

NCBI Taxonomy
Classicazione tassonomica

Sistemi di interrogazione (Retrieval)


Entrez h:p://www.ncbi.nlm.nih.gov/sites/gquery SRS, Sequence Retrieval System h:p://srs.ebi.ac.uk/
Entrez sviluppato da NCBI, un sistema chiuso. Viceversa SRS, pu essere installato su qualsiasi calcolatore, e la ricerca pu essere estesa a qualunque banca di da*

Ricerca in Entrez:
termine ricercato[campo di ricerca] <operatore logico> Gli operatori logici sono AND, OR, NOT (BUTNOT)

homo sapiens[ORGN]

Kv1.2[GENE]

Kv1.2[GENE] OR homo sapiens[ORGN]

Kv1.2[GENE] AND homo sapiens[ORGN]

Entrez Search Field Descriptions and Qualifiers


Search Field
Definition Contains the unique accession number of the sequence or record, assigned to the nucleotide, protein, structure, genome record, or PopSet by a sequence database builder. The Structure database accession index contains the PDB IDs but not the MMDB IDs. Contains all terms from all searchable database fields in the database. Contains all authors from all references in the database records. The format is last name space first initial(s), without punctuation (e.g., marley jf). Number assigned by the Enzyme Commission or Chemical Abstract Service (CAS) to designate a particular enzyme or chemical, respectively. Contains the biological features assigned or annotated to the nucleotide sequences and defined in the DDBJ/EMBL/GenBank Feature Table (http://www.ncbi.nlm.nih.gov/projects/collab/FT/i ndex.html). Not available for the Protein or Structure databases. Contains predetermined or filtered subsets of the various databases. These subsets or filters are created by grouping records that are commonly linked to other Entrez databases or within the same database. For example, the PopSet database Filter index includes PopSet all, PopSet medline, PopSet nucleotide, and PopSet protein. The PopSet medline filter includes all PopSet records with links to PubMed; the PopSet nucleotide filter includes all PopSet records with links to the nucleotide database; and, the PopSet protein filter includes all PopSet records with links to the protein database. The PopSet all filter includes all PopSet records. The Nucleotide database Filter index contains a great deal more filters because the database records are linked to numerous external links. For more information see Link Out.

Qualifier

Search Field

Definition Molecular weight of a protein, in Daltons (Da), calculated by the method described in the Searching by Molecular Weight section of the Entrez help document. Note that molecular weight must be entered as a fixed 6 digit field, filled with leading zeros (not letter O), e.g., 002002 [MOLWT] Contains the scientific and common names for the organisms associated with protein and nucleotide sequences. Contains the number of the first journal page of the article in which the data were published. Contains the primary accession number of the sequence or record, assigned to the nucleotide, protein, structure, genome record, or PopSet by a sequence database builder. A Primary Accession index is not available in the Structure database. Contains properties of the nucleotide or protein sequence. For example, the Nucleotide database's Properties index includes molecule types, publication status, molecule locations, and GenBank divisions. A Properties index is not available in the Structure database. Contains the standard names of proteins found in database records. Common names may not be indexed in this field so it is best to also consider All Fields or Text Words. A Protein Name index is not available in the Structure database. Contains the date that records are released into Entrez, in the format YYYY/MM/DD (e.g., 1999/08/05). It is the date the entry first appeared in GenBank explicitly indexed in Entrez. A year alone, (e.g., 1999) will retrieve all records for that year; a year and month (e.g., 1999/03) will retrieve all records released into GenBank for that month. Contains the special string identifier, similar to a FASTA identifier, for a given sequence. A SeqID String index is not available in the Structure database. Contains the total length of the sequence. Sequence Length indexes are not available in the Structure or PopSet databases. Contains the names of any chemicals associated with this record from the CAS registry and the MEDLINE Name of Substance field. Substance Name indexes are not available in the Genome or PopSet databases. Contains all of the "free text" associated with a record. Includes only those words found in the definition line of a record. The definition line summarizes the biology of the sequence and is carefully constructed by database staff. A standard definition line will include the organism, product name, gene symbol, molecule type and whether it is a partial or complete cds. Title Word indexes are not available in the Structure or PopSet databases. Contains the Medline unique identifier for records that contain published references that are linked to PubMed. The Uid index is not browsable. Contains the volume number of the journal in which the data were published.

Qualifier

Accession

[ACCN]

Molecular Weight

[MOLWT]

All Fields Author Name EC/RN Number

[ALL] Organism [AUTH] Page Number Primary Accession

[ORGN] [PAGE]

[ECNO]

[PACC]

Feature Key

[FKEY] Properties

[PROP]

Protein Name

[PROT]

Filter

[FILT]

Publication Date

[PDAT]

SeqID String Sequence Length [GENE] [ISS] Text Word [JOUR] Title Word Substance Name

[SQID]

[SLEN]

Gene Name Issue

Contains the standard and common names of genes found in the database records. This field is not available in Structure database. Contains the issue number of the journal in which the data were published. Contains the name of the journal in which the data were published. Journal names are indexed in the database in abbreviated form (e.g., J Biol Chem). Journals are also indexed by their by ISSNs. Browse the index if you do not know the ISSN or are not sure how a particular journal name is abbreviated. Contains special index terms from the controlled vocabularies associated with the GenBank, EMBL, DDBJ, SWISS-Prot, PIR, PRF, or PDB databases. Browse the Keyword indexes of the individual databases to become familiar with these vocabularies. A Keyword index is not available in the Structure database. Contains the date that the most recent modification to that record is indexed in Entrez, in the format YYYY/MM/DD (e.g., 1999/08/05). A year alone, (e.g., 1999) will retrieve all records modified for that year; a year and month (e.g., 1999/03) retrieves all records modified for that month that are indexed in Entrez.

[SUBS]

[WORD]

Journal Name

[TITL]

Keyword

[KYWD] Uid Volume [MDAT]

[UID] [VOL]

Modification Date

Ricerca testuale in banche da*


Cercare il gene codicante per la proteina umana hERG
1. 2. 3. 4. 5. 6. 7. 8. Cercare herg in Entrez, quante entry si trovano tra i nucleo*di ? Sono tu:e rela*ve al gene hERG ? Limitare la ricerca al campo gene. Controllare una entry. Esistono altri nomi u*lizza* per il gene hERG ? Ripetere la ricerca, ampliandola ai sinonimi di hERG Restringere la ricerca a sequenze di homo sapiens Cercare la proteina codicata nella banca da* RefSeq e in Swiss-Prot. Esistono delle patologie legate a mutazioni del gene hERG ? Cercare informazioni su LQT2 nella banche da* OMIM Cercare in PubMed ar*coli che tra:ano del ruolo del gene hERG in LQT2 Ripetere le ricerche in SRS

Individuare la sequenza aminoacidica del repressore Lac (LacI) del ba@erio Escherichia Coli
1. 2. 3. 4. 5. Ricercare LacI, con eventuali sinonimi in Entrez Restringere la ricerca a Escherichia Coli Limitare la ricerca alla banca da* SwissProt Quale la funzione della proteina ? In che residui si trova il sito di legame al DNA ? E il sito di legame a indu:ori ?

Recuperare informazioni bibliograche e di sequenza relaBve al gene CFTR


1. Ricercare la sequenza di mRNA di CFTR in topo (mus musculus) 2. Cercare la sequenza del gene CFTR in uomo 3. Individuare in OMIM informazioni su eventuali patologie in cui coinvolto il gene CFTR 4. Individuare in PubMed ar*coli che tra:ano analisi cliniche in neona* 5. Limitare la ricerca a ar*coli che tra:ano il ruolo di an*bio*ci macrolidi (macrolides) 6. Limitare la ricerca al macrolide azythromycin 7. Limitare la ricerca ai macrolidi azythromycin o roxithromycin

Esercizi
Iden*care la sequenza del gene umano KCNE2
Su quale cromosoma si trova ? Che *po di proteina codica ? Si conoscono mutazioni della proteina ?

Determinare la sequenza della proteina alpha haemolysin in Escherichia Coli


Quale la funzione della proteina ?