You are on page 1of 3

Cours : Analyse des Séquences Biologiques

Chapter 1: Databases
What is a Database?
A database is a structured collection of data / information. It consists of basic units called records or
entries.

What is a Biological Database?


A biological database is an organized collection of data about biological molecules such as nucleic acid,
proteins, and polymers.
Each record consists of fields, which hold pre-defined data related to the record.
For example, a protein database would have protein entries as records and protein properties as fields
(e.g., name of protein, length, amino acid sequence, function,...).

Biological Databases are:

– Public or private : Access and submission


– Protein, nucleotide, structure, literature, annotation... – Generalized or specialized
– Curated or non-curated
– Sequence or genome-centered

Biological Databases Generalized Databases

• Nucleotide sequences
• Protein sequences
• Protein sequence motifs and patterns
• Macromolecular 3D structure
• Gene expression data

Specialized Databases

• Collection of focused information on one or more specific fields of the study. Popular Biological
Databases
Primary Databases

• Primary databases are also called as archieval database.


• They are populated with experimentally derived data such as nucleotide sequence, protein
sequence or macromolecular structure.
• Experimental results are submitted directly into the database by researchers, and the data are
essentially archival in nature.

Secondary Databases

• Secondary databases often draw upon information from numerous sources, including other
databases (primary and secondary), controlled vocabularies and the scientific literature.
• They are highly curated, often using a complex combination of computational algorithms and
manual analysis and interpretation to derive new knowledge from the public record of science.

Accession Numbers

• Each GenBank record, consisting of both a sequence and its annotations is assigned a unique identifier
called an accession number.
• Alternative splicing is a process that enables a messenger RNA (mRNA) to direct
synthesis of different protein variants (isoforms) that may have different cellular
functions or properties. It occurs by rearranging the pattern of exon elements that are
joined by splicing to alter the mRNA coding sequence.

You might also like