You are on page 1of 58

GENÓMICA Y BIOINFORMÁTICA.

lunes 7 de junio de 2010


Frecuentemente se relaciona el genoma
con “el libro de la vida”. Esta analogía se
debe a que ambos pueden ser leídos
secuencialmente, de principio a fin, una
letra tras otra, y porque en el genoma se
encuentra la información necesaria para

lunes 7 de junio de 2010


Genómica
Se le llama así al proceso que
lleva a la caracterización total del
material genético de un
organismo

lunes 7 de junio de 2010


Genómica

Bioinformatica
Datos genómicos
Bases de datos
Secuenciación
Análisis
de secuencias
Transcriptoma
Análisis datos
….ómicos
Proteoma
Automatización
procesos
Metaboloma

lunes 7 de junio de 2010


GENÓMICA

GENÓMICA GENÓMICA
COMPARATIVA ESTRUCTURAL

GENÓMICA
FUNCIONAL

lunes 7 de junio de 2010


GENÓMICA ESTRUCTURAL

Estudia la estructura
tridimensional de las
macromoléculas especialmente
proteínas y ácidos nucleicos, y
las funciones asociadas a ella.

lunes 7 de junio de 2010


GENÓMICA ESTRUCTURAL

 Estos estudios incluyen también :

-Organización de las secuencias dentro los


genomas.
-Asignación de loci.
-Resolución de mapas cromosómicos.
-Mapeo físico de genes.
-Secuenciación.
-Uso de mapas genómicos para el análisis de
genes

lunes 7 de junio de 2010


GENÓMICA COMPARATIVA
Es la información conocida de un organismo que
puede ser utilizada para obtener información de
otro.

Ejemplo:
-Alineamientos
-Búsqueda de motivos
-Análisis filogenético
-Predicción de estructura de proteínas

lunes 7 de junio de 2010


GENÓMICA FUNCIONAL
Campo de la biología molecular que se
propone utilizar la vasta acumulación de
datos producidos por los proyectos de
genómica (como los "proyectos genoma"
de los distintos organismos) para
describir las funciones e interacciones
entre genes (y proteínas).

lunes 7 de junio de 2010


A diferencia de la genómica y la
proteómica, la genómica funcional se
centra en los aspectos dinámicos de los
genes, como su transcripción, la
traducción las interacciones proteína-
proteína, en oposición a los aspectos
estáticos de la información genómica
como la secuencia del ADN o su
estructura.

lunes 7 de junio de 2010


Bioinformática
 Permitan manipular la información contenida en
estas bases de datos se hace indispensable.

 Caracterización y clasificación de nuevos genes


y proteínas

 Métodos que faciliten la identificación de los


elementos encargados de la regulación de los
diferentes elementos genómicos

 Determinen la historia evolutiva, la función y los


determinantes estructurales.

lunes 7 de junio de 2010


Ejemplos
 Genomica comparativa: Minimal gene set
y metagenomic comunities

lunes 7 de junio de 2010


lunes 7 de junio de 2010
Las proteínas homólogas comparten un ancestro común

lunes 7 de junio de 2010


Micoplasma genitalium “the Haemophilus influenzae
smallest among known cellular 1703 coding genes
life forms 468 coding genes Gram negative
Gram positive

Minimal set of
genes requeridos
para la vida

“Extrapolation between genomes will then


most likely accelerate the definition of what
amounts to a “parts catalog” of cellular components
in a large number of organisms”. Bernhard Palsson,
NATURE BIOTECHNOLOGY VOL 18 NOVEMBER 2000

lunes 7 de junio de 2010


22 genes con desplazamiento no-ortólogo
262 genes en común
6 genes parásito-específicos
256 genes considerados como el conjunto mínimo

lunes 7 de junio de 2010


lunes 7 de junio de 2010
Workflow and dependencies in metagenome analysis

Its really complex


and full of pitfalls!

lunes 7 de junio de 2010


Data analysis: the signs before the flood

Completely sequenced
and published Early linear growth
followed by exponential
microbial genomes increase

lunes 7 de junio de 2010


Microbial genomes Data analysis: the signs before the flood
published per year

350k 500k 750k 1.5Mio

2003 2004 2005 2006


No of ORFs in all genomes (incl. ours)
2003 2004 2005 2006

lunes 7 de junio de 2010


Microbial genomes Data analysis: the signs before the flood
published per year

14Mio

ORFs from
complete
genomes vs
metagenomics
ORFs

1.1Mio 1.5Mio
350k 500k 750k 1.5Mio

2003 2004 2005 2006

lunes 7 de junio de 2010


¿Por qué 16S?
 Its presence in almost all bacteria, often
existing as a multigene family, or operons.

 The function of the 16S rRNA gene over time


has not changed, suggesting that random
sequence changes are a more accurate measure
of time (evolution).

 The 16S rRNA gene (1,500 bp) is large enough


for informatics purposes

lunes 7 de junio de 2010


lunes 7 de junio de 2010
Comparative metagenomics
Increase of functional assignments (via
orthologous groups) with coverage
Reason for differences
Biological issues
Orthologous groups (COGs + NOGs)

GC content
Genome sizes
Phylogeny

Evolutionary speed
Evenness/Richness
Functionality

Technical issues
Sampling +preparation
Sequencing method
Assembly+annotation
Coverage
…..
Tringe*, von Mering* … Bork, Hugenholtz, Rubin Science 308(05)554
lunes 7 de junio de 2010
lunes 7 de junio de 2010
Whale fall samples
Soil DNA • Between 25 and150 distinct
ribotypes
*At least 847 distinct ribotypes
• The most abundant accounts for
from more than a dozen phyla
15 to 25%
*More than 3000 predicted
• Between 100 and 700 Mbp would
bacterial ribotypes
be needed to generate a draft
assembly for the most prevalent
*Less than 1% of the nearly
genome
150,000 reads exhibited overlap

*Between two and five billion bp


would be necessary to obtain the
eightfold coverage
Acid mine drainage biofilm
A wide diversity of bacteria, few community
archaeal species, and some fungi
and unicellular eukaryotes were
found. 90% of the orthologous groups were
detected with just 25 Mbp of raw sequence

lunes 7 de junio de 2010


Functional profiling of microbial communities.

These profiles clearly suggest that the predicted protein complement of a


community is similar to that of other communities whose environments of
origin pose similar metabolic demands.
lunes 7 de junio de 2010
-These profiles clearly suggest that the predicted protein complement of a
community is similar to that of other communities whose environments of origin
pose similar metabolic demands.

-Our results further support the hypothesis that the Bfunctional[ profile of a
community is influenced by its environment and that EGT data can be used to
develop fingerprints for particular environments.
Tringe*, von Mering* et al. Science 308(05)554

lunes 7 de junio de 2010


Specific
enrichments.

lunes 7 de junio de 2010


Protein function prediction in metagenomics samples

Blast
Known Unknown No Hit

lunes 7 de junio de 2010


Protein function prediction in metagenomics samples

Neighborhood

Next to Known Next to Unknown No Prediction


Neighbor-based predictions possible for almost 30% of 1.25 Mio ORFs studied
More than 75,000 would not have been possible by homology

lunes 7 de junio de 2010


Protein function prediction in metagenomics samples

Neighbor-based predictions possible for almost 30% of 1.25 Mio ORFs studied
More than 75,000 would not have been possible by homology
Overall function predictions for >75% of environmental data!
lunes 7 de junio de 2010
Our functional knowledge: glass half full or half empty?
Function prediction in gene families of 1.5Mio proteins from 4 environments

Our knowledge concentrates in


large, well established families
contributing 65% of the ORFs;
However, many specialized
functions in small gene families
are to be discovered

All against all, MCL clustering,


(60bits, inflation factor 1.1)

Harrington et al, PNAS Aug.2007

lunes 7 de junio de 2010


lunes 7 de junio de 2010
RNA interference:
listening to the sound of silence

lunes 7 de junio de 2010


776 genes are enriched in the ovary 98%

A large proportion was 429 no-letales 362 Sin fenotipo Over half the
required for either egg 322 letales 389 con fenotipo genes tested
production or ------------------ ---------------------- showed at least
embryogenesis 751 totales 751 totales one detect enable
phenotype

Most commonly
embryonic lethality
209 fenotipo post-embriónico
180 fenotipo pre-embriónico
67 post embriónico no letal ----------------------------------------
322 post embriónico esteril and letal 389 con fenotipo
------------------------------------
389 con fenotipo

lunes 7 de junio de 2010


Finding relationships between embryonic lethality, degree
of conservation, and extent of enriched expression in the
ovary

¿Los genes que participantes en la embriogenesis estan altamente conservados?

lunes 7 de junio de 2010


Correlation between RNAi Phenotypic
Classes and Expression Levels in the
Ovary for the Set of 751 Genes Tested

No-phenotype

Post-embryonic

Partial

Strong

lunes 7 de junio de 2010


Genes giving rise to embryonic lethal phenotype
were under-represented in the X chromosome

Expected based on # of genes


Expected based on distribution
Observed
lunes 7 de junio de 2010
lunes 7 de junio de 2010
lunes 7 de junio de 2010
1, unknown;
2, RNA transcription/modification;
3, chromatin/chromosome structure;
4, cell cycle control;
5, protein synthesis/folding/translocation/degradation;
6, energy metabolism;
7, Signal transduction/differentiation;
8, cytoskeleton associated or component;
9, Nuclear-cytoplasmic transport; a
10, chitin biosynthesis.

lunes 7 de junio de 2010


1, unknown;
2, RNA transcription/modification;
3, chromatin/chromosome structure;
4, cell cycle control;
5, protein synthesis/folding/translocation/degradation;
6, energy metabolism;
Phenoclusters
7, Signal transduction/differentiation;
8, cytoskeleton associated or component;
9, Nuclear-cytoplasmic transport; a
10, chitin biosynthesis.
Chromosome
No visible biology
Exagerated Defect in the Granular Or DNA
Few eggs asyncrony First 50 min cytoplasm replication

Genes predicte to function in Genes involved in mRNA procesing


chromosome biology or DNA replication
lunes 7 de junio de 2010
lunes 7 de junio de 2010
lunes 7 de junio de 2010
lunes 7 de junio de 2010
lunes 7 de junio de 2010
lunes 7 de junio de 2010
The resulting ‘phenoclusters’ can provide information about both the
involvement of genes in particular modules and the functional relationships
that might exist between them

lunes 7 de junio de 2010


lunes 7 de junio de 2010
Clustering algorithms were used to group genes with similar expression
profiles, and these groups were visualized as ‘mountains’ in a ‘topomap’.

lunes 7 de junio de 2010


Proteínas ampliamente distribuidas con función
variada relacionadas con respuesta a estres

lunes 7 de junio de 2010


lunes 7 de junio de 2010
DNA damage response (DDR) protein interaction map for C. elegans. Arrows represent yeast two-hybrid
(Y2H) interactions and nodes (circles) represent proteins. Blue and orange nodes indicate products of genes from the DNA
repair and checkpoint phenoclusters, respectively. mrt-2 and C04F12.3 belong to a common phenocluster, and their
products physically interact in the Y2H system
lunes 7 de junio de 2010
Damage recovery module.
The nodes represent proteins and the lines represent yeast two-hybrid (Y2H) interactions.
All of the six proteins were identified as required for methyl methanesulfonate (MMS)
resistance in phenotypic analyses

lunes 7 de junio de 2010


Core germline interactome map for C. elegans.

The nodes represent proteins and the lines


represent yeast two-hybrid (Y2H)
interactions. The interactome map
was integrated with transcriptome
and phenome data.
Red lines indicate:

a) interactions between proteins whose


corresponding genes have both

b) similar expression profiles and

c) overlapping RNA interference (RNAi)


phenotypes.

lunes 7 de junio de 2010


A systems biology strategy

Starting from one known component, more components involved in a module of interest can be
identified, for example, by interactome mapping. A network can be constructed to describe
these interactions. Perturbation experiments are then systematically performed and responses
from the rest of the network are recorded, for example, by transcriptome profiling.

lunes 7 de junio de 2010


lunes 7 de junio de 2010

You might also like