You are on page 1of 7

Features Dark genome

I
n 1972, geneticist Susumu Ohno “When you first think about genetics A manifold blueprint
coined the term “junk DNA” to 15-20 years ago, the goal was simply to DNA is made up of four different molecules
describe every component of the understand the code—the code as it related called nucleotides, paired and bound together
human genome that was not a gene. to genes, gene expression, and the produc- to form the two anti-parallel twisting threads
Suspicious of the assumption that tion of proteins,” says Gary Karpen, a senior of the double helix. Some segments of DNA
all three billion base pairs of human DNA staff scientist in the Life Sciences Division are known as genes, meaning that their
were functionally important, Ohno wrote, of Lawrence Berkeley National Laboratory nucleotides will be transcribed into a slightly
“Triumphs as well as failures of nature’s past (LBL). “But then it became clear that the different chemical form called RNA. A spe-
experiments appear to be contained in our code was simply not enough.” Karpen and cific type of this RNA—called messenger
genome.” Nearly a decade later, Francis Crick a team of over 150 other scientists have just RNA, or mRNA—will then leave the nucleus
and Leslie Orgel published a review in Nature completed an ambitious project whose aims to serve as a template for synthesis of the
entitled “Selfish DNA: the ultimate parasite,” were, according to Karpen, “the next level up” protein building blocks that carry out our
arguing that most DNA in higher organisms from straight code—at the level of mapping cellular processes. Proteins not only make up
was, similarly, “little better than junk.” function in the dark genome. What is emerg- the structural framework of our cells, they
For many years, the idea that the genome ing is a far better idea of the importance of also catalyze most of the chemical reactions
was divided cleanly into two categories— this largely unexplored genetic landscape, a that make cells work.
short stretches of genes interspersed among picture of DNA as a dynamic template for life. Yet all cells, from kidney cells to neurons
long spans of junk—was a widely accepted to muscle cells, possess exactly the same copy
view. But by the early 1990s, the concept had The birth of modENCODE of DNA. In its entirety, DNA exists only as a
begun to grow stale. Geneticists were gradu- The project, called the model organism template from which an immense number of
ally uncovering more and more functionally Encyclopedia of DNA Elements (modEN- readouts can occur; not all genes are expressed
significant roles within the “junk” regions, CODE), was born out of a sister initiative at all times in all cells, and it is precisely this
and the very definition of a gene itself was launched in 2003 called ENCODE, which capacity for different combinations of expres-
beginning to change. Nevertheless, when aimed to catalog the complete “parts list” of sion that allows for the astonishing diversity
the full sequence of the human genome was the entire human genome. The pilot phase of of our cellular processes. Geneticists are still
finally published in 2004, many people were ENCODE centered on annotating only one unclear exactly how these highly ordered pat-
shocked to discover just how few genes our percent of human DNA, but the complex- terns of gene expression are achieved. The
DNA actually contains. Representing only ity of the human genome and the limits of answer may lie in the dark genome.
two percent of the entire genome, genes were technology at the time necessitated a slight
vastly outnumbered by mysterious non- shift in focus. From base to function
coding regions. But if this “dark genome” Thus, in 2007 the National Human The architects of the modENCODE project
really wasn’t junk, what could it all be doing? Genome Research Institute (NHGRI) sought to chip away at this question by first
launched modENCODE as a parallel effort assembling a map. By annotating the func-
involving two simpler subjects: the round- tion of every base of DNA in the two model
worm Caenorhabditis elegans and the fruit organisms, they hoped to gain some insight
fly Drosophila melanogaster. The four-year, into how transcription is regulated across cell
$57 million project hoped to identify, if types and throughout development.
possible, the functional role of every They analyzed function along two
base in the worm and fruit fly genomes. broad sets of factors. The first set, referred
These two model organisms represent to as “functional elements,” include small
far better understood genetic systems proteins that regulate transcription, as well
than the human genome and, at 100 as non-coding RNAs (ncRNAs) that help
and 180 million base pairs each, far to regulate gene expression after transcrip-
more feasible approaches to the tion but before protein synthesis. The second
genome-wide analysis NHGRI set, known as epigenetic elements, are not
aimed to achieve. The hope was contained in the sequence of DNA itself, but
that ultimately modENCODE include chemical marks on the surface of
could serve as an extended DNA that physically influence what regions
pilot for the entire human of the genome are silent or active. Over 50
ENCODE project, helping us participating labs around the world analyzed
better understand how it is that specific types of functional or epigenetic
complex, three-dimensional elements in one of the two model organisms
organisms arise out of linear to assemble a topographical map of function
strands of DNA. along the linear DNA sequence.
Features Dark genome

In alternative splicing, a single gene can be read in multiple ways to produce different proteins. After transcription occurs (step 1),
distinct segments of the RNA called introns (gray) are removed by cuts made on both sides at locations called splice junctions. The
remaining RNA (colored) can then be reconnected to form different strands of mRNA (step 2). The different mRNAs will then serve
as templates for the synthesis of different proteins (step 3).

The transcriptome called reverse transcription, giving them the for several different proteins, based on the
“We wanted to crack the code to discover coding DNA, or cDNA, for the original set different possible patterns of cutting and
the rules required to read a genome—any of RNA fragments. They then sequence the pasting.
genome,” says Susan Celniker, head of the cDNA and align it with the original genome The discovery of the vast number of
Department of Genome Dynamics at LBL sequence to map the transcriptome. previously unidentified splice junctions
who, along with Karpen, was one of the Celniker’s group generated almost six and new transcripts gives us a far better idea
senior principal investigators for modEN- thousand-fold coverage of the previously of the sheer quantity of potential protein
CODE. Her lab was on the Drosophila team annotated f ly transcriptome. Combing products in each cell. Insight into an addi-
and was responsible for mapping out the through their RNA-seq data, they identi- tional layer function, however, is provided
entire transcriptome—all of the sequences fied nearly two thousand new transcribed by the identification of the new non-coding
of DNA that are transcribed into RNA. regions that had been missed in previous RNAs, many of which are involved in splicing
Counting both coding and non-coding annotations. These new regions include events, promoting or repressing transcrip-
RNAs, the transcriptome comprises about sequences that encode small proteins, as tion, or silencing mRNAs to finely control
60 percent of the fly genome. In order to well as small non-coding RNAs that par- levels of protein synthesis. The overlapping
screen such vast amounts of RNA with ticipate in the regulatory machinery that output of these two mechanisms—variety
single-base resolution, Celniker’s group help control gene expression and protein of combinations within transcripts and an
used a high-throughput technique known production. In perhaps their highest-impact intricate regulatory machinery—is crucial
as RNA-seq. Investigators isolate the more finding, Celniker’s group identified over to understanding our genome’s differential
than 25 million scattered fragments of RNA 22,000 new splice junctions—areas where, workings from cell to cell.
that have been transcribed from DNA. After after transcription, distinct chunks of Illustrating this, Celniker’s group then
making some chemical modifications that transcripts can be cut out, allowing for dif- carried out comparisons across 27 distinct
allow sequencing to occur, they convert ferent combinations of mRNA. Alternative developmental stages as well as between
the RNA back to DNA through a process splicing thus allows a single gene to code the sexes. Interestingly, they found that the

Spring 2011 Berkeley Science Review 17


To fit inside each individual cell, DNA must be condensed and packaged into fibers called
chromatin. The double-stranded helical DNA first wraps around clusters of proteins called
histones. The histones are arranged along the DNA like beads on a string, allowing the
histone-DNA spools to coil, fold, and loop around themselves. The final product is the
tightly packed fiber of chromatin, organized into distinct sets of chromosomes.

18 Berkeley Science Review Spring 2011


Features Dark genome

number of expressed genes increases from resulting in silencing or activation of the functional elements control gene expression
around 7,000 in embryonic flies to around DNA in the tagged region of chromatin. at the level of DNA and RNA, transcription
12,000 in adults. They also analyzed changes Histone modifications are one of several and protein synthesis, epigenetic elements
in expression patterns of specific genes across types of epigenetic mechanisms that influ- allow for yet another route of cell diver-
development, finding genes that are highly ence gene expression. They are not encoded gence—one that occurs above the level of
upregulated in the larval developmental within the genome; rather, they impact the DNA sequence. “This is really the level of
stages and then essentially shut off as the readout of DNA through changes to the pro- dynamic genomics,” Karpen says. “I have to
fly matures. Between the sexes, they noted tein components of chromatin. These epigen- say, I just find the fact that we know so little
incredibly exciting.”

“We can only assume that the rules are there and From map to model
keep looking. But the reproducibility of biology Once the individual research groups had
all assembled their final data, Drosophila
tells us that these rules must exist.” modENCODE had over 700 datasets profil-
-Manolis Kellis, modENCODE computational biologist ing transcripts, histone modifications, and
replication programs. Karpen, Celniker, and
that adult males express around 3,000 more etic changes are also heritable, meaning the the rest of the Drosophila team then submit-
genes than their female counterparts. The modifications are passed along through cell ted their finished datasets to Manolis Kellis,
functions of all of these genes are not yet divisions and can lead to unique signatures head of the Computational Biology Group at
known, but they are all clearly implicated in amongst different cell types. Massachusetts Institute of Technology. Kellis
development—both across time and between “Most of the time, people have studied headed the modENCODE Data Analysis
sexes. Celniker hopes that her group’s identi- these histone modifications in isolation,” Center, which took all of the finished data
fication of the genes will spark more targeted says Karpen. “But what we were interested and integrated it into a coherent story, creat-
research in the Drosophila community. in is how they work in combination.” Using ing the predictive and comparative genom-
“For me,” says Celniker, “the project will a method called chromatin immunopre- ics models that the consortium hopes will
not be over until I know exactly how a single cipitation (ChIP) and high-throughput eventually help shed light on parallels in the
cell with its single copy of DNA turns into sequencing, Karpen and his group were human genome.
a complex organism like the fly. We’re not able to identify chromatin marks associ- “The biggest question we asked ourselves
there yet, but we’re certainly assembling the ated with various regions of the fly genome. was, how do we go beyond simple annota-
building blocks.” By looking at different combinations of 18 tion? How do we compare all these datas-
specific chromatin marks, they delineated ets together to reveal new insights?” says
The chromatin landscape about 30 distinct chromatin states correlated Kellis. To do so, Kellis and his group at MIT
With 100 and 180 million base pairs even with the position of genes and their levels attempted to reconstruct the full regulatory
in organisms as “simple” as the worm and of expression. These states included highly network of the fly from the pooled datasets.
the fruit fly, each copy of DNA is simply too predictable associations with transcrip- To assess the completeness of their
long to exist as a linear molecule in a tiny cell. tion start sites, gene length, silent or active reconstructed model, Kellis’s Data Analysis
Instead, it is condensed and packaged into regions, and even gene function. “There’s Center attempted to predict
chromosome pairs—the worm has six and an issue here with cause and effect,” Karpen gene expression
the fruit fly has four, while humans have 23. says. “It’s not just the type of modification
Chromosomes are made up of chromatin, that’s important, but where the modification
which consists of DNA wrapped around clus- is, which histone, which amino acid in that
ters of tiny proteins called histones, arranged histone, what recognizes that modification,
along the DNA like beads on a string. These what other proteins are brought in—there’s
histone-DNA spools then supercoil around a lot of complexity.”
themselves in meandering loops and folds, Karpen stresses that this is just the
finally forming the tightly-packed structure beginning of this type of broader analysis
of chromatin. of chromatin marks; although they
Karpen and his lab at LBL study what thoroughly characterized 18
is called the “chromatin landscape” of the histone modifications, hun-
fruit fly—the hundreds of chemical tags that dreds remain. Regardless,
can be added to histones to ultimately affect Karpen’s work adds
levels of transcription. The modifications another topographical
are then recognized by the cellular machin- layer to the genomic
ery that respond to these chemical signals, landscape. While

Spring 2011 Berkeley Science Review 19


Features Dark genome

Histone modifications are one of many cellular mechanisms that work to control gene
expression. Possessing long amino acid tails (yellow), histones can be “tagged” with chemical
modifications (red). These tags are then recognized by other cellular machinery that can
work to silence or activate the DNA in that region. Histone modifications are a type of
epigenetic mechanism, meaning they are heritable but not encoded directly in the genome.

levels based solely on the expression levels of The future of ENCODE cell with a single copy of DNA becomes a
their regulators. Looking across numerous The original draft of the human ENCODE complex living and breathing organism. The
developmental stages and cell lines, Kellis’s stated that the project would proceed in three Drosophila and C. elegans genomes have been
group was able to successfully predict over 60 stages: a pilot phase, a technology develop- “mapped,” but it’s really only the faint outlines
percent of gene expression patterns in about ment phase, and a production phase. Now of function that have emerged—we do not
a quarter of the cell lines studied. that modENCODE is complete and the yet know the intricate mechanisms by which
These are only very preliminary models, methodologies are finally tested and refined, each of the elements work, let alone their very
Kellis says, and predicting the expression all that remains for ENCODE is the mas- specific targets. “The modENCODE project
patterns of an entire genome remains sive production phase. “There’s been a lot of was really just interested in providing a start-
an enormously complex problem. For thinking about how to go about systematically ing map—the equivalent of the first explorers
modENCODE’s first round of predictive understanding the human genome, and it’s coming to the New World,” says Karpen. “We
modeling, for example, the group was only out of those conversations that modENCODE need large-scale projects like this to provide
able to incorporate a certain subset of pre- emerged,” says Kellis. The task is no less gar- the kind of foundational knowledge that
transcriptional functional elements whose gantuan, but with the technology and frame- allows the more intricate mechanisms to be
targets are already well-established. As more work finally in place, a completed human worked out from there.” A complete under-
and more of the targets of the newly mapped ENCODE may only be a few years away. standing of life’s genetic computations may
regions are characterized, Kellis and others With the modENCODE papers now be far off, but we now have the first maps to
in the computational field will be able to cast published, more than 80 percent of the fruit guide us. The dark genome is getting lighter
opposite: marek Jakubowski

a wider net to tease out the underlying logic fly genome is annotated and fully available to and lighter.
of genomics. “We can only assume that the the public—up from about 25 percent before
rules are there and keep looking,” says Kellis. the project began. Yet though the consortium
“But the reproducibility of biology tells us that has assembled an impressively huge dataset, Azeen Ghorayshi is a research technician in
these rules must exist.” we are still unable to trace exactly how a single molecular and cellular biology.

20 Berkeley Science Review Spring 2011

You might also like