Professional Documents
Culture Documents
B I O C H E M I ST R Y
Journey to
the Genetic
Interior
What was once known as junk DNA
turns out to hold hidden treasures,
says computational biologist Ewan Birney
Interview by Stephen S. Hall
I
n the 1970s, when biologists first glimpsed the landscape of human
genes, they saw that the small pieces of DNA that coded for proteins
(known as exons) seemed to float like bits of wood in a sea of genetic
gibberish. What on earth were those billions of other letters of DNA
there for? No less a molecular luminary than Francis Crick, co-discov-
erer of DNA’s double-helical structure, suspected it was “little better
than junk.”
IN BRIEF
The phrase “junk DNA” has haunted non-gene” parts of the human genome.
who
human genetics ever since. In 2000, Known as the Encyclopedia of DNA Ele-
EWAN BIRNEY when scientists of the Human Genome ments (ENCODE for short), the project
vocation/avocation Project presented the first rough draft of required scientists, in essence, to crawl
“ Cat herder in chief” of the ENCODE the sequence of bases, or code letters, in along the length of the double helix as
consortium of 400 geneticists from
human DNA, the initial results appeared they attempted to identify anything with
around the world
to confirm that the vast majority of the a biological purpose. In 2007 the group
where
sequence—perhaps 97 percent of its 3.2 published a preliminary report hinting
European Bioinformatics Institute,
Cambridge, England billion bases—had no apparent function. that, like the stuff all of us park in the at-
The “Book of Life,” in other words, looked tic, there were indeed treasures aplenty
research focus
like a heavily padded text. amid the so-called junk.
Creating an encyclopedia detailing
what the most mysterious parts of But beginning roughly at that same Now, in a series of papers published in
the human genome do time, a consortium of dozens of interna- September in Nature (Scientific Ameri-
big picture
tional laboratories embarked on a mas- can is part of Nature Publishing Group)
“I get this strong feeling that previously sive, unglamorous and largely unnoticed and elsewhere, the ENCODE group has
I was ignorant of my own ignorance, project to annotate what one biologist produced a stunning inventory of previ-
and now I understand my ignorance.” has called the “humble, unpretentious ously hidden switches, signals and sign-
posts embedded like runes throughout were discovered in the 1970s. I am now instead, that as much as 80 percent
the entire length of human DNA. In the convinced that it’s just not a very useful of the genome may be functional?
process, the ENCODE project is reinvent- way of describing what’s going on. One can use the ENCODE data and come
ing the vocabulary with which biologists up with a number between 9 and 80 per-
study, discuss and understand human in- What is one surprise you have had cent, which is obviously a very big range.
heritance and disease. from the “junk”? What’s going on there? Just to step back,
Ewan Birney, 39, of the European Bio- There has been a lot of debate, inside of the DNA inside of our cells is wrapped
informatics Institute in Cambridge, Eng- ENCODE and outside of the project, about around various proteins, most of them
land, led the analysis by the more than whether or not the results from our ex- histones, which generally work to keep
400 ENCODE scientists who annotated periments describe something that is real- everything kind of safe and happy. But
the genome. He recently spoke with Sci- ly going on in nature. And then there was there are other types of proteins called
entific American about the major find- a rather more philosophical question, transcription factors, and they have spe-
ings. Excerpts follow. which is whether it matters. In other cific interactions with DNA. A transcrip-
words, these things may biochemically oc- tion factor will bind only at 1,000 places,
Scientific American: The ENCODE cur, but evolution, as it were, or our body or maybe the biggest bind is at 50,000
project has revealed a landscape that doesn’t actually care. specific places across the genome. And so,
is absolutely teeming with important That debate has been running since when we talk about this 9 percent, we’re
genetic elements—a landscape that 2003. And then work by ourselves, but really talking about these very specific
used to be dismissed as “junk DNA.” also work outside of the consortium, has transcription-factor-to-DNA contacts.
Were our old views of how the genome made it much clearer that the evolution- On the other hand, the copying of
is organized too simplistic? ary rules for regulatory elements are dif- DNA into RNA seems to happen all the
birney: People always knew there was ferent from those for protein-coding ele- time—about 80 percent of the genome is
more there than protein-coding genes. It ments. Basically the regulatory elements actually transcribed. And there is still a
was always clear that there was regula- turn over a lot faster. So whereas if you raging debate about whether this large
tion. What we didn’t know was just quite find a particular protein-coding gene in amount of transcription is a background
how extensive this was. a human, you’re going to find nearly the process that’s not terribly important or
Just to give you a sense here, about 1.2 same gene in a mouse most of the time, whether the RNA that is being made ac-
percent of the bases are in protein-coding and that rule just doesn’t work for regu- tually does something that we don’t yet
exons. And people speculated that “may- latory elements. know about.
be there’s the same amount again in- Personally, I think everything that is
volved in regulation or maybe a little bit In other words, there is more being transcribed is worth further explo-
more.” But even if we take quite a conser- complex regulation of genes, and ration, and that’s one of the tasks that we
vative view from our ENCODE data, we more rapid evolution of these will have to tackle in the future.
end up with something like 8 to 9 percent regulatory elements, in humans?
of the bases of the genome involved in Absolutely. There is a widespread perception
doing something like regulation. that the attempts to identify common
That’s a rather different way of think- genetic variants related to human
Thus, much more of the genome is ing about genes—and evolution. disease through so-called genome-
devoted to regulating genes than to I get this strong feeling that previously I wide association studies, or GWAS,
the protein-coding genes themselves? was ignorant of my own ignorance, and have not revealed that much. Indeed,
And that 9 percent can’t be the whole now I understand my ignorance. It’s the ENCODE results now show that
story. The most aggressive view of the slightly depressing as you realize how ig- about 75 percent of the DNA regions
amount we’ve sampled is 50 percent. So norant you are. But this is progress. The that the GWAS have previously linked
certainly it’s going to go above 9 percent, first step in understanding these things is to disease lie nowhere near protein-
and one could easily argue for some- having a list of things that one has to un- coding genes. In terms of disease,
thing like 20 percent. That’s not an un- derstand, and that’s what we’ve got here. have we been wrong to focus on muta-
feasible number. tions in protein-coding DNA?
Earlier studies suggested that only, Genome-wide association studies are
Should we be retiring the phrase say, 3 to 15 percent of the genome had very interesting, but they are not some
“junk DNA” now? functional significance—that is, actu- magic bullet for medicine. The GWAS sit-
Yes, I really think this phrase does need ally did something, whether coding uation had everyone sort of scratching
to be totally expunged from the lexicon. for proteins, regulating how the genes their heads. But when we put these genet-
It was a slightly throwaway phrase to de- worked or doing something else. Am ic associations alongside the ENCODE
scribe very interesting phenomena that I right that the ENCODE data imply, data, we saw that although the loci are
MORE TO EXPLORE
I think that each time we always said, ing science. I am only one of 400 inves
T he ENCODE project: Encyclopedia of DNA Elements:
“These are foundations. You build on tigators, and I am the person who is www.genome.gov/10005107
them.” Nobody said, “Look, the human charged to make sure that the analysis
SCIENTIFIC AMERICAN ONLINE
genome bases, that’s it. It’s all done and was delivered and that it all worked out. Discover more about DNA at
dusted—we’ve just got a bit of code But I had to draw on the talents of many, ScientificAmerican.com/oct2012/genes
breaking to do here.” Everybody said, many people.