You are on page 1of 2

Preface RNA likely underpinned the emergence of life, yet it is arguably the least appreciated of

all biological mac-romolecules. For most of the past century, RNA has been regarded principally as an
intermediary between gene and protein. However, most of the human genome expresses RNAs that do not
encode proteins, which begs the question: why? The understanding of the functions of RNA and the answer to
this question are bound up with the history of molecular biology. The term ‘molecular biology’ was coined by
the mathematician Warren Weaver in 19381 and has become synonymous with the nature, transmis-sion and
manifestation of genetic information, and the structure of the molecules involved. The field had its roots in the
discovery of DNA in 1869 and the identifica-tion of proteins and their enzymatic activity in the late 19th
century, key events that gave birth to “the science of the chemistry of life”. Since then, proteins and DNA
have been the primary focus of studies of cellular and devel-opmental processes and the conceptions of
‘genes’, ‘gene expression’ and ‘gene regulation’. While it was clear early on that chromosomes are the
vehicles of inheritance, and contain DNA, RNA and pro-teins, for a long time it was thought that genetic
infor-mation is held in the proteins; nucleic acids seemed too simple. In the 1940s, however, it was shown that
DNA is the reservoir of genetic information, although it took some time for this finding to be accepted. The
connection between DNA and protein produc-tion was solved by the convergence of genetics and bio-
chemistry, mainly in experimentally amenable bacteria and fungi, which led to the breathtaking advances that
elucidated the role of ‘messenger’ RNA (mRNA) and the ‘genetic code’ in the 1950s and 1960s. The
assumption that genetic information is mostly transacted by proteins (‘one gene – one enzyme’), with RNA a
transient inter-mediate, became entrenched, reflecting the mechanical zeitgeist of the age. This assumption led
to many subsidiary assumptions about the nature of genetic information, and the conclu-sion that most of the
genomes of plants and animals are junk, based on theoretical considerations of mutational load and the
finding that protein-coding sequences occupy only a small fraction of animal and plant genomes. The naivety
of this conclusion and its super-ficial support by intrinsically circular assessments of the ‘neutral evolution’ of
‘non-coding’ sequences in genomes were rarely challenged. There were other assumptions as well, including
that heritable information is not transmitted from somatic cells to reproductive cells. This assertion, supported
by a peculiar 1868 experiment involving amputation of mice tails, accompanied the so-called Modern
Synthesis in the 1930s, which reconciled Mendelian genetics with Darwinian evolution and ruled out
Lamarckian evolu-tion, to buttress the belief that ‘mutations’ are random. Undoubtedly the biggest surprises
in the history of molecular biology were the discoveries in the 1960s and 1970s that plant and animal
genomes are replete with ‘repetitive elements’ and that their genes are mosaics of fragmented protein-coding
and flanking regulatory mRNA sequences (‘exons’) separated by extensive tracts of intervening sequences
(‘introns’), which are subsequently removed from the primary transcripts by splicing. It was immediately and
almost universally assumed that introns are evolutionary relics colonized by ‘selfish’ genetic hobos and that
the excised intronic RNA is simply degraded. Also unexpected were the discoveries in the 1980s that RNA
has catalytic capacity and, at the turn of the century, that the number of protein-coding genes in humans is
similar to that in nematode worms that only have ~1,000 cells. By contrast, the extent of intronic and
‘intergenic’ non-protein-coding DNA sequences was found to increase with developmental complexity, rising
to ~98% in humans and other mammals. High-throughput expression studies revealed that these ‘non-coding’
sequences are transcribed in spa-tially restricted patterns to produce hundreds of thou-sands of RNAs that do
not encode proteins. Many of these RNAs were subsequently shown to have regulatory and organizational
functions during differentiation and development. Here we provide an account of the development of
molecular biology from the 19th century to the present. We pay particular attention to the history of the
under-standing of RNA, which has been neglected. We also discuss the founder fallacies – where initial
interpreta-tions of limited data were generalized, became orthodox explanations and then articles of faith. Our
central theme is that the extrapolation of bacterial genetics to complex organisms, compounded by
expectational, ascertain-ment and interpretative biases, has led to a linked series xii Preface of false
dichotomies and the misunderstanding of roles of RNA in the transmission of genetic information. The
subsidiary theme is the clumsy progress of science. This book focuses on RNA as the main player in cell and
developmental biology, but also on chromatin composition and regulatory logic. While most educated in the
pre-genomic era were taught that gene regulation is primarily carried out by proteins, this became hard to
reconcile with the finding that genes encoding regulatory RNAs vastly outnumber protein-coding genes in
humans, and the demonstrations of widespread sequence-specific guidance of effector proteins by RNAs. The
simplicity and logic of base-pairing for sense-antisense target recognition and the ability of RNA to form
complex three- dimensional structures are almost as old as the double helix itself. The existence of regulatory
RNAs was hinted during the early period of molecular biology by genetic observations in fruit flies and
maize, and by the appearance of unexplained bands in biochemical fractionations, but these were treated as
oddities or interpreted through the lens of transcription factors, until the genome projects revealed the full
extent of RNA expression in plants and animals. We highlight the pioneers and controversies that
accompanied the many unexpected observations, with particular attention to those that challenged the
prevailing consensus, often ignored, at least at first. The book spans the early confusion about the functions of
proteins and nucleic acids, the elucidation of the double helix and the ‘genetic code’, the premature relegation
of RNA to intermediary between gene and protein, the strange genomes and genetics of plants and animals,
and the misguided musings that underpinned the idea of junk DNA. We chronicle the spectacular advances
brought by gene cloning and genome sequencing, the small and large regulatory RNA revolutions, and the
slowly dawning realization of the central role of transposon-derived sequences, intrinsically disordered
proteins, ‘enhancers’ and RNA-directed epigenetic processes in multicellular development, which we have
tried to integrate into a new framework for understanding genetic programming. We have cited original
references where possible, to give credit to the work of others and to provide the evidence for our assertions
and conclusions, especially in relation to the findings of the last two decades. We have also included
extensive footnotes that add detail and can be skipped, as well as suggestions for further reading. While the
story is still unfolding, we conclude that the genomes of humans and other complex organisms are not full of
junk but rather are highly compact information suites that are largely devoted to the specification of
regulatory RNAs. These RNAs drive the trajectories of differentiation and development, underpin brain
function and convey transgenerational memory of experience, much of it contrary to long-held conceptions of
genetic programming and the dogmas of evolutionary theory.

You might also like