Genome Organization:
Human
David H Kass, fosem stcioon Unies, Yon Micpon USA
Mark A Batzer, Louson Ste untersty Heath Scenes Cnt, New Ono Lousiana, USA
The human mvc
chromosomes, of DNA molecules. There a
chromosomal ents,
sequences and spacer DNA,
Introduction
genome contains about 3000 million
A. of which only an estimated 1. 5
As shown in Fig:
of the eukaryotic genome can be
which are fu
al sequences
& 4
Supergene ag Cene tis
é =
wo OE
Figure 1
genome isa highly complex arrangement of two set of 23
\arious types of DNA sequences and
luding single-copy protein-encoding genes, rep
Advanced article
Soe
oi 10.1038/np es 003885
such as the ribosomal RNA genes. Repetitive sequences
‘with no known function include the various highly repeat-
cd satelite families, and the dispersed, moderately repeat
ced transposable element families. The remainder of the
jome consists of spacer DNA. whic!
category of undefined DNA sequenc
‘The human nuclear gencme consists of 23 pairs of chro-
mosomes, or 46 DNA molecules, of differing sizes (Table 1)
is simply a broud
fp
Inespered
Highly repeated
ee Wen
eae)
wt fw “we oo
road classification of DNA sequence inthe human genome, MER madlum eteration frequency epetive sequence
NATURE ENCYCLOPEDIA OF LFESCENCES /€ 200 aie Publishing Croup / woe
an 5Table 1 DNA content of chromosomes, extrapolated from Stephens et al. (1990), based on the length of chromosome as &
percentage ofthe total length of the genome
+ Chromosome Amount of DNA(Mb) Chromosome ‘Amount of DNA (Mb)
T 249 3 108
2 237 4 105
3 192 1s 9
4 183 16 8
5 m4 n Br
6 16s Is 15
7 153 6 6
8 Bs 0 8
9 132 21 s4
10 132 2 37
nL 132, x lal
2 3 y a
Sequence Complexity
‘The human genome contains various levels of complexity
as demonstrated by reassociation kinetics. This involves
the random shearing of DNA into small fragments aver~
aging about S00bp, heat denaturation to separate the
strands of the double helix, and slow cooling. Duringcool-
ing, complementary sequences anneal; the more copies
there are ofa particular sequence, the greater the chance of
finding a complement to anneal to. Therefore, the reasso-
ciation is dependent on time (i), as well as the initial con-
centration of that sequence (Co) yielding whatisreferred to
as a Cof value. Analysis of the human genome estimates
that 60% of the DNA is either single copy or in very low
copies; 30% of the DNA is moderately repetitive; and 10%
is considered highly repetitive.
Various staining techniques demonstrate alternative
banding patterns of mitotic chromosomes referred to as
karyograms, Although the three broad classes of DNA are
seaitered throughout the chromosome, chromosomal
banding patterns reflect levels of compartmentalization
of the DNA. Using the C-banding technique yields dark-
staining regions ofthe chromosome (or C bands), referred
to as heterochromatin. These regions are highly coiled,
contain highly repetitive DNA and are typically found at
the centromeres, telomeres and on the Y chromosome.
‘They are composed of long arrays of tandem repeats and
therefore some may contain a nucleotide composition that
differs significantly from the remainder of the genome
(approximately 40-42% GC). This means that they can be
Separated from the bulk of the genome by buoyant density
(caesium chloride) gradient centrifugation. Gradient cen
‘wifugation results in a major band and three minor bands
referred to as satellite bands, hence the term satellite DNA.
The G-banding technique yields a pattern of alternating
light and dark bands reflecting variations in base compo~
n, time of replication, chromatin conformation and
the density of genes and repetitive sequences. Therefore,
the karyoarams define chemosomal organization and al
low for identification ofthe different chromosomes. The
darker bands, or G bands are comparatively more con-
densed, more AT-rch, lest generic and replicate later
than the DNA within the pele bands, which correspond to
the R bands by an allemative staining technique. More
recently these alternative bending patterns have been cor-
related 10 the eve! of compaction of seafTold-attachment
regions (SARs).
“The human genome may aso be compartmentalized ine
to lage segments of DNA with distinctive GC richness
refered to 35 “GC content domains’ (Lander et, 2001).
There is a distinct asociaton between GC-richness and
senedensity. Thisiseonsstent with he association of most
fenes with CpG islands, the 500-1000-bp GC-rich sea
tents flanking (osbaly atthe 5” end) most housekeeping
and many tissuespecfc eves. The clustering of CpG is
lands, as demonstrated by fluorescence in situ hybridiza-
tion (Craigand Bickmore, 1994), further depicts gene-poot
and gene-ich chromosomna. segments
‘Adsitionally, there are Fve human chromosomes (13,
14, 15, 21,22) distinguished at their terminus by a thin
bridge with rounded ends erred to as chromosomal sat.
eles, These contain repeats of genes coding for :RNA
and ribosomal proteins thateoalese to fori he nucleolus
and are known a the niceslar organizing regions
Single-copy Sequences
Although originally definec as a functional unit of hered.
ity. a gene may be defined asan expressed segment of DNA,
containing transeriptional regulatory seguenves. Venter oy
al (2001) estimated that there are 26 588 protein-encodin,
transcripts with an additional 12 000 computational ae
ly de-
2 NATURE ENCYCLOPEDI OF UF SCENCES 7004 Nae Ping Coup woea NAIC
ved genes. Thi is consistet with the 30 000-40 00 s-
timate ofthe ilernational human Benome serene
consortium (Lander etal, 2001) Tn adalon, ever 200
roncoding RNA genes have ben rented with ovr 5000
rated genes, of which most are poeldogenes ac om).
‘The proportion ofthe genome consisungof genes Would
be extmated at 15-20% assuming an averagegene az of
Iskb. However, approximately 90% of the DNA on
protein-coding genesare noncoding, neluing upsteam
{nd downstream regulatory sequeness andinttons There
fre. ony 1.5-2% (45-60 Mb) of DNA has coding fans
tion Regulatory Sequences inclode common promoters
recognized by transcription factors located at spec up.
Sueam distances from the anseripton start a These
sequences include the TATA, CCAATT and GC bores
There are also tsuespecie promoter sequences, Ev
hancers and sitnces ar clractng elements fat fancton
and that uprgulate and downregulate pene apiece
respectively, Many coding sequenees may be sneded oe
members of gene families as described below Ina
thee may besinge-opySequencesin the spaces DNA
no known (determinable) function. *
Repetitive Sequences
In the human genome various sized stretches of DNA se-
guences exist in variable copy numbers. These repetitive
sequences may be ina tandem orientation andjor dispersed
throughout the genome. Repetitive sequences may be clas-
sified by function, dispersal patterns and sequence relat-
edness. Satellite DNA typically refers to highly repetitive
sequences with no known function; gene families are DNA
sequences, with at least one functional gene, related by
sequence homology and/or function; and interspersed te:
peat sequences are typically the products of transposable
clement integrations, but may include retropseudogenes of
functional gene.
Macrosatellites, Minisatellites and
Microsatellites
Macrosatellites are very long arrays, up to hundreds of
kilobases, of tandemly repeated DNA. The three satelite
bands observed by buoyant density centrifugation repre-
sent sections of the human genome containing highly re-
peated DNA that in effect alter the proportion of
hucleotides from the rest of the genome. However, not all
satellite sequences are resolved by density gradient cen-
triugation, Alpha satelite DNA or alphoid DNA consti-
tutes the bulk of centromeric heterochromatin on all
chiromosomes. The interchromosomal divergence of the
alpha satellite families allows the different chromosomes to
be distinguished by fluorescence im situ hybridization
(FISH).
Minisatelltes are tandemly repeated sequences of DNA,
yielding a total length from less than 1 kbp to ISkbp. One
Subset of minisatellites comprises the highly polymorphic
arrays of short tandem repeats with no known function
that serve as useful DNA markers referred to a8 variable
number tandem repeats (VNTRs). These sequences gen-
crally contain I-Skbp of DNA of repeating units of 15~
100 nucleotides. Several minisatellites share enough se
{quence homology to be analysed by a single probe yielding
DNA fingerprints. An example is a 10-15-bp core se-
‘quence of myoglobin minisatellites, which includes an al-
most invariant core sequence (GGGCAGGANG) among
several polymorphic VNTR loci
Telomeric DNA sequences contain another subset of
minisatellites. The telomeric sequences contain 10-15kb
of hexanucleotide repeats, most commonly TTAGGG in
the human genome, at the termini of the chromosomes.
‘These sequences are added by telomerase to ensure com-
plete replication of the chromosome. Telomeres of somatic
cells are generally shorter than in germ cells, illustrated by
their decreasing size within human B cells and skin cells
with increasing age. In humans, ithas been postulated that
telomeric loss is associated with ageing and tumorigenesis.
Microsatelites are small arrays of short simple tandem
repeats, primarily 4 bp or ess, Different arrays are found
dispersed throughout the genome, although dinucleotide
CAJTG repeats are most common, yielding 0.5% of the
genome. Runs of As and Ts are common as well. Micro-
satellites haveno known functions. However, CA/TG dit
cleotide pairs can form theZ-DNA conformation in vitro,
which is possibly indicative of function. Repeat unit copy
number variation of microsatellites apparently occurs by
replication slippage yieldiag highly polymorphic DNA
markers referred to as short tandem repeat polymorphisms
(STRPs), STRPs are commonly used in commercial DNA
fingerprinting kits. The expansion of trinucleotide repeats
within genes has been associated with genetic disorders
such as Huntington disease, myotonic muscular dystro-
phy, Friedreich ataxia and fragile-X syndrome.
Gene Families
Gene families generally consist of a set of genes with high
sequence homology over their entire length primarily in
the exons for protein-encoding gene families. Members of
gene families, or possibly separate clusters ofthe same gene
family, are considered paralogous, and are derived from an
ancestral gene or locus by duplication, and are therefore
evolutionarily and functionally related. The duplication of
‘1 gene, however, may yield a nonfunctional pseudogene
(see below). Additionally, there are genes yielding products
NATURE ENCYCLOPEDIA OF LIFE SCIENCES 62004 Nature Pblshing Group net 3WAS
with weak overall sequence homology, but that are ho-
mologousat functionally conserved domains or short ami
ro acid motifs, collectively forming an additional type
gene family. A group of genes with functionally and struc
turally related products with weak sequence homologies
land lacking conserved amino acid motls may be referred
to asa gene superfamily. Only a limited number of exam-
ples will be discussed
Gene families with essentially identical
products
Ifthe cell warrants numerous proteins or RNA molecules,
fone solution might be the production of multiple func:
tional copies ofa gene. The human genome,and eukaryotic
genomes in general, have amplified a number of genes
whose products are responsible for general purpose func-
tions such as DNA replication and protein synthesis,
Histone genes are highly conserved among eukaryotes
and have a fundamental role in chromatin structure. The
histone family consists of five genes that tend to be linked,
although in differing arrays of variable copy numbers dis
persed in the human genome. The individual genes of a
particular histone family encode essentially identical prod-
ucts (ie. H4 genes yield the same H protein). Analysis of
individual human genomic clones has identified isolated
histone genes (e.g. Hi), clusters of two or more histone
genes, or clusters ofall histone genes (eg. H3-H4-H1-H3.
H2A-H2B) (Hentschel and Birnstiel, 1981). A majority of
histone genes forma large cluster on human chromosome 6
(6p21.3) and a small cluster at 1921. Additionally, histone
‘genes lack introns; a rare feature for eukaryotic genes.
‘Genes that encode ribosomal RNA (FRNA), inclusive of
the spacer units, total about 0.4% of the DNA in the hu-
‘man genome. The individual genes of a particular rRNA
family are essentially identical. The 288, 5.85 and 18S
RNA genes are clustered with spacer units (ETS (external
transcribed spacer), ITS (internal transcribed spacer), in
tandem arrays of approximately 60 copies each yielding
about 2 Mbp of DNA. These clusters are present on the
short arms of five acrocentric chromosomes and form the
nucleolar organizing regions, hence approximately 300
copies. These three sRNA genes are transcribed asa single
‘unit (yielding 41SrRNA) and then cleaved. SSTRNA genes
are clustered on chromosome 19,
‘There are an estimated 30 human transfer RNA (tRNA)
sgenes, tRNA genesand their pseudogenes are dispersed on
at least seven chromosomes (MeBride ea, 1989). In ad-
dition, tRNA genes have been found in variousclusters, ie
cloned genomic fragments have been isolated containing
several (RNA genes. Dispersal of tRNA pseudogenes may
have occurred by RNA-mediated retroposition (McBride
et al, 1989), This is consistent with the postulation that
various SINE families (see below) have been derived from
URNA genes (Deininger and Batzer, 1993)
‘Small nuclear RNA (snRNA) molecules are th
function im RNA processing, There are six families of te
lated snRNA genes, termed Ul to Uo, that are dispersed
among the chromosomes. However, differing cluster pat
terns have been olnserved for these genes on different chro.
mosomes. For example 35-100 functional UI genes. all
sharing 20kb of nearly identical S” and ¥° flanking se
quences, are loosely clustered in chromosome 1p36, and
contain over 44 kb of intergenic sequences, whereas 10-20
U2 genes are clustered in a tight, virtually perfect 6-kb
repeat unit on 17q21-q22 (Lindgren er al. 1985), 10 ad:
dition, more than one subfamily of a U snRNA has been
identified; U3 comprises at least two subfamilies, which
differ in the flanking sequences. Also, pseudogenes of
snRNA have been identified and are thought to be dis
persed in the genome by retroposition. CRNA genes are
also found clustered with U RNA genes, for example
‘chromosome band 1p36coniains 15-30 copies each of Ul.
GluRNA and AsntRNA genes (van der Daft er al
1994),
Gene families with high sequence homologies
Therearenumerous families of genesin thehuman genome
Sharing extensive intafamily Homology. These are Bene
aly dispersed, butmay contain inked embers One ofthe
mont comprehensively tied pene fais ete haem0-
{obi amy, Human haemoglobin sa tetrameric proven
Eonsiting of two aelobin and two Pelobin subunits
‘here ae several poasble Tolpeptides constructing the
haemoglobin mokeul with sifenng phyiolopeal prop-
erties and ontological regulation, Ths probably occurred
fsa result of gene dupiation allowing for divergence of
seauences for procuring new function. The two globin
Tames exist a chsterof genes and poeudogenes on sep
arate chromosomes, Thea
hromosome lead the fe
related in sequence there
homology. Therefore, nraclster duplications postdate
the eipication ofthe ancestal gene yeding av and
slobin, The ontological relation i appareny coord
hated oneach cluster by upsteam sequence, providing the
expression ofthe gene to sated for ine oxpgen ee (a
feu, for example exists ina relatively hypoue environ
meat) Predatinghaemogiob divergence thedver gence
of hasnglchin and myoglobin from an ancestral Ge
Ee, whereas haemoglobin is the oxygen cairn blow,
Proto-oncogencs are also gene family members, These
tenes contrite to neoplass when thir expression te
Ghusnce or level) altered. The gee products have more
Cn ficions such secreted grown actors ge toe
fovea) sll surac seelors (eg er gone fa
iiacelular sigaltansdues (eg. fas gene fae
4 [NATURE ENCYCLOPEOM OF LF SHINES 2004 Nae Pblin UP / meeeSee
DNA-binding proteins (e.g. mye gene family). ‘The Wnt
ne family consists ofa east 18 structurally related genes
Functioning in vavious aspects of growth and differentia=
tion. They contain an N-terminal secretory signal peptide
a short domain oflow sequence conservation, and 2 highly
conserved block (ranging from 40 to 95%) of about 300
Saree ids ce highly conserved motifs, and conserva-
tion of spacing of 22 eysteine residues. The Wr genes ma}
todiferen’chomosemes, some demonstraingcomere
tion of synteny (Bergstein eral, 1997)
‘There are four erbB genes. These are epidermal growth
factor receptors (EGFR) grouped, as are other receptor
tyrosine kinases, into a family based on the sequence hor
mology of their kinase domains, their structure and the
structural similarity of their ligands. The mye genes are
‘members ofthe basic helix-loop-helix family of wanseipe
tion factors, Functional members including emye, L-mye
and Nomye, ae not linked genetically, with the latter two
demonstrating more restricted patterns of expression, but
the shea thie exon, to intron structure. A detaled
sequence analysis of myc genes suggests that the progenitor
ofthe N-miyc and L-mye genes wasa duplieated emye gene
(Atchley and Fitch, 1998), The ras genes represent a sub-
family of guanosine triphosphate (GTP) binding proteins,
found dispersed inthe genome. Neras, Horas and K-ras are
closely elated genes encoding for a p21"™ product. Addi-
tional members ofthis family include TC2T and Reras.
‘Thee are many other examples of multigene families in
the human genome and some ofthese ate listed in Table 2.
Gene families with low sequence homology
but functionally conserved domains
‘Some sequences in the human genome share highly con-
served amino acid domains with weak overall homologies.
‘These often have developmental function. There ate nine
dispersed paired box (Pax) genes that contain highly con:
served DNA-binding domains with six 2 helices. The
homeobox or Hox genes share a common 60 amino acid
encoding sequence. In humans there are four Hox gene
ih, eas ee
clusters, on different chromosomes. However, the individ-
ual genes in the cluster derronstrate preater homology to a
counterpart gene on another cluster than to the other genes
fon the same cluster.
Gene families with different products but
conserved short amino acid motifs
Some genes are considered families based not on entire-
length sequence homology, but on conserved short amino
acid motifs. DEAD box genes encode products with RNA
helicase activity, and share eight short amino acid motifs,
including the DEAD box (Asp-Glu-Ala-Asp). However,
there are other gene families with conserved amino acid
motifs, such as the WD box, that provide different func-
tions, The WD box genes are characterized by between
four and eight tandem repeats of a core sequence of fixed
length terminating in a WD dipeptide
Gene Superfamilies
DNA sequences that yield functionally and structurally
related products with weak sequence homology and lack-
ing significantly conserved amino acid moufs may be
grouped asa gene superfamily. However, different families
(of genes may comprise a superfamily. Genes of the
immunoglobulin superfarily encode proteins that form
dimers consisting of extracellular variable domains at the
N-terminus and constant domains at the C-terminus.
Members of the immunoglobulin superfamily include
immunoglobulin, human leucocyte antigen (HLA), T-cell
receptor (TCR), Td and T8 genes. Another example in-
cludes three superfamilies of growth factor receptors: (1)
proteins with a core structure of seven transmembrane a
helical sequences; (2) large glycoproteins generally pos-
sessing a single transmembrane sequence and tyrosine kin-
ase activity (includes the EGFR er6B family described
above): and (3) single transmembrane proteins lacking
kinase activity.
Table 2 Examples of interspersed multigene families in the human genome
Gene Number of functional genes Estimated number of pseudogenes
Actin 4 >16
Aldolase 3 2
Arginosuccinate synthetase 1 4
B-Tubatin 2 15-20
Cytochrome ¢ 2 20-30
Ferritin heavy chain 1 Dg
Giyceraldchyde-3-phosphate de- 1 25
hydrogenase
Ribosomal protein L32 ' 20
‘Triose-phosphate isomerase U 56
[NATURE ENCYCLOPEDIA OF IF SCIENCES /¢ 200 Nature ating Gesu wre nt 5—_—_—_—_——
ITS A, NA
Transposable Elements
‘The hun peonome contains tater ypersed repeat sequences
‘hat uve lnngtely anyplifes in caypy snamiber hy movement
thronhant the gononie, ‘These seyuenersae voferte te 8
Uwannporable elements, Almost all amaposition Ii 06
of we
cuied via any RINA dnteriwediate ying eli
Aqenvo roforved 10 ay reisortapossne oF FetOpENON’
2), However, there fs wlio evidence of an wnt
sdiated taiypoxon family (pogo) in the human
sronoine (Ratbertsn, 1996),
Shoot acl Jong interspersed DNA elements (SINEs and
LINE, respectively) are the prinwary fan
Able clement
1 tranpos
the humus penne, ‘Those re referred 10
ris retroposons, sine they lick the long terminal repeats
(LIPRS) of retrovitnses, and ue anaphtice via any RNA i
Acrmiediate, LINHs are sometinnes referred to as non-L-TR
Fetrotransposons because sexquences in the elements code
for enzymes utilized in the retroposition process
Vhe Alvelemen isest
matedat over one million copies in
te human penone representing the primary SINE family
(Lander er af, 2001), Sequence conyparisons suggest that
Alu cepeats were derived from the TSL RNA gene, Each
‘Al elenyent is bout 280 bp with a dimeric structure, com
twins RNA polymentse HI promoter sequences, and typ»
ally has an Actich til and flanking direct repeats
(generated dusing integration). Approximately $000 Alu
a
i)
on Dv
Non-TR etvteansposon (1)
“A
Dime SINE (AN)
figure 2_ NA edited ronsposabe elements inthe human genome
Ene cote tie sia tanking dest pet (nro) The
foman endogenovsrerovan containg log tetra repeat (1H)
(pote ree regis), 909 frouprpretc antigen ene), pa poierae
(ene) ue envope gene) The THE etvovansponon cons ofan
pen reasng tome (ORF) and LTRs, The pon TRetotansposen (LI)
fora eral RNA piyeroe promote sequences (two Open
‘eating tame anda poy tal, The Au cementhasa dine soctre
1 hamologous haley separated by amide Rich region (we). Tet
Falleonaim and bon tApaymecaiel prmoter sequence athe
{9 hal contains an adtonalisernal 31 bp, Fr LT and Ala ements,
pale green and mative region are sequences unque to hese ements
‘A EA a
loons have eget tin the
requester gene et ope
‘ate 24 fi annevecema ns
Mt pucemefitnence heron panchayat
Meds DMA markers fo te way or
ht populajow geen (Deinnge naar 19
Mee gene Allene meron we rele 9
portopenkpionolypes vclonentnpredomsae ipehre
own R Dal preeminent ve Aichi
{uence inclu the Atl a revue vention
Mir clants and LUNEanppeac io amply by the actly
thw must lc, levi the om
reper nel pc
HINT (ov Ll eomens) re ewe at over 50 0
opi und we predomnataly found n chromosomal G
barat tulength LANE Ww appromumetly kbp
Though moa ae treated predoyens (ee blew with
‘About 2% fees 130 fa length Lic have
functional RNA polymere I promoter seenee long
few {cope toda! INE
tae Make hy diet repeats LINE mobilization ae
tty hs Ben vere in both germinal and somatic
SINE and LINEsationlly contribute othe evolu-
son ofthe gtiome by ying tes for unequal ome:
ous recombination. Wii the lowedensiy Wpprotein
EDL) receptor gee alone ere have bee Sever alter
tons tributed to recombstation a various ites
{ong ual hyperchesterolems
"he han geome aso conans fumes of retroviral
raat sequences These ne characterized by sequences
cacoding enzymes fr reypostion and contain LT Re
However, moat of thse sequences are defunct truncated
tnd mutated retroviruieelements, The endogenous
trove may have originally been incorporated into the
enone lowing etromalnfeton ofthe germ al. In
akin, sltary LTR ofthese elements maybe located
throughout the genome. There are several low abundant
(10-1000 copies) human erdogenousretrvirs (HERV)
fui, wth invidunt elements ranging rom 60 10 Kb
Overall LTR elements eneompasstpproximatly 8% of
the genome,
the transpoortike human element (THEs1) contains
ine long ternal repeat of grated retroviral genomes,
but lacks seq for enzymes involied in re
tropostion. i herefor, ths inrspersed DNA fanly 1s
tentuveychaacerd teu reteeansposon, The 24
‘Te sequence estimate a 10000 copes with an ad
aiional 10000 solitary LTRS
"ere are paeuogenes ee below that are the result of
retroposton etropseindoqene) These paciogenes lack
Ines and the Ranking DNA sequences ofthe functional
Toca tnd therefore ne not produto gene duplication
ily of these
6 NATURE ENCYCLOMDI OF UE SNC 62004 Nae Fusing ou wwe.‘The generation of these types of elements isd
the reverse transcriptase of other elements auch
Medium reiteration frequency. repetitive sequence
(MERs) represent a broad group of families of uncharac
{erized interspersed sequences. The mechanisms for am
plification of these sequences are unclear
LINE
and therefore
Fant inclusion as a transposable ele
ment. However, 11 has been suggested that some of these
Sequences are replicated and dissemmated by DNA virus
es, indicated by the presence of MER sequences in SV40
Fecombinants. Copy numbers of MER families range from
200 to 10.000, collectively yielding 100:000-200 40 copies,
Pseudogenes
Pscudogenes are nonfunctional copies ofa gene containing
par or all ofthe original sequence Poeudogenes ay aise
by tandem duplications, accumulating mutations a8
sult of the lack of selection pressure, and are sual tee
canizable by lack ofan open reading fame. An example
is the plobin pseudogenes A processed poeudoge
stopseudogene is derived by an RNA intermediate The
characteristic features of these sequencesare that they lack
regulatory sequences, and therefore ae normally incapa-
ble of expression, and they lack introns that have been
spliced during RNA processing. There may be as many a8
20-30 reiropseudogenes that have arisen from 1 parental
functional gene, e.¢. nbosomal protein L32 and plyceral
dehyde-3-phosphate dehydrogenase: However, SINEsand
LINES represent the most abundant families of retropseu
dogenes. Processed pseudogenes may be derived from
RNA genes as wel, Evidence for URNA “retro pscudo-
genesare the CCA sequencesat the 3end which are added
posttranscriptionally to the funcional RNA.
Mitochondrial Genome
‘The mitochondrion contains an autonomously replicating
genome. The human mitochondrial genome contains 16
569 bp encoding for 37 genes, including (RNA and rRNA
genes used for mitochondrial protein synthesis. Mi:
tochondrial DNA (mtDNA) is maternally inherited and
generally there are thousands of copies of mtDNA in a cell
Genome Evolution
‘The fact that genomes vary considerably among organisms
is indicative of the highly dynamic nature of the genome,
However, comparisons of human and chimpanzee DNA
demonstrate 98-99% sequence identity. This pases an in.
teresting question as to what makes us human. By FISH
Sa Ws a OL.
and basyone
sepme
has occurred altho
enon
Hat w shuttling of diferent
Diet aen buns and chinapan sees
ree fs conseraatvon a xzaeny of
‘eties.1¢ eves that are haber a hata ae ale ke
CGenvaunic reareangements may alter temps
expression of penes, as result of the chi
al Hocation of the gents), oF posaibly asa fune son
of a shift in gene imprinting, cunsequently yielding
phenotypic vasiation. Sinall dea yeibumuclentede change
ay alter the biochemical nature Of Ue protein product
‘alo conteibuting to phenotypic differences. Alteratians i
Fegulatory sequences, possibly by the sncorparation of te
troposed sequences, may also conteibute to. phenotypic
variation, Although chimpénzees and humans share neatly
all integrated Alu elements, here has been daperval of
rman-specific Alu sequences, 11s possible that exon shuf
fling has played a major role, Exon shuffling occurs by
lunequal homologous recombination in introns, and may
be a reason for the existence and maintenance of introns
(Gilbert, 1978), providing a source for the generation off
‘new proteins/gene families hat are composites of differing
functional domains as outlined in this review.
Acknowledgements
DHK was supported by an Eastern Michigan University
Spring/Summer Research Award and a University Re
search in Excellence Fund, MAB was supported by award
1999-1I-CX-K009 from the Office of Justice Programs,
National Institute of Justice, Department of Justice, Points
of view in this document are those of the authors and do
not necessarily represent the official position of the US
Department of Justice
References
Atchley WR and Fitch WM (1985) Mye and Mas: molecular evolution
‘of family of proto-oncogene products and tir merization par
ret Proceaings ofthe National Academy of Sciences of the USA 92
10217-10221
Berptin I, Escnbere LM. BhaleraoJ ea (1997 folation of two novel
WNT gencs, WNT und WNTIS, one of which (WNT5) closely
linked to WNT on human chromosome 17921. Genomics 46 450
44
CCrig JM and Bickmore WA (1598) The distribution of CpG itands in
mammalian chromosomes. Netare Genetics 7: 376-942
Deininger PL and Batzer MA (1993) Evolution of retroposons. Evo
onary Bislngy 27-157-196
‘ander DriftP, Chan A, van Roy Netal (1996) A multimegabascaster
ofsARNA and tRNA gones oF chromosome Ip3bharhours a adeno
views/S¥VA0 hybrid ving ntgntion site. Human Molecular Geneve: 3
231-2136
Gitber W 1978) Why genes in pico). Nate 271: 01
Hentschel CC and Birnstiet ML 1981) The organization andexpression
of histone gone fais. Cel: 301-313
Linder ES, Linton UM, Biren B ct a. (2001) tial anquensing and
‘analy ofthe human genome, Nature 409 860-92
1 EEYELOMEDINOF LE SEHINCES 2004 Nature biting Gro net 7
sro 9 cup