You are on page 1of 7
Genome Organization: Human David H Kass, fosem stcioon Unies, Yon Micpon USA Mark A Batzer, Louson Ste untersty Heath Scenes Cnt, New Ono Lousiana, USA The human mvc chromosomes, of DNA molecules. There a chromosomal ents, sequences and spacer DNA, Introduction genome contains about 3000 million A. of which only an estimated 1. 5 As shown in Fig: of the eukaryotic genome can be which are fu al sequences & 4 Supergene ag Cene tis é = wo OE Figure 1 genome isa highly complex arrangement of two set of 23 \arious types of DNA sequences and luding single-copy protein-encoding genes, rep Advanced article Soe oi 10.1038/np es 003885 such as the ribosomal RNA genes. Repetitive sequences ‘with no known function include the various highly repeat- cd satelite families, and the dispersed, moderately repeat ced transposable element families. The remainder of the jome consists of spacer DNA. whic! category of undefined DNA sequenc ‘The human nuclear gencme consists of 23 pairs of chro- mosomes, or 46 DNA molecules, of differing sizes (Table 1) is simply a broud fp Inespered Highly repeated ee Wen eae) wt fw “we oo road classification of DNA sequence inthe human genome, MER madlum eteration frequency epetive sequence NATURE ENCYCLOPEDIA OF LFESCENCES /€ 200 aie Publishing Croup / woe an 5 Table 1 DNA content of chromosomes, extrapolated from Stephens et al. (1990), based on the length of chromosome as & percentage ofthe total length of the genome + Chromosome Amount of DNA(Mb) Chromosome ‘Amount of DNA (Mb) T 249 3 108 2 237 4 105 3 192 1s 9 4 183 16 8 5 m4 n Br 6 16s Is 15 7 153 6 6 8 Bs 0 8 9 132 21 s4 10 132 2 37 nL 132, x lal 2 3 y a Sequence Complexity ‘The human genome contains various levels of complexity as demonstrated by reassociation kinetics. This involves the random shearing of DNA into small fragments aver~ aging about S00bp, heat denaturation to separate the strands of the double helix, and slow cooling. Duringcool- ing, complementary sequences anneal; the more copies there are ofa particular sequence, the greater the chance of finding a complement to anneal to. Therefore, the reasso- ciation is dependent on time (i), as well as the initial con- centration of that sequence (Co) yielding whatisreferred to as a Cof value. Analysis of the human genome estimates that 60% of the DNA is either single copy or in very low copies; 30% of the DNA is moderately repetitive; and 10% is considered highly repetitive. Various staining techniques demonstrate alternative banding patterns of mitotic chromosomes referred to as karyograms, Although the three broad classes of DNA are seaitered throughout the chromosome, chromosomal banding patterns reflect levels of compartmentalization of the DNA. Using the C-banding technique yields dark- staining regions ofthe chromosome (or C bands), referred to as heterochromatin. These regions are highly coiled, contain highly repetitive DNA and are typically found at the centromeres, telomeres and on the Y chromosome. ‘They are composed of long arrays of tandem repeats and therefore some may contain a nucleotide composition that differs significantly from the remainder of the genome (approximately 40-42% GC). This means that they can be Separated from the bulk of the genome by buoyant density (caesium chloride) gradient centrifugation. Gradient cen ‘wifugation results in a major band and three minor bands referred to as satellite bands, hence the term satellite DNA. The G-banding technique yields a pattern of alternating light and dark bands reflecting variations in base compo~ n, time of replication, chromatin conformation and the density of genes and repetitive sequences. Therefore, the karyoarams define chemosomal organization and al low for identification ofthe different chromosomes. The darker bands, or G bands are comparatively more con- densed, more AT-rch, lest generic and replicate later than the DNA within the pele bands, which correspond to the R bands by an allemative staining technique. More recently these alternative bending patterns have been cor- related 10 the eve! of compaction of seafTold-attachment regions (SARs). “The human genome may aso be compartmentalized ine to lage segments of DNA with distinctive GC richness refered to 35 “GC content domains’ (Lander et, 2001). There is a distinct asociaton between GC-richness and senedensity. Thisiseonsstent with he association of most fenes with CpG islands, the 500-1000-bp GC-rich sea tents flanking (osbaly atthe 5” end) most housekeeping and many tissuespecfc eves. The clustering of CpG is lands, as demonstrated by fluorescence in situ hybridiza- tion (Craigand Bickmore, 1994), further depicts gene-poot and gene-ich chromosomna. segments ‘Adsitionally, there are Fve human chromosomes (13, 14, 15, 21,22) distinguished at their terminus by a thin bridge with rounded ends erred to as chromosomal sat. eles, These contain repeats of genes coding for :RNA and ribosomal proteins thateoalese to fori he nucleolus and are known a the niceslar organizing regions Single-copy Sequences Although originally definec as a functional unit of hered. ity. a gene may be defined asan expressed segment of DNA, containing transeriptional regulatory seguenves. Venter oy al (2001) estimated that there are 26 588 protein-encodin, transcripts with an additional 12 000 computational ae ly de- 2 NATURE ENCYCLOPEDI OF UF SCENCES 7004 Nae Ping Coup woe a NAIC ved genes. Thi is consistet with the 30 000-40 00 s- timate ofthe ilernational human Benome serene consortium (Lander etal, 2001) Tn adalon, ever 200 roncoding RNA genes have ben rented with ovr 5000 rated genes, of which most are poeldogenes ac om). ‘The proportion ofthe genome consisungof genes Would be extmated at 15-20% assuming an averagegene az of Iskb. However, approximately 90% of the DNA on protein-coding genesare noncoding, neluing upsteam {nd downstream regulatory sequeness andinttons There fre. ony 1.5-2% (45-60 Mb) of DNA has coding fans tion Regulatory Sequences inclode common promoters recognized by transcription factors located at spec up. Sueam distances from the anseripton start a These sequences include the TATA, CCAATT and GC bores There are also tsuespecie promoter sequences, Ev hancers and sitnces ar clractng elements fat fancton and that uprgulate and downregulate pene apiece respectively, Many coding sequenees may be sneded oe members of gene families as described below Ina thee may besinge-opySequencesin the spaces DNA no known (determinable) function. * Repetitive Sequences In the human genome various sized stretches of DNA se- guences exist in variable copy numbers. These repetitive sequences may be ina tandem orientation andjor dispersed throughout the genome. Repetitive sequences may be clas- sified by function, dispersal patterns and sequence relat- edness. Satellite DNA typically refers to highly repetitive sequences with no known function; gene families are DNA sequences, with at least one functional gene, related by sequence homology and/or function; and interspersed te: peat sequences are typically the products of transposable clement integrations, but may include retropseudogenes of functional gene. Macrosatellites, Minisatellites and Microsatellites Macrosatellites are very long arrays, up to hundreds of kilobases, of tandemly repeated DNA. The three satelite bands observed by buoyant density centrifugation repre- sent sections of the human genome containing highly re- peated DNA that in effect alter the proportion of hucleotides from the rest of the genome. However, not all satellite sequences are resolved by density gradient cen- triugation, Alpha satelite DNA or alphoid DNA consti- tutes the bulk of centromeric heterochromatin on all chiromosomes. The interchromosomal divergence of the alpha satellite families allows the different chromosomes to be distinguished by fluorescence im situ hybridization (FISH). Minisatelltes are tandemly repeated sequences of DNA, yielding a total length from less than 1 kbp to ISkbp. One Subset of minisatellites comprises the highly polymorphic arrays of short tandem repeats with no known function that serve as useful DNA markers referred to a8 variable number tandem repeats (VNTRs). These sequences gen- crally contain I-Skbp of DNA of repeating units of 15~ 100 nucleotides. Several minisatellites share enough se {quence homology to be analysed by a single probe yielding DNA fingerprints. An example is a 10-15-bp core se- ‘quence of myoglobin minisatellites, which includes an al- most invariant core sequence (GGGCAGGANG) among several polymorphic VNTR loci Telomeric DNA sequences contain another subset of minisatellites. The telomeric sequences contain 10-15kb of hexanucleotide repeats, most commonly TTAGGG in the human genome, at the termini of the chromosomes. ‘These sequences are added by telomerase to ensure com- plete replication of the chromosome. Telomeres of somatic cells are generally shorter than in germ cells, illustrated by their decreasing size within human B cells and skin cells with increasing age. In humans, ithas been postulated that telomeric loss is associated with ageing and tumorigenesis. Microsatelites are small arrays of short simple tandem repeats, primarily 4 bp or ess, Different arrays are found dispersed throughout the genome, although dinucleotide CAJTG repeats are most common, yielding 0.5% of the genome. Runs of As and Ts are common as well. Micro- satellites haveno known functions. However, CA/TG dit cleotide pairs can form theZ-DNA conformation in vitro, which is possibly indicative of function. Repeat unit copy number variation of microsatellites apparently occurs by replication slippage yieldiag highly polymorphic DNA markers referred to as short tandem repeat polymorphisms (STRPs), STRPs are commonly used in commercial DNA fingerprinting kits. The expansion of trinucleotide repeats within genes has been associated with genetic disorders such as Huntington disease, myotonic muscular dystro- phy, Friedreich ataxia and fragile-X syndrome. Gene Families Gene families generally consist of a set of genes with high sequence homology over their entire length primarily in the exons for protein-encoding gene families. Members of gene families, or possibly separate clusters ofthe same gene family, are considered paralogous, and are derived from an ancestral gene or locus by duplication, and are therefore evolutionarily and functionally related. The duplication of ‘1 gene, however, may yield a nonfunctional pseudogene (see below). Additionally, there are genes yielding products NATURE ENCYCLOPEDIA OF LIFE SCIENCES 62004 Nature Pblshing Group net 3 WAS with weak overall sequence homology, but that are ho- mologousat functionally conserved domains or short ami ro acid motifs, collectively forming an additional type gene family. A group of genes with functionally and struc turally related products with weak sequence homologies land lacking conserved amino acid motls may be referred to asa gene superfamily. Only a limited number of exam- ples will be discussed Gene families with essentially identical products Ifthe cell warrants numerous proteins or RNA molecules, fone solution might be the production of multiple func: tional copies ofa gene. The human genome,and eukaryotic genomes in general, have amplified a number of genes whose products are responsible for general purpose func- tions such as DNA replication and protein synthesis, Histone genes are highly conserved among eukaryotes and have a fundamental role in chromatin structure. The histone family consists of five genes that tend to be linked, although in differing arrays of variable copy numbers dis persed in the human genome. The individual genes of a particular histone family encode essentially identical prod- ucts (ie. H4 genes yield the same H protein). Analysis of individual human genomic clones has identified isolated histone genes (e.g. Hi), clusters of two or more histone genes, or clusters ofall histone genes (eg. H3-H4-H1-H3. H2A-H2B) (Hentschel and Birnstiel, 1981). A majority of histone genes forma large cluster on human chromosome 6 (6p21.3) and a small cluster at 1921. Additionally, histone ‘genes lack introns; a rare feature for eukaryotic genes. ‘Genes that encode ribosomal RNA (FRNA), inclusive of the spacer units, total about 0.4% of the DNA in the hu- ‘man genome. The individual genes of a particular rRNA family are essentially identical. The 288, 5.85 and 18S RNA genes are clustered with spacer units (ETS (external transcribed spacer), ITS (internal transcribed spacer), in tandem arrays of approximately 60 copies each yielding about 2 Mbp of DNA. These clusters are present on the short arms of five acrocentric chromosomes and form the nucleolar organizing regions, hence approximately 300 copies. These three sRNA genes are transcribed asa single ‘unit (yielding 41SrRNA) and then cleaved. SSTRNA genes are clustered on chromosome 19, ‘There are an estimated 30 human transfer RNA (tRNA) sgenes, tRNA genesand their pseudogenes are dispersed on at least seven chromosomes (MeBride ea, 1989). In ad- dition, tRNA genes have been found in variousclusters, ie cloned genomic fragments have been isolated containing several (RNA genes. Dispersal of tRNA pseudogenes may have occurred by RNA-mediated retroposition (McBride et al, 1989), This is consistent with the postulation that various SINE families (see below) have been derived from URNA genes (Deininger and Batzer, 1993) ‘Small nuclear RNA (snRNA) molecules are th function im RNA processing, There are six families of te lated snRNA genes, termed Ul to Uo, that are dispersed among the chromosomes. However, differing cluster pat terns have been olnserved for these genes on different chro. mosomes. For example 35-100 functional UI genes. all sharing 20kb of nearly identical S” and ¥° flanking se quences, are loosely clustered in chromosome 1p36, and contain over 44 kb of intergenic sequences, whereas 10-20 U2 genes are clustered in a tight, virtually perfect 6-kb repeat unit on 17q21-q22 (Lindgren er al. 1985), 10 ad: dition, more than one subfamily of a U snRNA has been identified; U3 comprises at least two subfamilies, which differ in the flanking sequences. Also, pseudogenes of snRNA have been identified and are thought to be dis persed in the genome by retroposition. CRNA genes are also found clustered with U RNA genes, for example ‘chromosome band 1p36coniains 15-30 copies each of Ul. GluRNA and AsntRNA genes (van der Daft er al 1994), Gene families with high sequence homologies Therearenumerous families of genesin thehuman genome Sharing extensive intafamily Homology. These are Bene aly dispersed, butmay contain inked embers One ofthe mont comprehensively tied pene fais ete haem0- {obi amy, Human haemoglobin sa tetrameric proven Eonsiting of two aelobin and two Pelobin subunits ‘here ae several poasble Tolpeptides constructing the haemoglobin mokeul with sifenng phyiolopeal prop- erties and ontological regulation, Ths probably occurred fsa result of gene dupiation allowing for divergence of seauences for procuring new function. The two globin Tames exist a chsterof genes and poeudogenes on sep arate chromosomes, Thea hromosome lead the fe related in sequence there homology. Therefore, nraclster duplications postdate the eipication ofthe ancestal gene yeding av and slobin, The ontological relation i appareny coord hated oneach cluster by upsteam sequence, providing the expression ofthe gene to sated for ine oxpgen ee (a feu, for example exists ina relatively hypoue environ meat) Predatinghaemogiob divergence thedver gence of hasnglchin and myoglobin from an ancestral Ge Ee, whereas haemoglobin is the oxygen cairn blow, Proto-oncogencs are also gene family members, These tenes contrite to neoplass when thir expression te Ghusnce or level) altered. The gee products have more Cn ficions such secreted grown actors ge toe fovea) sll surac seelors (eg er gone fa iiacelular sigaltansdues (eg. fas gene fae 4 [NATURE ENCYCLOPEOM OF LF SHINES 2004 Nae Pblin UP / meee See DNA-binding proteins (e.g. mye gene family). ‘The Wnt ne family consists ofa east 18 structurally related genes Functioning in vavious aspects of growth and differentia= tion. They contain an N-terminal secretory signal peptide a short domain oflow sequence conservation, and 2 highly conserved block (ranging from 40 to 95%) of about 300 Saree ids ce highly conserved motifs, and conserva- tion of spacing of 22 eysteine residues. The Wr genes ma} todiferen’chomosemes, some demonstraingcomere tion of synteny (Bergstein eral, 1997) ‘There are four erbB genes. These are epidermal growth factor receptors (EGFR) grouped, as are other receptor tyrosine kinases, into a family based on the sequence hor mology of their kinase domains, their structure and the structural similarity of their ligands. The mye genes are ‘members ofthe basic helix-loop-helix family of wanseipe tion factors, Functional members including emye, L-mye and Nomye, ae not linked genetically, with the latter two demonstrating more restricted patterns of expression, but the shea thie exon, to intron structure. A detaled sequence analysis of myc genes suggests that the progenitor ofthe N-miyc and L-mye genes wasa duplieated emye gene (Atchley and Fitch, 1998), The ras genes represent a sub- family of guanosine triphosphate (GTP) binding proteins, found dispersed inthe genome. Neras, Horas and K-ras are closely elated genes encoding for a p21"™ product. Addi- tional members ofthis family include TC2T and Reras. ‘Thee are many other examples of multigene families in the human genome and some ofthese ate listed in Table 2. Gene families with low sequence homology but functionally conserved domains ‘Some sequences in the human genome share highly con- served amino acid domains with weak overall homologies. ‘These often have developmental function. There ate nine dispersed paired box (Pax) genes that contain highly con: served DNA-binding domains with six 2 helices. The homeobox or Hox genes share a common 60 amino acid encoding sequence. In humans there are four Hox gene ih, eas ee clusters, on different chromosomes. However, the individ- ual genes in the cluster derronstrate preater homology to a counterpart gene on another cluster than to the other genes fon the same cluster. Gene families with different products but conserved short amino acid motifs Some genes are considered families based not on entire- length sequence homology, but on conserved short amino acid motifs. DEAD box genes encode products with RNA helicase activity, and share eight short amino acid motifs, including the DEAD box (Asp-Glu-Ala-Asp). However, there are other gene families with conserved amino acid motifs, such as the WD box, that provide different func- tions, The WD box genes are characterized by between four and eight tandem repeats of a core sequence of fixed length terminating in a WD dipeptide Gene Superfamilies DNA sequences that yield functionally and structurally related products with weak sequence homology and lack- ing significantly conserved amino acid moufs may be grouped asa gene superfamily. However, different families (of genes may comprise a superfamily. Genes of the immunoglobulin superfarily encode proteins that form dimers consisting of extracellular variable domains at the N-terminus and constant domains at the C-terminus. Members of the immunoglobulin superfamily include immunoglobulin, human leucocyte antigen (HLA), T-cell receptor (TCR), Td and T8 genes. Another example in- cludes three superfamilies of growth factor receptors: (1) proteins with a core structure of seven transmembrane a helical sequences; (2) large glycoproteins generally pos- sessing a single transmembrane sequence and tyrosine kin- ase activity (includes the EGFR er6B family described above): and (3) single transmembrane proteins lacking kinase activity. Table 2 Examples of interspersed multigene families in the human genome Gene Number of functional genes Estimated number of pseudogenes Actin 4 >16 Aldolase 3 2 Arginosuccinate synthetase 1 4 B-Tubatin 2 15-20 Cytochrome ¢ 2 20-30 Ferritin heavy chain 1 Dg Giyceraldchyde-3-phosphate de- 1 25 hydrogenase Ribosomal protein L32 ' 20 ‘Triose-phosphate isomerase U 56 [NATURE ENCYCLOPEDIA OF IF SCIENCES /¢ 200 Nature ating Gesu wre nt 5 —_—_—_—_—— ITS A, NA Transposable Elements ‘The hun peonome contains tater ypersed repeat sequences ‘hat uve lnngtely anyplifes in caypy snamiber hy movement thronhant the gononie, ‘These seyuenersae voferte te 8 Uwannporable elements, Almost all amaposition Ii 06 of we cuied via any RINA dnteriwediate ying eli Aqenvo roforved 10 ay reisortapossne oF FetOpENON’ 2), However, there fs wlio evidence of an wnt sdiated taiypoxon family (pogo) in the human sronoine (Ratbertsn, 1996), Shoot acl Jong interspersed DNA elements (SINEs and LINE, respectively) are the prinwary fan Able clement 1 tranpos the humus penne, ‘Those re referred 10 ris retroposons, sine they lick the long terminal repeats (LIPRS) of retrovitnses, and ue anaphtice via any RNA i Acrmiediate, LINHs are sometinnes referred to as non-L-TR Fetrotransposons because sexquences in the elements code for enzymes utilized in the retroposition process Vhe Alvelemen isest matedat over one million copies in te human penone representing the primary SINE family (Lander er af, 2001), Sequence conyparisons suggest that Alu cepeats were derived from the TSL RNA gene, Each ‘Al elenyent is bout 280 bp with a dimeric structure, com twins RNA polymentse HI promoter sequences, and typ» ally has an Actich til and flanking direct repeats (generated dusing integration). Approximately $000 Alu a i) on Dv Non-TR etvteansposon (1) “A Dime SINE (AN) figure 2_ NA edited ronsposabe elements inthe human genome Ene cote tie sia tanking dest pet (nro) The foman endogenovsrerovan containg log tetra repeat (1H) (pote ree regis), 909 frouprpretc antigen ene), pa poierae (ene) ue envope gene) The THE etvovansponon cons ofan pen reasng tome (ORF) and LTRs, The pon TRetotansposen (LI) fora eral RNA piyeroe promote sequences (two Open ‘eating tame anda poy tal, The Au cementhasa dine soctre 1 hamologous haley separated by amide Rich region (we). Tet Falleonaim and bon tApaymecaiel prmoter sequence athe {9 hal contains an adtonalisernal 31 bp, Fr LT and Ala ements, pale green and mative region are sequences unque to hese ements ‘A EA a loons have eget tin the requester gene et ope ‘ate 24 fi annevecema ns Mt pucemefitnence heron panchayat Meds DMA markers fo te way or ht populajow geen (Deinnge naar 19 Mee gene Allene meron we rele 9 portopenkpionolypes vclonentnpredomsae ipehre own R Dal preeminent ve Aichi {uence inclu the Atl a revue vention Mir clants and LUNEanppeac io amply by the actly thw must lc, levi the om reper nel pc HINT (ov Ll eomens) re ewe at over 50 0 opi und we predomnataly found n chromosomal G barat tulength LANE Ww appromumetly kbp Though moa ae treated predoyens (ee blew with ‘About 2% fees 130 fa length Lic have functional RNA polymere I promoter seenee long few {cope toda! INE tae Make hy diet repeats LINE mobilization ae tty hs Ben vere in both germinal and somatic SINE and LINEsationlly contribute othe evolu- son ofthe gtiome by ying tes for unequal ome: ous recombination. Wii the lowedensiy Wpprotein EDL) receptor gee alone ere have bee Sever alter tons tributed to recombstation a various ites {ong ual hyperchesterolems "he han geome aso conans fumes of retroviral raat sequences These ne characterized by sequences cacoding enzymes fr reypostion and contain LT Re However, moat of thse sequences are defunct truncated tnd mutated retroviruieelements, The endogenous trove may have originally been incorporated into the enone lowing etromalnfeton ofthe germ al. In akin, sltary LTR ofthese elements maybe located throughout the genome. There are several low abundant (10-1000 copies) human erdogenousretrvirs (HERV) fui, wth invidunt elements ranging rom 60 10 Kb Overall LTR elements eneompasstpproximatly 8% of the genome, the transpoortike human element (THEs1) contains ine long ternal repeat of grated retroviral genomes, but lacks seq for enzymes involied in re tropostion. i herefor, ths inrspersed DNA fanly 1s tentuveychaacerd teu reteeansposon, The 24 ‘Te sequence estimate a 10000 copes with an ad aiional 10000 solitary LTRS "ere are paeuogenes ee below that are the result of retroposton etropseindoqene) These paciogenes lack Ines and the Ranking DNA sequences ofthe functional Toca tnd therefore ne not produto gene duplication ily of these 6 NATURE ENCYCLOMDI OF UE SNC 62004 Nae Fusing ou wwe. ‘The generation of these types of elements isd the reverse transcriptase of other elements auch Medium reiteration frequency. repetitive sequence (MERs) represent a broad group of families of uncharac {erized interspersed sequences. The mechanisms for am plification of these sequences are unclear LINE and therefore Fant inclusion as a transposable ele ment. However, 11 has been suggested that some of these Sequences are replicated and dissemmated by DNA virus es, indicated by the presence of MER sequences in SV40 Fecombinants. Copy numbers of MER families range from 200 to 10.000, collectively yielding 100:000-200 40 copies, Pseudogenes Pscudogenes are nonfunctional copies ofa gene containing par or all ofthe original sequence Poeudogenes ay aise by tandem duplications, accumulating mutations a8 sult of the lack of selection pressure, and are sual tee canizable by lack ofan open reading fame. An example is the plobin pseudogenes A processed poeudoge stopseudogene is derived by an RNA intermediate The characteristic features of these sequencesare that they lack regulatory sequences, and therefore ae normally incapa- ble of expression, and they lack introns that have been spliced during RNA processing. There may be as many a8 20-30 reiropseudogenes that have arisen from 1 parental functional gene, e.¢. nbosomal protein L32 and plyceral dehyde-3-phosphate dehydrogenase: However, SINEsand LINES represent the most abundant families of retropseu dogenes. Processed pseudogenes may be derived from RNA genes as wel, Evidence for URNA “retro pscudo- genesare the CCA sequencesat the 3end which are added posttranscriptionally to the funcional RNA. Mitochondrial Genome ‘The mitochondrion contains an autonomously replicating genome. The human mitochondrial genome contains 16 569 bp encoding for 37 genes, including (RNA and rRNA genes used for mitochondrial protein synthesis. Mi: tochondrial DNA (mtDNA) is maternally inherited and generally there are thousands of copies of mtDNA in a cell Genome Evolution ‘The fact that genomes vary considerably among organisms is indicative of the highly dynamic nature of the genome, However, comparisons of human and chimpanzee DNA demonstrate 98-99% sequence identity. This pases an in. teresting question as to what makes us human. By FISH Sa Ws a OL. and basyone sepme has occurred altho enon Hat w shuttling of diferent Diet aen buns and chinapan sees ree fs conseraatvon a xzaeny of ‘eties.1¢ eves that are haber a hata ae ale ke CGenvaunic reareangements may alter temps expression of penes, as result of the chi al Hocation of the gents), oF posaibly asa fune son of a shift in gene imprinting, cunsequently yielding phenotypic vasiation. Sinall dea yeibumuclentede change ay alter the biochemical nature Of Ue protein product ‘alo conteibuting to phenotypic differences. Alteratians i Fegulatory sequences, possibly by the sncorparation of te troposed sequences, may also conteibute to. phenotypic variation, Although chimpénzees and humans share neatly all integrated Alu elements, here has been daperval of rman-specific Alu sequences, 11s possible that exon shuf fling has played a major role, Exon shuffling occurs by lunequal homologous recombination in introns, and may be a reason for the existence and maintenance of introns (Gilbert, 1978), providing a source for the generation off ‘new proteins/gene families hat are composites of differing functional domains as outlined in this review. Acknowledgements DHK was supported by an Eastern Michigan University Spring/Summer Research Award and a University Re search in Excellence Fund, MAB was supported by award 1999-1I-CX-K009 from the Office of Justice Programs, National Institute of Justice, Department of Justice, Points of view in this document are those of the authors and do not necessarily represent the official position of the US Department of Justice References Atchley WR and Fitch WM (1985) Mye and Mas: molecular evolution ‘of family of proto-oncogene products and tir merization par ret Proceaings ofthe National Academy of Sciences of the USA 92 10217-10221 Berptin I, Escnbere LM. BhaleraoJ ea (1997 folation of two novel WNT gencs, WNT und WNTIS, one of which (WNT5) closely linked to WNT on human chromosome 17921. Genomics 46 450 44 CCrig JM and Bickmore WA (1598) The distribution of CpG itands in mammalian chromosomes. Netare Genetics 7: 376-942 Deininger PL and Batzer MA (1993) Evolution of retroposons. Evo onary Bislngy 27-157-196 ‘ander DriftP, Chan A, van Roy Netal (1996) A multimegabascaster ofsARNA and tRNA gones oF chromosome Ip3bharhours a adeno views/S¥VA0 hybrid ving ntgntion site. Human Molecular Geneve: 3 231-2136 Gitber W 1978) Why genes in pico). Nate 271: 01 Hentschel CC and Birnstiet ML 1981) The organization andexpression of histone gone fais. Cel: 301-313 Linder ES, Linton UM, Biren B ct a. (2001) tial anquensing and ‘analy ofthe human genome, Nature 409 860-92 1 EEYELOMEDINOF LE SEHINCES 2004 Nature biting Gro net 7 sro 9 cup

You might also like