You are on page 1of 10

J Mol Evol (1998) 46:6473

Springer-Verlag New York Inc. 1998

Organization and Nucleotide Sequence of the Cluster of Five Histone Genes in the Polichaete Worm Chaetopterus variopedatus: First Record of a H1 Histone Gene in the Phylum Annelida
Rosanna del Gaudio, Nicoletta Potenza, Patrizia Stefanoni, Maria L. Chiusano,* Giuseppe Geraci
Department of Genetics, General and Molecular Biology, University of Naples Federico II, Via Mezzocannone, 8, 80134 Naples, Italy Received: 23 April 1997 / Accepted: 21 July 1997

Abstract. Histone genes were identified and their nucleotide sequences were determined in the polychaete marine worm Chaetopterus variopedatus. The genes are organized in about 390 clusters of 7.3 kbp. Each cluster contains one copy of the five histone genes. The H1 histone gene present in the clusters is the first ever isolated in the phylum Annelida. The cluster has the unique peculiarity that all genes contain both the replicationdependent and the replication-independent 3 mRNA termination signals. Despite the differences in cluster organization and transcription polarity of the individual histone genes between C. variopedatus and Platynereis dumerilii, the other annelid in which histone genes have been studied, phylogenetic analysis of the encoded amino acid sequences clearly groups together those two organisms in a tree in which the other studied worms find closely related positions on the same evolutionary branch. Key words: C. variopedatus Nucleotide sequence Annelida Polychaete H1 gene Histone gene cluster mRNA double termination signal Phylogenesis

Introduction Histones are basic proteins that associate with each other and with nuclear DNA to form the eukaryotic chromatin. There are five histone types. Four of them, H3, H4, H2A, and H2B, form an octameric structure, the nucleosome core, that is, the fundamental unit of chromatin. The fifth histone, H1, participates in the formation of higherorder chromatin structures. Histones are among the most conserved proteins, and their genes and the corresponding frameworks of arrangements have been conserved over vast expanses of evolutionary time (Miller et al. 1993). Due to their diffusion in eukaryotes and to their characteristics, histones are considered to represent a model system for the study of structure, organization, and expression of multigene families, providing potentially interesting markers for evolutionary studies and for resolving phylogenetic relationships (Hentschel and Birnstiel 1981; Maxson et al. 1983a). A number of variant histone proteins are known to be expressed in specific tissues at specific stages of embryonic development (Brandt et al. 1979) and at particular periods of the cell cycle (Shumperli 1986; Osley 1991). Studies of histone genes and of their organization in a variety of organisms have shown the occurrence of two fundamental arrangements. In one, the five histone genes are clustered, the clusters are repeated in tandem, and the sequences of the repeated clusters are generally strongly conserved. In the other, histone genes are dispersed in the genome in single genes or in incomplete clusters. The first type of organization is present in the insect Dro-

* Present address: CRISCEB, Center of Research for Computational and Biotechnological Sciences, II University of Naples, Via Costantinopoli 16, 80138, Napoli, Italy Correspondence to: G. Geraci; e-mail geraci@dgbm.unina.it

65

sophila melanogaster (Lifton et al. 1977), in the trout Salmo gairdnerii (Connor et al. 1984), in the newt Notophtalmus viridiscens (Stephenson et al. 1981), and in various sea stars (Howell et al. 1987; Cool et al. 1988). The second type of organization is present in the nematode C. elegans (Roberts et al. 1987) and in some vertebrates like chicken (Engel and Dodgson 1981), mouse (Sittman et al. 1981), and human (Albig et al. 1991). In sea urchins (Maxson et al. 1983b; Angerer et al. 1985) and in Xenopus laevis (Zernik 1980; Ruberti et al. 1982) both histone gene arrangements are observed. At present, in the phylum Annelida, histone genes have been characterized only in the polychaete worm Platynereis dumerilii (Sellos et al. 1990). Histone gene clusters have been characterized also in the worm Urechis caupo (Ingham and Davis 1988; Davis et al. 1992) belonging to the closely related minor phylum of Echiuroid. In these two organisms the core histone genes are organized in repeated clusters that do not contain the H1 histone gene and this has not been found elsewhere in their genomes. We have undertaken the characterization of the histone genes in the annelid polychaete marine worm Chaetopterus variopedatus as an extension of our work on histone H1 and protaminethe two molecules that organize the sperm chromatin of this organism (De Petrocellis et al. 1983). We show here that in C. variopedatus, differently than in the other polichaete annelid P. dumerilii, histone genes are organized in clusters also containing histone H1. We report their complete nucleotide sequences. Differences between the histone gene clusters of C. variopedatus and the other studied annelid concern also the organization of the genes and their transcription polarity. Interestingly, C. variopedatus histone gene clusters have the unique peculiarity that all genes have two termination signals for the mRNA in their 3 untranslated regions (3-UTR): the stem-loopforming palindrome sequence, with the associated purine-rich motif, typical of the mRNA of replication-dependent genes, expressed in a strict correlation with the S-phase of the cell cycle, and the AATAAA polyadenylation signal typical of the replacement basal replication-independent histone variants, some of which contain introns, producing stable polyadenylated mRNAs not linked to the cell cycle (Hentschel and Birnstiel 1981; Birchmeier et al. 1982). Materials and Methods
General Methods. Annelid worms C. variopedatus were kindly provided by the Zoological Station of Naples (Italy). Specimens were collected in the bay of Naples in the month of June, when they are sexually active. Sperm cells were obtained from the parapods of a single male as described (De Petrocellis et al. 1983) and purified with several centrifugations in Millipore-filtered sea water. Sperm cells were lysed in 0.5 M EDTA, 2% SDS, and DNA was extracted and purified as described (Blin and Stafford 1976). Construction of Genomic Library. Genomic DNA (10 g) was digested to completion with different restriction enzymes to search for

fragments possibly containing the complete cluster of histone genes. Digestion with PstI generated a DNA band, corresponding to fragments averaging 7.5 kbp in length, that was positive to hybridization using as probes the different early histone genes of the sea urchin P. lividus, obtained from the plasmid pPH70 (a gift of G. Spinelli, University of Palermo). PstI-restricted genomic DNA (10 g) was fractionated by electrophoresis on 0.8% (w/v) agarose slab gel in TBE buffer (0.089 M Tris, 0.089 M sodium borate, 0.009 M EDTA, pH 8.3); the agarose gel slice containing the PstI fragments of interest was excised from the slab; and DNA fragments were electroeluted. A genomic library of these fragments was constructed in pBR322 vector (Promega) using standard procedures (Maniatis et al. 1982) with the following modifications: the molecular ratio in the ligation reaction was 2:1 (vector: DNA insert) and the final reaction volume was 100 l. The products of the T4 DNA ligase (Boehringer, Mannheim) were used to transform CaCl2-competent E. coli HB101 cells. Clones positive to colony hybridization were detected using as a probe the sea urchin early H1 histone gene previously labeled with [32P]-dCTP (3,000 Ci/mmol, Amersham) using the Multiprime DNA labeling system (Amersham). Clones showing the strongest hybridization signal with the H1 histone gene after washing to moderate stringency were used for further studies. The clones were amplified and their DNA inserts were excised and analyzed by hybridization with each of the early heterologous sea urchin histone genes. The hybridization procedure was as follows. Filters were prehybridized for 6 h at 65C in 5 SSPE (150 mM NaCl, 10 mM Na phosphate, 1 mM EDTA, pH 7.4), 0.1% SDS, 5 Denhardts solution (0.02% bovine serum albumin, 0.02% Ficoll, 0.02% polyvinylpyrrolidone), and 100 g/ml denatured herring sperm DNA. The labeled probe was then added and hybridization was carried out for 1618 h at 60C in the same solution. Filters were then sequentially washed in 2 SSPE, 0.1% SDS for 5 min at room temperature, for 20 min at 60C, and finally in 1 SSPE, 0.1% SDS for 30 min at 60C. Hybridization was revealed by exposing Fuji RX film, with intensifying screens, to the filters at 70C. Restriction and Hybridization Analysis of Positive Clones. The gene maps of the inserts of the selected clones were initially derived from a set of single and double restriction analyses with PstI, SacI, BamHI, and EcoRI performed in the conditions specified by the manufacturer (Boehringer, Mannheim). Digestion products (125 ng of plasmid DNA) were separated by electrophoresis on 1% (w/v) agarose gel in TBE buffer. The presence of the histone genes in the various fragments of the inserts was established by Southern blot hybridization of the digests with each of the heterologous sea urchin probes previously labeled with [32P]-dCTP, as described above, using the Multiprime DNA labeling kit. Sequence studies were performed on three clones, 3D, 6B, and 8D, and the complete sequences of all five histone genes was determined on clone 8D. Plasmid Subcloning and DNA Sequencing. The restriction fragments identified with the heterologous sea urchin probes (see previous section) were separated on low-melting agarose gel. Each band of interest was excised from the gel; the DNA was electroeluted and cloned into appropriately cleaved Bluescript KS and SK plus and minus vectors (Stratagene). Ligation products were used to transform competent E. coli Tg1 cells that were selected for AMP R: lac phenotypes. DNA sequences were determined on both strands of the inserts by the Sanger method using a modified T7 DNA polymerase kit (Sequenase version 2.0, Amersham) on double-stranded DNA plasmid, prepared using Quiagen columns or on single-stranded DNA plasmid produced with M13 K07 helper phage. The electrophoretic analyses of the sequencing reactions were performed with two successive loadings on 6% acrylamide/bisacrylamide (38:2) gel using the Sequi-Gen II apparatus (Bio-Rad). Histone coding sequences of the analyzed DNA fragments were identified using BLAST and PC-gene programs by comparing encoded amino acid sequences of all open reading frames with the protein sequences in the databases.

66
Copy Number of Histone Genes. The copy number was determined by comparing the relative radioactive intensity of the bands containing the fragments, averaging 7.5 kbp obtained by PstI restriction of genomic DNA, with the radioactive intensity of the insert of plasmid 8D obtained by hydrolysis of the plasmid with the same restriction nuclease. A set of increasing amounts of the two hydrolyzed DNA samples were analyzed by electrophoretic separation in adjacent lanes on 0.8% (w/v) agarose slab gels in TBE buffer. Southern blots of the slabs after electrophoretic fractionations were sequentially hybridized with three different homologous histone gene probes derived from clone 8D, previously labeled with [32P]-dCTP using the Multiprime labeling system. The amount of DNA loaded in each lane is specified in the legend of Fig. 3. The results show the occurrence in the genomic DNA of C. variopedatus of a unique band, of about 7.5 kbp, that hybridizes with each of the individual gene probes. The amount of bound radioactive probe in the different lanes was determined both by using the PhosphorImager Scanner (Molecular Dynamics) and by comparing the relative intensity of the radioactive bands on Fuji RX films exposed to the filters for appropriate lengths of time at 70C. Phylogenetic Analysis of the Encoded Histone Molecules. The computational analyses were performed separately on H2A and H2B C. variopedatus histone genes to determine their relative positions in phylogenetic trees. The amino acid sequences encoded in the two genes were compared with those of the corresponding types obtained from the histone dedicated data bank created by Baxevanis and Landsman at the National Center for Biotechnology Information (see Fig. 4). The analyses were performed producing multiple alignments and generating phylogenetic trees by the neighbor-joining method (Saitou and Nei 1987) of Clustal W, version 1.6 (Thompson et al. 1994). Trees were plotted with Phylip, version 3.5c (Felsenstein 1982, 1988).

Fig. 1. Histone gene cluster in clone 8D of C. variopedatus. A Restriction map with enzymes used for sequence determination and gene mapping. B Organization of histone genes. The arrows indicate the direction of gene transcription. P, PstI; D, DraII; V, EcoRV; E, EcoRI; S, SacI; X, XhoI; A, AccI.

ing that also the cloned clusters of histone genes of C. variopedatus contain all histone types, including H1. Restriction and Hybridization Analysis of Clone 8D The restriction map (Fig. 1A) shows the enzymes used to map genes and to produce clones for sequence determination. The map concerns clone 8D as a representative of the five clones that were initially chosen for the sequence analysis. EcoRI and SacI restriction enzymes were used because data on genomic DNA showed that they cut in the histone gene cluster. There is a site for BamHI only in the vector and this was used as an asymmetric site useful to orient the fragments obtained by SacI hydrolysis. The DNA inserts of the selected clones were restricted and the Southern blots of the fragments were probed with each of the five histone genes from sea urchin to localize their relative positions in the fragments and in the clusters. As reported for clone 8D (Fig. 1A), the H3 gene appeared to be adjacent to H4 and inside the cluster, because both genes hybridized with a fragment of about 2.2 kbp produced by SacI hydrolysis. They were adjacent to H2A because the probes of the three genes hybridized with the same 4.2-kbp fragment produced by EcoRI digestion. H1 appeared to be adjacent to H2B because they both showed a positive hybridization signal on the same 5.8-kbp fragment produced by EcoRI digestion. The overall results showed that clone 8D contained one copy of each histone gene in a distinctive organization (Fig. 1B). Nucleotide Sequences of the Histone Genes The DNA sequence of the histone gene cluster was initially determined on the three genomic pBR322 clones 3D, 6B, and 8D. The inserts of these clones were fragmented into three parts corresponding to the 2.2-kbp fragment produced by SacI digestion and to the two additional 3.1-kbp and 2.0-kbp fragments produced by further digestion of the DNA with PstI (Fig. 1A). These

Results Construction of the Genomic Library and Isolation of Clones The possibility of the occurrence of gene clusters containing all five histone genes was investigated by performing Southern blot analysis of the genomic DNA digested with various restriction enzymes. Hybridization experiments, performed using the five different sea urchin histone gene probes obtained from the plasmid pPH70, showed that PstI digestion produced a single band of about 7.5 kbp that hybridized with all heterologous probes (see Fig. 3, showing the determination of the histone genes copy number). The gel band containing the fragments hybridizing with the sea urchin histone gene probes was excised from the slab; the fragments were electroeluted and cloned in pBR322 vector. Clones containing the histone gene cluster were identified by colony hybridization and amplified to analyze the insert DNA. Electrophoretic analysis on agarose gel of 25 positive clones showed that the inserts had average value of about 7.5 kbp, with some length heterogeneity ranging between about 7 and 8.5 kbp. Five of these clones were analyzed by hybridizing their excised inserts with each of the five sea urchin histone genes used as probes. The results of these hybridization experiments (data not shown) confirmed the data obtained on genomic DNA demonstrat-

67

68

three fragments, prepared from clone 8D, were cloned into pBluescript vector and used to perform the complete sequence analysis of the histone gene cluster. The sequence was determined both on subclones prepared on the basis of restriction sites found in the sequence initially determined and by extending the analysis directly on the studied clones by means of synthesized primer oligonucleotides. The sequence data confirmed the indications of gene arrangements derived from the hybridization experiments and permitted one to orient the transcription polarity of each gene (Fig. 1B). The nucleotide sequence, as occurring in clone 8D, demonstrated the presence of a start codon at 58 nucleotides downstream from the 5 PstI site followed by an open reading frame of 606 nucleotides terminating with TAA. This sequence encodes an H1 histone type of 202 amino acids, including the initial methionine (Fig. 2A). On the basis of the deduced amino acid composition and K/R ratio (61 K and three R), the H1 histone in clone 8D is somatic. After a spacer of 587 bp there is an open reading frame of 369 bp terminating with TAA, coding for a 123-amino-acid H2B histone type, including the first methionine (Fig. 2B). A spacer of 1,337 bp is then observed followed by an open reading frame of 408 nucleotides coding for a H3 histone type of 136 amino acids, including the initial methionine (Fig. 2C). There is then a spacer of about 800 bp and an open reading frame of 309 bp terminating with TAG that encodes a H4 histone type of 103 amino acids, including the initial methionine (Fig. 2D). After a spacer of 659 bp there is another open reading frame of 375 nucleotides terminating with TAA encoding a H2A histone type of 125 amino acids (Fig. 2E). The spacer downstream this gene is about 1,800 bp, the longest of the cluster, and 3 terminal to the clone.

Analysis of the Genomic Sequences Flanking Histone Genes No peculiarity is evident in the sequences 5 to the start codons of the five histone genes present in the cluster except that no canonical TATA box is apparent. There are, however, elements common to other histone genes of various organisms and to genes transcribed by RNA polymerase II. In the case of the promoters of H2B and H2A genes there is the GATCC element (underlined in Fig. 2B,E) usually present 11 bp upstream of the TATA box in sea urchin histone genes. This element has been found to be essential to start transcription at the appropriate site (Maxson et al. 1983a; Busslinger et al. 1980). A similar esamer sequence, 5-GACTTC-3, is present also 5 to other histone genes. After this element, in the promoter of H2B, there is another sequence that is typical of this gene, 5-CCTAATTTGCATATG-3, similar to the consensus sequence 5-CCTTATTTGCATAAG3 corresponding to the H2B-specific-promoter element

Fig. 2. Nucleotide and putative encoded amino acid sequences of C. variopedatus histone genes in the cluster of clone 8D with adjacent 5 and 3 UTRs. No canonical TATA box is present. The conserved promoter motifs are underlined. The putative CAP sites are underlined and in bold type. Initial and stop codons of each encoded histone protein are in bold type. The conserved 3 UTR palindrome with the adjacent polypurine tract and the polyadenylation signal are underlined. Note that the amino acid sequences of histones H1, H2B, and H4 appear directly encoded while those of histones H3 and H2A appear inverted because their genes are encoded on the complementary DNA strand. The accession number of the cluster sequence is U96764 and AF007904 in GeneBank.

69
Table 1. Transcription termination signals in C. variopedatus histone genes Spacer (bp) 12 12 11 11 11 1112 79 1315 1317 Polyadenylation signal + + + + + +

Genes H1 H2B H3 H4 H2A C. varioped consensus P. dumerilii consensus Sea urchin consensus Trout consensus
a

Palindromea CCAAAGGTCCTTATCAGGACCATCCA CCAAC-GCCCTTATCAGGGCCATCT CAAACGGTCCTTTTTAGGACCAAACA AAAACGGTCCTTATCAGGACCAAACA CCAACGGTCCTTCTCAGGACCACTAT GGCT CCTTTACTTC AGGG A CC TGGCCT A TT TTAAT A GGCCACCA AACGGCC T CT TTTCAGG A GCCACCA A GGCTCTTTTAAGAGCCACCA

Purine-rich motif CCAAGAA CCAAGAA CAAGGAA CAAAGAA CAGAGAA CCAGAGAGAA CAAAAGA CAAGAAAGA
A G G T CAAA A

Bases forming the stems of the stem-loop structures are underlined.

(Harvey et al. 1982). This sequence appears essential to activate transcription. It contains an octameric subsequence, 5-ATTTGCAT-3, typical of other gene promoters transcribed both by RNA polymerase II and III (Sive et al. 1986) that interacts with Oct1 protein (Fletcher et al. 1987). A similar octameric sequence is present also in the promoter/enhancer of immunoglobulin genes and interacts with Oct2 protein (Scheidereit et al. 1987). The core histone genes have putative CAP sites in their 5-UTR (Fig. 2). The individual putative signal sequences have been identified for their homology to the corresponding sequences (Sures et al. 1980) reported for the sea urchin (5-PyPyATTCPu-3) and D. melanogaster (5-NCPuTTPyPu-3). The derived consensus sequence of C. variopedatus is 5-Py/APyATTCPu/ C-3. In the region 3 to the stop codons, each of the genes of clone 8D has the control elements common to all replication-dependent histone genes: the palindrome sequence forming stem-loop and, after 1112 bp, a purinerich region (Hentschel and Birnstiel 1981; Birchmeier et al. 1982) (Table 1). These two elements are involved in the binding of the primary mRNA transcript to the U7 snRNP prior to the final definition of the 3 terminus of the message (Krieg and Melton 1984; Birnstiel et al. 1985). As apparent (Table 1), the stem-loop structures conform to the consensus sequences derived for other organisms. However, the H2B histone gene shows the stem composed of 5 bp, with the unique peculiarity that only one GC pair is present at the base of the stem. Surprisingly, downstream the purine-rich element, all histone genes in the cluster have at least one AATAAA polyadenylation signal typical of the replication-independent histone genes. In the spacer regions between genes there are several simple reiterated units, as initially found in the sea urchin histone gene clusters (Hentschel and Birnstiel 1981). As

an example, there are 17 TA repetitions downstream H3 and in the spacer between H4 and H2A there are three repetitions of the ATGT motif at about 40 bp from the CAGAGAA sequence of H2A and an additional 12 repetitions 28 bp downstream the polyadenylation signal. This spacer sequence has been determined also in clones 6B and 3D, showing identical compositions except for a small difference in the number of repetitions of the conserved motifs. A series of four repetitions forming the sequence 5(TACA)11(TA)14CA(TA)14AC(TGAT)70 is present in the long spacer downstream H2A. The stretches of short repeated sequences might be involved in recombination events. In this case, the small difference in the number of repetitions in the same repeated motif in the different clusters found here in the different clones (8D, 6B, 3D) isolated from the DNA of a single male organism might represent the consequences of recombination events between clusters.

Copy Number Determination DNA sequence analysis confirmed the results of the hybridization experiments indicating that only one copy of each histone gene is present in each cluster. For this reason, both genomic DNA and clone 8D DNA were hydrolyzed with PstI restriction nuclease to obtain the electrophoretic DNA band with the fragments containing the clusters. The number of copies of the cluster in the haploid genome of C. variopedatus was then determined by comparing the hybridization signal of the genomic DNA band with that of clone 8D used as homologous reference value (Fig. 3). Three different gene probes, corresponding to genes of different conservation, were used. The first probe was a PstI-DraII fragment (0.7 kbp), containing the H1 gene. The second probe was a clone corresponding to the EcoRI-RsaI fragment (350

70

Fig. 3. Copy number of the histone genes in the genome of C. variopedatus. Plasmid 8D DNA containing the histone gene cluster and genomic DNA were hydrolyzed with PstI and fractionated on agarose slab gel. Lanes 16, decreasing amounts of genomic DNA (10 g, 5 g, 2.5 g, 1.25 g, 0.6 g, and 0.3 g, respectively). Lanes 813, decreasing amounts of the insert DNA present in plasmid 8D (5 ng, 2.5 ng, 1.25 ng, 500 pg, 250 pg, and 125 pg, respectively). Lane 7, undigested genomic DNA (4 g).

bp) containing the H4 gene prepared by PCR. The third probe was a part of the coding region of the H2B gene, obtained by PCR amplification from position 1260 to 1910 (Fig. 2B) and by digesting this product with HpaII enzyme that generates two fragments: one of 234 bp, corresponding to most of the coding region (from position 1283 to 1517) that was used as a probe, and another of 358 bp corresponding to the remaining part of the coding region with the adjacent downstream spacer. Determination of the copy number using these three probes was considered useful to obtain an indication of the homogeneity of the composition of the clusters concerning the least conserved H1 gene, the more conserved core histone gene H2B, and the highly conserved gene H4. The quantitative results obtained by PhosphorImager analysis of the radioactivity bound to the genomic DNA bands, after hybridization with the different probes, were consistent with each other, giving the same average value of 390 copies of histone gene clusters/sperm cell. This value was calculated assuming that the haploid genome of C. variopedatus contains 1.45 pg DNA, as reported for the haploid DNA amount of the polychaete annelid worm P. dumerilii (Sellos et al. 1990).

Discussion The results reported here on the histone genes and their organization in the genome of C. variopedatus show the presence of repeated clusters each containing one copy of the five histone genes. This is different from the other known situations of histone genes in annelids. In fact, in P. dumerilii (Sellos et al. 1990), a worm belonging to the same polychaete class of C. variopedatus, histone H1, is not present in the repeated clusters of core histone genes and has not yet been found elsewhere. Histone H1 is missing also in the cluster of histone genes of the worm U. caupo (Ingham and Davis 1988) belonging to the minor phylum of echiuroids closely related to Annelida. Other H1 histone genes, in addition to those present in

the repeated clusters, are very likely to be present in the genome of C. variopedatus. In fact, a different H1 histone gene must be coded in another position not yet identified because a H1 protein has been isolated from the sperm cells of this organism, showing a spermspecific amino acid composition with K/R ratio of about 2 (De Petrocellis et al. 1983), while the H1 encoded in the clusters identified here shows amino acid composition typical of somatic molecules with K/R ratio of about 20. Determinations of the histone gene copy number with three different probes of different sequence conservation have shown the presence of about 390 copies of the cluster per haploid genome in C. variopedatus (Fig. 3). This value is similar to that found in P. dumerilii (Sellos et al. 1990) while in U. caupo the cluster copy number is about 100 (Ingham and Davis 1988). The high number of histone gene clusters is a characteristic that annelids share with echinoderms and some amphibians (Table 2). The possibility that the high reiteration of the histone genes to be activated during the initial cleavage stages might be a mechanism to provide the high levels of histone production necessary to assemble the rapidly increasing amount of chromatin finds support in the present results concerning an organism with a high rate of cell division in the initial embryonic stages. The histone gene clusters are different in C. variopedatus and in the other annelid not only in the presence of the histone H1 gene but also in the relative positions of the individual genes and their transcription polarity. In fact, H3 and H4 histone genes are contiguous and present between H2B and H2A in C. variopedatus, while in P. dumerilii H3 and H4 histone genes are at the extremes of the cluster, separated by H2B and H2A (Table 2). In the echiuroid U. caupo, a gene asset identical to that of P. dumerilii is present if the direction of the gene cluster is inverted in the 5-3 direction. In the clusters of the annelids and in U. caupo, H3 and H4 genes, like H2A and H2B genes, are transcribed from different DNA strands but their relative directions of transcription are different. In C. variopedatus transcription polarities are divergent for H3 and H4 genes and convergent for H2A and H2B, just the opposite of what occurs in P. dumerilii and in U. caupo (Table 2). The differences in the structure and organization of the clusters of histone genes in other phyla are not so large even when comparison is made between different classes. As an example, in Echinodermata, the gene cluster is complete in the class Echinoidea (sea urchins), where the H1 histone gene is associated with the core histone genes, and is incomplete in the class Asteroidea (sea stars), where the H1 histone gene is not present in the cluster; but, in both classes, the genes are transcribed from only one DNA strand showing the same transcription polarity (Table 2). The encoded amino acid sequences of the core histone genes, when analyzed for phylogenetic relations, indicate

71
Table 2. Organization and transcription polarity of C. variopedatus histone gene clusters Insert length (kbp) 3.8 4 5.6 6.5 6 7.3 5 5 10.2 9.0 8.5 16 10.5 14 8.5

Group Coral Nematodes Sea urchins Sea stars Annelida

Species A. formosa C. elegans S. purpuratusa P. ochraceusb P. dumerilii C. variopedatus U. caupo D. melanogaster S. gardnerii N. viridiscens X. borealis X. tropicalis X. laevis

Reiteration 150 300 High copy No. 660 800 100 100 145 600800 60 30

Order of genes 3H2BH2 AH 4 H H3H4H2BH2 A H4H3H2AH2B H 1H 4 H2 B H3 H2 A H 2A H4 H3 H2 B H4H2 BH2 AH3 H1H2 BH3 H4 H2 A H3H2 AH2 BH4 H3H4H2 AH 2BH1 H4H2 BH1H2AH3 H1H3H2BH2AH4 H4H2AH2BH1H3 H1H2BH2AH1H4H3 H1H3H4H2AH2B H4H3H2AH2B H4H2AH2 BH1H3 (+ 7 other arrangements)

Fruit fly Fish Amphibia

a b

Other sea urchins show the same gene organization. Other sea stars show the same gene organization.

that the proteins of the two annelids are distinctive and close to each other. It has not been possible to extend this comparative analysis to the H1 histone gene because that sequenced here in C. variopedatus is the first somatic in the phylum Annelida. The comparison of deduced amino acid sequences of C. variopedatus histones with those of the other organisms reported in the dedicated histone gene data bank of Baxevanis and Landsman has been made using H2A and H2B histone genes because they are distinctive in the different organisms. The results common to the analyses performed individually on H2A and H2B genes can be presented in a simplified unrooted phylogenetic tree that clearly shows that the sequences have the expected phyletic relations (Fig. 4). In this tree, worms are grouped together in close relation with insects. In detail, C. variopedatus and P. dumerilii are closest to each other (polycheates), forming a group with U. caupo (echiuroid) and Sipunculus nudus (sipunculoid) (Kmiecik et al. 1983, 1991), both belonging to minor phyla closely related to annelids. C. elegans (nematode) is positioned before the bifurcation of echiuroids, sipunculoids, and polychaetes and also before the bifurcation of Insects while echinoderms are on another branch. This structure is in line with the generally accepted phylogenetic relationships. It is apparent from the results reported here on the histone gene clusters of C. variopedatus that the relative positions of the individual genes and their transcription polarity are not very useful or indicative of phylogenetic correlations. The apparent mobility of the histone genes might depend, among other causes, on the short repeated sequences present in the spacer regions observed between genes that might facilitate, as outlined in the preceding section, DNA mobility in the genomes, possibly

Fig. 4. Simplified unrooted phylogenetic tree based on H2A and H2B histones. Branch lengths are not proportional to phylogenetic distances. Data from the histone gene bank of Baxevanis and Landsman, access: http://www.ncbi.nln.nih.gov/Baxevani/HISTONES/.

as a result of functional needs specific of the particular organisms. An additional point of interest is provided by the presence in the 3UTR of all genes in the cluster of C. variopedatus of two different 3 termination signals for the mRNAs (Table 1). This is a peculiar asset so far unique for histone genes in a cluster. In fact, a double 3 termination signal has been recently reported for the H2B, H3, and H4 genes in D. melanogaster (Akhmanova et al. 1997) and only for two other individual genes, a murine H1 histone gene (Cheng et al. 1989) and a human (Mannironi et al. 1989) and a murine H2A.X gene (Nagata et

72

al. 1991). The meaning of the presence of the 3 double termination signals on all histone genes of C. variopedatus is not clear. A possible speculation is that they might have been initially present in the 3 region of all histone genes and eventually one signal was eliminated for a more efficient and specific transcription control. This hypothesis, however, requires knowledge of histone gene structures in other low forms of eukaryotes. In any case, the peculiar characteristics of C. variopedatus histone gene clusters add to the definition of phylogenetic relationships and have shown the quite different relevance, for phylogenetic analysis, of gene composition and of gene organization.
Acknowledgments. This work has been supported in part by CNR contract n 95.02892.CT14 and by Italian M.U.R.S.T. 40% funds, project Liveproteins.

References
Akhmanova A, Miedema K, Kremer H, Hennig W (1997) Two types of polyadenylated mRNAs are synthesized from Drosophila replication-dependent histone genes. Eur J Biochem 244:294300 Albig W, Kardalinou E, Drabent B, Zimmer A, Doenecke D (1991) Isolation and characterization of two human H1 histone genes within clusters of core histone genes. Genomics 10:940948 Angerer L, DeLeon D, Cox K, Maxson R, Kedes L, Kaumeyer J, Weinberg E, Angerer R (1985) Simultaneous expression of early and late histone messenger RNAs in individual cells during development of the sea urchin embryo. Dev Biol 112:157166 Birchmeier C, Grosschedl R, Birnstiel L (1982) Generation of authentic 3 termini of a H2A mRNA in vivo is dependent on a short inverted DNA repeat and on spacer sequences. Cell 28:739745 Birnstiel ML, Busslinger M, Strub K (1985) Transcription termination and 3 processing: the end is in site! Cell 41:349359 Blin N, Stafford DW (1976) Isolation of high-molecular-weight DNA. Nucleic Acids Res 3:2303 Brandt WF, Strickland WN, Strickland M, Carlisle L, Woods D, Von Holt C (1979) A histone programme during the life cycle of the sea urchin. Eur J Biochem 94:110 Busslinger M, Portmann R, Irminger JC, Birnstiel ML (1980) Ubiquitous and gene-specific regulatory 5 sequences in a sea urchin histone DNA clone coding for histone protein variants. Nucleic Acids Res 8:957977 Cheng G, Nandi A, Clerk S, Skoltchi AI (1989) Different 3-end processing produces two independently regulated mRNAs from a single H1 histone gene. Proc Natl Acad Sci USA 86:70027006 Connor W, Mezquita J, Winkfein RJ, States JC, Dixon GH (1984) Organization of the histone genes in the rainbow trout (Salmo gairdnerii). J Mol Evol 20:227235 Cool D, Banfield D, Honda BM, Smith MJ (1988) Histone genes in three sea star species: cluster arrangement, transcriptional polarity and analyses of the flanking regions of H3 and H4 genes. J Mol Evol 27:3644 Davis FC, Shelton JC, Ingham LD (1992) Nucleotide sequence of the Urechis caupo core histone gene tandem repeat. DNA Seq 2:247 256 De Petrocellis B, Parente A, Tomei L, Geraci G (1983) A H1 histone and a protamine molecule organize the sperm chromatin of the marine worm C. variopedatus. Cell Differ 12:129135 Engel JD, Dodgson JB (1981) Histone genes are clustered but not tandemly repeated in the chicken genome. Proc Natl Acad Sci USA 78:28562860

Felsenstein J (1982) Numerical methods for inferring evolutionary tree. Q Rev Biol 57:379404 Felsenstein J (1988) Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet 22:521565 Fletcher C, Heintz N, Roeder RG (1987) Purification and characterization of OTF-1, a transcription factor regulating cell cycle expression of a human histone H2B gene. Cell 51:773781 Harvey RP, Robins AJ, Wells JR (1982) Independently evolving chicken histone H2B genes: identification of a ubiquitous H2Bspecific 5 element. Nucleic Acids Res 10:78517863 Hentschel CC, Birnstiel ML (1981) The organization and expression of histone gene families. Cell 25:301313 Howell AM, Cool D, Hewitt J, Ydenberg B, Smith MJ, Honda BM (1987) Organization and unusual expression of histone genes in the sea star Pisaster ochraceus. J Mol Evol 25:2936 Ingham LD, Davis FC (1988) Cloning and characterization of a core histone gene tandem repeat in Urechis caupo. Mol Cell Biol 8: 44254432 Kmiecik D, Couppez M, Belaiche D, Sautiere P (1983) Primary structure of histone H2A from nucleated erythrocyte of the marine worm Sipunculus nudus. Presence of two forms of H2A in the sipunculid chromatin. Eur J Biochem 135:113121 Kmiecik D, Belaiche D, Sautiere P, Loucheux-Lefebvre MH, Kerckaert JP (1991) Complete sequence of Sipunculus nudus erythrocyte histone H2B and its gene. Identification of an N,N-dimethylproline residue at the amino-terminus. Eur J Biochem 198:275283 Krieg PA, Melton DA (1984) Formation of the 3 end of histone mRNA by post-transcriptional processing. Nature 308:203206 Lifton RP, Goldberg ML, Karp RW, Hogness DS (1977) The organization of the histone genes in Drosophila melanogaster: functions and evolutionary implication. Cold Spring Harb Symp Quant Biol 42:10471051 Maniatis T, Fritsch EF, Sambrook J (1982) Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory, New York Mannironi C, Bonner WM, Hatch CL (1989) H2A.X a histone isoprotein with a conserved C-terminal sequence, is encoded by a novel mRNA with both DNA replication type and polyA 3 processing signals. Nucleic Acids Res 17:91139126 Maxson R, Cohn R, Kedes L, Mohun T (1983a) Expression and organization of histone genes. Annu Rev Genet 17:239277 Maxson R, Mohun T, Gormezano G, Childs G, Kedes L (1983b) Distinct organizations and patterns of expression of early and late histone gene sets in the sea urchin. Nature 301:120125 Miller DJ, Harrison PL, Mahony TJ, McMillan JP, Miles A, Odorico DM, ten Lohuis MR (1993) Nucleotide sequence of the histone gene cluster in the Coral Acropora formosa (Cnidaria; Scleractinia): features of histone gene structure and organization are common to diploblastic and triploblastic metazoans. J Mol Evol 37: 245253 Nagata T, Kato T, Morita T, Nozaki M, Kubota H, Yagi H, Matsushiro A (1991) Polyadenylated and 3 processed mRNAs are transcribed from the mouse histone H2A.X gene. Nucleic Acids Res 19:2441 2447 Osley MA (1991) The regulation of histone synthesis in the cell cycle. Annu Rev Biochem 60:827861 Roberts SB, Sanicola M, Emmons SW, Childs G (1987) Molecular characterization of the histone gene family of Caenorhabditis elegans. J Mol Biol 196:2738 Ruberti I, Fragapane P, Pierandrei-Amaldi P, Beccari E, Amaldi F, Bozzoni I (1982) Characterization of histone genes isolated from Xenopus laevis and Xenopus tropicalis genomic libraries. Nucleic Acids Res 10:15441550 Saitou N, Nei M (1987) The neighbor-joining methods: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406425 Scheidereit C, Heguy A, Roeder RG (1987) Identification and purification of a human lymphoid-specific octamer-binding protein (OTF-2) that activates transcription of an immunoglobulin promoter in vitro. Cell 51:783793

73
Schumperli D (1986) Cell-cycle regulation of histone gene expression. Cell 45:471472 Sellos D, Krawetz SA, Dixon GH (1990) Organization and complete nucleotide sequence of the core-histone-gene cluster of the annelid Platynereis dumerilii. Eur J Biochem 190:2129 Sittman DB, Chiu IM, Pan CJ, Cohn RH, Kedes LH, Marzluff WF (1981) Isolation of two clusters of mouse histone genes. Proc Natl Acad Sci USA 78:40784082 Sive HL, Heintz N, Roeder RG (1986) Multiple sequences elements are required for maximal in vitro transcription of a human histone H2B gene. Mol Cell Biol 6:33293340 Stephenson EC, Erba HP, Gall JG (1981) Characterization of a cloned histone gene cluster of the newt Notophthalmus viridiscens. Nucleic Acids Res 9:22822295 Sures I, Levy S, Kedes LH (1980) Leader sequences of Strongylocentrotus purpuratus histone mRNAs start at a unique heptanucleotide common to all five histone genes. Proc Natl Acad Sci USA 77: 12651269 Thompson JD, Higgins DG, Gibson TJ (1994) Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:46734680 Zernik J (1980) Xenopus laevis histone gene: variant H1 genes are present in different clusters. Cell 22:807815