“Sequence of a DNA molecule can contribute very specifically to its 3D structure, both in vitro and in vivo- particularly striking

in their nonlinear structure, certain sequences can introduce a stereo-specific bend in DNA.”

•The most prominent sequence contributing to these structural anomalies are strings of three, four or more consecutive A or T residues (A-tracts), although other nucleotide segments can also induce structural anomalies. •A-tract bending has been investigated using X-ray crystallography, gel electrophoresis, electric birefringence, circular dichroism and NMR. DNA curvature of non-genomic DNA has also been investigated with electron microscopy and atomic force microscopy (AFM).

">

Moreno-Herrero, F. et al. Nucl. Acids Res. 2006 34:3057-3066; doi:10.1093/nar/gkl397
Copyright restrictions may apply.

">

Moreno-Herrero, F. et al. Nucl. Acids Res. 2006 34:3057-3066; doi:10.1093/nar/gkl397
Copyright restrictions may apply.

A higher overall "bend" density for C. elegans DNA than for other non-nematode genomes with a comparable base composition

When a set of predicted "bent" sequences from the C. elegans genome were inspected in detail, the most prominent common feature found was a propensity for occurrence of AA or TT dinucleotides on one face of the helix over tens of base pairs.

Evaluating periodicity of individual tetranucleotides in the C. elegans genome

“A histogram is set up with zero in each bin from 1 to 256. A four-base-wide window is then "slid" along the genome. At each point, the program examines the sequence for the previous 256 base pairs looking for occurrences of the same tetranucleotide sequence. If an identical sequence is found n base pairs upstream, we add one to bin n in the histogram. For this figure, the "relative coincidence frequency" is a value derived from dividing the raw number of coincidences by the number that would be expected for a randomized genome with identical base composition to that of C. elegans.”

Separation spectrum for AAAA/TTTT tetranucleotides using a dataset that is the complete C. elegans genome

Separation spectrum for AAAA/TTTT tetranucleotides using a dataset that is a version of the C. elegans genome from which annotated coding regions and sequences of =25nt that appear more than once have been removed

Periodicity analysis of C. elegans DNA
•For some tetranucleotides (e.g. TGTG), a 2-base periodicity is present, apparently representing segments of the genome with extended runs of alternating purinepyrimidine. •A 3 base periodicity is evident for a number of the tetranucleotides (e.g. GTAG). This type of periodicity is likely to represent the triplet nature of the genetic code combined with non-random codon choices and distributions of amino acids. •Additional 3n periodicities of 6, 9 etc. (e.g., GATT, TCAG;) presumably reflect (at least in part) coding for protein motifs such as beta sheet that have periodicity. •A number of the distributions show a strong and very discrete signal at a unique periodicity (e.g., GCCG). These longer periodicities represent highly repeated tandem sequences in the genome. •A ~10 base periodicity is present in "words" containing multiple AA/TT dinucleotides (e.g.,AAAA,).

Identification of individual periodic islands in the C. elegans genome

Islands of periodicity in the C. elegans genome.

Rather than underlying the entire genome sequence, the periodic regions appear enriched in a number of "islands" throughout the genome. These islands are for the most part unique in sequence; strikingly, they appear to delineate transcribed regions for a large group of genes that are expressed in the germ cell lineage of C. elegans.

periodic An/Tn clusters in the C. elegans (PATC) PATC motifs show a striking global pattern within the genome with enrichment on the terminal ~1/3 of each autosome and on the extreme left tip of the X chromosome. C. elegans autosomes are known to have distinct central and peripheral characteristics, with genes more densely packed in the center and more frequent recombination and occurrence of certain transposons in peripheral regions.

periodicity of unique chromosomal regions is most evident in intron sequences, showing a relatively constant profile over the length of individual genes.

Several observations are consistent with the suggestion that PATC clusters reflect nucleosomal positioning in C. elegans. First, nucleosomes association seems to be the defaultstate for the bulk of eukaryotic nuclear DNA (including that of C. elegans) [Kornberg and Lorch, 1999; Dixon et al., 1990]. Second, nucleosome modification complexes are critical in setting up appropriate patterns of gene expression in the germline [Dixon et al., 1990; Shin and Mello, 2003; Ahringer, 2000]. Third, the minor peaks in the Fourier plot in Figure S4E and the observation in Figure S3 that AAAA->TTTT separations have a distinct profile from TTTT->AAAA separations is consistent with previous reports of non-symmetric dinucleotide preferences and nonuniform helical repeats at different positions within a nucleosome.

Functional correlates of strong periodic character in the C. elegans genome: The strong bias for abundant periodicity in a discrete subset of genes suggests a functional commonality amongst these genes. Manual annotation of gene lists and objective comparisons of genomic data both indicate a statistically significant association between gene activity in the hermaphrodite germline and high sequence periodicity. Nonetheless, it should be stressed that this correlation could be secondary and that the functional link could be a complex one, for example involving another cell type with similar expression profiles to the hermaphrodite germline or some metabolic process that is simply more effective in these cells.