You are on page 1of 5

56 Research Update TRENDS in Genetics Vol.17 No.

2 February 2001

8 Niswender, C.M. et al. (1999) RNA editing of the 19 Melcher, T. et al. (1996) RED2, a brain-specific
Note added in proof
human serotonin 5-hydroxytryptamine 2C member of the RNA-specific adenosine deaminase
The gene encoding ADAR1 has recently been
receptor silences constitutive activity. J. Biol. family. J. Biol. Chem. 271, 31795–31798
shown to be an essential gene in mice27.
Chem. 274, 9472–9478 20 Bernard, A. et al. (1999) Q/R editing of the rat
Surprisingly, these studies also revealed that
9 Rueter, S.M. et al. (1999) Regulation of alternative GluR5 and GluR6 kainate receptors in vivo and in
ADAR1 was haploinsufficient. Mice with one splicing by RNA editing. Nature 399, 75–80 vitro: evidence for independent developmental,
dose of ADAR1 died as embryos with no 10 Palladino, M.J. et al. (2000) dADAR, a Drosophila pathological and cellular regulation. Eur. J.
obvious abnormalities. However, further double-stranded RNA-specific adenosine Neurosci. 11, 604–616
analysis suggests that ADAR1 is involved in deaminase is highly developmentally regulated and 21 Paupard, M.C. et al. (2000) Patterns of
embryonic erythropoiesis. Although the is itself a target for RNA editing. RNA 6, 1004–1018 developmental expression of the RNA editing
targets of ADAR1 in embryos are unknown, 11 Nishikura, K. et al. (1991) Substrate specificity of enzyme rADAR2. Neuroscience 95, 869–879
ADAR1 haploinsufficient teratomas the dsRNA unwinding/modifying activity. 22 Brusa, R. et al. (1995) Early-onset epilepsy and
composed mostly of nervous tissue appear to EMBO J. 10, 3523–3532 postnatal lethality associated with an editing-
edit the known adult targets of ADAR1 at a 12 Polson, A.G. and Bass, B.L. (1994) Preferential deficient GluR-B allele in mice. Science
reduced level. These data stand in stark selection of adenosines for modification by double- 270, 1677–1680
contrast to the roles of ADAR2 in mammals stranded RNA adenosine deaminase. EMBO J. 13, 23 Higuchi, M. et al. (2000) Point mutation in an
and dADAR in Drosophila. 5701–5711 AMPA receptor gene rescues lethality in mice
13 Lehmann, K.A. and Bass, B.L. (1999) The deficient in the RNA-editing enzyme ADAR2.
importance of internal loops within RNA Nature 406, 78–81
References
substrates of ADAR1. J. Mol. Biol. 291, 1–13 24 Palladino, M.J. et al. (2000) Afi I pre-mRNA editing
1 Basilio, C. et al. (1962) Synthetic polynucleotides
14 Higuchi, M. et al. (1993) RNA editing of AMPA in Drosophila is primarily involved in adult nervous
and the amino acid code, V. Proc. Natl. Acad. Sci.
receptor subunit GluR-B: a base-paired system function and integrity. Cell 102, 437–439
U. S. A. 48, 613–616
intron–exon structure determines position and 25 Aruscavage, P.J. and Bass, B.L. (2000) A
2 Rueter, S.M. and Emeson, R.B. (1998)
efficiency. Cell 75, 1361–1370 phylogenetic analysis reveals an unusual
Adenosinefi inosine conversion in mRNA. In
15 Herb, A. et al. (1996) Q/R site editing in kainate sequence conservation within introns involved in
Modification and Editing of RNA (Grosjean, H.
receptor GluR5 and GluR6 pre-mRNAs requires RNA editing. RNA 6, 257–269
and Benne, R., eds), pp. 343–361, ASM Press
distant intronic sequences. Proc. Natl. Acad. Sci. 26 Kung, S. et al. (1996) Characterization of two fish
3 Simpson, L. (1999) RNA editing – an evolutionary
U. S. A. 93, 1875–1880 glutamate receptor cDNA molecules: absence of RNA
perspective. In The RNA World (2nd edn) (Gesteland,
16 Reenan, R.A. et al. (2000) The mlenapts RNA editing at the Q/R site. Mol. Brain Res. 35, 119–130
R.F. et al., eds), pp. 585–608, CSHL Press
helicase mutation in Drosophila results in a splicing 27 Wang, Q. et al. (2000) Requirement of the RNA
4 Paul, M.S. and Bass, B.L. (1998) Inosine exists in
catastrophe of the para Na+ channel transcript in a editing deaminase ADAR1 gene for embryonic
mRNA at tissue-specific levels and is most
region of RNA editing. Neuron 25, 139–149 erythropoiesis. Science 290, 1765–1768
abundant in brain mRNA. EMBO J. 17, 1120–1127
5 Sommer, B. et al. (1991) RNA editing in brain 17 Bernard, A. and Khrestchatisky, M. (1994)
controls a determinant of ion flow in glutamate- Assessing the extent of RNA editing in the TMII
gated channels. Cell 67, 11–19 regions of GluR5 and GluR6 kainate receptors R.A. Reenan
6 Lomeli, H. et al. (1994) Control of kinetic during rat brain development. J. Neurochem. Dept of Genetics and Developmental Biology,
properties of AMPA receptor channels by nuclear 62, 2057–2060 University of Connecticut Health Center, 263
RNA editing. Science 266, 1709–1713 18 Hanrahan, C.J. et al. (2000) RNA editing of the
Farmington Avenue, Farmington, CT 06030,
7 Burns, C.M. et al. (1997) Regulation of serotonin- Drosophila para Na+ channel transcript:
evolutionary conservation and developmental USA.
2C receptor G-protein coupling by RNA editing.
Nature 387, 303–308 regulation. Genetics 155, 1149–1160 e-mail: rreenan@neuron.uchc.edu

Identification and analysis of eukaryotic promoters:


recent computational approaches
Uwe Ohler and Heinrich Niemann
The DNA sequence of several higher existence of a complex eukaryotic regions to identify the regulatory elements
eukaryotes is now complete, and we know organism is one of the great challenges. in them. These analyses are the first step
the expression patterns of thousands of The quantity of information gained in the towards complex models of regulatory
genes under a variety of conditions. This sequencing and gene expression projects networks. We focus on the computational
gives us the opportunity to identify and both requires and enables us to use point of view and leave a more elaborate
analyze the parts of a genome believed to computers to solve this problem. description, especially of the underlying
be responsible for most transcription Promoter sequences are crucial in gene biology, to the cited papers and reviews.
control – the promoters. This article gives a regulation. For the purposes of this paper,
short overview of the state-of-the-art we define a promoter as the region Analyzing promoters to find unknown
techniques for computational promoter proximal to the transcription-start site regulatory elements
localization and analysis, and comments on (TSS) of genes transcribed by RNA The interest in promoter analysis received a
the most recent advances in the field. polymerase II; we exclude distal regions great boost with the arrival of microarray
such as enhancers. Here we outline the gene-expression data. Once you have a group
Understanding gene regulation is one of recent developments in two areas of of genes with a similar expression profile (e.g.
the most exciting topics in molecular bioinformatics that deal with promoters: those that are activated at the same time in
genetics. To learn how the interplay the general recognition of eukaryotic the cell cycle1), a natural assumption is that
among thousands of genes leads to the promoters, and the analysis of these this profile is, at least partly, caused by and

http://tig.trends.com 0168-9525/01/$ – see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(00)02174-0
Research Update TRENDS in Genetics Vol.17 No.2 February 2001 57

reflected in a similar structure of the regions


involved in transcription regulation. The A C G T
ultimate goal is the automated construction Weight matrix 1 -2.571 1.967 -2.584 -2.523
of specific promoter models containing a (log-odds) 2 1.643 -2.585 -2.577 -2.583
combination of several regulatory elements. 3 -2.580 1.970 -2.582 -2.583
4 -2.581 -2.583 1.927 -2.546
Research has so far focused on the 5 -2.583 -2.583 -2.584 1.715
detection of single motifs (representing 6 -2.583 -2.526 1.926 -2.584
transcription-factor binding sites) 7 -2.578 0.735 1.235 -2.583
common to the promoter sequences of 8 -0.390 -2.582 1.620 -2.584
Alignment
9 0.438 -2.576 0.696 -0.348
putatively co-regulated genes. Although method
10 -0.410 1.276 -2.571 -0.335
this problem might seem simple at first, it 11 -2.583 -2.574 0.702 1.023
is very complex: 12 1.340 -2.582 -0.158 -2.583
• The motif is of unknown size. Extraction of
• The motif might not be well conserved Group of regulatory ...CACTCACACGTGGGACTAGCAC...
genes with regions ...CGTCGGGCCACGTGCTCACTTG...
between promoters. ...TTCACACGTGGGTTTAAAAAGGCA...
similar
• The sequences used to search for the expression ...TGGCACGTGCAATGAAC...
motif does not necessarily represent profiles ...TTTCCAGCACGTGGGGCGGAAATT...
the complete promoter. cgcacg....
• The genes with promoters to be analyzed .gcacgt...
are in many cases grouped together by a ..cacgtg..
Cluster 1 ...acgtgc.
clustering algorithm. As this algorithm Enumerative
....cgtgcg
might be error-prone, the genes are not method
cgcacgtgcg
necessarily all co-regulated in vivo.
Therefore, studies have mainly aaacgt...
.aacgtg..
concentrated on the rather ‘simple’ genome Cluster 2 ..acgtgc.
of the budding yeast Saccharomyces ...cgtgcg
cerevisiae. This was the first fully sequenced aaacgtgcg
eukaryotic organism and the first one for
cccacg....
which a comprehensive amount of Clusters of .ccacgt...
expression data became publicly available. over-represented ..cacgtg..
Cluster 3
Statistics on the mapped TSSs2 show that its oligomers ...acgtgc.
5¢ untranslated region (UTR) sequences are ....cgtgcg
cccacgtgcg
rather short (a mean of 89 bp), and most of
the known regulatory elements are close to TRENDS in Genetics
the translated part of the genes, the majority
being found 10–700 bp upstream from the Fig. 1. A flowchart to illustrate the two different approaches for motif identification. We analyzed 800 bp upstream
from the translation start sites of the five genes from the yeast gene family PHO by the publicly available systems
translation start codon. This means that, for
MEME (alignment) and RSA (exhaustive search, see Table 1). MEME was run on both strands, one occurrence per
yeast, the region upstream of the start codon sequence mode, and found the known motif ranked as second best. RSA Tools was run with oligo size 6 and
can be used as a good approximation of a noncoding regions as background, as set by the demo mode of the system. The well-conserved heptamer of the
promoter region. Most algorithms searching motifs used by MEME to build the weight matrix is printed in bold.

for conserved patterns in yeast promoters


thus take 500–1000 bp upstream of the start start positions of the motifs in the alignment approaches deliver a model of the
codons of supposedly co-regulated genes as sequences to be unknown and perform a motifs (usually a weight matrix) built from
the dataset. There are two fundamentally local optimization to determine which the alignment, whereas the enumerative
different approaches to tackle the problem: positions deliver the most conserved motif. methods give a list of over-represented
‘alignment’ methods and ‘enumerative’ or Two important methods of this type are oligomers, possibly already grouped to form
‘exhaustive’ methods. Gibbs sampling4 and expectation consensus sequences (Fig. 1).
Alignment methods aim to identify maximization in the MEME system5.
unknown signals by a significant local Enumerative or exhaustive methods Differences in motif identification
multiple alignment of all sequences. examine all oligomers of a certain length approaches
Direct multiple alignment is and report those that occur far more often One important difference between the
computationally very demanding, so the than expected from the overall promoter approaches concerns the background
methods use various other strategies. For sequence composition6,7. This approach model. For example, a simple background
example, the CONSENSUS algorithm has gained in popularity since the arrival model accounts for a different overall GC
approximates a multiple alignment by of complete genomes and is trickier than content. Without such a model, you would
aligning sequences one by one3 and one might believe – for example, how can probably find the obvious, e.g. mainly GC-
optimizing the information content of the patterns that overlap be counted? rich motifs in organisms whose promoters
weight matrix constructed from the From a practical point of view, the most have a high GC content. A more
alignment. Other algorithms use a obvious difference between those methods is sophisticated model is constructed from
statistical approach; they consider the the presentation of the results: the the set of all promoters and takes their

http://tig.trends.com
58 Research Update TRENDS in Genetics Vol.17 No.2 February 2001

specific sequence composition into New directions analyzed the whole set of yeast regulatory
account. Such a model avoids finding Some limitations of the enumerative sequences without using any information
motifs that are common to all promoters, methods have been eliminated by a number on gene-expression levels11, constructing a
such as TATA boxes. But this also means of recent publications. The motif- dictionary of oligomers of increasing length
that a specific model has to be constructed identification problem was one of the most and using the previous dictionary of shorter
for each organism, at least, and this prominent topics at the recent conference oligomers as background. A new way to
information is not always available. on intelligent systems for molecular biology look at the data is to cluster genes on the
Enumerative methods have to use such an ISMB 2000 (http://ismb2000.sdsc.edu). It is basis of both expression levels and common
elaborate background model to judge the now possible to detect homo- or motifs at the same time12. This can help to
importance of frequent patterns. By heterodimer motifs separated by a fixed9 separate gene groups that are active under
contrast, alignment methods usually or variable spacer length10, or motifs with the same conditions but belong to separate
incorporate only the GC content, which a variable length11. To allow for regulatory pathways.
makes them more prone to failure if the mismatches, ambiguous nucleotide letters An alternative approach is to identify
motif is not very well conserved among the (such as R for the purines) are added to the elements by analyzing promoters of the
sequences or the sequences to be nucleotide alphabet. same gene from approximately ten
examined become too large8. Thus it seems as if the enumerative different related species, rather than
Most of the enumerative algorithms approach is the method of choice: it different promoters from the same
need to have the size of the motif specified exhaustively searches all possible species13. For this, an optimal alignment
in advance. Because of the fixed size, they oligomers and provides more significant of a small region of specified size is
often deliver a number of similar motifs results because of the background constructed that takes the phylogenetic
simply shifted by one base or including modeling. In practice, though, alignment distance into account.
mismatches. Some methods provide methods are more flexible. Because of the The question remains as to how we can
automatic grouping of the resultant motifs simple background, they are not restricted use all these methods when we move on to
into consensus strings and thus present to one specific organism. They can also the analysis of higher eukaryotes with
the results as a small number of putative find long motifs, the detection of which is their complex genomes. The euchromatin
regulatory elements that can be examined simply not feasible by an exhaustive of Drosphila melanogaster has a gene
more easily by experts. A potential approach. Furthermore, they deliver a density of roughly one gene every 9 kb and
problem here is that parts of the consensus weight matrix as a comprehensive model an average predicted transcript size of
might come from different sequences. for a motif that can be more flexible than a 3058 bp (Ref. 14), leaving a huge portion of
The alignment approach requires consensus sequence for searching the genome that might contain regulatory
different statistics depending on how often a purposes. We therefore propose a two-step elements. In this case, the alignment of
pattern should be present in the sequences. approach: first apply an enumerative noncoding sequences from two related
For instance, MEME can be run in three approach, and then use the results to species, also known as phylogenetic
modes assuming that a motif occurs exactly initialize a weight matrix for an alignment footprinting, can help to narrow the
once, at most once, or an arbitrary number method. Unfortunately, no such combined search region and reveal conserved, and
of times per promoter sequence. The Gibbs approach has been published yet, but the potentially regulatory, regions15,16. A
sampler implementation (Table 1) also Gibbs sampler (Table 1), for example, lets recent publication closes the gap between
allows for zero or multiple ocurrences. In you specify a weight matrix to start with. this approach and motif identification: 28
principle, alignment methods yield one Of course, this works only if both methods orthologous co-regulated gene pairs from
pattern per run, but they can be run several are available, which so far is the case only human and rat were automatically
times to detect more than one motif, for yeast and some microbial organisms. aligned to identify conserved, ungapped
masking out previously found sites. Gibbs The described methods are often applied sequence blocks, and the subsequent
sampling is a nondeterministic approach, on a set of promoters that were first analysis of the conserved parts with a
meaning that, even without masking out grouped together using gene expression Gibbs sampling approach revealed motifs
sites, it might deliver different motifs. measurements. Bussemaker et al. recently that were missed otherwise17. The main

Table 1. A selection of recently published promoter finding and analysis tools accessible on the World-Wide Web
Program Description URL

General promoter finding


Promoter2.0 Search-by-signal, artificial neural network www.cbs.dtu.dk/services/Promoter
NNPP Search-by-signal, time delay neural network www.fruitfly.org/seq_tools/promoter.html
PromoterInspector Search-by-content, class-specific oligomers www.gsf.de/biodv
McPromoter V3 Signal/content, stochastic segment model/neural network www.mustererkennung.de/HTML/English/Research/Promoter
CorePromoter Signal/content, discriminant analysis argon.cshl.org
Promoter analysis tools
RSA Tools Yeast and microbial exhaustive search www.ucmb.ulb.ac.be/bioinformatics/rsa-tools
Gibbs sampler Alignment method bayesweb.wadsworth.org/gibbs/gibbs.html
MEME Alignment via Expectation Maximization meme.sdsc.edu
BBA Phylogenetic footprinting by Bayes alignment bayesweb.wadsworth.org/cgi-bin/bayes_align12.pl
PipMaker Phylogenetic footprinting by identity plots bio.cse.psu.edu

http://tig.trends.com
Research Update TRENDS in Genetics Vol.17 No.2 February 2001 59

assumption for phylogenetic approaches is the vast number of false positives: even the content approach and does not deliver a
that the regulatory pathway itself has not best algorithms had one false TSS TSS prediction. The use of CpG islands
diverged, as this would result in different prediction in every 500–1000 bp. features in the latest version of our
motifs with the same function. McPromoter predictor (see Table 1) and
If we do not have information from New features, new algorithms, new hope has also led to the reduction of false
related species, we can concentrate on the As a response to these rather discouraging positives by roughly one third.
analysis of proximal promoter regions results, different approaches for finding Unfortunately, CpG islands only exist in
close to TSSs, but the length of UTRs of promoters have been pursued. One idea is vertebrate organisms.
higher eukaryotes prevents us from to provide an accurate prediction of the Features common to promoters of all
assuming that the TSSs can be found TSS, but only for small regions known to organisms are structural properties of
immediately upstream of the coding part contain a promoter24. An ‘opposite’ DNA, such as bendability or conformation
of a gene. In our recent genome annotation algorithm provides specific predictions of (a compilation was carried out by Liao et
assessment18, we found that the 92 regulatory regions (of a size of roughly al.30). For these properties, scoring tables
Drosophila genes from the set for which 1000 bp) using a search-by-content based on di- or trinucleotides were
full-length cDNA information was approach, but gives no information determined experimentally and can be
available had an average UTR length of regarding whether the affected gene is on used to calculate profiles over the DNA
about 1900 bp (17 transcripts had UTRs the leading or lagging strand, or where sequence. Studies have shown that, in
longer than 1000 bp). This means we have within the region the TSS itself is general, eukaryotic promoters do indeed
to find the start sites first. located25. A fundamentally different have a distinct profile when compared
approach is to construct specific, rather with coding or non-regulatory
Finding the promoters in genomic DNA than general, promoter models for groups sequences28. Whether using these features
For a long time, bioinformaticians have of genes such as muscle-active genes will improve recognition remains to be
tried to come up with algorithms able to known by experiment to contain specific seen. The profiles of individual sequences
identify the promoters in eukaryotes. This combinations of regulatory elements26 can be very noisy and thus not easy to use,
is not easy because promoters are very (reviewed in Ref. 27) – this is where and it is not clear whether they provide
diverse, and even well-known signals such promoter finding and analysis meet. new information not accurately reflected
as the TATA box can be weakly conserved With the genomes of many organisms in the sequence itself.
or missing altogether. Algorithms for now completely sequenced, the interest in The most recently published promoter-
general promoter recognition so far can be general promoter prediction has increased. finding and analysis tools are listed in
classified into two groups: Although the ab initio performance of the Table 1. More links can be found in
• Search-by-signal algorithms make algorithms is not as good as desired, this is comprehensive reviews23,27.
predictions on the basis of the detection not the way the annotation of genomes is Will we be able to find the regulatory
of core promoter elements such as the done. Many algorithms are used together, regions of eukaryotes with high accuracy,
TATA box or the initiator and/or and limiting the analyzed sequence to the and if so, will we be able to derive complex
transcription factor binding sites region upstream from the start of a models for transcription regulation from
outside the core19. predicted gene or a cDNA alignment their sequence? The question is open, but
• Search-by-content algorithms identify reduces the number of false predictions we certainly are on the way to answering it.
regulatory regions on the basis of the immensely. In a recent assessment of both
Acknowledgements
sequence composition of promoter and ab initio and gene-finder-coupled promoter We thank the anonymous referees for many helpful
nonpromoter (typically coding and predictions for Drosophila, the ab initio suggestions. U.O. is a fellow of the Boehringer
intron sequences) examples20. methods had less false positives than Ingelheim Fonds.
There are also methods that combine both before18, and coupling them with a gene
References
ideas – looking for signals and for regions finder proved to be quite successful –
1 Spellman, P. et al. (1998) Comprehensive
of specific composition21,22. although it will be hard to achieve a identification of cell-cycle regulated genes of
For an exact localization, promoter sensitivity of more than 50%. the yeast Saccharomyces cerevisiae by
prediction should also mean identification Promoter features28 that have not microarray hybridization. Mol. Biol. Cell
of TSSs. But search-by-content methods do previously been included in the 9, 3273–3297
2 Zhu, J. and Zhang, M.Q. (1999) SCPD: a promoter
not provide good TSS predictions because algorithms can help us here. For instance,
database of the yeast Saccharomyces cerevisiae.
they do not look for positionally conserved many vertebrate promoter regions Bioinformatics 15, 607–611
signals. To enable the comparison of coincide with CpG islands. These are 3 Hertz, G.Z. and Stormo, G.D. (1999) Identifying
different algorithms, predictions are thus regions where the GC content is high and DNA and protein patterns with statistically
counted as correct if they are made within the CG dinucleotide occurs more significant alignments of multiple sequences.
Bioinformatics 15, 563–577
a window around an experimentally frequently than expected, a consequence
4 Lawrence, C.E. et al. (1993) Detecting subtle
verified start site. Using this scoring, an of the fact that the DNA of many sequence signals: a Gibbs sampling strategy for
evaluation in 1997 found that many promoters is unmethylated so that it is multiple alignment. Science 262, 208–214
algorithms identified ~30–50% of the start accessible to regulatory proteins. A 5 Bailey, T.L. and Elkan, C. (1995) Unsupervised
sites within genomic DNA sequences23. method to discriminate between CpG learning of multiple motifs in biopolymers using
expectation maximization. Machine Learning
The programs were run ab initio; that is, islands in promoters and in other parts of 21, 51–83
without any information other than the the genome has just been published29. 6 van Helden, J. et al. (1998) Extracting regulatory
sequence itself. The problem, though, was This method can be seen as a search-by- sites from the upstream region of yeast by

http://tig.trends.com
60 Research Update TRENDS in Genetics Vol.17 No.2 February 2001

computational analysis of oligonucleotide sequences. Curr. Opin. Struct. Biol. 7, 399–406 of promoter regions in large genomic sequences by
frequencies. J. Mol. Biol. 281, 827–842 16 Hardison, R.C. (2000) Conserved noncoding PromoterInspector: a novel context analysis
7 Brazma, A. et al. (1998) Predicting gene sequences are reliable guides to regulatory approach. J. Mol. Biol. 297, 599–606
regulatory elements in silico on a genomic scale. elements. Trends Genet. 16, 369–372 26 Wasserman, W.W. and Fickett, J.W. (1998)
Genome Res. 8, 1202–1215 17 Wasserman, W.W. et al. (2000) Human–mouse Identification of regulatory regions which confer
8 Pevzner, P. and Sze, S-H. (2000) Combinatorial genome comparisons to locate regulatory sites. muscle-specific gene expression. J. Mol. Biol.
approaches to finding subtle signals in DNA Nat. Genet. 26, 225–228 278, 167–181
sequences. Proc. ISMB 8, 269–278 18 Reese, M.G. et al. (2000) Genome annotation 27 Werner, T. (1999) Models for prediction and
9 van Helden, J. et al. (2000) Discovering assessment in Drosophila melanogaster. Genome recognition of eukaryotic promoters. Mamm.
regulatory elements in non-coding sequences by Res. 10, 483–501 Genome 10, 168–175
analysis of spaced dyads. Nucleic Acids Res. 19 Prestridge, D.S. (1995) Predicting Pol II promoter 28 Pedersen, A.G. et al. (1999) The biology of
28, 1808–1818 sequences using transcription factor binding eukaryotic promoter prediction – a review.
10 Sinha, S. and Tompa, M. (2000) A statistical sites. J. Mol. Biol. 249, 923–932 Comput. Chem. 23, 191–207
method for finding transcription factor binding 20 Hutchinson, G.B. (1996) The prediction of 29 Ioshikhes, I.P. and Zhang, M.Q. (2000) Large-
sites. Proc. ISMB 8, 344–354 vertebrate promoter regions using differential scale human promoter mapping using CpG
11 Bussemaker, H. et al. (2000) Building a dictionary hexamer frequency analysis. Comp. Appl. Biosci. islands. Nat. Genet. 26, 61–63
for genomes: Identification of presumptive 12, 391–398 30 Liao, G-C. et al. (2000) Insertion site preferences
regulatory sites by statistical analysis. Proc. Natl. 21 Solovyev,V. and Salamov, A. (1997) The Gene- of the P transposable element in Drosophila
Acad. Sci. U. S. A. 97, 10096–10100 Finder computer tools for analysis of human and melanogaster. Proc. Natl. Acad. Sci. U. S. A.
12 Holmes, I. and Bruno, W.J. (2000) Finding model organisms genome sequences. Proc. ISMB 97, 3347– 3351
regulatory elements using joint likelihoods for 5, 294–302
sequence and expression profile data. Proc. ISMB 22 Ohler, U. (2000) Promoter prediction on a
8, 202–210 genomic scale – the Adh experience. Genome Res. U. Ohler*
13 Blanchette, M. et al. (2000) An exact algorithm to 10, 539–542 H. Niemann
identify motifs in orthologous sequences from 23 Fickett, J.W. and Hatzigeorgiou, A.G. (1997)
Lehrstuhl für Mustererkennung (Informatik V),
multiple species. Proc. ISMB 8, 37–45 Eukaryotic promoter recognition. Genome Res.
Universität Erlangen-Nürnberg,
14 Adams, M.D. et al. (2000) The genome sequence of 7, 861–878
Drosophila melanogaster. Science 287, 2185–2195 24 Zhang, M.Q. (1998) Identifcation of human gene Martensstr. 3, D-91058 Erlangen, Germany.
15 Duret, L. and Bucher, P. (1997) Searching for core promoters in silico. Genome Res. 8, 319–326 *e-mail:
regulatory elements in human noncoding 25 Scherf, M. et al. (2000) Highly specific localization Uwe.Ohler@informatik.uni-erlangen.de

Plant steroids recognized at the cell surface


Philip W. Becraft
Plants might use a markedly different Studies of labeled-steroid binding, Mutation of the BRI1 locus of
mechanism for steroid signaling than elicitation of responses by membrane- Arabidopsis causes a similar phenotype,
animals. In animals, steroid hormone signals impermeant conjugated steroids and although bri1 mutants are brassinolide
are generally mediated by receptors inside modulation of responses by antibodies insensitive, unlike det2 and cpd, which
the cell. However, a recent report by He et al. indicate that the receptors involved in can be rescued with exogenous
indicates that, in plants, steroids appear to these ‘nontranscriptional’ responses are brassinolide application. This indicates
be perceived at the plasma membrane rather located at the plasma membrane2–6. BRI1 functions in brassinolide
than by intracellular receptors. Steroids are now widely accepted as perception9,10 and isolation of the gene
bona fide plant hormones. They regulate revealed it to encode a receptor kinase10.
Steroid hormones are well-known several processes in plants including cell Expression of a BRI1–GFP fusion protein
regulators of developmental and elongation and photomorphogenesis. Of the regulated by the BRI1 gene promoter
physiological processes in animal systems. steroid compounds known in plants, indicated that BRI1 is ubiquitously
The most familiar receptors for animal brassinolide is the most active growth expressed and localized to the plasma
steroid hormones are nuclear receptors of regulator. The chemical structures of membrane11. The predicted extracellular
the steroid/thyroid receptor superfamily. brassinolide, cholesterol and several animal domain contains leucine-rich repeats
These receptors are transcription factors steroid hormones are shown in Fig. 1. (LRRs) with a novel 70-amino-acid island
that are present in the cytoplasm or the General acceptance of steroids as plant between LRRs 21 and 22, and the
nucleus. Binding of their ligand induces hormones came from genetic studies where cytoplasmic domain consists of a
nuclear translocation and/or activation of mutants deficient in brassinosteroid serine/threonine kinase. Both the
the receptor, leading to the transcriptional biosynthesis or insensitive to exogenous 70-residue island and the kinase domain
regulation of specific target genes. In brassinolide application showed marked are crucial because mutations in these
addition, most steroids elicit responses that developmental defects7–9. The det2 and cpd regions disrupt BRI1 function10–12.
do not involve transcriptional regulation. mutant seedlings both show a light-grown But is BRI1 directly involved in
This type of steroid signaling was first habit (including a shortened hypocotyl and brassinolide perception, or does it function
suggested in a 1941 report showing that expanded leaves) when grown in the dark downstream in the signal transduction
progesterone induced anesthesia within and a dwarf phenotype when grown in the system? A recent report by He et al.13
minutes of administration1 – too rapidly to light, and both were found to encode strongly supports a direct role for BRI1. As
involve a transcriptional mechanism. enzymes for brassinolide biosynthesis7,8. summarized in Fig. 2, a chimeric signal

http://tig.trends.com 0168-9525/01/$ – see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(00)02165-X

You might also like