Professional Documents
Culture Documents
Jrme Reboul1,11*, Philippe Vaglio1*, Jean-Franois Rual1,2*, Philippe Lamesch1,2*, Monica Martinez1,
Christopher M. Armstrong1, Siming Li1, Laurent Jacotot1, Nicolas Bertin1, Rekins Janky1, Troy Moore3,11, James
R. Hudson Jr.3,11, James L. Hartley4,11, Michael A. Brasch4,11, Jean Vandenhaute2, Simon Boulton1,11, Gregory A.
Endress5, Sarah Jenna6, Eric Chevet6, Vasilis Papasotiropoulos7, Peter P. Tolias7, Jason Ptacek8, Mike Snyder8,
Raymond Huang9, Mark R. Chance9, Hongmei Lee10, Lynn Doucette-Stamm10,11, David E. Hill1 & Marc Vidal1
*These authors contributed equally to this work
To verify the genome annotation and to create a resource to functionally characterize the proteome, we attempted
to Gateway-clone all predicted protein-encoding open reading frames (ORFs), or the ORFeome, of Caenorhabditis
elegans. We successfully cloned approximately 12,000 ORFs (ORFeome 1.1), of which roughly 4,000 correspond to
genes that are untouched by any cDNA or expressed-sequence tag (EST). More than 50% of predicted genes needed
corrections in their intron-exon structures. Notably, approximately 11,000 C. elegans proteins can now be expressed
under many conditions and characterized using various high-throughput strategies, including large-scale interac-
tome mapping. We suggest that similar ORFeome projects will be valuable for other organisms, including humans.
1Dana-Farber Cancer Institute and Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA. 2Unit de Recherche en Biologie
Molculaire, Facults Universitaires Notre-Dame de la Paix, Namur, 5000, Belgium. 3Research Genetics /Invitrogen, Huntsville, Alabama, USA. 4Life
Technologies /Invitrogen, Rockville, Maryland, USA. 5Protedyne Corporation, Windsor, Connecticut 06095, USA. 6Department of Surgery, McGill
University, Montreal, Canada. 7Center for Applied Genomics, Public Health Research Institute, Newark, New Jersey 07103, USA. 8Yale University, New
Haven, Connecticut 06520, USA. 9Center for Synchrotron Biosciences and Department of Physiology & Biophysics, Albert Einstein College of Medicine,
Bronx, New York 10461, USA. 10Genome Therapeutics, Waltham, Massachusetts 02453, USA. 11Present addresses: INSERM Unit 119, Institut Paoli
Calmette, 13009 Marseille, France (J.R.); Open Biosystems, Huntsville, Alabama 35806, USA (T.M.); Cityscapes, Huntsville, Alabama 35801, USA (J.R.H.);
SAIC/National Cancer Institute, Frederick, Maryland 21702, USA (J.L.H.); Atto Bioscience, Rockville, Maryland 20850, USA (M.A.B.); Cancer Research
UK, Clare Hall, Herts EN6 3LD, UK (S.B.); Agencourt Biosciences Corporation, Beverly, Massachusetts 01915, USA (L.D.). Correspondence should be
addressed to M.V. (e-mail: marc_vidal@dfci.harvard.edu).
scriptome projects are not available in vectors that allow the Version 1.1 of the C. elegans ORFeome
protein-encoding sequences to be transferred to a variety of We first implemented this strategy at a genome scale to experi-
expression vectors by automated, high-throughput methods. mentally verify the genome annotation of C. elegans because of
Whereas bioinformatic analysis of genome annotations is the relative simplicity of its genome (small introns and short
affected by only the first limitation, the development of pro- intergenic sequences) and the high quality of its genome
teome-wide strategies has been hampered by all three. sequence20. The C. elegans genome sequence was originally pub-
lished with a relatively low error rate of 1 nucleotide per 30 kb20
Results and is now the only complete sequence available for a multicellu-
Genome-wide ORFeome cloning lar organism16. At the start of the C. elegans ORFeome project
To simultaneously address these three problems, we designed an (August 1999), the following ORF sequences were available: 839
alternative strategy referred to as Gateway-based ORFeome ORFs previously submitted to GenBank by the worm scientific
cloning6,1719 (Fig. 1a). Briefly, predicted protein-encoding ORFs community (community ORFs), 1,340 ORFs defined experi-
are amplified by PCR precisely between the initiation and termi- mentally by the transcriptome project through overlapping
nation codons, using a cDNA library as template and specific ESTs6 (transcriptome ORFs) and 17,298 ORFs predicted by the
WorfDB
5UTR O R F 3UTR
PCR on
cDNA library
cDNA library
O R F
O R F
recombinational
cloning ccdB
Donor vector
E. coli transformation
ccdB
+
plasmid DNA O R F
miniprep Entry clone
ORF identification
ORFeome 1.1
(~12,000 Entry clones)
Fig. 1 Gateway cloning of the C. elegans ORFeome 1.1. a, Overall scheme for Gateway cloning. ccdB, toxic marker19; yellow, attB1 and attB2; green, attP1 and
attP2; blue,attR1 and attR2; red, attL1 and attL2. b, Electrophoretic analysis of PCR products. The sizes of products from 19,477 PCR reactions that were
attempted were analyzed on 218 ethidium bromidestained gels6. Each gel picture is available at WorfDB23. c, Experimental verification of predicted ORFs.
WormBase release WS84 contains 20,800 predicted ORFs (of which 1,324 correspond to alternative splice forms). 12,376 have been verified by ESTs (yellow and
green); 11,984 have been verified by OSTs (red and green); 7,619 have been verified by both ESTs and OSTs (green) and 4,365 and 4,757 ORFs have been verified
exclusively by OSTs (red) or ESTs (yellow), respectively.
(that is, between the Start and Stop primers). In addition, it facil- that is, the community and transcriptome ORFs, could be ampli-
itated the throughput needed to complete the project. We then fied and cloned using our high-throughput assay methods. The
re-arrayed all 11,984 ORFs that were successfully cloned and second possible explanation for cloning failures is intron-exon
sequenced to generate version 1.1 of the C. elegans ORFeome. structure mispredictions (for example, wrongly predicted exons
This version is available to the research community through can lead to the design of a primer pair that does not amplify a
MRC geneservice and Open Biosystems (see URLs). product even though the gene is expressed). This possibility is
supported by two observations: (i) 29% of ORFs already touched
Genome-wide verification of gene existence by ESTs, and thus previously identified experimentally, did not
The first outcome of the C. elegans ORFeome 1.1 is to provide clone in our assay and (ii) for a large proportion of unsuccessfully
experimental evidence for 4,365 predicted ORFs that had not yet cloned ORFs, pairs of internal primers could amplify a product
been identified by any EST in the transcriptome project whereas the Start and Stop primer pair could not6.
(untouched ORFs). This increases the number of identified C. Though our assay should not be used to rule out the existence
elegans transcripts by 35%, raising the total number of C. elegans of any particular predicted gene, trends of overall ORF cloning
genes experimentally confirmed to 16,741 (4,365 untouched efficiency (OCE) identified interesting features of C. elegans
ORFs cloned here plus 12,376 ORFs verified by ESTs (touched global chromosomal organization. First, the OCE of chromo-
ORFs), as indicated in the August 2002 release of WormBase somes V and X was lower than that of the other four chromo-
(WS84); Fig. 1c). This finding might relate to the genome anno- somes (Fig. 2a). Additionally, the OCE, though homogeneous
tation of other species and the issue of number of genes in gen- along chromosomes I, III, IV and X, was slightly biased against
eral. Indeed, unlike for Drosophila melanogaster and humans, C. one extremity of chromosome II and both extremities of chro-
elegans gene predictions did not rely primarily on the existence of mosome V (Fig. 2b). Because these three chromosomal regions
ESTs or orthologies. Thus, we propose that a substantial propor- are heavily populated with ORFs predicted to encode G pro-
tion of as yet unpredicted and untouched genes could also be teincoupled receptors (GPCRs), we compared the OCE of vari-
present in other organisms. Consistent with this idea, a recently ous functional classes (Fig. 2c). Compared to ORFs that encode
described protein trapping method in D. melanogaster supported potential transcription factors, phosphatases, kinases and others,
the notion that 44% of its genes have not yet been predicted21. In ORFs predicted to encode GPCRs were indeed cloned at a sub-
addition, although the total number of human ESTs (roughly 3 stantially lower rate. Together with other observations22, this
106) is higher than that of C. elegans ESTs (roughly 3 105) and suggests that a large proportion of ORFs predicted to encode
a b
c
100%
90%
80%
70%
60%
50%
40%
30%
Fig. 2 Genome-wide verification of gene existence. a, Global representation of the OCE per chromosome. Roman numerals refer to chromosome numbers and n rep-
resents the number of predicted ORFs for each chromosome. The color code for each ORF class is as indicated for b. b, OCE distribution along the chromosomes for
each of the six chromosomes. The blue bars on the right of each chromosome represents the density of GPCRs. c, OCE across seven predicted functional categories. The
number of predicted ORFs for each Gene Ontology category is as follows: transcription factors (tf), GO:0004930, n = 457; phosphatases (pho), GO:0016302, n = 211;
kinases (kin), GO:0016301, n = 542; G-protein coupled receptors (GPCR), GO:0004930, n = 986; hydrolases (hyd), GO:0016787, n = 1,220; oxydoreductases (oxy),
GO:0016491, n = 295; extracellular ligand-gated ion channels (lgic), GO:0005230, n = 87. The color code for each ORF class is as indicated for b.
the structure of cloned ORFs with that of predicted ORFs. Over- been made available on WormBase16 and WorfDB23 (Fig. 3b).
all, 3,439 (29%) cloned ORFs had a structure that differed from The global OCE described above, together with the ORF struc-
that of the GeneFinder predictions (Fig. 3a,b). Notably, the pre- ture corrections, allowed us to estimate the overall quality of gene
dicted intron-exon structure of more than 1,500 untouched predictions in C. elegans. Of 19,477 predicted genes, we success-
ORFs could be corrected. Exons were removed or added for 608 fully cloned and sequenced 11,984 (61.5%), among which 8,545
or 479 ORFs, respectively, or were extended or shortened for (43%) had an intron-exon structure matching the predictions. If
1,008 or 1,046 ORFs, respectively. Introns were removed or we assume, as stated above, that most cloning failures occurred
added for 684 or 505 ORFs, respectively. Although such modifi- because of intron-exon structure mispredictions, more than 50%
cations did not change the global level of orthologies between the of predicted C. elegans genes would need corrections (Fig. 3c). The
C. elegans predicted proteome and that of other organisms, we completeness and quality of the C. elegans genome sequence are
expect that they will be useful for numerous proteomic considered relatively high compared with that of other organisms1.
approaches, such as protein identification using peptide-finger- Hence, our work strongly suggests that similar genome-wide veri-
printing mass-spectrometry techniques. fication projects are urgently needed for these organisms.
a d
exon exon exon additional intron additional exon
unaltered extended shortened intron not found exon not found 0
predicted structure X 1
OST X
number of events 29,049 1,138 1,223 608 884 568 1,167
2
percentage of events 83.9% 3.3% 3.5% 1.8% 2.5% 1.6% 3.4%
500
number of ORFs 1,008 1,046 505 684 479 608
3
8,545
percentage of ORFs 71.3% 8.4% 8.7% 4.2% 5.7% 4% 5.1%
1,000
b
4
1,500
5
2,0006
7
OSTs
c corrected 2,500
18% OSTs 8
perfect match
(3,439)
44%
(8,545) 3,000
38%
(7,493)
9
no OSTs 3,500
Fig. 3 Genome-wide verification of intron-exon structure. a, Differences in the structure of cloned ORFs versus that predicted in WormBase release WS9. The
exons observed by OST can be identical to the predicted exons (exon unaltered) or of different length (exon extended or exon shortened). There can also be
additional introns inserted into predicted exons (additional intron) or missing introns merging two predicted exons into one (intron not found). Finally, OSTs can
identify exons that were not predicted (additional exon) or suggest that predicted exons do not exist (exon not found). The number and percentage of events as
well as the number and percentage of ORFs affected by each event is shown. b, Example of the graphical display available in WorfDB showing the structure of an
ORF (C10E2.5) derived from OSTs compared with the current prediction (WS84). c, Summary of gene prediction quality in C. elegans. d, OST analysis of isolated
Entry clones showing splicing variants. The panel displays 36 sequencing reads corresponding to 11 singly isolated Entry clones (black arrows on the right)
aligned against the genome sequence using Acembly and compared with three predicted splice variants of a single gene (W07B3.2). The blue boxes correspond
to GeneFinder exon predictions, numbered 19, whereas the connecting blue lines indicate predicted introns (confirmed by ESTs when highlighted in green). The
yellow bar and the vertical scale (left) represent the C. elegans genome with nucleotide positions starting at 0 on the putative ATG codon. The reads are com-
bined into observed exons (pink boxes) and introns (connecting pink lines), and the deduced structures of each alternative splice form are represented in green.
The blue lines connecting the green and pink boxes represent introns that do not satisfy the gtag or gcag rule.
percentage of total
interactions found
T24D8.1 F47D12.3
mapping. The panel shows H28O16.1
C02F5.9
ZK20.5 K12G11.4
R13A5.8 B0281.5 60 touched ORFs touched ORFs
F53G12.10 K02E7.9
two superimposed two-hybrid F23C8.5 B0336.2 CC8.1 C30F8
Y113G7A.6A H09G03.2
Y79H2A.1 F26D10.3
maps obtained from screening Y39B6B.J
B02O5.3
K08A8.1 T02E9.2 F55D10.2F49H12.3 Y105C5B.19 ZK1098.4 Y49E10.1 W05B10.4
40
C36C9.1 F25H2.9W02G9.2 C06A8.1 F29G9.5
116 baits against our worm T22D1.9
B0252.3
ZK792.6 20
R09H10.3 C48D5.1 untouched ORFs
AD-cDNA library18,25,26 (two- T10B10.1
F23F12.6
C23H3.4 K11H3.1 C56C10.7
C16C2.3F46G10.1 C48B6.3 F54D10.7
ZC239.15
AC7.2
F28B4.2
0
untouched ORFs
T28C6.7
hybrid connections are repre- ZK1055.7
ZK867.1D
F52E1.7 F25B5.4 Y54E10BL.6
C38D4.6 cDNA library AD-ORFeome library
T23B5.1 F54F2.5
sented by black lines) and the R09B3.1 T18D3.7
C36B1.4 Y39G10AR
C52B11.2 T11B7.4A
F33H2.6 T22A3.2 D1007.12 ZK1055.1
AD-ORFeome library (two- F39H12.1
C31H1.6C32D5.1 F54D10.3 F09E5.7
CD4.6 R05D11.8 W07G4.5
M03C11.4
F57F5.1 T05E7.5
hybrid connections are repre- F55A11.1
R186.4 ZK678.1 R06F6.8B
T18H9.2 ZC155.7 C27H5.2
T08A11.1
ZK945.2
C30F2.3B0024.11 F31E3.5K12H4 T27C4.4A
sented by red lines). The C28G1.3
C53A5.3
F11E6.1 F13D12.6 F35G2.2 Y42H9AR.1T07F8.4
T21B6.3 F10C1.7B K07A1.12
B0041.6ZK930.3
complete list of interactions is T22F3.2 F56H1.4
F23F1.8
C15H11.7
R07H5.1
K05B2.3 K10G6.1 F59A2.3T06G6.3 Y54E2A.3
C35A5.9
C40A11.7
H15N14.1 T02E1.7 F52G2.3
available in Supplementary C26F1.4 F31E3.5T03E6.7 F10G7.4K08F8.1 F29G6.3BC43E11.4 F33G12.5
Y82E9BR.13
H06I04.1 M162.1 F07A5.7
Table 1 online. Overlapping Y24F12A.2
C44B7.1
K11D2.3
R03C1.2
Y110A7A.14
Y79H2A.1
F02A9.3Y87G2A.6 C06A6.1
F58A4 ZK829.7 C14B1.1F56D12.5 F35F10.12 C39D10.7 F26B1.3
connections are shown in C23G10.4A K12H4.1
T08G5.5ZK1127.4F01G10.5
C54D1.5 F44G3.9
ZK418.4 B0547.1 F38A3.1
DY3.7 D1054.2 F45G2.3 R11A8.6 F45D11.15
green. Proteins encoded by F31C3.2
R06C1.1
C34H4.2 K05C4.1 R11E3.6K12G11.3
F23F12.9
F44B9.6
W09H1.3
R05F9.1
K08B4.1
C15H9.6 H26D21.1 F10G8.8
touched and untouched ORFs T06E4.3A C07A12.4
F41H10.4
Y69H2.3A Y71A12B.G
ZK1053.5
F02A9.6 C05C10.5
K08E3.6 F46G10.1 B0024.14 F56H11.1A
are represented by black and Y119D3B Y62E10A.14 W03D2.3
C30B5.1T05C12.7
K06A4.5 T05B11.1
T10E10.4 ZK945.8T22H2.6A
F38B2.1 T22A3.3
T04A11.6 Y45F10D.13
C30C11.2
yellow closed circles, respec- F49C12.8
Y39A1A.23
K12D12.1 C05D9.1 W04D2.1A
T05C12.6A K04G2.10
F59A2.1
K09B11.9 untouched ORF
F46F2.2 Y38A8.2 Y113G7B.23
F43D9.4 F54D5.5
tively. Many of the novel inter- F19B10.1 C05C10.4B0495.5 Y41C4A.14 C47B2.4
F39H11.5
T10F2.4
W10G6.3 C06A8.5
touched ORF
C10G11.5 F49B2.5 T20F5.6
actions detected in the T06E4.6 F44D12.1
C16C8.16 C14F5.5 M6.1A
Y57G11C.22 W03D2.4T06E4.3A T22H2.5B
C31H1.2 F46F6.1A T23H4.2 cDNA library
Y119C1A.1
AD-ORFeome screens identi- Y43F11A.5 Y57G11C.24C
D1037.3
M02A10.3A interaction
ZK632.7 C06G3.6 F46F6.1B
ZK1248.3
fied potential links that were F31E3.3
C54G10.2
K03H1.10 C04F12.3
Y38A10A.5 T01C3.3 C11E4.6 R08E3.3Y105E8B.5 C49A9.6 W05H7.4
K04G7.1
T07E3.5 AD-ORFeome library
T28F12.3
not identified by the AD- F14D12.4 F55A11.3 K07D4.3
H32C10.1
interaction
F32A11.2 C39E9.13C05D11.11A T10H10.1 K07H8.6 C44B12.5 F10E9.3
F42A10.2 F14F3.2
cDNA library screens. For K09A11.1
K10B3.8
C35B1.1 C09D4.5
K04H4.2B
T11B7.4B W10D9.3 cDNA and AD-ORFeome
C53A5.6 ZK652.9 library interaction
R05D3.4 K10B3.7 T02E1.3B
example, three interactors T06D8.8 ZC434.2
Y51A2D.17
F17E5.1A
C09H6.2A F32D1.1 F08C6.7
C47E8.5 F21F8.7
C50F4.11 C02C2.1 C53C7.3
(C53A5.6, Y71F9AL.10 and Y54G2A.31
T24D1.3 R12E2.3 F29B9.6 M04B2.1
T02E1.3A
ZC477.9A
F43C1.2A
K11E8.1C
F59A2.5
T24D1.3) found with UBC-13 Y47D3A.G
R07E5.8
C17E4.2Y66H1A.6
F55B11.3
C02F12.4 C05C8.1
F38A6.3A
R119.7 T08D10.1
F59C6.5 F29G9.2
Y71F9AL.10 Y43C5A.6 C34E10.8 C33H5.12A Y17G7B.4
(Y54G2A.31) are predicted to C01G5.6
F45C12.7 K08E3.5A C09G1.4
C06C3.1
F47B10.2
ZK20.3 C32F10.2 W06D4.6
contain a RING finger domain, K05G3.3
F25H5.4 Y116A8C.13
R01H10.5 C44F1.2
F08B6.4A M02D8.2 C49C3.7
ZK1240.2 Y77E11A.4 B0432.8 F39B2.2 C30A5.2 F42C5.10
R10D12.14
a structure frequently found B0205.3 R02F2.5
T27F2.2
C02F05.7A F20H11.5 F55B12.3A F41H10.3 F42H10.7
K12D12.5
in proteins involved in ubiqui- F25H2.5
Y15E3A.1
K06A1.4
Y76B12C.2M03C11.8 F46A9.5
tination. These interactors may
encode proteins that function
as degradation-target speci- c
50% success 100% success 41% success
ficity factors (similar to F-box 60 kDa -
percentage
1
mw 2 3 4 1 2 3 4 1 2 3 4 mw mw
resulting clones. We then used this AD-ORFeome library to use in smaller scale one protein at a time approaches.
carry out two-hybrid screens against 116 bait proteins previously
used with an AD-cDNA library18,25,26 (Fig. 4a). When compared Discussion
with AD-cDNA screens, AD-ORFeome screens reached satura- In summary, the C. elegans ORFeome 1.1 suggests that genome
tion, as defined by the percentage of interactors identified more annotation tools, such as GeneFinder, can be accurate in identi-
than once in a given screen, with considerably fewer yeast trans- fying potential genes. Indeed, most C. elegans genes originally
formants (2 105 transformants for AD-ORFeome versus 2 106 predicted without evidence from cDNA or orthology are
for AD-cDNA). Hence, the throughput of the worm interactome expressed and spliced. Nevertheless, a third of the predicted
map can now be increased at least ten-fold. In addition, the AD- ORFs could not be cloned in our assay owing to GeneFinder mis-
ORFeome screens detected relatively more untouched ORFs than predictions at the boundaries of the ORFs. Future versions of the
the AD-cDNA screens (26% versus 7%; Fig. 4b). Finally, although C. elegans ORFeome could be generated using improved gene
the average number of potential interactors per bait is lower for models by comparative analysis with other genome sequences
AD-ORFeome screens (0.9) than for AD-cDNA screens (3.5; this (for example, between the C. elegans (WormBase) and
can be explained by the fact that many two-hybrid interactors can Caenorhabditis briggsae genome sequences) or by experimental
only be detected in the context of partial domains), it is substan- corrections of ORF extremities (for example, by using splice-
tially higher than for previously described yeast AD-ORFeome leader sequences as anchors for 5 end PCR reactions). Alto-
screens (0.130.26; ref. 24). Thus, in addition to allowing a higher gether, this information should allow the design of new Start and
throughput, worm AD-ORFeome screens should lead to a rea- Stop primer pairs for future versions of the C. elegans ORFeome.
sonably comprehensive interactome map for C. elegans. Many of In addition, approximately one third of GeneFinder-predicted
the novel interactions detected in the AD-ORFeome screens iden- ORFs that were successfully cloned here needed correction of
tified potential links that were not identified by the AD-cDNA intron-exon structure. We propose that many proteomic applica-
library screens (Fig. 4a). Similar ORFeome libraries can also now tions will benefit from these corrections, particularly in organ-
be generated using additional Destination vectors to do other isms for which the current genome sequence is of poorer quality.
proteome-wide genetic or biochemical assays27. Similar ORFeome projects in those organisms should be com-
plementary to current transcriptome projects to verify the exis-
ORFeome 1.1 and proteomics tence of all genes, identify their intron-exon structure and
ORFeome projects are also valuable for large-scale proteomic splicing variants and point to features of chromosomal organiza-
approaches. For example, comprehensive use of the protein chip tion, assuming that improved genome annotations become avail-
technology has been limited so far to the yeast proteome9, mostly able. Finally, in contrast to current transcriptome projects, the
because of a limited availability of similar ORFeome resources for resulting Entry clone resources should be immediately useful for
metazoan organisms. Hence, a crucial aspect of the C. elegans high-throughput expression and functional characterization of
ORFeome 1.1 is the versatility with which ORFs can now be trans- the proteome of many organisms in many different settings.
ferred to many different Destination vectors to be expressed for
various functional, biochemical and structural genomic analy- Methods
ses18,19. To investigate to what extent the C. elegans ORFeome 1.1 Gateway cloning of the C. elegans ORFeome 1.1. We retrieved known or
can be used for high-throughput protein expression in different predicted ORF sequences from GenBank, the Transcriptome project or
formats, we used subsets of ORFs found as potential interactors of Wormpep (WormBase version WS9), identified overlaps and introduced
baits involved in the DNA-damage response26. Using an auto- the resulting 19,477 distinct ORFs into WorfDB23. We designed 19,477
primer pairs using OSP30 and used them to amplify by PCR each ORF
mated 96-well plate setting, we transferred these ORFs from the individually from a cDNA library17 as described6,18. We designed Start
Entry vector to various Destination vectors and expressed them in primers without the ATG codon (that is, the first 5 specific nucleotide used
Escherichia coli and in S. cerevisiae as fusion proteins (Fig. 4c). We in the primers is the T of ATG) to preclude internal translation initiation
first expressed 68 proteins in E. coli as N-terminal fusions to the events in the production of N-terminal fusion proteins. We designed Stop
maltose binding protein (MBP) and examined protein expression primers without the last two bases of the termination codon so that C-ter-
by SDSPAGE. In 41% of the cases, we observed a band of the minal fusion proteins could be produced from the ORFeome resource. To
appropriate size. We next expressed the same proteins in E. coli as facilitate the PCR product size analysis and to more conveniently adjust the
N-terminal fusions to a hexa-histidine tag (His6) and tested them PCR reaction elongation times, we organized samples in order of increas-
by mass spectrometry. In 50% of the cases, we detected proteins of ing size of the predicted ORFs6. We cloned the resulting PCR products into
the Entry vector pDONR201 by Gateway recombinational cloning tech-
the appropriate size. Finally, we expressed 79 proteins in yeast cells
nology18,19 and archived them as both bacterial glycerol stocks and plasmid
as N-terminal fusions to tandem glutathione S-transferase (GST) DNA mini-preps. We used 96-well plates and liquid handling systems
protein and His6 and tested them by western-blot analysis using (robotics methods and protocols will be described elsewhere).
an antibody against GST. In 61% of the cases, we observed a band During the first phase of the project (including approximately 7,000
of the appropriate size. Overall, 83% of 58 ORFs that had been ORFs), we carried out the antibiotic selection step by spotting transformants
transferred in these three Destination vectors were expressed in at on solid media 96 ORFs at a time. This facilitated visual estimation of cloning
least one setting. efficiency. In general, cloning efficiency was inversely proportional to ORF