You are on page 1of 12

,,:: ,~',,, Z, -~ D,

"

A cDNA Cassette System for the Synthesis


of Recombinant Procollagens. Variants
of Procollagen II Lacking a D-Period Are Secreted
as Triple-Helical Monomers
WILLIAM V. ARNOLD.I, ALEKSANDER L. SIERON.I, ANDRZEJ FERTALA.I,
HANS PETER BACHINGERi', DIANE MECHLINGi" and DARWIN O. PROCKOP.I
Department of Biochemistry and Molecular Biology,Jefferson Institute of Molecular Medicine, Jefferson Medical College of Thomas Jefferson University,Philadelphia, Pennsylvania and
t Shriners Hospital for Children and Department of Biochemistry and Molecular Biology, Oregon
Health Sciences University,Portland, Oregon, USA.

Abstract
Currently there is a lack of experimental systems for defining the functional domains of the
fibrillar collagens. Here we describe an experimental strategy that employs the polymerase
chain reaction (PCR) to create a series of cDNA cassettes coding for seven separate domains
of procollagen II. The system was used to prepare novel recombinant procollagens I! from
which one of the four repetitive D-periods of the triple helix was deleted. Four constructs,
each lacking a different D-period, were expressed in stably transfected mammalian cells
(HT-1080). Truncated procollagens of the predicted size were recovered from the medium.
All were triple-helical as assayed by circular dichroism. Therefore, deletion of a complete
D-period containing 234 amino acids does not destabilize the triple helix of homotrimeric
collagen II as much as some naturally occurring mutations in the heterotrimeric monomer of
collagen I that delete shorter sequences or that convert obligate glycine residues to residues
with bulkier side chains. Moreover, the results suggest that the strategy developed here can
be used to map in detail the binding sites on fibrillar collagens for other components of the
extracellular matrix and for the binding, spreading and signaling of cells.
Key words: cDNA cassettes, recombinant type II procollagen

Introduction
Collagens are a major family of extracellular proteins
that have a variety of functions and that are character1 Current address: Center for Gene Therapy, Allegheny University of the Health Sciences, 245 North 15 Street, M.S. 421,
Philadelphia, PA 19102
Matrix Biology Vol. 16/1997, pp. 105-116
1997 by Gustav FischerVerlag

ized by having at least one region in which three


polypeptide chains are folded into a characteristic
triple-helical conformation (Piez, 1984; van der Rest and
Garrone, 1991; Prockop and Hulmes, 1994; Prockop
and Kivirikko, 1995). Folding of the protein into the
triple-helical conformation is driven by repetitive sequences of -Gly-Xxx-Yyy- tripeptide units in which
-Xxx- is frequently proline and -Yyy- is frequently hy-

106

W.V. Arnold et al.

droxyproline. The most abundant collagens are the


fibril-forming collagens known as types I, II and III.
Each of these has a major triple-helical domain of about
1,000 residues, and each is first synthesized as a soluble
procollagen containing additional N-propeptides and
C-propeptides (Fig. 1). After the propeptides are cleaved
by specific N- and C-proteinases, the collagen molecules
spontaneously self-assemble into characteristic fibrils.
Analysis of the distribution of hydrophobic and charged
amino acids in the monomers of collagen I and II indicated that the repeating tripeptide units of 1,014 amino
acids are divided into 4.4 repeats or 4.4 D-periods of
about 234 amino acids each (Hulmes et al., 1973; Chapman, 1984). In fibrils, the monomers are arranged in.
a head-to-tail orientation with a gap of about 0.6 Dperiods and, therefore, a repeat of 5 D-periods.
The monomers of the fibrillar collagens were initially
regarded as rigid rods. Subsequent observations, however, proved that short segments or "cooperative

blocks" (Privilov, 1982) of the triple helix begin to undergo micro-unfolding at about 30 to 37 C and well before the molecule fully unfolds at about 41 C. Evidence
for the micro-unfolding model includes the effects of
partially denaturing and then renaturing the protein
(Kiihn et al, 1966; Rhy~inen et al., 1983), comparisons
of the helix-forming properties of synthetic peptides
with repetitive -Gly-Xxx-Yyy- sequences (Sakakibara et
al., 1973; Prockop et al., 1976; Engel et al., 1977; Inouye et al., 1982; D61z and Heidemann, 1986; Roth and
Heidemann, 1986; Germann and Heidemann, 1988),
measurements of enthalpy changes of thermal denaturation by microcalorimetry (Privalov, 1982) and the effects
of temperature on the kinetics of fibril formation
(Kadler et al., 1988). The results demonstrated that sequences in which the Xxx positions are occupied by proline, and especially sequences in which the Yyy positions
are occupied by hydroxyproline, form the most stable
triple-helical regions. Regions lacking the two imino

Signal Peptide
N-Propeptide

Triple

N-telopeptide

Helix

C-telopeptide
C-Propeptide

N-Proteinase
Cleavage Site

Nt

C-Proteinase
Cleavage Site

D1

I
137 aa

D2

I
234 aa

D3

I
234 aa

D4

I
234 aa

D0.4

Ct

I I
234 aa

78 aa

I
273 aa

Figure 1. Schematic drawing of Procollagen II. The subdivisions of the protein indicated below the molecule were used in the design of the procollagen II DNA cassette system. Symbols: aa, amino acids.

A cDNA Cassette System


acids are less stable and are therefore likely to unfold at
much lower temperatures. However, it is unlikely that
the cooperative blocks are completely independent, since
some mutations that replace a single obligate glycine in
triple helix with a bulkier amino acid can lower the
melting temperature by as much as 20 C (Westerhausen
et al., 1990; Kuivaniemi et al., 1996). Also, the locations
and sizes of the cooperative blocks have not been defined.
Here we report an experimental strategy that greatly
simplifies mapping the functional domains of fibrillar
collagens. We have created a series of cDNA cassettes
that code for the individual D-periods and the N- and
C-propeptides of procollagen II. The cassettes are designed so that they can be assembled in any desired
order into DNA constructs that code for novel variants
of the protein. Also, the cassettes can be systematically

5'

107

mutated before assembly. We have assembled four constructs that each lack a specific D-period and have
shown that we can express them as truncated variants of
procollagen II that are secreted as correctly folded recombinant proteins in a mammalian cell system. The resuits indicate that deletion of a complete D-period of
234 amino acids has surprisingly little effect on the thermal stability of collagen II.

Materials and Methods


Design of the procollagen II DNA cassette system
cDNA cassettes were synthesized as indicated in Figures 2 and 3, using PCR with a template of a full-length
cDNA for procollagen II (Baldwin et al., 1989; courtesy

666 bbb

3'
ccc

cc

PCR
S'

3'

LIGATE

cassette
insert

Figure 2. Creation of a DNA cassette. PCR primers containing engineered blunt-cutting restriction sites B and C were used to
PCR-subclone a region of a cDNA template. The restriction site A was a part of the multiple cloning region of the plasmid vector.
The A nucleotide shown at the 3'-ends of the PCR product were added by Taq polymerase and provided nucleotide overhangs used
in the TA cloning system (Invitrogen).

108

W.V. Arnold et al.

B ,
5

.Reaion 1
-

,C

B ,
5

cleave at sites /
A and C

/ cleave at sites
A and B

3, C

B,
leaion 2

Figure 3. DNA construct assembly using the DNA cassette system. The assembly of a construct containing two cassette inserts is
demonstrated. The A, B and C restriction sites are the same in the three plasmids shown. The nucleotide sequences of some sites
are capitalized so that they may be differentiated from the same sites in different plasmids.

of S.-W. Li). The PCR primers were engineered to introduce blunt-cutting restriction sites B and C (Fig. 2) at the
ends of the amplified region. These restriction sites (Fig.
2 and 3) were designed to meet the following criteria: (a)
cleavage at B or C maintained the wild-type amino acids
encoded by the c D N A ; (b) restriction site B did not

occur in the wild-type c D N A template used to create the


cassettes or in the plasmid vectors; (c) restriction site C
in a given D N A cassette was unique for the cassette in
which it was used (site A to site C in Plasmid I of Fig. 3)
but could occur anywhere in the D N A sequence of other
D N A cassettes; and (d) restriction site A was a sticky-

A cDNA Cassette System


cutting site from the multiple cloning region of the plasmid containing the D N A cassette and was not found in
any of the sequences used in the cassettes or constructs.
Different D N A cassettes were assembled to create
larger DNA constructs as indicated in Figure 3. Plasmid
I was defined as a p-cassette because it provided the
DNA cassette that was inserted at the 5' end of the cassette in Plasmid II. Plasmid II was defined as an r-cassette because it received a DNA insert from a p-cassette.

Synthesis of D N A cassettes
Seven DNA cassettes were created to code for seven
separate regions of the procxl(II) chain of type II procollagen (Fig. 1). Because the 5" end of the COL2A1 cDNA
was difficult to isolate (Baldwin et al., 1989), nucleotides 23 to 126 (nucleotide 23 is the first of the start
codon) in the construct were from exon 1 of human
C O L I A 1 cDNA (Tromp et al., 1988). The remainder of
the cDNA, nucleotides 127 to 4737, were derived from
exons 3 to 52 of human COL2A1 gene (Baldwin et al.,
1989; Ala-Kokko and Prockop, 1990; Fertala et al.,
1994). Therefore, the construct included the signal peptide and signal peptide cleavage site from the COL1A1
gene. However, it did not include either exon 2 from the

109

C O L I A 1 or exon 2A or 2B from the COL2A1 gene, the


exons that are alternatively spliced in the COL2A1 gene
(Prockop and Kivirikko, 1995). The PCRs were carried
out with a commercial kit (GeneAmp PCR Reagent Kit;
Perkin Elmer) with 0.2 ng/lal cDNA for pro~l(II) chains
as template, primers (Express Genetics, Princeton, NJ) in
a concentration of 0.4 pmol/lal (Table I), and one-ninth
volume of 5 m M MgCI 2. The conditions were: 94.5 C
for 1 min followed by cycles of 94 C for 30 sec; 55 C
for 30 sec and 72 C for 60 sec. After 30 cycles, the samples were annealed at 72 C for 420 sec and then stored
at 4 C. The PCR products were ligated into a TA
cloning vector (PCRII; Invitrogen) according to the manufacturer's instructions (Fig. 4a). The structure of each
of the cassettes in the clones was verified by nucleotide
sequencing (Nucleic Acid Facility, Kimmel Cancer Institute, Thomas Jefferson University, Philadelphia, PA).
The cassettes were cleaved from the cloned plasmids and
then transferred to the vector pcDNA II (Invitrogen) by
digestion with HindlII and SphI (Fig. 4b), followed by
ligation with T4 DNA ligase (Life Technologies, Inc.).
To ensure expression, a 2-kb PpuMI/PuvlI fragment of
the COL2A1 gene encoding the 3' untranslated region
(UTR) (Baldwin et al., 1989; courtesy of L. Ala-Kokko)
was inserted into the compatible PpuMI/EcoRV site at

Table I. PCR Primers used for subcloning the pro~l(II) cDNA in the creation of the proctl(II) DNA cassette system.
Primer
Name a

Primer Sequenceb

Engineered
Restriction Sitec

N,+
N~DI+
D1D2+
D2D3+
D3D4+
D4-

GTCTACATGTCTAGGGTCTAGACATG'YI'CAG
CAG/CTGCATTACTCCCAACTGGGCGCCACCA
GCCC/GGGCCAATGGGCCCCATGGGACCTCG
GAG/CGGGAAGCCAGGAGCACCAGCAATGCC
GCCC/GGGCCACGGGGTCCTCCTGGCCCTCA
CAG/CGGAAGTCCCTGGAACCCAGATGGCCC
GCCC/GGGGCTCCTGGTCCCCCAGGTGAAGGT
CG/CGCAGCTCCAGGGAATCCAGTGGCTCCCG
GCCC/GGGCGTGTTGGACCCCCAGGCTCCAATGd
CG/CGTGAAGCCACGGTGTCCCTI'CAGGCCTCT
GCCC/GGGCTGCAGGGTCTGCCCGGCCCTCCTGGTC
GAG/CGGGGGACCTGGAGGACCAGGGGGCCCAGGAT
GCCC/GGGCCTGGCATCGACATGTCCGCCT
TCTGGCCTGGGCTGGGGGCAGTCACTCAG

NONE
PvulI (C)
Sr[I (B)
BsrBI (C)
Srfl (B)
BsrBI (C)
Srfl (B)
BstUI (C)
Srfl (B)
BstUI (C)
SrfI (B)
BsrBI (C)
SrfI (B)
NONE

D0.4+
D0.4-

Ct+
Ct-

a The primer name denotes the region of the procxl(II) cDNA that this primer was used to subclone by PCR. A "+" or "-" in the
primer name denotes whether the primer primes the polymerization of the sense or the antisense strand, respectively.
b All primers are written from the 5' to 3' end. Boldface nucleotides indicate where changes were made from the original pro0tl(II)
cDNA. These changes were engineered to introduce the indicated restriction sites, which are underlined. The slash in each primer
sequence indicates where the corresponding restriction enzyme cleaves the engineered restriction site.
' The (B) or (C) following the name of the engineered restriction site indicates whether that site is used as a B or a C restriction site
in the corresponding DNA cassette.
The additional change (C to T) introduced here was done to destroy a native BstUI site which occurs in this position in the original pro00(II) cDNA. Without this change the BstUI site engineered at the 3' end of the D 4 cassette would not be unique for this
coding region. Note that this C to T change does not alter the codon at this position.

110

W.V. Arnold et al.

(')

H jI

IL

Hindll,/~

~ Sphl

Spel

BsrBI

kan

~[

amp

I//

pCR II (3.9 kb)

(b)
Sphl

//

II

I .mp

Hindlll

/I"

pcDNA II (3.0 kb)

(c)
EcoRV

Pvul

I
Hindlll

neo

amp

pcDNA 3 (5.4 kb)

Figure 4. Plasmids used in the creation of the procollagen II


DNA cassette system. The plasmid vectors were used in (a)
PCR-subcloning, (b) assembly of DNA cassettes into constructs, and (c) construct for mammalian cell transfection.

the 3' end of the DNA cassette encoding the pro(xl(II)


C-propeptide. This 3' UTR region was subsequently
cloned at the 3' end of the Ct cassette.

eDNA constructs
Five DNA constructs (Table II) were assembled from
the cassettes in the order indicated in Table Ill, using the
bacterial strain DH5(x (Life Technologies, Inc.) as host.
In each assembly step, the fidelity of the DNA sequence
at the junctions was verified by DNA sequencing. Functional constructs were removed intact by digestion with
HindIII and BsrBI, and cloned into the corresponding
HindIII and EcoRV sites of the mammalian expression
vector pcDNA3 (Invitrogen) containing the CMV promoter and a neomycin resistance (Fig. 4c).

Cell transfections
HT-1080 cells (American Type Culture Collection
CCL 121) were cultured in DMEM supplemented with
10% (v/v) fetal calf serum. The cells were transfected
(Fertala et al., 1994) with one of the DNA constructs by
calcium phosphate precipitation using a commercial kit
(Profection Mammalian Transfection System Kit;
Promega), according to the manufacturer's instructions.
Briefly, cells were split 18 h prior to transfection, grown
to a density of approximately 106 cells in a 10 mm cell

culture dish, and provided with fresh culture medium


3 h prior to transfection. Approximately 12 lag of DNA
linearized with PvuI was precipitated with calcium
phosphate and incubated with the cells. After about
17 h, fresh culture medium was applied. The cells were
split 1 to 10, and 24 h later cultured in the culture
medium containing G418 (Life Technologies, Inc.) at a
concentration of 0.4 mg/L. The medium was exchanged
with fresh selection medium every 48 h over a 12-day
period.

Screening of transfected clones


Isolated G418-resistant cell Colonies were expanded in
12-well plates. Upon reaching confluence, cells were cultured for 24 h with 1 ml of serum-free DMEM supplemented with 41 iag/ml L-ascorbic acid phosphate magnesium salt n-hydrate (WAKO Pure Chemical Industries,
Ltd) and 0.5 laCi/ml of uniformly 14C-labeled amino acid
mixture (NEN DuPont). Media were collected and proteins precipitated with 8000 MW polyethylene glycol
(Sigma) at a final concentration of 5%. Precipitated proteins were dissolved in storage buffer (0.4 M NaCI,
25 mM EDTA, 0.04% NaN 3 in 0.1 M Tris HCI, pH 7.4)
and separated by SDS-PAGE under reducing conditions,
followed by electroblotting and Western analysis using
guinea pig anti-human antibodies specific to the C-telopeptide region of procollagen II (Ala-Kokko et al., 1991;
kindly provided by C. Merryman, Dept. Biochem. and
Molec. Biol., Thomas Jefferson University, Philadelphia,
PA) and secondary antibodies anti-guinea pig IgG conjugated with alkaline phosphatase (Sigma).

Recombinant procollagens production


Selected clones were grown to confluence in DMEM
supplemented with t0% (v/v) fetal calf serum in ten
175-cm 2 culture flasks. The clones were expanded in
four interconnecting tissue-culture flasks (Cell Factories,
Nunc) that were equivalent to approximately one hundred thirty-seven 175-cm 2 flasks of cultured cells. When
the culture reached approximately 80% confluence, the
medium was removed and the cell layer was washed
briefly with PBS. The cells were then incubated with labeling serum-free medium containing 0.17 laCi/ml of a
uniformly 14C-labeled amino acid mixture and 41 pg/ml
of L-ascorbic acid phosphate magnesium salt n-hydrate.
After 24 h, the medium was collected and replaced with
fresh labeling medium. After another 24 h, this medium
was replaced with the medium that did not contain radioactivity. Following collection of the third 24-h

A cDNA Cassette System


medium, the cell layer was washed twice briefly with
PBS containing 1 mM EDTA. The cells were then treated
with the same cycle of three consecutive 24-h incubations of medium.

Purification of recombinant procollagens


The method of Fertala et al. (1994) was used with
minor modifications. For each cell line, approximately
4 L of media harvested from each 24-h period was filtered through a 1.6 lain glass-fiber filter (Millipore) and
supplemented with the following reagents to the indicated concentrations: 0.1 M Tris HC1 buffer, 0.4 M
NaC1, 25 mM EDTA, 10 mM NEM, 1 mM PAB, and
0.04% NaN 3 adjusted to pH 7.4. High molecular mass
proteins in the medium were concentrated approximately 10-fold at 4 C by the use of cartridges with
100-kDa molecular mass cut-off (Prep/Scale-TFF filter;
Millipore). Proteins in the concentrated medium were
precipitated over night at 4 C with 200 mg/ml ammonium sulfate and centrifugation at 15,000 g for 1 h at
4 C. Pellets from each 24-h collection were pooled, solubilized in storage buffer over night at 4 C, and diaM
lyzed twice against 200 volumes of DEAE-cellulose column I buffer (2 M urea, 0.2 M NaCI, 5 mM EDTA, and
0.04% NaN3 in 0.1 M Tris HCI, pH 7.4). The sample
was clarified by centrifugation and chromatographed at
4 C on a DEAE-cellulose column (2.6 cm 15 cm)
equilibrated and eluted with the DEAE-cellulose column I buffer. The flow-through fraction was collected
and dialyzed at 4 C against 200 volumes of DEAE-cellulose II column buffer (2 M urea, 2 mM EDTA, and
0.04% NaN3 in Tris HC1 buffer, pH 7.8). The sample
was chromatographed at 4 C on a second DEAE-cellulose column (2.6 cm 15 cm) equilibrated and eluted
with the DEAE-cellulose column II buffer. The
flow-through fraction was again collected and directly
applied in the same buffer to a column (1.6 cm 5 cm)
of Q-Sepharose (Pharmacia) at 4 C. The column was
washed with DEAE-cellulose column buffer II, and recombinant procollagen II was eluted with a 0.4 M NaCI
in the same buffer. The eluted protein was dialyzed at
4 C against 200 volumes of storage buffer containing
5 m M EDTA and stored at -80 C. For further analysis
by circular dichroism, the proteins were concentrated
on a membrane filter (YM-100; Amicon), and the
buffer was exchanged to EDTA-free storage buffer. The
amino acid composition and protein concentrations of
the purified procollagens were assayed for us by the
Wistar Protein Microchemistry Core Facility, Philadelphia, PA.

111

CD spectroscopy of recombinant procollagens


Circular dichroism (CD) spectroscopy was carried out
with a JASCO J-500A spectropolarimeter using waterbath thermostated quartz cells with a path length of
0.05 cm (Hellma). The temperature of the sample was
monitored by a thermistor and a digital thermometer
(Omega Engineering, Inc.). The temperature of the circulating water bath (Lauda RCS20D) was controlled by
a temperature programmer (Lauda PM350). The CD
spectrum of the sample was scanned from 180 nm to
260 nm.

Resutts
General features of the cassette system
In designing cassettes for the pro~l(II) chain, the 8rfl
(5'-GCCC/GGGC-3') site was used as the B site (Fig. 2)
for all the cassettes because Srlq sites are rare in plasmid
vectors, do not occur in protxl(II) cDNA, and cleavage
of the site leaves a complete codon for glycine (GGN) at
the 5" end of the cassette. Accordingly, a Srfl site can be
used to define the 5' end of a D-period or build any cassette beginning with a complete glycine codon found
anywhere within a D-period of the pro~l(II) chain. To
accommodate the variability in the -Yyy- positions that
end each D-period, the restriction sequences introduced
into the C site were BsrBI (GAG/CGG) for proline
(CCN in the 3' to 5' orientation), BstUI (CG/CG) for
alanine (GCN), and Eco47III (AGC/GCT) for serine
(AGN).
The Nt and Ct regions containing the propeptides and
the telopeptides cap the ends of the DNA constructs,
and therefore restriction sites were engineered into one
end only. The SpeI site was used as the A site (Fig. 2).
As designed, the pro00(II) DNA cassette system can
be used to assemble DNA constructs encoding novel
procollagens II in which individual D-periods of the
triple helix can be deleted, rearranged or duplicated,
since any D-period cassette from Plasmid I can be added

Table II. Constructs assembled using the DNA cassette system.


Name

Description

Composition

FL
-D1
-D2

full-length
missing D1
missing D2

N,-D 1-D2-D3-D4-Do.4-Ct
Nt-D2-D3-D4-D0.4-C,

-D 3

missing D 3

NcD1-D2-D4-Do.4-Ct

-D4

missing D4

Nt-DI-D2-D3-Do.4-C
t

Nt-D1-D3-D4-Do.a-Ct

112

W.V. Arnold et al.

Table III. Assembly of functional proal(II) DNA constructs using the DNA cassette system.
Functional Construct

Ligation Reactions of Cloning Steps a

Nt-D1-D2-D3-D4-D0.4-C t

(1) Dl(p) + D2(r) = D1-D2


(1) D3(p) + D4(r) = D3-D4
(1) D0.4(p) + Ct(r) = D 0 . 4 - C t
(2) Nt(p) + DI+D2(r) = N~-D1-Dz
(2) D3+D4(p) + Do.4+Ct(r) = D 3 - D 4 - D o . 4 - C t
(3) N~-Dt-D2(p) + D3-D4-Do.4-Ct(r ) = Nt-D1-D2-D3-D4-Do.4-C t

(full length)

N,-D2-D3-D4-Do.4-C t

(missing D,)

(1) N~(p) + D2(r ) = N~-D 2


(1) D3(p) + D4(r) = D3-D4
(1) D0.4(P) + Ct(r) = D 0 . 4 - C t
(2) D3-D4(p) + D 0 . 4 - C t ( r ) = D3-D4-D0.4-Ct
(3) N,-D2(p) + D3-D4-Do.4-Ct(r)= Nt-D2-D3-D4-Do.4-C t
(4) N~-D2-D3-D4-D04-Cr(p)+ Ct(r ) = *Nt-D2-D3-D4-Do.4-C t

Nt-D1-D3-D4-Do.4-C t
( m i s s i n g D2)

(1) Ndp) + Dl(r) = N~-D1


(1) D3(p) + Dr(r) = D3-D4
(1) Do.4(p) + C~(r) = Do.4-C t
(2) D3-D4(p) + Do.4-C~(r) = D3-D4-Do.4-C~
(3) N~-Dt(p) + D3-D4-Do.4-Cdr)= N~-D1-D3-D4-Do.4-C~

Nt-D1-D2-D4-D0 4-C ~

(1) DI(p) + D2(r) = D1-D2


(1) D4(p) + Do.4(r) = D4-Do.4
(2) Ndp) + D1-D2(r) = N,-D1-D2
(2) D4-D0.4(p) + Ct(r ) = D 4 - D o . 4 - C t
(3) N~-D1-D2(p) + D4-Do.4-C~(r)= Nt-D1-D2-D4-Do.4-C t

(missing D3)

Nt-Dj-D2-D3-D0.4-C ~
( m i s s i n g D4)

(1) DI(p) + D2(r) = D1-D2


(1) D3(p) + D0.4(r) = D3-Do.4
(2) N,(p) + D1-D2(r} = NcD1-D2
(2) D3-Do.4(p) + Ct(r ) = D3-Do.4-C~
(3) Nt-D1-D2(p) + D3-Do.4-C~(r)= N~-D1-D2-D3-Do.4-C~

The (p) or (r) following each cassette in the ligation reactions indicates whether that cassette was used as a p- or r-cassette, respectively, in the given cloning procedure. The proal(II) insert in each p-cassette, with one exception noted in b, was released intact by
digestion with SpeI and BsrBI (in the case of cassette inserts ending with D1, D2 or D0.4); or Spel and BstUI (in the case of cassette
inserts ending D 3 or D4}. The r-cassette was opened, with exception described in b, by digestion with SpeI and Srfl. The number in
parentheses preceding each ligation reaction describes the temporal order in which these reactions were performed. Note that
many reactions were performed concomitantly. The constructs appearing as a result of final ligation reactions were the functional
constructs which were transferred to a mammalian expression vector and used in subsequent transfection procedures.

in any desired order to the 5' end of a Ct cassette in Plasmid II (Fig. 3). To reduce the number of cloning steps,
combined cassettes of more than one D-period were
used as inserts (Table III). The final D N A construct was
capped by the addition of the Nt cassette to the 5' end of
an r-cassette.

Stable transfection of HT-1080 cells with functional


procollagen II constructs
Five constructs were assembled from the cassettes (Tables II and III), and the sequences spanning the junctions

were analyzed by dideoxynucleotide sequencing. N o


mutations were found in the junctions (not shown). The
five D N A constructs were each placed under the C M V
promoter in a mammalian expression vector containing
a gene encoding neomycin resistance (Fig. 4c) and used
for the transfection of HT-1080 cells. For each construct, approximately one hundred G418-resistant
clones were screened by Western blotting for the secretion of recombinant procollagen (Fig. 5). The percentage
of positive clones obtained per transfection with each of
the five D N A constructs ranged from 10 to 25%. The
best producing clones were selected from each set based

A cDNA Cassette System

113

A
+

Q-

type
m

r~

-D1 pro

a
I

a
I

LL
I

pro al (I)
pro c 2(I)

Figure 5. Screening of G418-resistant clones by Western blotting. Panel A: Western blot demonstrating a positive signal in
one of four clones transfected with a recombinant procollagen
II missing the Dl-period. The band below the recombinant
protein is a partially processed procd(II) chain lacking a Dl-period. Panel B: A corresponding phosphor storage image of the
Western blot demonstrating ~4C-radiolabeled proteins. Symbols: FN, fibronectin; Type IV, collagen IV; "+", lane containing a positive clone.

on the amount of recombinant protein secretion relative


to endogenous type IV collagen secretion as determined
by SDS-PAGE followed by visualization of the radiolabeled proteins with a phosphor storage imager (Fertala
et al., 1994). The ratio of secreted recombinant protein
to type IV was essentially the same in high producing
clones transfected with constructs lacking a D period
and the full-length construct. Therefore, there was no indication of intracellular retention of the proteins.
Purification and analysis of recombinant
procollagens II
Clones expressing the recombinant proteins were chosen for large scale procollagen production, and the procollagens were purified chromatographically. Each recombinant procollagen was concentrated; SDS-PAGE of
the five recombinant procollagens II demonstrated they
were homogeneous (Fig. 6). As expected, the procollagens with a deleted D-period (234 amino acids) migrated more rapidly than the wild-type protein. The initial yields of the recombinant procollagens from 4 1 of
culture medium were about 0.5 to 1.0 mg. However,
large and variable losses were encountered in the final

Figure 6. Purified recombinant procollagens II. SDS-PAGE of


recombinant procollagens II missing D~, D2, D 3 a n d D 4 periods.
Electrophoresis was performed under reducing conditions
using 7.5% polyacrylamide gel. Symbols: -D1, -D2, etc.,
procd(II) chains in which specific D-periods were deleted; I
pro, standard of wild-type procollagen I; FL, recombinant
full-length procollagen II (Fertala et al., 1994).

concentration step required for CD analysis, and the


final yields were 39 to 384 lag.
Each procollagen was analyzed for its amino acid
composition (not shown). The observed values agreed
well with the calculated values for variants of procollagen lacking specific D-periods, with three exceptions:
the values for glutamate plus glutamine (Glx) were consistently lower than calculated (70 to 76 observed vs 96
to 104 calculated), the values for histidine were consistently higher (11 to 31 vs 7 and 8 calculated), and the
values for tyrosine were also consistently higher than
calculated (19 to 40 vs 7 and 8 calculated). The reasons
for the three exceptions were not apparent. Essentially
the same values were obtained with duplicate samples
and with all four different recombinant procollagens.
Therefore, the exceptional values were unlikely to be explained by contaminants. Higher values for histidine (8
vs 2 calculated) and tyrosine (9 vs 2 calculated) were
previously observed for full-length type II procollagen
from the same recombinant system (Fertala et al., 1994).
Conformation of the recombinant procollagens
To assess the conformation of the recombinant proteins, the recombinant procollagens were examined by
CD. The spectra between 205 and 260 nm were similar
to the characteristic spectrum of triple-helical type II

114

W.V. Arnold et al.

procollagen (not shown). The magnitude of CD signal at


the maximum around 221 nm of the recombinant proteins was slightly lower than the maximum of the
wild-type procollagen, an observation consistent with
the lower amount of triple helical residues present in the
recombinant proteins. For reasons that were not apparent, the shape of the spectrum was slightly different with
the - D 3 procollagen, and the magnitude of the maximum
was that of the wild- type procollagen. In further experiments, all the recombinant proteins were shown to undergo sharp thermal transitions similar to those of native type II collagen (Arnold et al., in preparation).

Discussion
The DNA cassette system described here together with
the previously described expression system for recombinant procollagens (Fertala et al., 1994) can be used to
prepare novel fibrillar procollagens. The fidelity of the
over-all system in preparing these novel proteins was
verified by partial DNA sequencing, by amino acid analysis of the proteins, and by conformational analysis by
CD. The D-period deleted procollagens II were folded
into the triple helical structure as shown by the CD spectrum. The recombinant molecules are therefore useful to
determine the contribution of different regions to thermal stability (Engel, 1987; D61z and Heidemann, 1988;
Kadler et al., 1988; Morris et al., 1990; Westerhausen et
al., 1990; B/ichinger and Davis, 1991; B/ichinger et al.,
1993; Fertala et al., 1993), folding, fibril formation
(Prockop and Hulmes, 1994) and specific interactions
with other molecules.
The flexibility of the DNA cassette system allows for
the creation of a number of informative procollagen II
DNA constructs. It should be noted that the D-period
division of the triple-helical region of proixl(II) was a
somewhat arbitrary division of the functional domains
of the protein (Hulmes et al., 1973; Chapman, 1984).
Smaller or larger cassettes can readily be synthesized.
The nucleotide sequence coding for the human pro0cl(II)
triple-helical region contains 151 separate locations
where a Srfl site may be engineered as the 5' end of a
cassette. The 3' end of a cassette can be terminated in
any codon for a proline, alanine or serine. Similar cassettes can also be made for the propeptides. Furthermore, complex constructs can now be assembled from
the intermediate cassette constructs already available
(Table III). Also, the DNA cassette system can easily be
adapted to other procollagen cDNAs and even to
cDNAs for other large proteins. The crux of the strategy

is to introduce appropriate blunt-cutting restriction sites


which are compatible with the native nucleotide sequence encoding the protein of interest. This is generally
not difficult, given the availability of a large number of
blunt-cutting restriction enzymes and the need to match
only one-half of the restriction site to the native protein
nucleotide sequence.
One of the unexpected findings here was that four
modified procollagen II monomers lacking specific
D-periods were each secreted by the mammalian expression system as triple-helical monomers at 37 C. Previous reports demonstrated that some natural mutations
replacing a codon for an obligate glycine in procollagen
I lower the melting temperature below 37C and cause
intracellular retention and degradation of the protein
(Westerhausen et al., 1990; Prockop and Kivirikko,
1995; Kuivaniemi et al., 1996). Also, an in-frame deletion of three exons and 84 amino acids in the pro0d(I)
chain of type I procollagen had a similar effect (Kuivaniemi et al., 1996). The observation here that deletion
of a complete D-period of 234 amino acids did not prevent monomers of procollagen II from folding into a native conformation supports a large body of information
suggesting that each D-period contains some amino acid
sequences that stabilize and some that do not stabilize
the triple helix (Engel, 1987; D61z and Heidemann,
1988; Kadler et al., 1988; Morris et al., 1990; Westerhausen et al., 1990; B~ichinger and Davis, 1991;
B~ichinger et al., 1993; Fertala et al., 1993).
Since the modified procollagen II monomers were secreted in a triple-helical conformation, the results suggest that the strategy of D-period cassettes can be used
to map specific binding sites on fibrillar collagens. Observations on the patterns of monomer assembly in vitro
and on the patterns of covalent cross-links found within
mature fibrils suggested (Chapman, 1984; Piez, 1984;
Silver et al., 1992) but did not prove (Parkinson et al.,
1994) that the monomers assemble in fibrils because of
specific binding interactions that occur between sites in
the triple-helical domains and sites in the short telopeptides with non-helical sequences found at each end of the
monomers. Also, a series of observations suggested that
as the monomers assemble into fibrils and after the fibrils are formed, the collagens undergo a series of specific
binding interactions with other macromolecules of the
extracellular matrix (Hedbom and Heinegfird, 1993;
San Antonio et al., 1994). In addition, it has been
demonstrated that collagen fibrils serve as substrates for
cell binding, cell spreading and cell signaling through integrins and other receptors (Staatz et al., 1991; Gulberg
et al., 1992; Weston et al., 1994). However, definitive

A c D N A Cassette System
data on specific binding sites are not available, primarily
because site-directed mutagenesis of the whole
m o n o m e r s is technically difficult, and most short peptides do not fold into the triple-helical conformation
that is probably essential for specific binding interactions. The strategy of D-period cassettes should make it
possible to overcome these problems.

AcknowLedgements
This work was supported in part by National Institutes of
Health Grant AR-39740, a grant from the Lucille P. Markey
Charitable Trust, and a grant from the Shriners Hospital.

References
Ala-Kokko, L. and Prockop, D.J.: Completion of the intronexon structure of the gene for human type II procollagen
(COL2A1). Variations in the nucleotide sequences of the alleles from three chromosomes. Genomics 8" 454-460, 1990.
Ala-Kokko, L., Hyland, J., Smith, C., Kivirikko, K.L, Jimenez,
S.A. and Prockop, D.J.: Expression of a human cartilage procollagen gene (COL2A1) in mouse 3T3 cells. J. Biol. Chem.
266: 14175-14178, 1991.
B/ichinger, H.P. and Davis, J.M.: Sequence specific thermal stability of the collagen triple helix. Int. J. Biol. Macromol. 13:
152-162, 1991.
B/ichinger, H.P., Morris, N.P. and Davis, J.M.: Thermal stability
and folding of the collagen triple helix and the effects of mutations in osteogenesis imperfecta on the triple helix of type I
collagen. Am. J. Med. Genet. 45: 152-162, 1993.
Baldwin, C.T., Reginato, A.M., Smith, C., Jimenez, S.A. and
Prockop, D.J.: Structure of cDNA clones coding for human
type II procollagen. The ~1(II) chain is more similar to the
cxl(I) chain than two other a chains of fibrillar collagens.
Biochem. J. 262: 521-528, 1989.
Chapman, J.A.: Molecular organization in the collagen fibril.
In: Connective Tissue Matrix ed. by Hukins, D.W.L., Verlag
Chemie, Deerfield Beach, FL, 1984, pp. 89-132.
D61z, R. and Heidemann, E.: Influence of different tripeptides
on the stability of the collagen triple helix. I. Analysis of the
collagen sequence and identification of typical tripeptides.
Biopolymers 25: 1069-1080, 1986.
Engel, J., Chen, H.-T., Prockop, D.J. and Klump, H.: The triple
helix to coil conversion of collagen-like polytripeptides in
aqueous and nonaqueous solvents. Comparison of the thermodynamic parameters and the binding of water to
(L-Pro-L-Pro-Gly), and (L-Pro-L-Hyp-Gly)n. Biopolymers 16:
601-622, 1977.
Engel, J.: Folding and unfolding of collagen triple helices. In:
Advances in Meat Research, ed. by Pearson, A.M., Dudson,
T.R. and Bailey, A.J., Van Nostrand Reinhold, 1987, pp.
145-161.
Fertala, A., Sieron, A.L., Ganguly, A., Li, S.-W., Ala-Kokko, L.,
Anumula, K.R. and Prockop, D.J.: Synthesis of recombinant
human procollagen II in a stably transfected tumor cell line
(HT-1080). Biochem. J. 298: 31-37, 1994.
Germann, H.-P. and Heidemann, E.: A synthetic model of collagen: an experimental investigation of the triple-helix stability.
Biopolymers 27: 157-163, 1988.

115

Gullberg, D., Gehlson, K.R., Turner, D.C., Ahlen, K., Zijenah,


L.S., Barnes, M.J. and Rubin, K.: Analysis of 0t1~1, a2131
and ct3131 integrins in cell-collagen interactions: identification
of conformation dependent a1131 binding sites in collagen I.
EMBO J. I1: 3865-3873, 1992.
Hedbom, E. and Heinegfird, D.: Binding of fibromodulin and
decorin to separate sites on fibrillar collagens. J. Biol. Chem.
268: 27307-27312, 1993.
Heidemann, E. and Roth, W.: Synthesis and investigation of
collagen model peptides. Adv. Polym. Sci. 43: 143-153,
1983.
Hulmes, D.J.S., Miller, A., Parry, D.A.D, Piez, K.A. and Woodhead-Galloway, J.: Analysis of the primary structure of collagen for the origins of molecular packing. J. Mol. Biol. 79:
137-148, 1973.
Inouye, K., Kobayashi, Y., Kyogoku, Y., Kishida, Y., Sakakibara, S. and Prockop, D.J.: Synthesis and physical properties
of (hydroxyproline-proline-glycine)10: Hydroxyproline in
the X-position decreases the melting temperature of the collagen triple helix. Arch. Biochem. Biophys. 219: 198-203,
1982.
Kadler, K.E., Hojima, Y. and Prockop, D.J.: Assembly of type I
collagen de novo. Between 37 and 41 C the process is limited by micro-unfolding of monomers. J. Biol. Chem. 263:
10517-10523, 1988.
Kiihn, K., Fietzek, P. and Ki.ihn, J.: The action of proteolytic
enzymes on collagen. Biochem. Z. 344: 418-434, 1966.
Kuivaniemi, H., Tromp, G. and Prockop, D.J.: Mutations in
fibrillar collagen (types 1, II, III and XI), fibril-associated collagen (type IX), and network-forming collagen (type X) cause
a spectrum of diseases of bone, cartilage, and blood vessels.
Human Mutation 9: 300-315, 1997.
Morris, N.P., Watt, S.L., Davis, J.M. and B/ichinger, H.P.: Unfolding intermediates in the triple helix to coil transition of
bovine type XI collagen and human type V collagens a1(2)a2
and 0d~21x3. J. Biol. Chem. 265: 10081-10087, 1990.
Parkinson, J., Kadler, K.E. and Bass, A.: Self-assembly of
rod-like particles in two dimensions: a simple model of collagen fibrillogenesis. Physical Rev. E 50: 2963-2966, 1994.
Piez, K.A.: Molecular and aggregate structures of the collagens.
In: Extracellular Matrix Biochemistry, ed. by Piez, K.A. and
Reddi, A.H., Elsevier, New York, 1984, pp. 1-40.
Privalov, EL.: Stability of proteins. Proteins which do not present a single cooperative system. Adv. Proteins Chem. 35:
1-104, 1982
Prockop, D.J. and Hulmes, D.J.S.: Assembly of collagen fibrils
de novo from soluble precursors: Polymerization and copolymerization of procollagen, pN-collagen, and mutated collagens. In: Extracellular Matrix Assembly and Structure, ed. by
Yurchenco, ED., Birk, D.E. and Mecham, R.P., Academic
Press, New York, 1994, pp. 47-90.
Prockop, D.J. and Kivirikko, K.I.: Collagens: molecular biology, diseases, and potentials for therapy. Annu. Rev.
Biochem. 64: 403-434, 1995.
Prockop, D.J., Berg, R.A., Kivirikko, K.I. and Uitto, J.: Intracellular steps in the biosynthesis of collagen. In: Biochemistry of
Collagen, ed. by Ramachandran, G.N. and Reddi, A.H.,
Plenum, New York, 1976,.pp. 163-273.
Roth, W. and Heidemann, E.: Triple helix-coil transition of covalently bridged collagen-like peptides. Biopolymers 19:
1909-1917, 1980.
Ryh~inen, L., Zaragoza, E.J. and Uitto, J.: Conformational stability of type I collagen triple helix: evidence for temporary
and local relaxation of the protein conformation using a pro-

116

W.V. Arnold et al.

teolytic probe. Arch. Biochem. Biophys. 223: 562-571,


1983.
Sakakibara, S., Inouye, K., Shudo, K., Kishida, Y., Kobayashi,
Y. and Prockop, D.J.: Synthesis of (Pro-Hyp-Gly). of defined
molecular weights. Evidence for the stabilization of collagen
triple helix by hydroxyproline. Biochim. Biophys. Acta 303:
198-202, 1973
San Antonio, J.D., Lander, A.D., Karnovsky, M.J. and Slayter,
H.S.: Mapping the heparin-binding sites on type I collagen
monomers and fibrils. J. Cell Biol. 125: 1179-1188, 1994.
Sieron, A.L., Fertala, A., Ala-Kokko, L. and Prockop, D.J.:
Deletion of a large domain in recombinant human procollagen II does not alter the thermal stability of the triple helix. J.
Biol. Chem. 268: 21232-21237, 1993.
Silver, D., Miller, J., Harrison, R. and Prockop, D.J.: Helical
model of nucleation and propagation to account for the
growth of type I collagen fibrils from symmetrical pointed
tips: A special example of self-assembly of rod-like monomers. Proc. Natl. Acad. Sci. USA 89" 9860-9864, 1992.
Staatz, W.D., Foks, K.E, Zutter, M.M., Adams, S.P., Rodriguez,
B.A. and Santoro, S.A.: Identification of a tetrapeptide recognition sequence for the 0t2~1 integrin in collagen. J. Biol.
Chem. 266: 7363-7367, 1991.

Tromp, G., Kuivaniemi, H., Stacey, A., Shikata, H., Baldwin,


C.T., Jaenisch, R. and Prockop, D.J.: Structure of a fulllength cDNA clone for the prepro~l(I) chain of human type I
procollagen. Biochem. J. 253: 919-922, 1988.
van der Rest, M. and Garrone, R.: Collagen family of proteins.
FASEB J. 5: 2814-2823, 1991.
Westerhausen, A., Kishi, J. and Prockop, D.J." Mutations that
substitute serine for glycine c~1-598 and glycine ~1-631 in
type I procollagen. Effects on thermal unfolding of the triple
helix are position-specific and demonstrate that the protein
unfolds through a series of cooperative blocks. J. Biol. Chem.
265" 13995-14000, 1990.
Weston, S.A., Hulmes, D.J.S., Mould, A.P., Watson, R.B. and
Humphries, M.J.: Identification of integrin ct2131 as cell surface receptor for the carboxyl-terminal propeptide of type I
procollagen. J. Biol. Chem. 269: 20982-20986, 1994.
Dr. Darwin J. Prockop, Center for Gene Therapy, Allegheny
University of the Health Sciences, 245 North 15 Street, M.S.
421, Philadelphia, PA 19102.
Received April 4, 1997; accepted May 21, 1997

You might also like