You are on page 1of 36

ANNUAL

REVIEWS Further
Quick links to online content
Annu. Rev. Biochem. 1990. 59:837-72
Copyright © 1990 by Annual Reviews Inc. All rights reserved

THE FAMILY OF COLLAGEN GENES

Eero Vuorio1 and Benoit de Cromhrugghe


The University of Texas M. D. Anderson Cancer Center, Department of Molecular
Genetics, Houston, Texas 77030
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

KEY WORDS: molecular structure, supramolccular organization, mutations, gene control,


evolutionary conservation.
by Boston University on 05/09/13. For personal use only.

CONTENTS

INTRODUCTION ..................................................................................... 837


FIBRIL-FORMING COLLAGENS AND THEIR GENES.................................... 839
Molecular and Supramolecular Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840
The Tri ple - Helical Doma in ..................................................................... 841
C - Prope ptide Domain . . .................................. ............................... ......... 846
N - P ropeptide Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847
NON-FIBRIL-FORMING COLLAGENS AND THEIR GENES ............................ 849
Fibril - A ssocia ted Collagen s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849
Types VlII and X Collagen s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854
Type IV Collag en . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854
Type VI Collagen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
Type XUI Collagen............................................................................... 857
COLLAGEN GENES IN INVERTEBRATES................................................... 858
REGULATORY SEGMENTS IN COLLAGEN GENES...................................... 859
aI(1) a nd a2(1) C ollagen G ene s . . ........... . . ............................... ................. 860
al(IlI) Collagen Gene ....... . . ......... ................ ................. ............... ......... 864
a1 (II) Collag en Gen e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865
Type IV Collagen Gene s . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865
CONCLUSiONS....................................................................................... 865

INTRODUCTION

The collagens constitute a family of related proteins that are assembled in a


variety of supramolecular structures in extracellular matrices. These struc­
tures illustrate how the basic motif of collagens was utilized to generate a
diversity of supramolecular matrix networks to accomplish an equally diverse
number of functions in the tissues of multicellular organisms. A vast literature

ISecond affiliation: The University of Turku, Department of Medical Biochemistry, 20520


Turku, Finla nd

837
0066-4 154/90/0701-0837$02.00
838 VUORIO & DE CROMBRUGGHE

on the different collagens has been reviewed over the last few years (1, 2 ).
The present article focuses mainly on the relationships between the molecular
and supramolecular organization of specific collagens and the structure of
their genes. In a second part, we also briefly review some of the regulatory
elements of these genes.
Our discussion includes only molecules that have been assigned to the
collagen family based on their structural and functional features. The different
collagens are referred to as collagen types and are designated by Roman
numerals I-XIII. In all these molecules a major component of the protein is a
triple-helical structure of three polypeptide chains (a-chains) with a
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

characteristic Gly-X- Y repeat sequence. Other specific features of collagens


include a high content of proline, alanine, and lysine residues and extensive
posttranslational modifications that encompass hydroxylation of proline and
by Boston University on 05/09/13. For personal use only.

lysine residues, various glycosylations, and formation of intermolecular


cross-links through lysine and hydroxylysine residues (3, 4). In the ex­
tracellular matrix these triple-helical molecules are the constituents of fibrillar
and other supramolecular networks. This discussion does not include proteins
such as Cl q, acetylcholinesterase, conglutinin, surfactant proteins, and others
that contain collagenous domains but are not thought to participate in supra­
molecular structures of the extracellular matrix.
The Roman numerals denoting collagen types follow the order in which
they were discovered. For each collagen type the a-chains are identified with
Arabic numerals followed by the Roman numeral for the type in parenthesis,
e . g. al(l), a2(I), al(II), etc. The chain composition of the corresponding
heterotrimeric type I collagen molecule is written [al(I)]za2(1), and that of
the homotrimeric type II collagen molecule al(lIh. The gene loci for the
members of the collagen family have been given names beginning with COL,
followed by an Arabic number denoting the collagen type, letter A (for A
chain), and another Arabic number for the a-chain in question. Thus COL­
lA2 is the gene locus for the a2(1) chain, and COL2A l for the al(lI) chain. A
list of the collagen types, their constituent a-chains, and the corresponding
gene loci is presented in Table 1.
To date 1 3 collagen types have been identified and characterized in suf­
ficient detail to warrant their classification in the collagen family. A minimum
of 2 5 genes are needed to code for the constituent chains of these 13 collagen
types. The genes are dispersed in the human genome (5-12 ), although in three
cases two genes have been mapped to the same region: al(III) and a2(V) to
2 q24.3-q3l (5, 6), al(IV) and a2(IV) to 13q34 (5), and al(VI) and a2(VI)
to 2 l q2 2 3 (8). Genomic and/or cDNA sequences corresponding to 19 differ­
ent collagen a-chains have been reported, mostly for human and chick , but
also for bovine, rat, and mouse. No eDNA or gene sequences are yet available
for type VII collagen, whereas type XIII collagen has been characterized at
the gene and mRNA levels only. Genomic and cDNA sequences of collagens
COLLAGEN GENES 839

Table 1 Gentically distinct collagen types

Constituent Gene Chromosomal Chain Supramolecular


Type chains locus localization composition structure

0'1(1) COLIAI 17q21.3-q22 [aI (1))20'2(1), 67 nm banded fibril


0'2(1) COLlA2 7q21.3-q22 [ 0'1(1)13
II a I (II) COL2AI 12q13-q14 [O'I(II)13 67 nm banded fibril
III al(III) COL3Al 2q24.3-q31 [a I (III)h 67 nm banded fibril
IV nl(\V) COL4AI 13q34 [0'1 (IV))zO'I (lV), 3-dimensional network
a2(1V) COL4A2 13q34 and minor forms in basement membranes
o,3(IV) COL4A3
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

o14(IV) COL4A4
as (IV) COL4As X q22
V (d(V) COLsAI r aI (V)ha2(V), 67 nm banded fibril
«2(V) COL5A2 2q24.3-q31 a I(V)O'2(V)O'3(V),
by Boston University on 05/09/13. For personal use only.

n3(V) COLsA3 and other forms


VI al(VI) COL6AI 21q223 a I (VI)a2(VI)a3(VI) Microfibrillar network
,,2(VI) COL6A2 2 1q 22 3 (100 nm banding)
,,3(Vl) COL6A3 2q37
VII "I (VII) COL7Al [al(VII)]3 Anchoring fibrils
VIII «I(VIII) COL8AI unk nown Filamentous lattice
IX "I (IX) COL9Al 6q12-ql4 "l(IX)o:2(IX)o:3(1X) Lateral association
u2(IX) COL9A2 to banded fibrils
u3(1X) COL9A3
X ,} I (X) COLIOAI [aI(X)13 unknown
XI al(XI) COLIlAI Ip21 al(XI)a2(Xl)a3(XI), 67 nm banded fibril
,:.2(XI) COLl IA2 6 p21 2 and other forms
o:3(XI) COL2Al 12q13-ql4
XII (l:l(XII) COLl 2AI [aI(XII)h Association with
banded fibrils?
XIII o:l(XIII) COLl3Al lO ql l-qte r unknown unknown

have also been reported for several invertebrates including sea urchins, fruit
flies, and slime molds.
Based on their supramolecular structures, the collagens are divided into two
main classes: fibril-forming (or fibrillar) collagens and non-fibril-forming
collagens. The former group contains molecules with long continuous triple
helices, which are the constituents of banded collagen fibrils. The non-fibril­
forming collagens are more heterogenous and have been further classified
according to their molecular characteristics, supramolecular structures, and
the types of extracellular networks that they form.

FIBRIL-FORMING COLLAGENS AND THEIR GENES

Based 011 their protein and gene structures, types I, II, III, V , and XI
collagens have been assigned to the fibril-forming group. By forming highly
organized fibers and fibrils, these collagens provide the structural support for
840 VUORIO & DE CROMBRUGGHE

the body in skeleton, skin, blood vessels, nerves, intestines, and in the fibrous
capsules of organs.

Molecular and Supramolecular Organization


Type I collagen, the prototype of the family and its most abundant member , is
a heterotrimer of two identical a l (l) chains and one a2(1) chain. Small
amounts of homotrimeric a l (Ih molecules have also been observed. Types II
and III collagens are homotrimers of al(lI) and a l (Ill) chains. The chain
composition of type V collagen is variable: the most common structure is
[ a l (V)z]a2(V), but homotrimers of a l (V) have also been detected as well as
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

heterotrimers a l (V)a2(V)a3(V) ( 1 3). The predominant form of type XI


collagen is a l (XI)a2(XI)a3(XI) (1 4). The a3(XI) and a l (lI) chains are
considered to be products of the same gene with differences in posttranslation­
by Boston University on 05/09/13. For personal use only.

al processing ( 14 , 1 5) . The a l (XI) chain has also been identified as a


constituent of bone type V collagen, exhibiting a chain composition
a l (V)a2(V)al(XI) (16). In addition to the exchange of a-chains between
collagen types, there is evidence of heterotypic fibrils composed of different
fibrillar collagen molecules. Such supramolecular interactions between fibril­
lar collagen types have been demonstrated for types I and III (17, 18), types I
and V (19, 20), and types II and Xl (21).
Electron microscopic studies of collagen fibrils indicate that the individual
collagen molecules are arranged in a quarter-stagger array (for reviews see 2 ,
2 2 , 23) . According t o this model (Figure 1 ) , the rodlike collagen molecules

gap = 40nm

D 67nm
""
rI
r--L--, collagen molecule 4.4D = 300nm
I
I

A "':=
::= ====:::::J
rJ======::::J ' B 'D
c:;:::========::::J
:::
C CI ========::::J::::j I

I I I I
Figure 1 Schematic presentation of the quarter-staggered assembly of collagen fibrils. In­
dividual 300-nm collagen molecules overlap each other by a distance D of 67 nm or mUltiples of
67 nm. Each molecule is 4.4 D long. leaving a gap of 0 . 6 D (40 nm) between the ends of
non-overlapping molecules. The four arrows at the bottom of the figure mark the locations of the
cross-linking residues. one in the N-telopeptide. one in the C-telopeptide. and two others. 0.4 D
from each end of the triple-helix. Two different intermolecular cross-links are shown between
collagen molecules A and B. and C and D.
COLLAGEN GENES 841

overlap eaeh other by integral multiples of 67 nm (distance D). Since each


molecule is 4.4 D long, a gap of 40 nm is left between the ends of non­
overlapping molecules . This quarter-staggered assembly of collagen mole­
cules in microfibrillar units is a direct consequence of the unique properties of
the protein. To allow for correct association of a-chains into a tight tripl�­
helix, no interruptions exist in the Gly-X-Y repeating units. In addition,
owing to the heterotypic nature of the fibrils, the length of the triple helical
domain of the a-chain must remain very similar (1014-1029 amino acids) for
each fibrillar collagen. The fibril is further stabilized by formation of covalent
intra- and intermolecular cross-links through specific lysine and hydroxyly­
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

sine residues in strictly conserved positions (Figure 1; for review see 2 4).
When cross-linking is inhibited, e.g. by lathyrogens, the tensile strength of
collagen fibrils is drastically reduced.
by Boston University on 05/09/13. For personal use only.

The a- chains of fibrillar collagens are synthesized as precursor pro a-chains


with amino- and carboxy-terminal extensions, referred to as N- and C­
propeptide:s (Figure 2). After a number of posttranslational modifications,
three proa-chains associate through their C-propeptides and fold into a triple­
helical procollagen molecule. Extracellularly, specific N- and C-proteases
remove the propeptides. The remaining collagen molecule consists of the
triple-heli(�al domain and short N- and C-te1opeptides. The genes for the
major fibril-forming collagens, types I, II, and III, have been characterized in
detail (63--83). These genes contain 51-54 exons and have sizes between 16
and 44 kilobases.

The Triple-Helical Domain


GENE ORGANIZATION The domain consists of 44 exons ; 23 of these have a
size of 54 bp, eight are 108 bp, and one is 162 bp, five exons are 45 bp and
another five 99 bp long (65, 66, 73). The beginning and the end of the
triple-helix are coded for by so-called joining exons, which also code for
te10peptide and propeptide sequences. Differences in the number of the
Gly-X- Y repeats in the N- and C- termin al joining exons account for the minor
variations in the length of the triple-helical domain (1014-1029 amino acids).
All triple-helical exon sizes are multiples of 9 bp (45, 54, 99, 108, and 162 bp
coding for 5, 6, 11, 12, and 18 Gly-X-Y triplets, respectively). In all fibrillar
collagen genes these exons start exactly with a complete codon for Gly, and
end precisely with a complete codon for the amino acid in the Y position.
Since the majority of exons has a length of 54 bp, it was proposed that the
ancestral gene for these collagen genes arose by amplification of a DNA unit
containing a 54-bp exon embedded in intron sequences (63). Exons of 108
and 162 bp could have resulted from the loss of intervening introns, whereas
exons of 45 and 99 bp could have arisen by recombination between two 54-bp
exons. It is not clear why such recombinational events between two 54-bp
exons would have only generated exons of two sizes, i.e. 45 and 99 bp. In
Fibril-forming collagens Basement membrane (Type IV) collagen

(3
+I �
"- "- :r
,,-I 0,- II-;- :r: 1-;-1 0,- 2 I- 2 00
(1). 2 , 2 � �O �
"1(IV) .1 I III I 1111 I II II II II I � - N
DI:::JI ._ "- <0
'" �
C')
g)
to "- en "- "-
� <0
�� � ..- ;! '"
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

N � �
:::: ;e� G
l�
N
on ..- <0

N
2
:r:
I- 2 -<
"2(IV) c:::
01 I III I III! I III II II II 0
onC'l ex:>
C'lC') C'l �
'" CS
Fibril-associated collagens ;! 0
by Boston University on 05/09/13. For personal use only.

..,. '"
...J C')
N
.J "'.J�
Ro
(.) (.) 0 ,-,ou
! 2
0
(.) 2 '-' 2'-'2
Microfi brillar (Type VI) collagen
t:i
tTl
Q{ 1(IX) /" 2(IX) D 1111
r-- "- en 0"'''' n
!:::'
C')

..,.
M

N
""
""
M;::!:: :;
0 G ;:0
'" N
'" �
l � '-' 2 0
" 1 (VI)la 2(V1) 01 �
0 on tc
� �
M
;:0

N �
al
C')
N

M
co
!;j!
c:::
a
--IN .....J ""'"
a
� aCJ aU
02 UZ ::r::
"1(XI/) __ II trl
C\J CI') CO) to

§ � ��,...... " 3(V1)


'
/" 8 � 0

� "" �
I
I
Type X and Type VIII collagens
Type XIII collagen

� � G
J11111111
z: (.) 2 "'-�C'J�C'I') :3�
1- &��8�8 � 8 §E
<>1(X) co<>
Q'1(X1I1) Dill III! I I
�'"
� � 8������
a 1(VIII) � ';f, g:
� '"
"

Figure 2 Domain structures of the different collagen types, Open rectangles represent triple-helical domains with bars representing interruptions in the
Gly-X- Y repeat structure, Dark rectangles denote globular domains, and dotted areas signal peptides, The arrows and gaps mark the sites of posttranslational
cleavage, The sizes (in amino acid residues) are shown for each domain: SP, signal peptide; N-P, N-propeptide; N-TP, N-telopeptide, TH, triple-helix, C-TP,
C-telopeptide; CP, C-propeptide . For fibril-forming collagens the data are largely derived from cDNA sequences for human type I (24-29), II (30, 3 1 ), III
(32-36), a2(V) (37-40), and aI(XI) collagen (41) chains, The data for the other collagens are based on eDNA clones for chick aJ(IX) and a2(1X) chains
(42-46), for chick al (XII) collagen (47, 48), for chick al (X) (49, 50) and rabbit aI(VIII) collagen (45 , 5 1 ) , for human al(IV) and a20V) collagens
(52-56), for human type VI collagen chains (57-60) , and for human aI (XIll) collagen (6 1 , 62),
COLLAGEN GENES 843

nonfibrillar collagen genes, examples of exon sizes abound that are multiples
of 9 bp but different from 54, 45, and 99 bp.
The overall organization and succession of exons coding for the triple­
helical domain of fibril-forming collagen genes is shown in Table 2 . The
available sequence data (63-83) indicate that all fibrillar collagen genes
display the exact same pattern of exon sizes shown in Table ::: with three
minor exc(:ptions: (a) in the al(l) collagen gene a single 108-bp exon (exon
33/34) replaces two 54-bp exons (exons 33 and 34) (68, 84), (b) in the a2(XI)
collagen gene the last 108-bp exon (exon 48) has been replaced by exons of 54
bp, 36bp, and 54 bp, with only 18 bp of triple-helical domain in the
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

C-terminal joining exon (11), and (c) in the cDNA sequence of human and
mouse a l(III) collagen an additional Gly-X-Y triplet appears near the N­
terminal end of the triple-helix (34, 36, 83). Since the complete exon structure
by Boston University on 05/09/13. For personal use only.

of the type III collagen gene is not known, the exact location of this additional
triplet remains unknown. Although the amino acid residues in the X and Y
positions show considerable divergence within a collagen chain and between
different chains, the extreme degree of conservation for both the exon struc­
tures and the pattern of exon sizes implies a common origin. The pattern
appears to have been established before the invertebrate-vertebrate radiation,
since a sea urchin collagen gene has a very similar structure (85). This high
degree of conservation also suggests that once the triple-helical molecular
structure and the highly organized supramolecular assembly of fibrils were
established, no changes in these structures were tolerated. The dramatic

Table 2 Exon structure of the triple-helical domain of fibrillar collagcn genes'

Exon no. Size (bp) Exon no. Size (bp) Exon no. Size (bp)

7 45* 21 1 08 35 54
8 54 22 54 36 54
9 54 23 99 37 1 08
10 54 24 54 38 54
11 54 25 99 39 54
12 54 26 54 40 1 62
13 45 27 54 41 1 08
14 54 28 54 42 1 08
15 45 29 54 43 54
16 54 30 45 44 1 08
17 99 31 99 45 54
18 45 32 1 08 46 1 08
19 99 33 S4* 47 S4
20 54 34 54* 48 1 08*

a In addition to the 42 exons coding for 332 Gly-X-Y triplets, the N- and C-terminal

joining exons code for 1-3 and 5-7 triplets, respectively. The three minor deviations from
the wnserved pattern are marked with asterisks as discussed in the text.
844 VUORIO & DE CROMBRUGGHE

consequences of both Gly substitutions and exon deletions that are found in
mutant collagens illustrate this point.

MUTATIONS IN TRIPLE-HELICAL DOMAIN Mutations have been identified


in four different genes, the two genes specifying the human cd(l) and a2(1)
collagen chains (86-90) and the genes for the human a l(I1) (91, 92) and
al(III) collagen (93, 94). The different mutations have been found associated
with the diseases known as Osteogenesis Imperfecta (01), Ehlers Danlos
Syndromes (EDS), and various types of Chondrodysplasias, each covering a
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

spectrum of clinical entities. The two principal characteristics of these muta­


tions are (a) that the mutations are dominant and (b) that mutations throughout
the triple-helix cause abnormal phenotypes.
by Boston University on 05/09/13. For personal use only.

Dominant nature of the mutations The conclusion that mutations in a single


allele are sufficient to cause an abnormal phenotype is mainly based on the
biochemical identification of both normal and abnormal chains of the same
coHagen polypeptide in fibroblasts from affected individuals, and on the
identification of both the normal and mutant DNA sequence in the alleles of
these individuals (86-94). The dominant character of these mutations is
further illustrated by the existence of transgenic mice expressing a mutant
mouse 0'1(1) collagen transgene in which a single Cys-for-Gly substitution
causes a phenotype typical of perinatal lethal 01 in humans (95). This 01
phenotype of transgenic mice occurs even if the expression of the mutated
transgene is only 10% that of the corresponding normal endogenous alleles,
illustrating the highly dominant character of the mutation.
Several types of mutations have been identified and include deletions,
insertions, point mutations, and splice site mutations (86-94, 96). The de­
letions that have been reported are intron-intron deletions and do not alter the
translational reading frame of the polypeptides, since all triple-helical exons
start with a complete Gly codon and end with a complete Y codon. Splice site
mutations are functionally equivalent to deletion mutations, since they cause
the deletion of one or more exon sequences in the mature, processed RNA
(97-lO0). Point mutations that have been identified so far are substitutions of
Gly residues.
Several mechanisms can account for the dominant nature of these muta­
tions. One important characteristic, initially observed in mutant type I colla­
gen chains with mutations that map in the triple-helical domain, is a post­
translational overmodification, which consists of an overglycosylation of
hydroxylysine residues (101). It is likely that the mutations cause a delayed
intracellular assembly of the triple-helical molecules leading in tum to the
overmodification and to a slower secretion of the overmodified mutant mole­
cules. Either the disruption of the triple-helix caused by the mutation and/or
COLLAGEN GENES 845

the slower process of triple-helix formation would render the molecule more
susceptible to degradation. What is important is that the existence within a
procollagen molecule of one mutant chain is sufficient to cause degradation of
the entire molecule. Hence, a much smaller proportion of collagen molecules
accumulate:s in the matrix, leading to a severe deficiency of molecules
available for fibrillogenesis. This phenomenon has been called "protein
suicide" (102, 103).
A second possible mechanism to explain the dominant character of these
mutations is that the presence of mutant molecules in collagen fibrils would
cause defective phenotypes (87, 95). Because fibrils are made of many
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

collagen molecules, it is possible that the presence of even a few mutant


collagen molecules is sufficient to generate defective fibrils. One molecular
mechanism that might generate abnormal fibrils is the persistence of the
by Boston University on 05/09/13. For personal use only.

N-propeptide in the mutant collagen molecule . Indeed, mutations in the


triple-helical domain several hundred amino acids away from the N­
proteinase cleavage site have been shown to render all three chains in the
procollagen molecule more resistant to cleavage by the enzyme N-proteinase
(104, 105). The persistence of the N-propeptide in mutant collagen molecules
could disrupt the process of fibril assembly and cause the formation of
abnormal fibrils. Regardless of the exact mechanisms that cause the fibrils to
be defective, it is possible that these fibrils also become more susceptible to
degradation by extracellular proteolytic enzymes. The existence of abnormal
collagen fibrils must account at least in part for the abnormal phenotype in
transgenic mice with perinatal lethal 01 (9 5). Indeed, since in some transgenic
mice the mutant collagen transgenes are expressed at only 10% the level of the
normal endogenous genes, only a small proportion of all the type I pro­
collagen molecules made by the cells of the transgenic animals should be
defective. Hence, both interruptions in the Gly-X-Y repeat and reductions in
the length of the triple-helical domain occurring in a single allele are able to
disrupt the normal assembly of procollagen molecules within the cells as well
as that of the collagen fibrils outside the cells. In a sense the Gly substitutions
in the mutant chain of type I collagen correspond to interruptions in the
triple-helix found in other collagens. It is obvious that such substitutions,
which were functionally tolerated in these other collagens, were not tolerated
in the fibIiI-forming collagens.

Mutations throughout the triple-helical domain cause abnormal pheno­


types Since point mutations throughout the entire triple-helical domain of
proal(I) and proa2(I) chains are associated with abnormal phenotypes, it is
obvious that the integrity of the entire triple-helical domain(s) of type I
procollagen and probably that of all fibrillar collagens is required for proper
function (86-89).
846 VUORIO & DE CROMBRUGGHE

However, not all mutations are equivalent in terms of the severity of their
phenotype (87, 88, 106, 107). Generally mutations in the al (l) chain have a
more severe phenotype than mutations in the a2(1) chain. This could be the
consequence of the 2: I ratio of these polypeptides in the type I collagen. A
mutation in the al chain will affect more molecules (75%) than a mutation in
the a2 chain (50%) (87, 89). In addition, mutations located closer to the
carboxy-terminal end of the triple-helical domain have generally more severe
effects than mutations that are further removed from the carboxy-terminal
end. The best indication of this "gradient" phenomenon came from a compari­
son of Gly ......,.. Cys substitutions occurring at three different locations along the
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

triple-helical domain of the proa I (I) chain. Indeed the abnormal phenotype
becomes less severe when the Cys substitution moves toward the N-terminus
of the triple-helix of the proa I(I) chain (10 I). It is clear that the nature of the
by Boston University on 05/09/13. For personal use only.

substitution and the local environment where the substitution occurs also play
a role.
In summary, the two major properties of mutations in the triple-helical
domain of fibril-forming collagens are entirely consistent with the concept
that the rigidly conserved features of these molecules, which are determined
by the structural properties of their genes, are essential for the correct function
of these collagen molecules in their supramolecular complexes.

C-Propeptide Domain
The C-propeptides of fibrillar collagens share the highest degree of sequence
similarity both between different types and between species. The 243-247-
amino-acid C-propeptide is removed extracellularly by a specific C-protease,
leaving a short telopeptide of 11-27 amino acids attached to the triple-helix
(Figure 2 ). The C-propeptide has a globular structure that is stabilized by two
intra-chain disulfide bonds formed by the four carboxy-terminal Cys residues.
Another three or four Cys residues that are located closer to the telopeptide
participate in inter-chain disulfide bonding. The formation of these intra- and
inter-chain disulfide bridges precedes the assembly of the triple-helix and
allows the correct alignment of three a-chains that associate inside the cells
into a triple-helical molecule starting from the C-terminal end of the mole­
cule. Mutations altering the structure of the C-propeptide have confirmed that
this domain plays an important role in chain association (108-110). The
mechanism selecting the correct a-chains for each molecule within cells
synthesizing more than one collagen type at the same time is not understood.
It has been suggested that the number of Cys residues available for inter-chain
disulfide bonding dctermines whether the a-chains form homo- or heterotrim­
ers (39). The procollagen chains that form heterotrimers have only three Cys
residues for intcr-chain disulfide bonding [a2(1), a2(V), and al(XI)], where­
as those that are capable of forming homotrimers [al(l), al(II), and al(III)]
have four.
COLLAGEN GENES 847

The locations of the Cys residues in the C-propeptides are strictly con­
served. Their neighboring sequences are also conserved as well as the se­
quence arolJnd the carbohydrate attachment site. The C-propeptide domain is
specified by four exons (exons 49 to 52). Since the last two exons are identical
in size, the small variations in the length of the propeptide result from small
changes in size for exons 49 and 50. The highest degree of sequence identity
is seen in exon 51 around the carbohydrate attachment domain; interestingly,
this homology is also evident for the nucleotide sequence (111). The joining
exon (exon 49) codes for the end of the triple-helix, the telopeptide, and the
beginning of the C-propeptide. The length of this triple-helical sequence
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

varies between 45 and 63 bp and contributes to the differences in the size of


this exon. Partial sequence for the C-propeptide domain of the human a2(XI)
collagen gene adds more divergence to the exon structure: in this gene the
joining exon contains only 1 8 bp of Gly X -Y sequence, and the sizes of the
by Boston University on 05/09/13. For personal use only.

other exons for the C-propeptide show more variation (11).


Another typical feature for fibrillar collagens is the presence of several
polyadenylation sites in the gene transcripts. This gives rise typically to two
mRNA size classes of approximately 5 kb and 6 kb (112, 1 1 3). Variations in
the ratio of the two transcripts have been detected, but the biological signifi­
cance of these variations is not well understood ( 1 1 4) .

N-Propeptide Domain
The N-propeptides of fibrillar collagens exhibit a much higher degree of
divergence both in length and in domain structure than the rest of the
polypeptides. The exon organizations of the gene segments coding for this
domain also show much more divergence (64-67, 82, 115-119; Figure 3).
Size variation is also seen in the same chain from different species. The
amino-terminal propeptide consists of the following elements: a signal pep­
tide, a Cys-rich globular domain, a short triple-helical region, and a short
globular domain ending in the N-telopeptide.
A 65-71.-amino-acid Cys-rich domain is coded for by exon 2 in the a 1 (1)
and al (III) gene but is not present in the a2(I) chain. In this gene two
vestigial exons of 11 and 15 bp code for a short globular domain. The function
of the Cys-rich domain remains unknown. A homologous domain has also
been detected in thrombospondin ( 1 20) and von Willebrand factor ( 1 2 1 ) . A
sequence for a similar Cys-rich domain is found in the a 1 (Il) collagen gene
but is only present in a fraction of the mRNAs (3 1 , 1 1 7 , 1 1 8). It is, therefore,
likely that this sequence undergoes alternative splicing in the al(II) collagen
pre-mRNA.
The triple-helical subdomain of the N-propeptide varies in length from 39
amino acids in the a 1 (1ll) chain to 79 amino acids in the a 1 (Il) and a2(V)
chains. This triple-helical sequence contains one interruption in the a l (Il) and
two intemlptions in the a2(V) chains (3 1 , 40, 82, 1 16) . The exons coding
848 VUORIO & DE CROMBRUGGHE

main
signal cys-rich globular telo- triple
peptide domain triple helical domain pertide helix
I
r-L--, I I
1 2 3 4 5 6
I::;;;;;;;;:M -m-- -D- ---c::J- -fiI]-

1 2 3 4 5 6
a 2(1) [JJ-- I -I- � ----c:::::J-- -IIIIJ--
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

1 2 3 4A 48 5A 58 6
a1(1I) 1321- --Il-O-O-D- -c:::3- -1111J---

1 2 3 4/5 6
by Boston University on 05/09/13. For personal use only.

a 1(111) CII-- ---IJ-- ---c:::J- �


Figure 3 Exon structure for the N-propeptide domain of type I, II, and III collagens. Exons are
drawn approximately to scale and illustrate variations in the sizes of exons coding for the different
subdomains. The data represent human gene structures for al(l) (67), a2(1) (66, 115), a l(II)
(116, 1l7), and al(III) collagen (82). In order to maintain the same exon numbering system for
the exons in the triple-helical domain of the different a-chains, the N-terminal joining exon is
numbered as exon 6 regardless of the number of exons preceding this exon.

for this domain show considerable divergence in size (see Figure 3). In the
a2(I) and al(III) genes this triple-helix is encoded by two exons, whereas the
a I (I) chain contains an additional triple-helical 36-bp exon, and the a1(II)

gene contains two 33-bp exons and one 54-bp exon .


The function of the N-propeptide remains poorly understood. After the
three a-chains have assembled to form a procollagen molecule, the propeptide
is enzymatically removed outside the cells (4). The enzymatic removal of the
N-propeptide by specific proteases requires that all three a-chains have the
correct alignment and configuration. Misalignment of the N-propeptides due
to deletion mutations within the helical domain prevents removal of the
N-propeptide and causes EDS type VII (103-105). Although it has been
reported that in some cases propeptides are retained by collagen molecules in
fibrils, their removal is generally considered a prerequisite for proper fibril­
logenesis (for a review, see Ref. 4). One hypothesis is that the N-propeptides
would have a role in regulating the diameter of collagen fibrils (122). Several
experiments in cell-free systems and in cell culture have also suggested that
the N-propeptides of type I collagen, or shorter N-propeptide-derived pep­
tides, regulate production of type I collagen at the translational level (123).
The biological significance of these experiments is poorly understood.
In summary, the strict conservation of both the repeating structure of the
tripIe-helical domain of these collagen molecules and of the highly organized
quarter-stagger supramolecular organization of the collagen fibrils has its
COLLAGEN GENES 849

correspondence in the strictly conserved properties of the exons coding for the
triple-helical domain of these proteins. The very high degree of conservation
of the pattern of triple-helical exon sizes is in contrast to the diversity of exon
sizes that code for the N-propeptide domain. Different parts of the gene
underwent, therefore, different types of selective pressure. Changes were
tolerated in the N-propeptide domain, but not in the triple-helical domain.

NON-FIBRIL-FORMING COLLAGENS AND THEIR


GENES
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

Grouped under the non-fibril-forming heading are all collagens that fall
outside the fibril-forming collagens. This group is very heterogeneous both
structurally and functionally. Several of these collagens constitute the com­
by Boston University on 05/09/13. For personal use only.

ponents of different extracellular matrix networks (types IV, VI, VII, VIII,
and possibly type X) or interact directly with the fibril-forming collagens
(type IX and possibly type XII) .
The exon structures and organizations of the genes for the non-fibril­
forming collagens diverge from those of the fibril-forming collagen genes,
although the basic 9-bp unit coding for Gly-X-Y is clearly maintained.
Complete exon structures are known for two nonfibrillar collagen genes: those
for a2(IX) and a l (IV) collagens . The structures of these genes show different
degrees of divergence from the fibrillar gene model , less for the gene for type
IX, considc!rably more for the gene for type IV collagen. The divergence in
exon sizes and organization in these genes is very likely related to the
difference in structure and function between nonfibrillar and fibrillar col­
lagens .

Fibril-Associated Collagens
This subgroup, which has been named FACIT for Fibril-Associated Collagens
with Intemlpted Triple-helices, contains the collagens IX and XII (45). Type
IX has been shown to be associated with type II collagen, whereas type XII,.
which shares many structural features with type IX, is thought to be associ­
ated with type I collagen .

TYPE IX COLLAGEN Type IX collagen is expressed in cartilages, where it


constitutes about 1 - 1 0% of total collagens, and in the primary corneal stroma
of chick embryos ( 1 24). While the exact function of this heterotrimeric ,
disulfide-bonded collagen remains unknown, a considerable amount of
structural information has been obtained from cDNA clones and by rotary
shadowing electron microscopy of purified type IX collagen molecules. Type
IX collagen contains three short triple-helical domains-COL3 , COL2 , and
COLl of 137 , 339, and 1 1 5 amino acids, respectively (42, 45)-interspersed
850 VUORIO & DE CROMBRUGGHE

with non-triple-helical domains designated NC3 and NC2 (Figure 2). The
amino- and carboxy-terminal noncollagenous domains (NC4 and NCI) share
no homology with the propeptides of the fibrillar collagens and do not appear
to be proteolytically processed. The NC3 domain of the a2(IX) chain is five
amino acids longer than in the a I (IX) chain and contains an attachment site
for a glycosaminoglycan side chain (44) . This may account for the sharp kink
observed by electron microscopy in the NC3 domain ( 1 25 ) .
Both CaL l and COL3 domains of a l (IX) and a2(IX) chains contain one
discontinuity in the Gly-X-Y sequence, which can be accounted for by a
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

deletion of the X or Y amino acid. The other discontinuity, seen in the CaL l
domain of both chains, could similarly be explained by deletion of one Gly,
since the sequence can be written as Gly-X-Y-X-Y. No interruptions of the
Gly-X-Y repeat sequence are seen in the COL2 domain (45).
by Boston University on 05/09/13. For personal use only.

Type IX molecules have been localized by immunoeIectron microscopy on


the surface of the type II collagen fibrils of cartilage where they are found in a
periodic manner along the fibril (2 1 ) . The interactions between type IX and II
collagens are stabilized by covalent intermolecular cross-links ( 1 26, 1 27).
These cross-links join the N-telopeptides of type II collagen chains to the
COL2 domain of the a2(1X) chain near the NC3 domain. The cross-links and
the glycosaminoglycan side chain suggest a role for type IX collagen in
mediating the interaction between the two major components of cartilage
extracellular matrices: type II collagen and proteoglycans. Whereas the two
carboxy-terminal collagenous domains COL2 and COL I interact laterally
with type II collagen, the COL3 and NC4 domains and the glycosaminogly­
can side chain project out from the fibril . This projection could create a
favorable three-dimensional structure for the interactions with proteoglycans
( 1 27). In addition , the amino acid sequence of the NC4 domain is basic
(estimated pI 9 . 7) and could interact with the acidic glycosaminoglycans. The
localization of type IX collagen on fibril surfaces has also prompted sugges­
tions that it plays a role in regulating the fibril diameter (45 , 1 28).
The genes coding for the chick a 1 (IX) and a2(IX) collagen have been
isolated and partially characterized (45 , 1 29 , 1 29a) . Although the sizes of the
a l (lX) and the a2(1X) genes are very different (approx . 1 00 kb and 1 0 kb,
respectively) , the intron-exon organization of the two genes is very similar,
suggesting that they originated by duplication from a common precursor. The
gene for a2(1X) collagen contains 32 exons. With a few exceptions, the exons
coding for the three triple-helical domains adhere to the 9-bp rule, i .e. exons
are multiples of 9 bp and start with a complete Gly codon and cnd precisely
with a complete codon for the Y amino acid. Approximately one-half of the
exons coding for triple-helical domains contain 54 bp as in the genes for
fibrillar collagens. Some exons, however, show a slight variation in size from
either 54 bp or other multiples of 9 (Table 3). Two factors contribute to this
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

Table 3 Size distribution of exons coding for Gly-X-Y repeats

36 45 54 63 72 81 90 99 108 135 144 153 171 189


by Boston University on 05/09/13. For personal use only.

Gene 27 117 126 162

0'2(1)" 0 0 5 23 0 0 0 0 5 8 0 0 0 0 0 0 0
0'2(1X)-lb 0 3 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0
-2 1 0 1 4 0 1 0 0 0 0 0 0 0 0 0 0
O'I(lV)-1 2 2 3 2 2 1 1 3 3 0 0 0 0 0 0 0 0 0
-2 0 0 2 0 2 0 0 0 2
0'1(VI)C 1 5 1 .
0'1(XIII)C 2 2

n
a Representing fibril-fanning collagen genes o
b Line I, exons that are exact multiples of 9; line 2, exons for which the size was adjusted owing to short interruptions and/or split codons t"""
cRepresents partial sequence data t"""

Cl
tTl
Z
Cl
tTl
Z
tTl
C/l

00
VI
852 VUORIO & DE CROMBRUGGHE

variability in the exon sizes: (a) short interruptions of the triple-helix that
correspond to single amino acid deletions, and (b) split codons at some exon
junctions . If allowance is made for deletions of 3 bp (one amino acid) , the
sizes of the three exons with interruptions become multiples of 9. In one
single case the splice occurs between the X and Y codons , resulting in
triple-helical exons of 33 and 1 47 bp (45).
The other deviation from the fibrillar collagen gene exon model is the
occurrence of split codons at the 5 ' - and 3 ' - ends of several exons. The split
codons do not, however, occur randomly within the 9 bp which encode
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

Gly-Xcy, but, remarkably, always involve the first nucleotide of the Gly
codon (G) . As in the fibrillar collagen genes, the structure of exons that
contain exact mUltiples of 9 bp is:
by Boston University on 05/09/13. For personal use only.

· . . agGGN-NNN-NNN-(GGN-NNN-NNN)n GGN-NNN-NNNgt .

(the small letters represent conserved intron sequences). Some exons contain
one additional G residue at their 3' -end, corresponding to the first base of the
following Gly codon:

· . . cagGGN-NNN-NNN-(GGN-NNN-NNN)n GGN-NNN-NNNGgt .

In order to maintain the Gly-X-Y pattern they are followed by exons lacking
one G residue at their 5 ' end.

· . . cagGN-NNN-NNN-(GGN-NNN-NNN)n GGN-NNN-NNNgt .

Some exons contain split codons at each end. Thus , exons that contain split
codons deviate from a multiple of 9 bp by only 1 bp and this 1 bp always
involves either the deletion of a single G residue at the 5 ' end of the exon or
the addition of a single G residue at the 3 ' end of the exon or both . In the latter
case, the exon size is also a multiple of9 bp. In order to maintain the Gly-X-Y
pattern, the exons with split codons must occur in pairs or clusters. In the
COL2 domain of thc a2(IX) gcnc, cxons with split codons occur in pairs and
in the COL I domain in clusters . The remarkable nonrandomness in the
location of the spJjt-codons, together with the existence of many exons with
lengths that are exact multiples of 9 bp and that start with a complete Gly
codon and end with a complete Y codon, suggests that in this gene the split
codons arose as secondary events in exons that initially contained a complete
Gly codon at their 5 ' end and a complete Y codon at their 3 ' ends and had
lengths that were exact multiples of 9 bp.
The considerable number of 54-bp exons in the type IX collagen suggests
that like the ancestral gene for the fibrillar collagens, the type IX collagen
COLLAGEN GENES 853

gene or its ancestor arose by amplification of a DNA segment containing a


54-bp exon . The function of type IX collagen must have allowed the occur­
rence of both the small number of interruptions in the triple-helix and the split
codons in some exons, events that were not tolerated in fibrillar collagen
genes. Interestingly, the central COL2 domain contains no interruptions and
is coded for by several 54-bp exons and exons with split codons most likely
derived from 54-bp and 45-bp exons (45) . Since the amino-terminal end of
this domain is covalently cross-linked to the amino-telopeptide region of type
II collagen, it probably laterally associates with type II collagen molecules.
These interactions may require an uninterrupted triple-helix .
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

The electron microscopic appearance of the type IX collagen molecules


from embryonic chick cornea differs from those isolated from cartilage by
lacking nearly all of the globular structure (NC4 domain) at the amino­
by Boston University on 05/09/13. For personal use only.

terminal end. By cDNA and genomic cloning, two different 5 ' -ends have
been discovered in the mRNA (cDNA) for the a l (IX) chain; the mRNA in
cornea is approximately 700 nucleotides shorter, lacking most of the se­
quences coding for the NC4 domain ( 1 24) . The two transcripts arise from the
same gene by using alternate promoters and transcription start sites; the
synthesis of the shorter mRNA starts from an alternate exon 1 in the sixth
intron of the gene (45 , 1 29a) . From the seventh exon onward, the mRNAs are
identical . This represents the first case within the collagen gene family where
alternate transcription start sites are used to create two forms of the protein. A
somewhat similar situation is found in the a2(1) collagen gene, which uses a
different start site for transcription in cartilage compared to other tissues in
which the gene is expressed. However, the a2(1) polypeptide chain is not
made in this tissue, probably as a result of a translational block ( 1 29b) .

TYPE XII COLLAGEN This collagen type was initially discovered as a cDNA
clone homologous to type IX collagen cDNAs (47) . The corresponding
protein has been purified and shown to be a homotrimer with a molecular
weight of approximately 220,000 ( 1 30). The structural similarities between
type IX and type XII collagens suggest an analogous function for type XII
collagen: lateral association with type I collagen on the surfaces of fibrils (45 ,
48). Although immunolocalization and RNAse protection studies have shown
coexpression of type I and type XII collagen in several tissues, no direct
evidence is available for the presence of type XII collagen on fibril surfaces
( 1 3 1) .
The homologies between a l (Xll), a l (IX), and a2(1X) collagen sequences
are seen in the carboxy-terminal NC l , COLI, and NC2 domains, and be­
tween the large amino-terminal NC3 domain of aI(XII) and the NC4 domain
of a l (lX) (48; Figure 2). While the conservation of exon sizes, location of
Cys residues, and the sequence homologies suggest a common ancestor for
854 VUORIO & DE CROMBRUGGHE

both , considerable differences also exist. The amino-terminal NC3 domain of


the a l (XII) chain contains about 1 600 amino acids, thus accounting for most
of the protein. Type XII collagen has only two triple-helical domains of 1 5 2
and 1 03 amino acids interrupted by the short NC2 domain (48). Partial gene
structure for the chick a I (XII) collagen shows considerable homology in the
intron-exon organization with the type IX collagen genes in the NC 1 , COL l ,
and NC2 domains (48).

Types VIII and X Collagens


These two short-chain collagens show considerable structural homologies .
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

The expression of type X collagen, a homotrimeric disulfide-bonded col­


lagen, is limited to hypertrophic chondrocytes. This suggests a specific
function related to the mineralization of cartilage ( 1 32) . The triple-helical
by Boston University on 05/09/13. For personal use only.

domain of type X contains 460 amino acids with eight imperfections in the
Gly-X-Y repeat structure (49, 50) . The nature of these imperfections (de­
letions of single residues, either Gly or X or Y) is similar to those in type IX
and XU collagens. The structure of the rabbit a l (VIII) collagen chain is very
similar to that of a I (X), with similar imperfections in the same locations
within the triple-helix (51; Figure 2). Despite complete divergence of the NC3
domains, the overall sequence similarity between the a l (X) and a l (VIII)
chains is about 60% . Type VIII collagen is expressed in the Descement's
membrane of the eye, by a number of vascular endothelial cells in tissue
culture , and by some tumor-derived cells ( 1 33).
The gene structure of type X collagen (and probably type Vlll collagen) is
completely different from the multiexon structure of the other collagen genes .
The entire triple-helical domain is encoded by one single exon. In chick, the
entire gene contains only three exons and spans about 5 kb (45 , 50, 1 34) . One
possible hypothesis to account for this exon structure, which differs from the
organization of all other collagen genes , would be that the single exon that
encodes the triple-helix would have arisen by a homologous recombinational
event between a double-stranded eDNA and a previously existing gene that
contained introns .

Type IV Collagen
Expression of type IV collagen is restricted to basement membranes, where it
is the major component. The most common form of type IV collagen consists
of two a l chains and one a2 chain, but other forms also exist ( 1 35). There is
chemical evidence for at least two other polypeptide chains, which have been
named a3(1V) and a4(1V) ( 1 36, 1 37) , and recently a related a5(IV) gene was
identified that maps to the human X chromosome (7).
Type IV collagen molecules consist of three distinct domains: a central
triple-helix, a large C-terminal globular domain (NC l ) , and an N-terminal
COLLAGEN GENES 855

globular domain (NC2) (Figure 2). The a-chains of type IV collagen are not
proteolytic ally processed as are those of the fibrillar collagens.
Type IV collagen molecules are assembled into a flexible three­
dimensional network. Based on electron microscopic and biochemical stud­
ies, a model has been proposed whereby the 400-nm-Iong type IV collagen
molecules interact at both ends with other type IV collagen molecules ( 1 35).
At their amino-terminus, four molecules interact, two in parallel and two in
antiparallell orientation , through their 7S-domains. These domains consist of a
Cys-rich a.mino-terminal noncollagenous segment (NC 2), a 30-nm un­
interrupted triple-helix, and a short non-triple-helical squence. Both disulfide
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

bonds and covalent cross-links stabilize this structure . At the carboxy­


terminal end, two molecules join end-to-end through a Cys-containing
noncollagenous domain.
by Boston University on 05/09/13. For personal use only.

A striking feature of the amino acid sequence of the central triple-helical


domain of the polypeptide is the presence of a considerable number of
interruptions of the Gly-X-Y repeat: 21 in the a l (IV) chain and 23 in the
a2(1V) chain (52-56, 1 38- 1 43 ) . Only 1 2 imperfections can be explained by
deletion of one amino acid from the Gly-X-Y triplet; the other interruptions
are longer (up to 24 amino acids) . Eighteen of these imperfections occur in the
same location in both chains, an additional eight short interruptions occur in
one, but not in the other chain. These discontinuities probably account for the
flexibility of type IV collagen molecules ( 144).
The distribution of exon sizes in the type IV collagen genes varies consider­
ably from the pattern seen in the genes for fibril-forming collagens ( 145- 1 5 1 ;
Table 3). This size variation is related in part to the large number of in­
terruptions in the Gly-X-Y repeat pattern , and to the presence of split codons
in most of the triple-helical exons . Similarly to what is seen in the type IX
collagen gene, the split codons show the same, remarkable, nonrandom
location wi thin the 9-bp sequence coding for Gly-X-Y and always involve the
first G of the Gly codon at the 5 ' end of the exon. The exons with split codons
contain either one additional G residue at their 3 ' end or one G residue less at
their 5 ' end. In some exons both occurrences are found. Here also, this
nonrandornness suggests that the split codons arose during evolution as
secondary events in exons that initially contained a complete Gly codon at
their 5 ' end and a complete Y codon at their 3 ' end. When adjusted for
discontinuities and split codons, the exon sizes adhere to the 9-bp rule (Table
3) with only one exception (where the splice occurs between the X and Y
codons) ( 147) .
We view the exon structure of the type IV collagen genes as a further
divergence from the model typified by the genes for fibril-forming collagens.
The genes for type IX collagen display a structure that also diverges from that
of the fibril-forming collagen genes, but the degree of divergence is much
856 VUORIO & DE CROMBRUGGHE

less. The genes for type IV collagen show the same type of deviations from
the fibrillar pattern as the type IX genes, i . e . interruptions of the Gly-X-Y
repeat, exons with split codons, and some exons that, although their lengths
are multiples of 9 bp, represent sizes not found in fibrillar collagen genes
(Table 3 ) . However, in the type IV collagen genes, these differences occur
much more frequently throughout the triple-helical domain . It is interesting to
note that in the gene segment specifying the triple-helical part of the N­
terminal 7S domain of a l (IV) collagen, which is the part of the molecule
undergoing lateral associations with other type IV collagen chains, the exons
contain no interruptions or split codons and adhere strictly to the 9-bp rule
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

( 147, 1 49) . The sizes of these exons (90 bp, 45 bp, 45 bp, and 63 bp) are also
somewhat reminiscent of the structure of fibrillar collagen exons. In contrast,
nearly all the exons in the C-terminal two thirds of the a-chain contain split
by Boston University on 05/09/13. For personal use only.

codons ( 1 47).
Characterization of the gene structure of the a2(IV) chain, which shares a
considerable degree of amino acid sequence homology with the a l (IV) chain,
has revealed a striking difference in the exon organization of the C-terminal
portion of these genes ( 1 50 , 1 5 1 ) . Currently there is no good explanation for
this conservation of amino acid sequence but divergence of the exon structure .
The gene for Drosophila a l (IV) collagen, which shows similarities in amino
acid sequence , has an exon structure that is quite different from the vertebrate
type IV collagen genes , as is described later ( 1 52) .

Type VI Collagen
Protein chemical and electron microscopic characterization of this short-chain
collagen has been reviewed recently ( 153). Type VI collagen is a heterotrimer
of ai , a2, and a3 chains with a short (approximately 1 00 nm) triple-helix.
Dimerization occurs by antiparallel association of two monomers . Such dim­
ers associate to form tetramers by lateral aggregation . These tetramers , which
are stabilized by disulfide bonds into a structure with large globular ends and a
short helical segment, associate, in tum, end-to-end and laterally to form a
class of microfibrils with a 1 00-nm periodicity. These microfibrils have a
ubiquitous distribution in connective tissues , but are not directly associated
with banded collagen fibrils ( l , 1 53).
The complete primary structures for the a l and a2 chains, and for the
central part of the a3 chain, have been derived from amino acid and cDNA
sequencing (57-60, 1 54; Figure 2) . The triple-helical domains of these
chains, which constitute less than one-third of their total mass (Figure 2) , are
335-336 amino acids long (58). They contain two short interruptions with a
spacing which, according to one model , would allow coiling of two anti­
parallel triple helices in dimeric molecules as indicated by electron micros­
copy ( 1 55). Each helical domain contains one Cys residue, which probably par-
COLLAGEN GENES 857

ticipates in intramolecular disulfide bonding. The genes for human a 1 (VI)


and a2(VI) chains are located in the same segment of chromosome 21 (8).
The C-terminal NC I domains of all type VI collagen a-chains contain two
repeats of a 200-residue segment, which is also found once in the N-terminal
NC2 domain (59). The NC2 domain of the a3(VI) chain is large and is
believed to contain several copies of this repeat structure (59) . At the 3 ' -end
of the a2(VI) RNA, alternate splicing occurs . One of the alternatively spliced
RNAs only codes for one complete repeat (59) .
The gene structure for the type VI collagen chains is largely unknown.
Preliminary sequence data for the chicken a2(VI) collagen gene shows strict
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

adherence to the 9-bp rule in the triple-helical domain, with exon sizes of
27-63 bp without split codons (B . Triieb, personal communication; Table 3).

Type XIII Collagen


by Boston University on 05/09/13. For personal use only.

The structure of a novel short-chain collagen, which has been entirely de­
termined fmm cDNA clones (6 1 , 62) , shows three triple-helical and four
noncollagenous domains (Figure 2) . The corresponding protein has not yet
been isolated, and both the chain composition and the supramolecular struc­
ture of type XlII collagen remain unknown . Polyclonal antibodies to a
synthetic peptide for the carboxy-terminal (NC4) domain recognize
polypeptides of 67 kd and 62 kd in Western blots (6 1 ) . The tissue distribution
for the mRNA determined by Northern and in situ hybridizations show the
highest levels in skin (epidermis and hair follicles), intestine (mucosal layer) ,
bone (intertrabecular mesenchyme), striated muscle (endomysium) , and car­
tilage ( 1 56).
The most striking feature of type XIII collagen is the existence of at least
five alternatively spliced RNAs, which affect both collagenous (CaLl and
COL3) and noncollagenous (NC2 and NC4) domains (62). Each collagenous
domain contains one or two short interruptions of the Gly-X-Y repeat.
The gene for the a l (XIII) collagen is large since genomic clones spanning
25 kb cover only eleven 3 ' - terminal exons and one quarter of the coding
sequences ( 1 57). These exons contain no split codons and follow the 9-bp
rule. Exact adherence to these rules must be essential for maintaining the
correct reading frame and the G l y X-Y sequence during the alternative splic­
-

ing events. Two of the exons are 54 bp, one 45 bp long (Table 3). In two cases
the differences in cDNA sequence result from alternative splicing of one
36-bp eXOIl, and of the carboxy-terminal joining exon, respectively. The
central triple-helical domain (COL2) of the a l(XIII) collagen chains is not
affected by alternative splicing, but variation in the lengths of the other two
collagenous domains presents a challenging puzzle: how are such trimeric
molecules assembled, or is there a mechanism selecting a-chains of the same
length for I;!ach trimer?
858 VUORIO & DE CROMBRUGGHE

COLLAGEN GENES IN INVERTEBRATES

Characterization of collagen genes in sea urchin, fruit flies, and nematodes


attests to the diversity of the collagen gene family in invertebrates. The
products of the first genes isolated from Drosophila melanogaster and
Caenorhabditis elegans clearly represent nonfibrillar collagens ( 1 58, 1 59).
However, a recently described sea urchin collagen gene shows strong homol­
ogy to fibrillar collagen genes of vertebrates (85 ) .
Nine developmentally regulated genes coding for cuticle collagen o f C.
elegans have been characterized ( 1 58, 1 60- 1 62) . Altogether the collagen
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

gene family in the nematode includes between 50 and 1 50 genes dispersed


throughout the genome ( 1 58 , 1 63). The genes characterized so far share
several common features; each is 1 - l . 2 kb long, each contains one or two
by Boston University on 05/09/13. For personal use only.

short introns, and each codes for a protein with two triple-helical domains,
one of 27-33 amino acids , the other of 1 27- 1 32 amino acids. The latter
triple-helical domain contains one to three short interruptions of the Gly-X-Y
pattern ( 1 62). The three noncollagenous domains contain several Cys resi­
dues. Based on the amino acid sequence similarities , the genes have been
further divided into three groups. In only two genes, designated COL8 and
dpy l 3 , the intron is located within the triple-helical domain. In COL8 the
intron is found between the X and Y codons of a Gly-X-Y triplet ( 1 6 1 ) ,
whereas i n dpy 1 3 the intron splits a Gly codon ( 1 60). Mutations i n two
different collagen genes (dpy l 3 and sgt l ) have been shown to affect the body
shape of C. elegans ( 1 60, 1 6 1 ) .
The protein coded by the Drosophila a 1 (IV) collagen genes shares con­
siderable homology with the corresponding vertebrate a-chain in both domain
structure and sequence ( 1 52) . For instance, the locations of 1 1 out of 2 1
discontinuities of the Gly-X-Y repeat are conserved. In contrast, the gene
structures are much more divergent. The gene for Drosophila a l (IV) collagen
contains eight exons and seven relatively short introns (67-484 bp) , whereas
the human gene contains 52 exons and 5 1 introns ( 147 , 1 52 , 1 64) . Only three
of the intron locations coincide. Four of the seven introns in the Drosophila
gene are within the triple-helical domain , and their location does not appear to
follow the 9-bp exon pattern observed in vertebrate genes. The reason for the
small number and size of introns in C. elegans and Drosophila genes may
simply reflect the smaller sizes of their genomes (approximately 3% of the
human genome) and the much lower number of introns throughout their
genomes.
Characterization of cDNA and genomic clones for a procollagen of the sea
urchin Paracentrotus lividus revealed features that are typical of fibrillar
collagen genes of vertebrates (85 ) . A partial cDNA sequence codes for 478
amino acids of uninterrupted Gly-X-Y repeats followed by a globular domain
COLLAGEN GENES 859

of 252 amino acids. The globular domain contains seven Cys residues, a
carbohydrate attachment site, and a putative C-proteinase cleavage site. In
addition the cDNA shows conserved sequences in the putative cross-linking
sites in the telopeptide and the triple-helical domain . The gene structure of
this collagen chain shows a remarkable similarity with that of the vertebrate
fibril-forming collagen genes. The 14 exons coding for the triple-helical
domain that were characterized have sizes of 54 bp, 1 08 bp, and 1 62 bp and
always follow the Gly-X-Y pattern . Even the pattern of exon sizes follows
that of the fibril-forming collagen genes. It is clear that the vertebrate fibril­
forming collagen genes and this echinoid gene are closely related and derive
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

from a common ancestor. The mRNA transcript of this developmentally


regulated gene is 6 kb in size , further confirming its close relationship with
the fibrillar collagen genes (85).
by Boston University on 05/09/13. For personal use only.

Another multiexon sea urchin (Strongylocentrotus purpuratus) collagen


gene has also been identified recently using a mouse a l (IV) collagen cDNA
probe ( 1 65). A segment of this gene hybridizes to a 9-kb mRNA. The
sequence of a partial cDNA of 2 . 7 kb codes for a helical domain with 1 0
interruptions i n the Gly-X-Y repeat. The structure of this collagen resembles
that of the human type IV collagen, although most Gly-X-Y interruptions do
not appear to fall at the same places (G. Wessel, personal communication) .

REGULATORY SEGMENTS IN COLLAGEN GENES

The activation of different collagen genes in specific cells or tissues during


embryonic development, their restricted expression in specific tissues of adult
organisms" and the changes in the expression of certain of these genes in
several disease states have prompted studies aimed at identifying the regula­
tory segments of these genes. Moreover, in tissue culture cells, a number of
studies have indicated that several cytokines and products of oncogenes
influence the expression of certain collagen genes ( 1 23 ) . Although such
influence bas only been formally demonstrated in a limited number of cases, it
is likely that in many physiological and pathological situations, changes in
collagen gene transcription constitute the principal component responsible for
the changes in gene expression.
One cfUlcial experimental system to determine which specific DNA se­
quences of a gene have the ability to confer tissue-specific expression in intact
animals is provided by transgenic mice. Typically, a chimeric gene, in which
potential regulatory sequences are fused to a marker gene, is introduced in the
germline of mice and the expression of the marker gene monitored in different
tissues. Such an experiment was performed with a chimeric gene in which a
2000 bp fragment upstream of the start of transcription of the mouse a2(1)
gene was fused to the bacterial gene for chloramphenicol acetyl transferase
860 VUORIO & DE CROMBRUGGHE

(CAT). In practically all the ensuing transgenic mice strains, the expression of
the endogenous a2(I) collagen paralleled the expression of the endogenous
a2(1) collagen gene, i . e . high expression in tail , a tissue that is very rich in
type I collagen, somewhat less expression in bone and skin, and very little or
no expression in many other tissues ( 166; B . de Crombrugghe, G . Karsenty ,
L. A . Garrett, unpublished results). Hence, the 2000-bp segment 5 ' to the
transcription start site was sufficient to confer tissue-specificity. A similar
experiment was performed with an a 1 (II) collagen promoter-CAT chimeric
gene, which showed selective expression in tissues in which the endogenous
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

type II collagen gene is expressed ( 1 67). In related experiments , a cos mid


clone containing the entire human a l (I1) gene was microinjected into blasto­
cysts to generate chimeric mice. In these mice the transgene was specifically
expressed in cartilage tissues ( 1 68). The implications of these results are that
by Boston University on 05/09/13. For personal use only.

the collagen DNA sequences present in the transgenes contain the necessary
cis-acting elements, which by interacting with defined cellular factors de­
termine the expression of these transgenes in specific cells and tissues of
intact animals. Some of the trans-acting factors that are responsible for the
activation or repression of these genes have begun to be identified. We briefly
review here some of the regulatory elements that have been identified in a
limited number of collagen genes.

al (I) and a2(l) Collagen Genes


SEQUENCES IN THE FIRST INTRON The regulatory elements of these genes
arc found on either side of the respective transcription start sites. In the mouse
a2(I) gene, an enhancer element was located in the first intron of the gene
between + 4 1 8 and + 1 524 ( 1 69) . This enhancer was shown to activate either
the homologous a2(I) collagen promoter or a heterologous SV40 promoter in
transient expression experiments after DNA transfection of NIH-3T3 fibro­
blasts. With the a2(!) collagen promoter as test promoter, the intron se­
quences were placed 3 ' to the promoter, whereas with the SV40 promoter the
intron sequences were placed 5 ' to this promoter. In each instance enhancing
activity was found regardlcss of the orientation of the intron sequence .
The insertion of mouse leukemia retroviral sequences in the first intron of
the mouse a l (l) collagen gene resulted in a block in transcription of this gene
(69 , 1 70). Heterozygote mice that harbored the inserted sequences in one
allele showed a phenotype corresponding to a mild form of Osteogenesis
Imperfecta (type I) , because only 50% of the normal amount of type I
collagen was synthesized (1 . Bonadio, personal communication). When both
a l (1) genes were inactivated by the inserted DNA, the homozygote mice
embryos died around the 1 2th day of gestation ( 1 7 1 ) . The block in transcrip­
tion of the a 1 (I) collagen gene was associated with the abolition of a
characteristic DNAse I hypersensitive site in the promoter and an abnormal
COLLAGEN GENES 861

methylation pattern o f the promoter ( 1 72). One possible explanation for the
absence of transcription of the a 1 (I) collagen gene is that the inserted viral
sequence would disrupt an enhancer located in the first intron . Interestingly,
recent organ culture experiments have indicated that this block in transcription
was ovemome in odontoblasts ( 1 73) .
A DNA segment with some of the properties of enhancing elements was
identified in a 782-bp fragment of the first intron of the human a l (l) gene
between + 820 and + 1 602 using a heterologous frog oocyte micromjection
assay system ( 1 74) . For these experiments the a 1 (1) collagen intron DNA
segment was inserted in a a 1 (1) collagen promoter-a globin chimeric gene,
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

either in the first intron of the a-globin gene or 5 ' to the a I (I) collagen
promoter sequence . After microinjection of the DNA in frog oocytes, the
levels of a-globin RNA were measured and found to be increased over those
seen with control chimeric genes lacking the intron sequences. Activation was
by Boston University on 05/09/13. For personal use only.

observed when intron sequences were placed in both orientations within the
a-globin intron but only in the inverted orientation when the intron segment
was placed 5 ' to the promoter. Similar results were obtained, again by
measuring the levels of globin RNA , after DNA transfection of NIH-3T3
fibroblasts using the construction in which the a I (I) collagen gene intron
segment was inserted in the first intron of the a-globin gene.
Upon further dissection of the first intron of the human a1 (I) collagen gene,
both positive and negative cis-acting elements were identified in this gene
segment. Indeed when a l (l) collagen chimeric genes were transfected into
chick emblYo tendon fibroblasts, a negative element was located between 820
and 1 093 ( 1 75). This element inhibited either the SV40 promoter or the a l (l)
collagen promoter when placed 5' of these promoters in either orientation.
When placed 3 ' of the a 1 (1) collagen promoter, the element was neutral in the
normal orientation but had a strong negative effect in the opposite orientation
( 1 76) . This strong negative effect when the element was located in the
opposite orientation 3 ' to the promoter appeared to require a sequence be­
tween -625 and - 1 6 1 in the a 1 (I) collagen promoter ( 1 77) . Deletion of this
promoter sequence abolished the strong negative effect, but was apparently
without effect when the intronic element was placed in the correct orientation.
It is unclear why the negative effect of the 820-1093 intron segment was only
observed in the opposite orientation when placed 3 ' to the promoter.
Additional evidence has also been presented for the existence of positive
e lements in the first intron of the a 1 (I) gene on each side of the "negative"
element. One element ( + 292 to + 67 1 ) stimulated the a l (l) promoter when
placed 3 ' of this promoter in the correct orientation but was inactive in the
opposite orientation ( 1 77). However, another segment, which partially over­
laps with the previous segment, stimulated the promoter in both orientations
( 1 77). It should be noted that the latter activating elements in the al (l) intron
862 VUORIO & DE CROMBRUGGHE

produced only a modest increase in the activity of the promoter, much less
than what is normally observed with viral enhancers such as the SV40
enhancer.
At this stage , our understanding of the regulatory elements in the first
introns of the 0' 1 (1) and 0'2(1) collagen gene remains very incomplete . The
choice of appropriate tissue culture cells used in DNA transfection is probably
of considerable importance and the overall interpretation of some experiments
using heterologous expression systems may be difficult. Furthermore, studies
with deletions of large DNA segments, although useful in an initial dissection
of potential regulatory elements, are often insufficient and need to be com­
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

plemented with much more precise site-specific mutations to better un­


derstand the function of individual cis-acting elements in the first intron and
other regulatory sites of these genes .
by Boston University on 05/09/13. For personal use only.

A CONSERVED INVERTED REPEAT SEQUENCE IN THE FIRST EXON In the


first exon of the 0' 1 (1) , 0'2(1), and a l (III) genes, a sequence of approximately
50 bp located around the respective collagen translation initiation codon
shows a remarkable degree of conservation from chickens to humans ( 17 8 , .
1 79). This conserved sequcncc is an inverted repeat sequence, and the
predicted secondary structure in the RNA has been demonstrated by RNAse
digestion experiments in vitro ( I 80) . This sequence is, however, not found in
all fibrillar collagen genes since it is missing in the 0' 1 (11) collagen gene
( 1 1 9) . The high degree of conservation of this inverted repeat in several
fibrillar collagen genes in different species and its location around the start of
translation prompted the hypothesis that this sequence may have a role in
translational control ( 1 78). However, evidence has been presented , based on
one partial deletion of this segment, suggesting that the sequence has no
influence on translation ( 1 8 1 ) . On the other hand, unpublished observations in
the laboratory of one of the authors using additional deletions do suggest a
role for this sequence in influencing efficiency of translation (A. Schmidt, P.
Rossi, B. de Crombrugghe, unpublished results) . Here also, additional ex­
periments are needed to understand the role of this conserved inverted repeat
sequence.

PROMOTER ELEMENTS Upstream of the start of transcription of the mouse


0'2(1) and 0' 1 (1) genes, several specific binding sites for DNA-binding factors
present in nuclear extracts of NIH-3T3 fibroblasts and other cells have been
identified. To obtain evidence that the factors identified by DNA-binding
assays also play a role in the control of these genes, site-specific mutations
that abolish the binding of these factors were generated and the effects of
these mutations on promoter activity assayed in DNA transfection ex­
periments ( 1 82 , 183; G. Karsenty, B . de Crombrugghe, unpublished results).
COLLAGEN GENES 863

One factor that was shown to bind to a CCAAT motif in the mouse a2(I) gene
(-84 to -80) and in the mouse a 1 (I) gene (-96 to - 1 00) is an interesting
heterodimer, which belongs to a group of CCAAT-binding proteins ( 1 84 ,
1 85). Both subunits o f this protein have been purified t o homogeneity; one
has a Mr of 40,000-42,000, the other of 34,000 (S . Maity, T. Vuorio, and B .
de Crombmgghe, unpublished results) . Both components are needed for DNA
binding.
Furthermore , a highly purified preparation of this factor stimulated accurate
initiation of transcription of both the 0'2(1) and 0' 1 (1) collagen genes in a
reconstituted in vitro transcription system (1 85). The specificity of this trans­
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

criptional stimulation was shown by the fact that a a2(1) collagen DNA
template containing a mutation in the binding site at -84, which essentially
abolished the binding of the factor, failed to be transcriptionally stimulated. In
by Boston University on 05/09/13. For personal use only.

DNA transfection experiments, the same mutation also strongly inhibited the
activity of both the 0'2(1) and 0' I (I) promoters ( 1 82; G. Karsenty, B . de
Crombrugghe, unpublished results). The similarity in the effects of mutations
in three different assays, i . e . DNA binding, in vitro transcription, and expres­
sion in fibroblasts after DNA transfection , argue for a probable physiological
role of thi s factor in the control of the 0'2(1) and 0' 1 (1) collagen genes. In
chromatin , the segment around the binding site for the CCAAT-binding
protein in the 0'2(1) gene was also highly sensitive to DNAse I and to
restriction enzymes in cells expressing the 0'2(1) gene ( 1 86, 1 87). A similar
DNAse I hypersensitive site was found in the 0' 1 (1) promoter at approximately
the same llocation ( 1 88 , 1 89).
Recent experiments have indicated that two additional factors bind to
sequences immediately 5 ' to the binding site for the CCAAT-binding protein
in both promoters (G. Karsenty, B . de Crombrugghe, unpublished results) .
Based on the effects that mutations in the binding sites for these factors
displayed in DNA transfection experiments, the factors appeared to be nega­
tive regulatory factors . These two negative factors and the positive CCAAT­
binding faetor presumably participate in the coordinate control of the two type
I collagen genes. Hence, at least three different factors appear to interact at
approximately the same location of the promoters in these two coordinately
controlled genes. A number of additional binding sites for nuclear proteins
were shown to be present further upstream in these two genes ( 1 82 , 1 8 3 ; R.
Ravazzolo, G . Karsenty, B . de Crombrugghe, unpublished results) . A com­
prehensive understanding of the control of these genes will require a systemat­
ic study of these various factors and their binding sites. More work needs also
to be performed to understand the mechanisms that determine the tissue­
specific expression of these genes.
One plausible hypothesis to account for the multiplicity of transcription
factors that bind to the promoter and enhancer segments of these genes is that
864 VUORIO & DE CROMBRUGGHE

they could correspond to the diversity of cytokines, such as TGF-,B, IL- I ,


TNF a, and others, which by binding to specific cell-surface receptors in­
-

fluence the expression of these genes ( 1 90- 1 94). One example of such
relationships between the cytokine TGF-f3 and a factor that binds to the -300
sequence in the mouse a2(1) collagen gene is discussed next. Indeed, a factor
identified as CTF/NF- I was shown to bind around -300 in the mouse a2(1)
collagen gene ( 1 95) . Although the binding site contained a CCAA motif, the
factor that was binding to the -300 site in the mouse a2(1) promoter did not
bind to the -80 CCAAT motif. The -300 site mediates the activation of a a2(1)
collagen promoter-CAT chimeric gene , which is produced by TGF-f3 treat­
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

ment of NIH-3T3 fibroblasts and rat osteosarcoma cells transfected with this
chimeric gene ( 1 96). A 3-bp substitution mutation in the binding site, which
abolished the binding of NF- l , also prevented the induction of the promoter
by Boston University on 05/09/13. For personal use only.

by TGF-,B. Insertion of an oligonucleotide, corresponding to the binding site


for NF- I in the mouse a2(1) collagen gene, upstream of the early promoter of
SV40, conferred TGF-f3 inducibility to this promoter, whereas the SV40
promoter itself was not responsive to TGF-f3. Insertion of the same oligonu­
cleotide with a 3-bp substitution mutation, which abolished the binding of
NF- l , did not confer TGF-,B inducibility. These experiments suggested that
NF- I could mediate the transcriptional activation of the a2(1) collagen promo­
ter by TGF-f3. However, no differences in the binding of NF- l to its binding
site in the a2(1) collagen promoter were observed when extracts of cells that
were either treated or not treated with TGF-f3 were compared (C. Ruteshous­
er, B . de Crombrugghe, unpublished results). One possible model to account
for the effect of TGF-,B would be that the hormone does not affect the
DNA-binding properties of NF- I but could change its transcriptional activa­
tion function.
It should be noted that TGF-,B may also increase the expression of the type I
collagen genes by other mechanisms. Evidence has been presented suggesting
that under certain conditions of tissue culture , the stability of a l (I) mRNA
was increased by TGF-,B (1 1 4 19 7). TGF-f3 also increased expression of the
,

Jun/API transcription factor, suggesting that TGF-f3 could activate transcrip­


tion by other mechanisms as well ( 1 98).

al (III) Collagen Gene


This gene is coexpressed with the type I collagen genes in a number of tissues
such as skin and smooth muscle cells. In bone, however, a major site for type
I collagen synthesis , no type III collagen is found.
DNA transfection studies with increasing 5 ' deletions in the promoter
suggested the existence of negative regulatory elements, but the precise
location of these elements has not yet been identified ( 1 99). Two factors
present in nuclear extracts of NIH-3T3 fibroblasts were shown to bind specifi­
cally to defined sequences within a I SO-bp segment immediately upstream of
COLLAGEN GENES 865

the start of transcription (200) . One of these factors is probably API or a


related protein . The other factor is an unknown protein. Mutations in the
binding sites for each of these factors inhibited the promoter in DNA transfec­
tion experiments . The heterodimeric CCAAT factor, which was shown to
activate the 0' 1 (1) and 0'2(1) genes , did not bind to this segment of the a l (III)
collagen promoter. Several other factors , however, were shown to bind to a
segment more upstream than -200. From the limited information that is
available, it is clear that the factors that control the 0'1 (III) gene are at least in
part different from those that control the type I collagen genes .
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

al (Il) Collagen Gene


A cell-specific enhancer element has been identified in the first intron of the
rat gene, which increases activity of the promoter in cultured chondrocytes
by Boston University on 05/09/13. For personal use only.

about 20-fold (20 1 ) . The segment, which includes approximately 550 bp,
activates the 0' 1 (11) promoter in chondrocytes but not in fibroblasts and
myoblasts ..

Type IV Collagen Genes


The human a l (IV) and a2(IV) collagen genes are arranged in a head-to-head
configuration on chromosome 1 3q34 and are transcribed from opposite
strands with only 4 1- 1 30 bp between the transcription start site of the c.d (IV)
gene and the mUltiple start sites of the a2(IV) gene (202-205) . This intragenic
segment contains no recognizable TATA box, an element that directs the
RNA Polymerase II enzyme to transcribe from a specific start site. The first
intron of the a l (IV) gene was shown to contain an enhancer that could
activate both promoters as well as the heterologous thymidine kinase promo­
ter in differentiated F9 embryonal carcinoma cells (206) . This enhancer
presented characteristics of cell specificity, since it lacked activity in NIH-
3T3 fibroblasts and in dermal fibroblasts. When chimeric genes containing
the common promoter and a l (IV) intron enhancer element were stably trans­
fected in undifferentiated F9 cells, no promoter activity was detected, but the
activity could be induced by the addition of retinoic acid and cyclic AMP,
compounds that were known to stimulate the expression of type IV collagen
genes and other genes for basement membrane components in these cells.
Interestingly, the stably integrated collagen IV enhancer-promoter-CAT
chimeric gene could also be induced by 5-azacytidine, suggesting that the
methylation status of the gene might also be important in controlling the
activity of these genes (206).

CONCLUSIONS

Information about the molecular structures of the different collagens, their


supramolecular assemblies, and the structures of their genes, is much more
866 VUORIO & DE CROMBRUGGHE

complete than information about the regulatory DNA elements that control the
expression of these genes.
Some general conclusions emerge from a comparison of the structural
properties of different collagen genes.

I . Collagens with rigid rodlike supramolecular structures in which in­


dividual collagen molecules interact with each other in a highly organized
quarter-stagger array show no interruptions in the Gly-X-Y repeat. The sizes
of their exons are all strictly multiples of 9 bp. All exons are either 54 bp long
or are easily recognized as having derived from 54-bp exons. All triple-helical
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

exons contain a complete codon for Gly at their 5 ' end and a complete codon
for Y at their 3 ' end. No exons with split codons are found. The pattern of
succession of triple-helical exon sizes shows practically no variations between
by Boston University on 05/09/13. For personal use only.

the different genes for fibril-forming collagens.


2. The importance of the rigid, redundant characteristics of these mole­
cules, which are determined by the strict conservation of the exon structures
of their genes, is illustrated by the fact that point mutations in one allele that
interrupt the triple-helix by substituting a single Gly codon severely disrupt
the triple-helical assembly of the collagen molecules and probably also the
supramolecular structure of the fibrils.
3. The establishment of the exon-pattern of the genes for fibril-forming
collagens preceded the vertebrate-invertebrate radiation, since a sea urchin
collagen gene shows a structure that is very similar in its exon organization to
the exon pattern of the vertebrate fibril-forming genes.
4. Collagens that have different supramolecular structures show different
degrees in the number of interruptions of the Gly-X-Y repeats . Type IX
collagen, which interacts laterally over a significant portion of its triple­
helical segment with type II collagen fibrils, shows few interruptions, where­
as type IV collagen, which forms a loose flexible network in which only a
small segment of the triple-helix participates in lateral interactions with other
type IV collagen molecules , shows many more interruptions. However, in the
portion of the type IV triple-helix where lateral interactions take place, no
interruptions occur. We suspect that the number of interruptions of the
Gly-X-Y repeat depends on the type of supramolecular structure of the
individual collagens.
5 . Genes for non-fibril-fonning collagens show varying numbers of triple­
helical exons with split codons . A small number of these occur in type IX
collagen exons, many more in type IV collagen exons. The split codons occur
nonrandomly within the 9-bp sequence coding for Gly-X-Y and always
involve the first G residue of a Gly codon . Exons with split codons show
either an additional G residue at their 3 ' end, or lack the first G residue of the
Gly codon at their 5 ' end, or both . This remarkable nonrandomness suggests
COLLAGEN GENES 867

that these split-codons arose as secondary events occurring in exons that


initially contained a complete Gly codon at their 5 ' end and a complete Y
codon at their 3 ' end and had lengths that were exact multiples of 9 bp.
6. The considerable number of 54-bp exons in the 0'2 type IX collagen
gene suggests that the ancestor of this gene arose in the same manner as the
ancestral gene for the fibril-forming collagens, by amplification of a DNA
segment containing a 54-bp exon. The specific functions and supramolecular
interactions of type IX collagen must have tolerated the small number of
interruptions in the triple-helix and the small number of exons with split
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

codons that were obviously not tolerated in the fibril-forming collagens .


7 . The genes that show the largest number of triple-helical exons with split
codons such as the type IV collagen genes also show the largest variations in
exon sizes. We postulate that the gene products that easily tolerated the
by Boston University on 05/09/13. For personal use only.

changes in structure that must have accompanied during evolution the occur­
rence of split codons, also tolerated much more frequent variations in exon
sizes than other collagens.
8. The structures of many collagen genes still display the marks of the
exon assembly patterns that led to the successful acquisition of unique
molecular and supramolecular structures for the products of their genes. For
the collagens, in which tight triple-helices assemble in highly organized
supramolecular fibrils, the corresponding genes display the clearest marks of
these assembly pathways as they show little variations in exon structure. In
contrast, for the collagens that form different supramolecular structures and
use less lateral aggregation between collagen molecules, the corresponding
genes presumably underwent many more changes in exon structure , erasing at
the same time to various degrees the traces of these evolutionary assembly
pathways.
9. Our understanding of the regulatory elements of the various collagen
genes is still at an early stage . Although only a small number of cis-acting
elements and their cognate trans-acting factors have been identified and
characterized, it is clear that a multiplicity of control elements exist in the
genes that have been examincd . It is likely that this mUltiplicity corresponds
to a diversity of regulatory mechanisms which can influence the expression of
these genes. The important tasks in this area now are to identify the cellular
mechanisms that determine the tissue-specific expression of the different
collagen genes, the mechanisms whereby these genes respond to various
cytokines and hormones that influence their expression, and the mechanisms
that ensure the coordinate expression of two or more genes that specify the
different polypeptide chains of a single collagen molecule . In addition a
systematic characterization of the different trans-acting factors that participate
in the control of individual genes will be needed to achieve a comprehensive
understanding of the regulation of these genes.
868 VUORIO & DE CROMBRUGGHE

ACKNOWLEDGMENTS
We thank Martha Trinkle for editorial assistance. We are grateful to the many
colleagues who generously provided manuscripts prior to publication and
apologize that only a fraction of the valuable information could be cited.
Work in the authors' laboratory was supported by National Institutes of
Health grants RO l -HL4 1 264-02 and RO I -CA495 1 5-0 1 (BdC) and by the
Finnish Academy .

Literature Cited
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

1. Burgeson, R. E. 1988. Annu. Rev. Cell 20. Birk, D. E . , Fitch, J. M . , Babiarz, J. P. ,


B ioI. 4:551-77 Linsenmayer, T. F. 1988. 1. Cell B ioI.
2. Mayne, R . , Burgeson, R. E . , eds. 1987. 106:999- 1 008
Structure and Function of Collagen 21. Vaughan, L . , Mendler, M . , Huber, S . ,
Ty pes. New York: Academic. 3 1 7 pp. Bruckner, P . , Winterhalter, K . H . , e t aL
by Boston University on 05/09/13. For personal use only.

3. Kivirikko. K . , Myllylii, R. 1985. Ann. 1988. J. Cell BioI. 106: 991-97


NY Acad. Sci. 460:187-20 1 22. Miller, E. J . 1 985. Ann. NY Acad. Sci.
4. Kuhn, K . 1 987. See Ref. 2, pp. 1-42 460:1-13
5. Myers, 1. C . , Emanuel, B. S . 1987. 23. Brodsky, B . , Eikenberry, E. 1 9 85 . Ann.
Collagen Relat. Res. 7: 149-59 Sci. 460:73-84
NY Acad.
6 . Tsipouras, P., Schwartz, R . C . , Liddell, 24. Eyre, D.
R . , Paz, M . A. , Gallop, P. M.
A. c . , Salke ld , C . S . , Weil, D . , et al. 1984. Annu. Rev. Biochem. 53:717-48
19 88 . Genomics 3:275-77 25 . Bernard, M. P . , Chu, M . -L. , Myers, J .
7. Hostikka, S. L. , Eddy, R. L . , Byers, M. c . , Ramirez, F . , Eikenberry, E . F . , et
G . , Hoyhtya, M . , Shuws, T. B . , et al. al. 1 983. B iochemistry 22:5213-23
1990. Proc. Natl. Acad. Sci. USA . 87: 26. Tromp, G . , Kuivaniemi , H . , Stacey,
1606-10 A . , Shikata, H., B aldwin , C. T . , et al.
8. Weil, D . , Mattei, M . - G . , Passage, E . , 1988. Biochem. J. 253:919-22
Van Cong, N'G. , Pribula-Conway, D . , 27. Bernard, M. P . , Myers, J. C . , Chu,
et al. 1988. A m . J . Hum . Genet. M . -L. , Ramirez, F . , Eikenberry , E. F. ,
42:435-45 et al. 1983. Biochemistry 22:1 1 39-45
9. Kimura , T . , Mattei, M . - G . , Stevens, J. 28. Lee, S . -T . , Smith, B. D . , Greenspan ,
W . , Goldring, M . B . , Ninomiya, Y., et D. S. 1988. J. BioI. Chem. 263:134 1 4-
al. 1 989. Eur. 1. Biochem. 1 79:71-78 18
10. Henry, I . , Bernheim, A . . Bernard, M . , 29. Kuivaniemi, H . , Tromp, G . , Chu,
van der Rest, M . , Kimura, T . , et al. M.-L. , Prockop, D. J. 1988. B iochem.
1988. Genomics 3:87-90 J. 252:633-40
I I . Kimura, T . , Cheah, K. S . E . , Chan, S. 30. Elima, K . , Vuorio, T . , Vuorio, E. 1987.
D. H . , Lui, V . C . H . , Mattei, M . -G . , et Nucleic Acids Res. 15:9499-504
al. 1989. J. B ioi. Chern. 264:13910--16 31. Baldwin, C. T . , Reginato, A. M . ,
12. Shows, T. B . , Tikka, L . , Byers, M. G . , Smith, c . , Jimenez, S. A. , Prockop, D .
Eddy, R . L. , Haley, L. L . , et al. 1989. 1 . 1989. Biochem. J . 262:521-28
Genomics 5:128-33 3 1 a. Su, M . -W . , Lee, B . , Ramirez, F . ,
13. Fessler, J. H . , Fessler, L. I. 1 987. See Machadu, M . , Horton, W . 1989. Nucle­
Ref. 2, pp. 81-103 ic Acids Res. 17:9473
14. Eyre, D . , Wu, J. J . 1987. See Ref. 2, 32. Loid1, H . R . , Brinker, J. M . , May, M . ,
pp. 26 1 -81 Pihlajanicmi, T . , Morrow, S . , e t al.
15. Burgeson, R . E . , Hollister, D. W. 1979. 1984. Nucleic Acids Res . 12: 9383-94
Biochem. Biophys. Res. Commun. 33. Toman , P. D . , Ricca, G . A . , de Crom­
87: 1124--3 1 brugghe, B . 1988. Nucleic Acids Res.
16. Niyibizi, C . , Eyre, D. 1989. FEBS Le tt. 16:7201
242:314--18 34. Ala-Kokko, L . , Kontusaari, S . , Bald­
17. Henkel, W. , Glanville, R. W. 1982. win, C. T . , Kuivaniemi, H . , Prockop,
Eur. 1. Biochem. 122:205-13 D. 1. 1989. Biochem. J. 260:509-16
1 8 . Keene, D. R . , Sakai, L. Y. , Biichinger, 35. Mankoo, B . S . , Dalgleish, R. 1988.
H. P . , Burgeson, R. E. 1987. J. Cell Nucle ic Acids Res. 16:2337
B ioI. 105:2393-402 36. Janeczko, R. A., Ramirez, F. 1989.
19. Adachi, E . , Hayashi, T. 1986. Connect. Nucleic Acids Res. 17:6742
Tissue Res. 1 4 :257-66 37. Myers, J. C., Loidl, H. R . , Stolle, C.
COLLAGEN GENES 869

A . , Seyer, J. M. 1 985. J. Bioi. Chern. Pribula-Conway, D . , Hsu-Chen, C . -C . ,


260:5533--4 1 et al. 1 987. Eur. J. Biochern. 168:309-
3 8. Myers, J. c . , Loidl, H. R . , Seyer, J . 17
M . , Dion, A . S . 1 985. J. BioI. Chern. 58. Chu, M . -L. , Conway, D . , Pan, T . ,
260: 1 1 2 1 6-22 Baldwin, C . , Mann, K . , et a!. 1988. J.
39. Wei!, D . , Bernard, M . , Gargano, S . , Bioi. Chem. 263: 1 8601-6
Ramirez, F . 1 987. Nucleic Acids Re s. 59. Chu, M . -L. , Pan , T . , Conway, D . ,
1 5: 1 8 1·-98 Kuo, H .-J . , Glanville, R . W . , e t al.
40. Woodbury, D . , Benson-Chanda, V., 1 989. EMBO J. 8 : 1 939-46
Ramirez, F. 1989. J. BioI. Chern. 60. Chu, M . -L. , Pan, T . , Conway, D . , Sait­
264:2735-38 ta, B . , Stokes , D . , et al. 1990. Ann. NY
4 1 . Bernard, M . , Yoshioka, H . , Rodriguez , Acad. Sci. 580:55-63
E . , van der Rest, M . , Kimura, T . , et al. 6 1 . Pihlaj an iemi , T. , MyUylii, R . , Seyer, J . ,
1 98 8. J. Bioi. Chern. 263: 1 7 1 59-66 Kurkinen, M . , Prockop, D . J . 1987.
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

42. Ninomiya, Y . , van der Rest, M . , May­ Proc. Natl. Acad. Sci. USA 84:940-
ne, R . , Lozano, G . , Olsen, B. R. 1985. 44
Biochemistry 24:4223-29 62. Pihlaj aniemi , T . , Tamminen, M . , Sand­
43. van de r Res!, M . , M ayn e , R . , Nino­ berg , M . , Hirvonen, H . , Vuorio, E.
miya, Y . , Seid ah, N. G . , Chretien, M . , 1 990. Ann. NY Acad. Sci. 580:440-43
by Boston University on 05/09/13. For personal use only.

e t al. 1 985. J . Bioi. Chern. 260:220- 63. Yamada, Y . , Avvedimento, V. E . ,


25 Mudryj , M . , Ohkubo, H . , VogeJi, G . , et
44. McConniek, D . , van def Rest, M . , al. 1980. Cell 22:887-92
Good sh ip, 1 . , Lozano, G . , Ninomi ya , 64. Wozney, J . , Hanahan , D . , Tate , V . ,
Y . , et al. 1 987. Proc. Natl. Acad. Sci. Boedtker, H . , Doty, P . 1 98 1 . Na ture
USA. 84:4044-48 294:129-35
45. Ninomiya, Y . , Castagnola, P. , Gerecke , 65. Boedtker, H . , Finer, M . , Aho, S. 1985.
D . , Gmdon, M . , Jacenko, D . , et al . Ann. NY Acad. Sci. 460:85- 1 1 6
1 989. Collagen Genes: S tructure, Regu­ 66. de Wet, W . , Bernard , M . , Benson­
la tion and Abnorrnali ties, ed. C. Boyd, Chanda, V . , Chu, M . -L , Dickson, L ,
L. Sandell, P. Byers. New York: Aca­ et al. 1987. J . Bioi. Chern. 262 : 1 6032-
demic. In press 36
46. Vasios, G . , Nishimura, I . , Konomi, H . , 67. Chu, M . -L , de Wet, W . , Bernard, M . ,
van der Rest, M . , Ninomiya, Y . , e t a!. Ding, I.-F . • Morabito , M . , et a1. 1984.
1 988. J. Bioi. Chern. 263:2324--29 Na ture 3 10:337-40
47. Gordon , M . K . , Gerecke, D. R. , Olsen, 68. Monson, J. M . , Friedman , J . , McCar­
B. R. 1 987. Proc. Natl. Acad. Sci. USA thy, B. J. 1 982. Mol. Cell. Bioi.
84:6040---44 2 : 1 362-7 1
48. Gordon. M . K . , Gerecke, D. R . , Dub­ 69. Harbers, K . , Kuehn, M . , Delius, H . ,
let, B . , van der Rest , M . , Olsen, B. R. J aenisch, R . 1984. Proc. Na tl. Acad.
1 989. J. B ioi. Chern. 264 : 1 9772-78 Sci. USA 8 1 : 1 504--8
49. Ninomiya, Y., Olsen, B. R. 1984. ProC. 70. Schnieke, A. , Dziadek, M . , Bateman,
Natl. A ;:ad. Sci. USA 8 1 :30 14-- 1 8 J . , Mascara, T . , Harbers, K . , et a!.
50. N i nomiya , Y . , Gordon, M . , van der 1 987. Proc. Natl. Acad. Sci. USA
Rest, M . , Schmid, T. , Linsenmayer, T . , 84:764-68
e ! al. 1 986. J . Bioi. Chern. 261 :5041-50 7 1 . Upholt, W. B . , Sandell, L. J . 1 986.
5 1 . Yamaguchi, N . , Benya, P. D . , van der Proc. Natl. Acad. Sci. USA 83:2325-
Rest, M . , Ninomiya, Y. 1 989. J. BioI. 29
Chern. 264: 1 6022-29 72. Upholt, W. B . , Strom, C. M . , Sandell,
52. Pihlaj aniemi, T . , Tryggvason, K . , My­ L. J. 1985. Ann. NY Acad. Sci,
ers, J. C . , Kurkinen, M . , Lebo, R . , et 460:1 30-40
a!. 1 985. J. Bioi. Chern. 260:768 1-87 73. Upholt , W. B . 1 989. Collagen, Vol. IV:
53 . Soininen, R . , Haka-Risku, T . , Prockop , Molecular Biolog y, ed. B. R . Olsen, M .
D. J . , Tryggvason, K. 1 987. FEBS Lett. E . Nimni, pp. 3 1-49. Boca Raton, Fla:
225: 1 88-94 eRC
54. Brazel, D . , Oberbiiumer, I . , Dieringer, 74. Cheah, K. S. E . , Stoker, N. G . , Griffin,
H . , B ab el , W. , Glanville, R. W . , et al. J. R . , Grosveld, F. G . , Solomon , E.
1 987. Eur. J. Biochem. 168:529-36 1 985. Proc. Natl. Acad. Sci. USA
55. Hostikka, S. L . , Tryggvason, K. 1988. 82:2555-59
J. Bioi. Chern. 263 : 1 9488-93 75. S toker , N. G . , Cheah, K. S. E . , Griffin,
56. Brazel, D . , PoUner, R . , Oberbiiumer, I . , J. R . , Pope, F. M . , Solomon, E. 1985.
K uhn , K. 1988 . Eur. J . Biochem. Nucleic Acids Res. 1 3:4613-22
1 72:35�n 76. Sangiorgi. F. D . , Benson-Chanda, V . ,
57. Chu, M . -L. , Mann, K. , Deutzmann , R . , de Wet, W . J., Sobel , M . E. , Tsipouras ,
870 VUORIO & DE CROMBRUGGHE

P . , et al. 1985. Nucleic Acids Res. 98. Tromp, G . , Prockop, D . J. 1 988.Proc.


1 3 : 2 207-25 Natl. Acad. Sci. USA 85:5254-58
77. Sangiorgi, F. 0 . , Benson-Chanda, V . , 99. Weil, D . , D 'Alessio, M . , R amire z , F.,
d e Wet, W. J . , Sobel, M . E . , Ramirez, d e Wet, W . , Cole, W . G . , e t a l . 1989.
F. 1985 . Nucleic Acids Res. 13: 2815-26 EMBO J. 8: 1 705-1 0
7 8 . Nunez, A. M . , Kohno, K . , Marti n , G. 100. Wei! , D. , D ' Alessio , M . , Ramirez, F . ,
R . , Yamada, Y . 1986. Gene 44: 1 1 -16 Steinmann, B . , Wirtz, M . K . , et al.
79. Yamada, Y. , Mudryj , M . , Sullivan, M . , 1989. J. BioI. Chern. 264: 1 6804-9
d e Crombrugghe, B . 1983. J. Bioi. 101. Bonadio, J . , Bycrs, P. H. 1985. Nature
Chern. 258: 2758-61 3 1 6 : 363-66
80. Yamada, Y . , Liau , G . , Mudryj, M . , 1 02 . Williams, C. J . , Prockop, D. J . 1983. J.
Obici, S . , d e Crombrugghe, B . 1984. BioI. Chern. 258:5915-21
Nature 3 10:333-37 103. Prockop, D. J . , Kivirikko, K. 1. 1 984.
8 1 . Chu , M . -L. , Weil, D . , de Wet, W . , New Engl. J. Med. 31 1 :376-86
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

Bernard, M. , Sippola, M . , e t a l . 1985. 104. Vogel, B . E . , Doelz, R . , Kadler, K. E . ,


J. Bioi. Chern. 260:43 5 7-63 Hojima, Y . , Engel, J . , e t al. 1 988. J .
82. Benson-Chanda, V . , Su, M . -W. , Weil, BioI. Chern. 263 : 19249-55
D . , Chu, M . -L. , Ramirez, F. 1 989. 1 05 . Kuivaniemi, H . , Sabol, C. , Tromp, G . ,
Gene 78:255-65 Sippola-Thieie, M . , Prockop, D . J .
by Boston University on 05/09/13. For personal use only.

83. Wood, L . , Theriault, N . , Vogeli, G. 1 988. J. Bioi. Chern. 263 : 1 1407- 1 3


1 987 . Gene 61: 225-30 106. Prockop, D . J . , Olsen, A . , Kontusaari ,
84. Ramirez, F. , Bernard , M . , Chu, M . -L., S . , Hyland, J . , Ala-Kokko, L . , et al.
Dickson, L . , Sangiorgi, F. , et al. 1985. 1990. Ann. NY Acad. Sci. 580: 330-3 9
Ann. NY Acad. Sci. 460: 1 17-29 107. Starman, B . J . , Eyre, D . , Charbonneau,
85 . D ' Alessio, M . , Ramirez, F. , Suzuki, H . , Harrylock, M . , Weis, M. A . , et al.
H .. Solursh, M . , Gambino. R. 1989. 1 989. J. Clin. Invest. 84: 1 206-14
Proc. Natl. Acad. Sci. USA 86:9303-7 108. Pihlajaniemi , T . , Dickson, L . A . , Pope,
86. Prockop, D. J . , Constantinou, C. D . , F. M . , Korhonen, V. R . , Nicholls, A . ,
Dombrowski, K. E . , Hojima, Y . , Kad­ et al. 1 984. 1. Bioi. Chern . 259: 1 2941-
ler, K. E . , et al. 1 989. Arn. J. Med. 44
Genet. 34:60-67 109. Bateman, J . F. , Lamande, S. R . , Dahl,
87. Byers, P. H . , Bonadio, J . F . , Cuhn, D. H . -H. M . , Chan, D . , Mascara, T . , et al.
H . , Starman, B . J . , Wenstrup, R. J . , et 1 989. J. Rial. Chern. 264:10960-64
al . 1 988. Ann. NY Acad. Sci. 543: 1 17- 1 1 0. Willing, M. C . , Cohn, D. H . , Byers, P.
28 H. 1990. J. Clin. Invest. 85:282-90
88. Byers, P. H. 1989. Arn. J . Med. Genet. 1 1 1 . Yamada, Y . , KUhn, K . , de Crombrug­
34:72-80 ghe, B. 1 983. Nucleic Acids Res.
89. Tsipouras, P. , Ramirez, F. 1987. J. 1 1 : 2 733-44
Med. Genet. 24:2-8 1 1 2 . Myers, J. c . , Dickson, L. A . , de Wet,
90. Lamande, S . R . , Dahl, H . -H . M . , Cole, W. J . , Bernard, M. P . , Chu, M .-L. , et
W. G . , Bateman, J. F. 1 989. 1. Bioi. al. 1983 . J. BioI. Chern. 258:10128-
Chern. 264: 1 5809- 1 2 35
9 1 . Lee, B . , Vissing, H . , Ramirez, F. , Rog­ 113. Aho, S . , Tate, V . , Boedtker, H. 1983.
ers, D . , Rimoin, D. 1989. Science Nucleic Acids Res. 1 1 :5443-50
244:978-80 1 1 4 . Penttinen, R. P . , Kobayashi, S . , Bom­
92. Vissing, H . , D'Alessio, M . , Lee, B . , stein, P. 1988. Proc. Natl. Acad. Sci.
Ramirez, F . , Godfrey, M . , e t al. 1989. USA 85 : 1 1 05-8
J. Bioi. Chern. 264: 1 8265-67 115. D 'Alessio, M . , Bernard, M . , Pre­
9 3. Tromp , G. , Kuivaniemi, H . , Shikata, torious , P. J . , de Wet, W . , Ramirez, F.
H . , Prockop , D. J . 1 989. J. Bioi. Chern. 1988. Gene 67: 1 05- 1 5
264 : 1 349-52 1 1 6. Su , M . -W . , Benson-Chanda, V . , Vis­
94. Supcrti-Furga, A., Steinmann, B . , sing, H . , Ramirez, F. 1989. Genornics
Ramirez, F. , Byers, P . H . 1 989. Hurn. 4:438-41
Genet. 82:104-8 1 1 7. Ryan, M. C . , Goldring, M. B . , Sandell,
95 . Stacey, A . , Bateman, J . , Choi, T . , Mas­ L . J . 1989. J. Cell Bioi. 109:43a
cara, T . , Cole, W . , et al. 1988. Nature (Abstr.)
332:131-36 1 18. Kohno, K . , Martin , G. R . , Yamada, Y .
96. Byers, P. H . , Starman , B. J . , Cohn, D. 1 984. J. BioI. Chern. 259:13668-73
H . , Horwitz, A. L. 1988. J. Bioi. Chern. 1 1 9. Kohno, K . , Sullivan, M . , Yamada, Y .
263:7855-61 1985 . J . BioI. Chern. 260 :4441-47
97. Weil, D . , Bernard, M . , Combates, N . , 1 20 . Lawler, J . , Hynes, R. O. 1 986. J. Cell
Wirtz, M . K. , Hollister, D . W . , et at. Bioi. 103: 1 635-48
1 988. J. BioI. Chern. 263:8561-64 1 2 1 . Titani, K . , Kumar, S . , Takio, K . , Erics-
COLLAGEN GENES 87 1

son, L. H . , Wade, R. D . , et al. 1 986. Kaytes, P. S. 1986. FEBS Lett. 206:29-


Biochemistry25: 3 1 7 1-84 32
1 22 . Fieischmajer, R . , Perlish, J. S . , Timpl, 1 4 3 . Schwarz-Magdolen, U., Oberbaumer,
R. 1 98.5. Ann. NY Acad. Sci. 460:246- I . , Kuhn, K. 1 986. FEBS Lett. 208:203-
57 7
1 23 . Bomstein, P . , Sage , H. 1 989. Prog. 144. Hofmann, H . , Voss, T. , Kuhn, K .
Nucleic Acid Res. Mol. Bioi. 37:67-106 1 984. J. Mol. Bioi. 1 72:325--43
1 24 . Svoboda, K. K . , Nishimura, I . , Sugrue, 145. Soininen, R . , Tikka, L. , Chow, L . ,
S. P . , Ninomiya, Y . , Olsen, B. R . Pihlajaniemi, T . , Kurkinen, M . , e t al.
1 988. Proc. Natl. Acad. Sci. USA 1 986. Proc. Natl. Acad. Sci. USA
85:7496-500 83 : 1 568-72
1 25 . Irwin, M. H . , Mayne, R. 1 986. J. Bioi. 146. Soininen, R . , Chow, L . , Kurkinen, M . ,
Chern. 261 : 1 62 8 1 -83 Tryggvason, K . , Prockop, D . J . 1 986.
1 26. Eyre, D. R . , Apon, S . , Wu, J . -J . , Erics­ EMBO J. 5:282 1 -23
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

son, L. H. , Wal sh, K. A. 1 987. FEBS 1 47 . Soininen, R . , Huotari , M . , Ganguly,


Lett. 220:337-4 1 A . , Prockop, D . J . , Tryggvason, K.
1 27 . van del' Rest, M . , Mayne, R. 1 988 . J. 1 9 89 . J. BioI. Chern. 264: 1 3565-7 1
Bioi. Chern. 263: 1 6 1 5- 1 8 1 4 8 . Kurkinen, M . , Bernard, M. P . , Barlow,
1 2 8 . van der Rest, M . , Mayne, R . 1987. See D. P . , Chow, L. T. 1 985. Nature
Ref. 2, pp. 1 95-2 1 9 3 1 7 : 1 77-79
by Boston University on 05/09/13. For personal use only.

1 29. Lozano, G . , Ninomiya, Y . , Thompson, 149. Killen, P. D . , Burbelo, P. , Sakurai, Y . ,


H . , Olsen, B. R. 1 985. Proc. Natl. Yamada . Y. 1 9 8 8 . J . Bioi. Chern.
Acad. Sci. USA 82 :4050--54 263:8706--9
1 29a. Nishimura, I . , Muragaki, Y . , Olsen, 1 50. Kurkinen, M . , Condon, M. R . , Blum­
B. R. 1 989. J. Bioi. Chern. 264:20033- berg, B . , Barlow, D. P . , Quinones, S . ,
41 e t al . 1 987. 1. Bioi. Chern. 262:8496-99
1 29b. Bennett, V . D . , Weiss, I . M . , Adams, 1 5 1 . Hostikka, S. L . , Tryggvason, K. 1 987.
S. L. 1 989. J. Bioi. Chern. 264: 8402-9 FEBS Lett. 224:297-305
1 30. Dublet, B . , Oh, S . , Sugrue , S. P . , Gor­ 1 52 . Blumberg, B . , MacKrell , A. J . , Fessler,
don, M . K. , Gerecke, D. R. , et al . J. H. 1 988. J. Bioi. Chern. 263 : 1 8328-
1 989. J. Bioi. Chern. 264: 1 3 1 50--5 6 37
1 3 1 . Sugrue, S . P . , Gordon, M. K . , Seyer, 1 53 . Timpl, R. , Engel, J . 1 987. See Ref. 2,
J . , Dublet, B . , van der Rest, M . , et al. pp. 1 05-43
1 989. J. Cell Bioi 1 09:939-45 1 54 . Triieb, B . , Schaeren-Wiemers, N . ,
1 32. Schmid, T. M . , Linsenmayer, T. F. Schreier, T. , Winterhalter, K . H . 1 989.
1 987. See Ref. 2, pp. 223-59 J. Bioi. Chern. 264 : 1 36--40
1 33 . Sage, H . , Bomstein, P. 1 987. See Ref. 1 5 5 . Furthmayr, H . , Wiedemann, H . , Timpl,
2, pp. 1 73-94 R . , Odermatt, E . , Engel, J. 1 983.
1 34 . LuValle, P . , Ninomiya, Y . , Rosenblum, Biochem. J. 2 1 1 :303- 1 1
N . D . , Olsen, B. R. 1 988. J. Bioi. 1 56. Sandberg, M . , Tamminen, M . , Hir­
Chern. 263: 1 8378-85 vonen, H . , Vuorio, E . , Pihlajaniemi, T.
1 35 . Glanville, R. W. 1 987. See Ref. 2, pp. 1989. J. Cell BioI. 109:137 1-79
43-79 1 57 . Tikka, L . , Pihlajaniemi , T . , Henttu, P . ,
1 36. Butkowski, R. J . , Langeveld, J. P. M . , Prockop, D. J . , Tryggvason, K . 1 98 8 .
W ies l ander , J . , Hamilton, J . , Hudson, Proc. Natl. Acad. Sci. USA 85:749 1 -
B. G. 1 9R7. J. Bioi. Chern. 262:7874- 95
77 1 5 8 . Kramer, J . M . , Cox, G . N . , Hirsh, D .
1 37 . Saus, J . , Wieslander, J . , Langeveld, J . 1 982. Cell 30:599-606
P. M . , Quinones, S . , Hudson, B . G . 1 59. Monson, J. M . , Natzle, J . , Friedman,
1 988. J. Bioi. Chern. 263 : 1 3374-!l0 J . , McCarthy, B. J. 1 91;2. Proc. Natl.
1 3 8 . Oberbaumer, I . , Laurent, M . , Schwarz, Acad. Sci. USA 79: 1 76 1 -65
U . , Sakurai, Y. , Yamada, Y . , et al. 1 60. Kramer, J. M . , Johnson, J. J . , Edgar, R .
1 98 5 . Eur. J. Biochem. 1 47 : 2 1 7-24 S . , Basch, C . , Roberts, S . 1 988. Cell
1 39. Nath, P. , Laurent, M . , Hom, E . , Sobel, 5 5 : 555-65
M . E . , Zon, G . , et al. 1 986. Gene 1 6 1 . von Mende, N . , Bird, D. M . , Albert, P .
43:301-4 S., Riddle, D . L. 1 988. Cell 55:567-76
1 40 . Wood, L. , Theriault, N . , Vogeli, G . 162. Cox, G. N . , Fields, C . , Kramer, J. M . ,
1 988. FEBS Lett. 227:5-8 Rosenzweig, B . , Hirsh, D . 1989. Gene
1 4 1 . Schwarz, U . , Schuppan , D . , Oberbaum­ 76:33 1--44
er, I . , Glanville, R. W. , Deutzmann, 1 63 . Cox, G. N . , Kramer, J . M . , Hirsh, D.
R . , et al. 1 986. Eur. J. Biochem. 1 934. Mol. Cell. Bioi. 4:2389-95
1 57 :49-56 1 64 . Tryggvason, K . , Soininen, R . , Hostik­
1 42 . Vogeli , G. , Hom, E . , Carter, J . , ka, S. L. , Ganguly, A . , Huotari, M . ,
872 VUORIO & DE CROMBRUGGHE

Prockop, D. J . 1990. Ann. NY Acad. 1 85 . Maity, S. N . , Golumbeck, P. T . ,


Sci. 580:97- 1 1 1 Karsenty, G . , de Crombrugghe, B .
165. Venkatesan, M . , de Pablo, F . , Vogeli, 1988. Science 241 :582-85
G . , Simpson, R. T. 1 986. Proc. Natl. 1 86. Liau, G . , Szapaty, D . , Setoyama, C . , de
Acad. Sci. USA 83:335 1-55 Crombrugghe , B . 1986. J. Bioi. Chern.
1 66. Khillan, J. S . , Schmidt, A . , Overbeek, 261 : 1 1 362-68
P. A . , de Crombrugghe, B . , Westphal, 1 87. McKeon, C . , Pastan, I . , de Crombrug­
H. 1986. Proc. Natl. Acad. Sci. USA ghe, B . 1984. Nucleic Acids Res.
83:725-29 1 2:3491-502
167. Yamada, Y . , Miyashita, T . , Savagner, 1 88 . Barsh, G. S . , Roush, C. L . , Gelinas, R .
P . , Horton , W . , Brown, K. S . , et al. E . 1984. J. Bioi. Chern. 259: 1 4906-13
1 990. Ann. NY Acad. Sci. 580:8 1-87 1 89. Breindl, M . , Harbers, K . , Jaenisch, R .
1 68 . Lovell-Badge, R. H . , Bygrave, A . , 1984. Cell 38:9-16
Bradley, A. , Robertson, E . , Tilly, R . , et 1 90. Ignotz, R. A . , Massague, J . 1 986. J.
Annu. Rev. Biochem. 1990.59:837-872. Downloaded from www.annualreviews.org

al. 1987. Proc. Natl. Acad. Sci. USA Bioi. Chern. 261 :4337-45
84:2803-7 191. Roberts, A. B . , Sporn, M . B . , Assoian,
1 69. Rossi, P . , de Crombrugghe, B .
1987. R. K . , Smith, J. M . , Roche, L. M . , et
Proc. Natl. Acad. Sci. USA 84:5590- al. 1986. Proc. Natl. Acad. Sci. USA
94 83:4 1 67-7 1
J., Jimenez,
by Boston University on 05/09/13. For personal use only.

1 70. Hartung, S . , Jaenisch, R. , Breindl, M . 1 92. Varga, S. A. 1986.


1986. Nature 320:365-67 Biochern. Biophys. Res. Cornrnun.
171. Jaenisch, R . , Harbers, K . , Schnieke, 138:974-80
A . , Lohler, J . , Chumakov, I . , et al . 193. Goldring, M. B . , Krane, S. M. 1987 . J.
1 983. Cell 32:209-1 6 Bioi. Chern . 262:16724--29
1 72. Jahner, D . , Jaenisch, R. 1985. Nature 194. Soliz-Herruzo, J . A . , Brenner, D. A . ,
3 1 5 :594-97 Choj kier, M . 1988. J . Bioi. Chern.
1 7 3 . Kratochwil, K . , von der Mark, K . , Kol­ 263:584 1-45
lar, E. J . , Jaenisch, R . , Mooslehner, K . , 195. Oikarinen, J . , Hatamochi, A . , de Crom­
et al. 1989. Cell 57:807-16 brugghe, B. 1987. J. Bioi. Chern.
1 74. Rossouw, C. M . S . , Vergeer, W . P . , 262: 1 1 064--70
duPlooy, S . J . , Bernard, M . P . , 196. Rossi, P . , Karsenty, G . , Roberts, A. B . ,
Ramirez, F. , e t al. 1987 . J . Bioi. Chern. Roche, N . S . , Sporn, M . B . , et al. 1988.
262: 1 5 1 5 1-57 Cell 52:405-14
175. Bornstein, P. , McKay, J . , Hoishima, 1. 197. Raghow, R . , Postlethwaite, A . E . , Kes­
K . , Devaraya1u, S . , Gelinas, R. E. ki-Oja, J . , Moses, H. L. , Kang, A. H .
1 987. Proc. Natl. Acad. Sci. USA 1 987. J. Clin. Invest. 79: 1285-88
84:8869-73 198. Pertovaara, L . , Sistonen, L. , Bos, T. J . ,
1 76. Bornstein, P. , McKay, J. 1 988. J. Bioi. Vogt, P . K . , Keski-Oja, J . , e t al. 1989.
Chern. 263: 1603-6 Mol. Cell. BioI. 9: 1 255-62
1 77 . Bornstein, P . , McKay, J . , Liska, D. S . , 1 99. Mudryj , M . , de Crombrugghe, B . 1988.
Apne, S . , Devarayalu , S . 1988. Mol. Nucleic Acids Res. 1 6:75 1 3-26
Cell. Bioi. 3:485 1-59 200. Ruteshouser, E. C . , de Crombrugghe,
1 7 8 . Yamada, Y. , Mudryj , M . , de Crom­ B. 1 989. 1. Bioi. Chern. 264: 1 3740-
brugghe, B. 1983. J. Bioi. Chern. 44
258: 1 49 14-1 9 201 . Horton, W . , Miyashita, T . , Kohno, K . ,
1 79. d e Crombrugghe, B . , Schmidt, A . , Hassell , J . R. , Yamada, Y . 1987. Proc.
Liau, G . , Setoyama, C . , Mudryj , M . , et Natl. Acad. Sci. USA 84:8864--68
al. 1985. Ann. NY Acad. Sci. 460: 1 54- 202. Poschl , E . , Pollner, R . , Kiihn, K. 1 988.
62 EMBO J . 9:2687-95
1 80. Rossi, P., de Crombrugghe, B. 1 98 7 . 203. Soininen, R . , Huotari, M . , Hostikka, S .
Nucleic Acids Res. 1 5 :8935-56 L . , Prockop, D . J . , Tryggvason, K .
181. Bornstein, P . , McKay, J . , Devarayalu, 1988. 1 . Bioi. Chern. 263: 1 72 1 7-20
S . , Cook, S. C. 1988. Nucleic Acids 204. Burbel0, P. D . , Martin, G. R . , Yamada,
Res. 16:972 1-36 Y. 1988. Proc. Natl. Acad. Sci. USA
1 82. Karsenty, G . , Go1umbeck, P. T . , de 85:9679-82
Crombrugghe, B. 1988. J. BioI. Chern. 205. Kaytes, P . , Wood, L . , Theriault, N . ,
263 : 1 3909-15 Kurkinen, M . , VogeJi , G . 1988. J . Bioi.
1 83 . Brenner, D. A . , Rippe, R. A . , Veloz, L . Chern. 263: 1 9274--77
1989. Nucleic Acids Res. 1 7:6055-63 206. Burbelo, P. D . , Klotman, P. , Brugge­
1 84. Hatamochi, A . , Golumbeck, P. T . , Van man, L. , Clement, B . , Yamada, Y.
Schaftingen, E . , de Crombrugghe, B . 1990. Critical Reviews in Eucaryotic
1988. J. BioI. Chern. 263:5940- Gene Expression. Boca Raton, Fla:
47 CRe. In press

You might also like