Professional Documents
Culture Documents
Comparative Hzbridiyation
Comparative Hzbridiyation
Key Words array CGH, clinical genetics, cancer genetics, genomic instability,
genome profiling
■ Abstract Altering DNA copy number is one of the many ways that gene expres-
sion and function may be modified. Some variations are found among normal individ-
uals (14, 35, 103), others occur in the course of normal processes in some species (33),
and still others participate in causing various disease states. For example, many defects
in human development are due to gains and losses of chromosomes and chromoso-
mal segments that occur prior to or shortly after fertilization, whereas DNA dosage
alterations that occur in somatic cells are frequent contributors to cancer. Detecting
these aberrations, and interpreting them within the context of broader knowledge, fa-
cilitates identification of critical genes and pathways involved in biological processes
and diseases, and provides clinically relevant information. Over the past several years
array comparative genomic hybridization (array CGH) has demonstrated its value for
analyzing DNA copy number variations. In this review we discuss the state of the art
of array CGH and its applications in medical genetics and cancer, emphasizing general
concepts rather than specific results.
The relative hybridization intensity of the test and reference signals at a given lo-
cation is then (ideally) proportional to the relative copy number of those sequences
in the test and reference genomes. If the reference genome is normal then increases
and decreases in signal intensity ratios directly indicate DNA copy number varia-
tion within the genome of the test cells. Data are typically normalized so that the
modal ratio for the genome is set to some standard value, typically 1.0 on a linear
scale or 0.0 on a logarithmic scale. Additional measurements such as fluorescent
in situ hybridization (FISH) or flow cytometry (62) can be used to determine the
actual copy number associated with a ratio level.
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
Array CGH has been implemented using a wide variety of techniques. The
initial approaches used arrays produced from large-insert genomic clones such as
bacterial artificial chromosomes (BACs) (70, 85). Producing sufficient BAC DNA
of adequate purity to make arrays is arduous, so several techniques to amplify
small amounts of starting material have been employed. These techniques include
ligation-mediated polymerase chain reaction (PCR) (84), degenerate primer PCR
using one (34) or several (24) sets of primers, and rolling circle amplification (80).
BAC arrays that provide complete genome tiling paths are now available (43, 47a,
50). Arrays made from less complex nucleic acids such as cDNAs (71), selected
PCR products (17, 59), and oligonucleotides (7, 10) are also used. Although most
CGH procedures employ hybridization with total genomic DNA, some use re-
duced complexity representations of the genome produced by PCR techniques.
Computational analysis of the genome sequence is used to design array elements
complementary to the sequences contained in the representation (55). Currently,
various single nucleotide polymorphism (SNP) genotyping platforms, some of
which use reduced complexity genomic representations, are being evaluated for
their ability to determine both DNA copy number and allelic content across the
genome (106, 107).
The different basic approaches to array CGH provide different levels of per-
formance, so some are more suitable for particular applications than others. The
factors that determine the performance requirements include the magnitudes of
the copy number changes, their genomic extents, the state and composition of the
specimen, how much material is available for analysis, and how the results of the
analysis will be used (Figure 1b). Many applications require reliable detection
of copy number changes of much less than 50%, a more stringent requirement
than for other microarray technologies. Note that technical details are extremely
important and different implementations of the “same” array CGH approach may
yield different levels of performance.
TECHNICAL CONSIDERATIONS
Hybridization Signals
The major technical challenge of array CGH is generating hybridization signals
that are sufficiently intense and specific so that copy number changes can be de-
tected. The signal intensity on an array element is affected by a number of factors
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
nonlinearly related to genomic abundance due to processes that affect the test
and reference genomes equally, such as saturation of array elements or reassoci-
ation of double-stranded nucleic acids during hybridization (69). The alternative
strategy—hybridization of a single genome to an array and comparison of the
result to a set of historical controls—places more stringent requirements on repro-
ducibility of array manufacture and hybridization conditions to avoid reduced data
quality.
The complexities of the genomic DNA and of the DNA in the array elements
significantly affect signal intensities and thus play a dominant role in determining
the genomic resolution of different array CGH technologies. For example, copy
number information from genomes such as bacteria and yeast (28, 100) is easer
to obtain than from mammalian genomes, which are 100 to 1000 times larger,
because the concentration of each portion of the genome in the hybridization is
correspondingly higher. Similarly, due to a number of complex kinetic factors, ar-
ray elements made from genomic BAC clones (complexity ∼100–200 kb) typically
provide more intense signals than elements employing shorter sequences such as
cDNAs, PCR products, and oligonucleotides. The higher signals from the more
complex array elements result in better measurement precision, allowing detection
of single-copy transition boundaries—even in specimens with a high proportion
of normal cells—and localization of copy number transitions to a fraction of the
length of the array element in some circumstances (3).
Arrays with low complexity elements can potentially provide better genomic
resolution than BAC arrays if their measurement precision is adequate for the ap-
plication. The advantages of using shorter sequences, including the opportunity
to design arrays directly from the genome sequence, the ability to use the same
arrays for expression and genomic analysis, and the possibility of higher genomic
resolution, drive efforts to improve array performance with low complexity ele-
ments. Single-copy changes on individual array elements have been detected on
sequences as short as several kilobases (59), and even several hundred kilobases
(17), but that is not currently possible with oligo arrays (4, 10, 77). Figure 2 il-
lustrates the relationship of measurement precision and genomic resolution for
analyzing a single-copy deletion boundary using arrays of elements made from
BACs, fosmids, and PCR products of several kilobases in length. As indicated
above, some measurement approaches reduce the complexity of the genomic DNA
in order to increase signal intensities and allow the use of low complexity array
elements (77, 106, 107). Published data from these procedures indicate the noise
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
levels are too high to allow detection of single-copy changes affecting single array
elements.
from genomic and cDNA clones, overwhelming the signal due to the unique se-
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
constitution and has true copy number levels ranging from 0 to 3 as determined by
FISH. The measured ratios for this sample, and published data (98), demonstrate
that this simple model provides an accurate description of the behavior of some
←
Figure 3 Relationship of measured ratios to copy number change. (a) Calculated ra-
tios (linear representation on the left, logarithmic on the right) are shown as a function
of copy number using a simple model (68) that includes the signal from unsuppressed
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
repetitive sequences and nonspecific hybridization. The ratios are plotted relative to
the normalized copy number, which is set to 1.0 for the median copy number in the
genome. The heavy line shows the dependence when the signal is entirely due to se-
quences uniquely associated with the locus corresponding to the array element. The
five lighter lines show the dependence when test and reference signals on the array
element include a bias equal to 10%, 20%, 30%, 40%, or 50% of the signal that would
be present when the normalized copy number of the locus is 1. The round circles
indicate the ratios corresponding to true copy number of 0, 1, 2, and 3 found in the
profile in part b of the figure. [The model assumes that the bias, β, is proportional to
the total amount of genomic DNA used in the hybridization, but independent of the
copy number of a particular locus because it is generated by sequences distributed
throughout the genome. Because the unique sequence signal on an array element is
also proportional to the amount of genomic DNA, after normalization one can write
that the test signal is C + β, where C is the copy number of the locus normalized
to the median, or any other similar value, for the genome, and the reference signal
is 1 + β. Thus, ratio = (C + β) / (1 + β). Lines show behavior for β = 0, 0.11,
0.25, 0.43, 0.67, and 1.0]. (b) Ratio profile of a variant of cell line HCT-116 while
undergoing selection for resistance to methotrexate (83). Array comparative genomic
hybridization (CGH) was performed using the bacterial artificial chromosome (BAC)
arrays, and actual copy number levels for the parental HCT-116 cells were previously
determined using fluorescent in situ hybridization (FISH) (84). The ratios were directly
calculated from the background-corrected test and reference signal intensities for each
element. An overall normalization factor was applied to set the median Log2 ratio = 0.
No other computational adjustments were employed. The cell line contains a well es-
tablished homozygous deletion on chromosome 16p, Log2 ratio ∼−3.2 in this analysis,
as well as single-copy deletions, Log2 ratio ∼−0.8, and single-copy gains, Log2 ratio
∼0.5. Plotting these points on part a of the figure demonstrates that in this data set
the typical bias on the array elements was equal to approximately 10% of the diploid
signal level, and the response slopes for all array elements were very similar. Individual
clones with ratios much different from 0 indicate copy number polymorphisms, focal
aberrations, or noise. Close examination of the ratios indicates that some genomic
regions are heterogeneous in copy number in this population, presumably due to the
ongoing selection. In particular, the ratio on chromosome 5q, the site of the DHFR
gene, the target of methotrexate, is slightly higher than other regions of the HCT116
genome that are characteristically present at 3 copies. (Unpublished data courtesy of
A. Snijders.)
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
array CGH systems. If the magnitude of the biases differs significantly among
array elements, for example due to different repetitive sequence content, then the
elements will reproducibly follow different curves in Figure 3a. Such behavior may
lead to false indications of recurrent copy number structure within a region where
the aberrant copy number is actually constant, thus producing false indications of
the potential locations of critical genes.
The performance of an array system for measuring heterogeneous specimens,
for example tumor specimens containing normal cells, can be estimated by first
establishing its behavior with a well characterized homogeneous specimen to de-
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
termine the effective bias level (see Figure 3). The expected ratio changes in the
heterogeneous specimen can then be obtained using the measured response curve
in conjunction with values of the normalized copy number appropriate for the
aberrations in the specimens to be measured. For example, a single-copy loss in a
homogeneous (near) diploid cell population results in a normalized copy number
of 0.5, whereas it is 0.75 if that cell population is mixed with an equal number of
normal cells. Comparing the expected ratio changes with the noise level expected
for the specimen then allows determination of the ranges of copy number change
and specimen heterogeneity for which acceptable performance might be expected.
Finally, this simple model does not describe the behavior of the measurements if
the effective biases on array elements have contributions from autofluorescence,
differential nonspecific behavior due to the labels in the genomic DNA, high levels
of nonspecific binding to the array substrate, or if the measurement process has ar-
tifacts introduced by nonlinearities in the imaging systems or characteristics of the
image analysis software. These effects may lead to very complex and idiosyncratic
behavior among the array elements.
CGH measurements are also affected by low-copy reiterated sequences that
are common to all individuals. Low-copy reiterated sequences include members
of gene families and blocks of duplicated sequences (20, 21, 56). If a locus that
contains such a sequence is changed in copy number, the corresponding ratio
change may underestimate the magnitude of the aberration because the other loci
with copies of that sequence remain at normal copy number (105). Conversely, all
loci that contain a copy of the sequence may show a ratio change when one locus
is altered (54, 98).
Specimen Preparation
The quality of genomic DNA preparations has a substantial effect on the resulting
data. Although isolating genomic DNA from fresh and frozen specimens is routine
through use of numerous published protocols and commercial kits, there appears to
be an unknown class of contaminants that occasionally copurify with the DNA and
produce abnormally high “noise” in the ratios. This noise is typically not random
because relabeling a different aliquot of the same DNA reproduces exactly the same
noisy pattern. Repurifying or, better, reisolating the DNA may help significantly.
DNA quality issues are especially acute when analyzing formalin-fixed archival
tissue. Data obtained from such specimens can range from excellent, basically
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
ments. Typical array CGH procedures use 300 ng to 3 µg of specimen DNA in the
labeling reaction, equivalent to ∼50,000 to 500,000 mammalian cells. Most proto-
cols employ random primer labeling, which also amplifies the DNA, so that several
micrograms are used in the hybridization. The need to obtain analyses from small
specimens, or small regions of heterogeneous specimens, has motivated significant
efforts to develop whole-genome amplification procedures. The strand displacing
polymerase φ29 has been used when the genomic DNA is present in long frag-
ments, permitting analysis of nanogram quantities (36, 48). Several companies
offer kits for such amplifications. DNA from formalin-fixed specimens is typically
too short for this approach. A number of other procedures including degenerate
primer PCR (12, 16), two-stage random primer labeling reactions (16), balanced
PCR (96), ligation-mediated PCR (31, 52, 87), and ligation circularization of de-
graded DNA (97) have also been used for DNA from both fresh/frozen and fixed
specimens. The PCR generation of genomic representations used in some tech-
niques also amplifies the DNA, allowing analysis of tens to hundreds of nanograms
of input DNA (77, 106, 107). The judgment on how well any of these techniques
work depends on the requirements of the desired application (Figure 1b).
Data Analysis
A number of primary processing approaches have been applied to obtain ratio
profiles. Normalization in some cases involves only a simple overall factor to set
the median ratio to some standard value, whereas in others additional procedures
based on spatial and intensity dependence and historical data specific to each array
element may also be applied. Occasionally, genomes have so much copy number
variation (Figure 4, upper panel;) that the biological significance of the normal-
ization is uncertain because only a very small proportion of the genome is at the
normal ratio. Some platforms use data from a single hybridization, whereas others
combine data from two measurements with dye reversal. Using data adjustment
procedures without understanding the underlying processes responsible for the dis-
tortions, or a robust phenomenological validation that the procedures are stable and
give reasonable results in control specimens, runs the risk of introducing systematic
errors.
Although the major aberrations in a genome are frequently evident by inspec-
tion, many approaches to improve interpretation in the face of measurement noise
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
have been developed. The simplest is to apply thresholds. If the ratio profile has
only a few well spaced ratio levels, thresholds can be chosen by examining the
distribution of all measured ratios (34). However, many tumors, owing to their
nondiploid genomes and/or heterogeneity, have closely spaced ratio levels that
partially overlap due to measurement noise, so this approach is not capable of dis-
tinguishing them. Smoothing by averaging the ratios on neighboring array elements
improves the behavior of thresholding but blurs the locations of boundaries and
reduces the amplitude of aberrations involving fewer elements than the smoothing
window.
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
More sophisticated analytical approaches employ the fact that the copy number
changes involve chromosome segments, so ratios at contiguous sets of loci should
be identical, except for occasional abrupt steps to a new level. These methods
statistically assess the status of each array element in the context of its neigh-
bors. Among the approaches that have been used are Hidden Markov Models (27);
change point analysis (66), adaptive weights smoothing (38), Baysean maximum a
posteriori probabilities (13), and ratio clustering (99), and many more are being de-
veloped. Several software packages are available at http://www.bioconductor.org/.
Statistical approaches limited to examination of ratio profiles cannot evaluate the
reliability of an aberrant ratio that affects only a single array element. The underly-
ing image data needs to be examined to determine its quality, and the interpretation
needs be accomplished in light of experience. Single-copy aberrations that affect
only one array element can be detected with high sensitivity and specificity with
some BAC technologies and may be highly informative (82).
APPLICATIONS
Variation in Normal Genomes
Differences in gene structure and variability in gene families produce DNA dosage
polymorphisms in the human genome (14, 35). A comprehensive understanding
of these normal variations is of intrinsic biological interest and is essential for
proper interpretation of array CGH data and its relation to phenotype. Array CGH
measurements using BAC arrays immediately revealed copy number polymor-
phisms (2). Figure 5 illustrates ratio variation in a publicly available data set (84)
for several clones that are contained in well-known polymorphisms. Each of these
clones shows a different magnitude and frequency of variation across the cell lines.
An analysis by Iafrate et al. (41) of 39 normal individuals with a different set of
∼2500 BACs found that about 10% of clones showed ratios consistent with a copy
number change in at least one individual, and on average each person had 12.4
variant ratios. About 100 clones showed variant ratios in more than one person,
with the most polymorphic loci showing variability in about half of the people.
Polymorphisms have also been detected using the representational oligonucleotide
microarray analysis (ROMA) approach to CGH (77). ROMA samples the genome
differently than BAC arrays because it employs reduced complexity genomic
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
in the human genome, the picture of this normal variation is incomplete. In results
reported to date, measurement noise has restricted detection to polymorphisms that
involve genomic segments of many kilobases or larger, genome coverage has been
far from comprehensive, and the population has not been adequately sampled. The
published studies also raise questions concerning their mutual consistency. For
example, Iafrate et al. (41) report three times as many distinct polymorphic loci on
their BAC arrays than do Sebat et al. (77), even though Sebat et al. use arrays with
30 times as many elements. Part of this difference may be due to the analytical
procedures and decision criteria employed in the two studies, and to the differences
in their technical limitations. The further elucidation of dosage polymorphisms
will remain an experimental rather than computational endeavor until high-quality
sequence is available on a large number of individuals. Understanding the copy
number polymorphisms that are detectable by a particular array CGH technique
is important so that normal variations are not falsely associated with disease,
and, conversely, to determine if some so-called normal variation may underlie
phenotypic characteristics such as disease susceptibility (89).
Mouse genomes also have a high frequency of copy number polymorphisms.
Comparative hybridization of 14 M. musculus strains using arrays of ∼19,000
mouse BACs selected to produce a tiling path of the sequenced portion of the
mouse genome found ∼350 clones that detected polymorphisms (50). Of these,
216 consistently showed loss relative to C57BL6, and 130 consistently showed
gains, with very few showing both gains and losses among the strains. Thus, about
2% of the clones were classified as polymorphic among the strains. Snijders et al.
(81) found a much higher frequency of apparent polymorphisms in the analysis of
multiple individual mice from seven M. musculus and two M. spretus strains using
arrays of ∼2000 clones, ∼1300 of which were nonredundant and nonoverlapping.
About 6% of these clones showed |log2 ratio| > 0.4 in one of six M. musculus strains
when compared to FVB/N. Clustering of the mouse strains based on either set of
polymorphisms was consistent with the known evolutionary relationships among
the strains. Although these two papers clearly demonstrate the presence of a large
number of copy number polymorphisms among the inbred mouse strains, there is
no obvious concordance among the sets of clones. Among the factors that could
contribute to the differences in the two studies is the method of clone selection.
The tiling arrays did not include portions of the genome that were not in the
physical map, and the smaller arrays were assembled prior to the existence of the
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
physical map and comprised BACs with mapped STS markers. Thus, the arrays of
Snijders et al. (81) might include regions of complex, and perhaps polymorphic,
genome sequences that are difficult to assemble. There may also be significant
differences in the criteria used to classify clones as polymorphic, but this remains
to be determined.
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
←
Figure 5 Examples of human DNA copy number polymorphisms. Each panel shows
the ratio values for array elements [bacterial artificial chromosome (BAC) clones] from
polymorphic loci in 15 different cell strains obtained from the Coriell Cell Repository.
The gray bands indicate the range of ratio variability for nonpolymorphic loci. Data
are from Snijders et al. (84) and are publicly available. The names of the BAC clones
used for the array elements are indicated in each panel. (a) The highly polymorphic
Lipoprotein(a) locus (14) on chromosome 6q shows a different ratio in nearly every
individual. This polymorphism is due to variation in the numbers of a 5-kb repeat
sequence in the gene. (b) Two partially overlapping BAC clones in the Defensin gene
cluster (35) on 8p show similar copy number variation in most individuals. However,
in several individuals there are clear differences in ratios of the clones, indicating
that a polymorphic structure occasionally affects the nonoverlapping portions of the
clones differently. The preponderance of negative ratios in these individuals presumably
reflects status of this polymorphism in the DNA used for the reference. For example,
if a person similar to GM04435 had been the reference, all others would have shown
positive ratios. (c) A polymorphism on 8q behaves as if there are two alleles in the
population (103). Most individuals have the genomic region present on both of their
chromosomes. Two individuals have a deletion that includes most or all of the BAC
on one of their copies of chromosome 8, based on the observed log2 ratio of ∼−1. One
individual is homozygous for the deletion, based on the observed very low ratio. (See
Figure 3 for dependence of ratio on copy number.)
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
number changes in 10% to 25% of cases (79, 93), leading to the identification of the
critical gene involved in coloboma, heart defects, atresia choanae, retarded growth
and development, genital hypoplasia, ear anomalies and deafness (CHARGE) syn-
drome (94). Even when aberrations are cytogenetically detectable, array CGH pro-
vides a significantly enhanced ability to map the events in multiple cases, thereby
facilitating identification of the critical gene (86). Numerous studies are underway
to use array CGH for analysis groups of phenotypically abnormal patients with
normal karyotypes. Many of these studies remain unpublished, pending confirma-
tion from family analyses that the aberrations are likely causal or due to the lack of
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
of discovery, one would like to perform the highest resolution studies that are pos-
sible in order to gather the information required to assess phenotypic effects of
particular aberrations. However, one might find apparent abnormalities whose con-
sequences are unknown, an especially acute problem for prenatal analysis. One
approach is to use arrays that only contain elements designed to detect aberrations
whose interpretation is well established (5, 75). However, even when the arrays
are targeted, the CGH analysis may not provide sufficient information to inter-
pret the potential phenotypic consequences of an aberration. Although FISH is
frequently discussed as a method to simply validate array results, its more funda-
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
mental role will likely be in performing additional studies suggested by the array
results in order to obtain additional essential diagnostic and counseling informa-
tion (102). Establishing the proper implementation of array CGH for different
clinical applications will require considerable thought because excessive caution
will inappropriately restrict the benefits that can be obtained, whereas proceeding
without sufficient consideration will lead to significant errors.
characteristics were due to the action of multiple genes. Zhang et al. (105) used a
high-resolution array of clones on chromosome 5p to analyze deletions in Cri du
Chat syndrome. They refined the localization of the regions of the chromosome,
which, when deleted, contributed to various aspects of the phenotype, including
the typical cry, facial features, and speech delay. In addition, they demonstrated
the interaction of various regions of 5p in producing the mental retardation phe-
notype, and, using whole-genome scanning, found that the most severely retarded
individuals frequently had dosage abnormalities in addition to 5p deletions. Array
CGH has also been used to investigate genomic alterations associated with neurofi-
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
COPY NUMBER PHENOTYPES Tumor genomes reveal a wide variety of copy num-
ber “phenotypes” indicating different types of genetic instability. For example,
colon tumors have long been known to have different levels and types of genomic
aberrations (63), which were eventually attributed to differences in mismatch re-
pair competence (6, 22). Analysis of mismatch repair-proficient and -deficient cell
lines found that the exact nature of the repair deficiency also had significant effects
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
on the characteristics of the copy number changes (83). Figure 4 shows the DNA
copy number profiles of one breast cancer cell line and two breast tumors (1).
Clearly, different defects in the mechanisms that maintain genomic stability oc-
curred. Tumors in mouse model systems frequently do not contain large numbers
of informative genome copy number variations unless they were engineered to
have specific genetic defects such as impaired telomeres (65). The wide range of
genomic phenotypes in cancer means that for some sets of specimens array CGH
will provide significant information on the locations of important cancer genes,
whereas in others it will be uninformative. Note that copy number profiles of cell
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
populations reveal past genomic instability that lead to the clonal expansion of a
cell population with a relatively stable genome, at least stable within its selective
environment. Some tumors appear very stable in vivo, with the primary tumors
and recurrences having nearly identical copy number profiles even though there
are many years between them (1, 95). Ongoing genomic instability results in het-
erogeneity that is not detectable by CGH and is best assessed by techniques that
examine individual cells (11).
expression changes at the RNA and protein levels essentially perfectly coupled to
dosage (73). Thus, finding that a gene is always overexpressed when at increased
copy number, and is sometimes overexpressed when not present at increased copy
number, supports its functional role in cancer. Genes that drive copy number gains
may also be altered by mutation (57), so sequencing candidates in tumors with and
without increases may provide critical information. Similarly, particular alleles of
a gene may contribute to tumorigenesis, so finding preferential gain of one variant
may indicate its functional involvement (23).
Evaluating genes in regions of copy number loss is also complex. In some cases
the decrease in expression due to the decrease in copy number is sufficient for the
gene to be significant to the tumor. However, in the classic case of tumor suppressor
genes function is totally abrogated by deletion of all copies of a gene, deletion of one
copy and mutation or epigenetic alteration of the other (104), or alteration of one
copy and replacement of the other by a duplicate of the altered copy, etc. The latter
type of aberration results in loss of heterozygosity but no copy number change,
and is not detectable by array CGH. The developing SNP profiling technologies
may be able to provide additional information concerning these events, perhaps
eventually providing information on heterozygosity and dosage for some types
of specimens (106, 107). Candidate genes within recurrent regions of loss can be
assessed for expression changes and examined to determine if the remaining copies
are mutated or methylated (104), etc. One general approach that has proven useful
to broadly screen for mutated genes in cells in culture employs nonsense-mediated
decay (NMD). If a mutation produces a premature stop codon the transcripts are
rapidly degraded (64). Global comparison of expression levels before and after
inactivating NMD identifies genes whose transcript levels have increased. Those
that are contained within deletions are attractive candidate tumor suppressors (39).
CONCLUSION
Array CGH is one of a growing number of top-down approaches that can provide
comprehensive information about aspects of biological status or function. In the
near term these techniques can provide correlational information that is useful
for clinical applications in oncology and medical genetics, and basic information
on fundamental characteristics of genome structure. The clinical applications in
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
medical genetics are particularly compelling and will be extensive in the immediate
future. Considerable care must be exercised to be sure that false positive indica-
tions of abnormality are kept to acceptable levels and that the aberrations that are
detected are interpreted with appropriate rigor. For the longer term, the combina-
tion of measurements of DNA copy number, RNA and protein expression levels,
mass RNAi screens in model systems, etc. will lead to substantial advances in fun-
damental understanding of biological processes. However, traditional bottom-up
studies of any one small component of a functional pathway always reveal details
that are not addressed by the global approaches. Conversely, focused studies may
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
AUTHORS’ NOTE
Portions of this manuscript are substantially similar to portions of a Perspective
that appeared in Nature Genetics volume 37, May 2005.
LITERATURE CITED
1. Albertson DG. 2003. Profiling breast can- 5. Bejjani BA, Saleki R, Ballif BC, Rorem
cer by array CGH. Breast Cancer Res. EA, Sundin K, et al. 2005. Use of tar-
Treat. 78:289–98 geted array-based CGH for the clinical
2. Albertson DG, Pinkel D. 2003. Genomic diagnosis of chromosomal imbalance: Is
microarrays in human genetic disease and less more? Am. J. Med. Genet. A 134:259–
cancer. Hum. Mol. Genet. 12(2):R145–52 67
3. Albertson DG, Ylstra B, Segraves R, 6. Bocker T, Ruschoff J, Fishel R. 1999.
Collins C, Dairkee SH, et al. 2000. Quan- Molecular diagnostics of cancer predis-
titative mapping of amplicon structure by position: hereditary non-polyposis col-
array CGH identifies CYP24 as a candi- orectal carcinoma and mismatch repair
date oncogene. Nat. Genet. 25:144–46 defects. Biochim. Biophys. Acta 1423:
4. Barrett MT, Scheffer A, Ben-Dor A, Sam- 1–10
pas N, Lipson D, et al. 2004. Comparative 7. Brennan C, Zhang Y, Leo C, Feng B,
genomic hybridization using oligonu- Cauwels C, et al. 2004. High-resolution
cleotide microarrays and total genomic global profiling of genomic alterations
DNA. Proc. Natl. Acad. Sci. USA 101: with long oligonucleotide microarray.
17765–70 Cancer Res. 64:4744–48
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
26. Freedman ML, Reich D, Penney KL, Mc- ning with array CGH delineates regional
Donald GJ, Mignault AA, et al. 2004. alterations in mouse islet carcinomas. Nat.
Assessing the impact of population strat- Genet. 29:459–64
ification on genetic association studies. 35. Hollox EJ, Armour JA, Barber JC. 2003.
Nat. Genet. 36:388–93 Extensive normal copy number varia-
27. Fridlyand J, Snijders AM, Pinkel D, tion of a beta-defensin antimicrobial-gene
Albertson DG, Jain AN. 2004. Hidden cluster. Am. J. Hum. Genet. 73:591–600
Markov models approach to the analy- 36. Hosono S, Faruqi AF, Dean FB, Du Y,
sis of array CGH data. J. Multivar. Anal. Sun Z, et al. 2003. Unbiased whole-
90:132–53 genome amplification directly from clini-
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
28. Fukiya S, Mizoguchi H, Tobe T, Mori cal samples. Genome Res. 13:954–64
H. 2004. Extensive genomic diversity in 37. Hu DG, Webb G, Hussey N. 2004.
pathogenic Escherichia coli and Shigella Aneuploidy detection in single cells us-
strains revealed by comparative genomic ing DNA array-based comparative ge-
hybridization microarray. J. Bacteriol. nomic hybridization. Mol. Hum. Reprod.
186:3911–21 10:283–89
29. Gribble SM, Fiegler H, Burford DC, Prig- 38. Hupe P, Stransky N, Thiery JP, Radvanyi
more E, Yang F, et al. 2004. Applica- F, Barillot E. 2004. Analysis of array CGH
tions of combined DNA microarray and data: from signal ratio to gain and loss
chromosome sorting technologies. Chro- of DNA regions. Bioinformatics 20:3413–
mosome Res. 12:35–43 22
30. Gribble SM, Prigmore E, Burford DC, 39. Huusko P, Ponciano-Jackson D, Wolf
Porter KM, Ng BL, et al. 2005. The M, Kiefer JA, Azorsa DO, et al. 2004.
complex nature of constitutional de novo Nonsense-mediated decay microarray
apparently balanced translocations in analysis identifies mutations of EPHB2
patients presenting with abnormal pheno- in human prostate cancer. Nat. Genet. 36:
types. J. Med. Genet. 42:8–16 979–83
31. Guillaud-Bataille M, Valent A, Soularue 40. Hyman E, Kauraniemi P, Hautaniemi S,
P, Perot C, Inda MM, et al. 2004. Detect- Wolf M, Mousses S, et al. 2002. Impact
ing single DNA copy number variations in of DNA amplification on gene expres-
complex genomes using one nanogram of sion patterns in breast cancer. Cancer Res.
starting DNA and BAC-array CGH. Nu- 62:6240–45
cleic Acids Res. 32:e112 41. Iafrate AJ, Feuk L, Rivera MN, Listewnik
32. Heidenblad M, Lindgren D, Veltman JA, ML, Donahoe PK, et al. 2004. Detec-
Jonson T, Mahlamaki EH, et al. 2005. tion of large-scale variation in the human
Microarray analyses reveal strong influ- genome. Nat. Genet. 36:949–51
ence of DNA copy number alterations 42. Ioannidis JP, Ntzani EE, Trikalinos TA,
on the transcriptional patterns in pancre- Contopoulos-Ioannidis DG. 2001. Repli-
atic cancer: implications for the interpre- cation validity of genetic association stud-
tation of genomic amplifications. Onco- ies. Nat. Genet. 29:306–9
gene 24:1794–801 43. Ishkanian AS, Malloff CA, Watson SK,
33. Hersh MN, Ponder RG, Hastings PJ, DeLeeuw RJ, Chi B, et al. 2004. A tiling
Rosenberg SM. 2004. Adaptive mutation resolution DNA microarray with com-
and amplification in Escherichia coli: two plete coverage of the human genome. Nat.
pathways of genome adaptation under Genet. 36:299–303
stress. Res. Microbiol. 155:352–59 44. Kallioniemi A, Kallioniemi OP, Sudar D,
34. Hodgson G, Hager JH, Volik S, Hariono Rutovitz D, Gray JW, et al. 1992. Compar-
S, Wernick M, et al. 2001. Genome scan- ative genomic hybridization for molecular
20 Aug 2005 11:26 AR AR252-GG06-15.tex XMLPublishSM (2004/02/24) P1: KUV
cytogenetic analysis of solid tumors. Sci- 2003. Large-scale variation among human
ence 258:818–21 and great ape genomes determined by ar-
45. Ki A, Rauen KA, Black LD, Kostiner DR, ray comparative genomic hybridization.
Sandberg PL, et al. 2003. Ring 21 chro- Genome Res. 13:347–57
mosome and a satellited 1p in the same 54. Locke DP, Segraves R, Nicholls RD,
patient: novel origin for an ectopic NOR. Schwartz S, Pinkel D, et al. 2004. BAC
Am. J. Med. Genet. 120A: 365–69 microarray analysis of 15q11–q13 rear-
46. Klein OD, Cotter PD, Albertson DG, rangements and the impact of segmental
Pinkel D, Tidyman WE, et al. 2004. duplications. J. Med. Genet. 41:175–82
Prader-Willi syndrome resulting from an 55. Lucito R, West J, Reiner A, Alexander J,
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
62. Mohapatra G, Moore DH, Kim DH, Gre- comparative genomic hybridization to mi-
wal L, Hyun WC, et al. 1997. Analyses croarrays. Nat. Genet. 20:207–11
of brain tumor cell lines confirm a sim- 71. Pollack JR, Perou CM, Alizadeh AA,
ple model of relationships among fluores- Eisen MB, Pergamenschikov A, et al.
cence in situ hybridization, DNA index, 1999. Genome-wide analysis of DNA
and comparative genomic hybridization. copy-number changes using cDNA mi-
Genes Chromosomes Cancer 20:311–19 croarrays. Nat. Genet. 23:41–46
63. Muleris M, Salmon RJ, Dutrillaux B. 72. Pollack JR, Sorlie T, Perou CM, Rees CA,
1990. Cytogenetics of colorectal adeno- Jeffrey SS, et al. 2002. Microarray anal-
carcinomas. Cancer Genet. Cytogenet. ysis reveals a major direct role of DNA
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
Leo C, et al. 2004. Balanced-PCR ampli- Osterling J, et al. 2002. Presence of large
fication allows unbiased identification of deletions in kindreds with autism. Am. J.
genomic copy changes in minute cell and Hum. Genet. 71:100–15
tissue samples. Nucleic Acids Res. 32:e76 104. Zardo G, Tiirikainen MI, Hong C, Misra
97. Wang G, Maher E, Brennan C, Chin L, A, Feuerstein BG, et al. 2002. Integrated
Leo C, et al. 2004. DNA amplification genomic and epigenomic analyses pin-
method tolerant to sample degradation. point biallelic gene inactivation in tumors.
Genome Res. 14:2357–66 Nat. Genet. 32:453–58
98. Wang NJ, Liu D, Parokonny AS, Scha- 105. Zhang X, Snijders A, Segraves R,
nen NC. 2004. High-resolution molecular Niebuhr A, Albertson D, et al. 2005.
characterization of 15q11–q13 rearrange- High-resolution mapping of genotype-
ments by array comparative genomic hy- phenotype relationships in cri du chat syn-
bridization (array CGH) with detection of drome using array comparative genomic
gene dosage. Am. J. Hum. Genet. 75:267– hybridization. Am. J. Hum. Genet. 76:
81 312–26
99. Wang P, Kim Y, Pollack J, Narasimhan 106. Zhao X, Li C, Paez JG, Chin K, Janne PA,
B, Tibshirani R. 2005. A method for call- et al. 2004. An integrated view of copy
ing gains and losses in array CGH data. number and allelic alterations in the can-
Biostatistics 6:45–58 cer genome using single nucleotide poly-
100. Watanabe T, Murata Y, Oka S, Iwa- morphism arrays. Cancer Res. 64:3060–
hashi H. 2004. A new approach to species 71
determination for yeast strains: DNA 107. Zhou X, Mok SC, Chen Z, Li Y, Wong
microarray-based comparative genomic DT. 2004. Concurrent analysis of loss
hybridization using a yeast DNA microar- of heterozygosity (LOH) and copy num-
ray with 6000 genes. Yeast 21:351–65 ber abnormality (CNA) for oral premalig-
101. Weiss MM, Kuipers EJ, Postma C, Sni- nancy progression using the Affymetrix
jders AM, Pinkel D, et al. 2004. Genomic 10K SNP mapping array. Hum. Genet.
alterations in primary gastric adenocarci- 115:327–30
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
HI-RES-GG06-15-Pinkel.qxd
ARRAY CGH
C-1
HI-RES-GG06-15-Pinkel.qxd 8/20/05 12:12 PM Page 2
Figure 1 (a) Array comparative genomic hybridization (CGH). Genomic DNA from two
cell populations is differentially labeled and hybridized to a microarray. The fluorescent
signal intensity ratios measured at each array spot are normalized so that the median
log2ratio is 0. Plotting of the data for chromosome 9 from pter to qter shows that most
elements have a ratio near 0. The two elements nearest pter have a ratio near –1, indicating
a reduction of a factor of two in copy number. Fluorescent in situ hybridization (FISH) with
a red-labeled probe for the deleted region and a green-labeled control probe (genome loca-
tions indicated by the red and green arrows on the ratio profile) shows that the cells con-
tain two copies of the green probe and only one for the red, consistent with the array CGH
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
analysis (84). (b) The difficulty of array CGH analysis varies significantly among different
applications. It is much easier to detect the large increase in copy number associated with
amplification of a genomic region than it is to detect single-copy gains or losses.
Aberrations affecting an extended genomic region including multiple array elements are
easier to detect than focal events. Measurements on cell lines are the least difficult because
isolation of high-quality DNA is straightforward and the genomes are relatively homoge-
neous. Fresh or frozen tumor tissue presents additional challenges due to possible tissue-
specific factors, and the potential for genomic heterogeneity in a tumor and/or inclusion of
normal cells. Measurements on formalin-fixed paraffin-embedded tissue present the great-
est challenges. Research studies aimed at profiling a group of tumor specimens that have a
large number of highly recurrent aberrations can be informative even if a considerable
number of errors are made in each tumor. In contrast, detection of aberrations that are both
rare and small, and of course clinical applications, present challenging specificity and sen-
sitivity requirements.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
HI-RES-GG06-15-Pinkel.qxd
ARRAY CGH
C-3
HI-RES-GG06-15-Pinkel.qxd 8/20/05 12:12 PM Page 4
array elements decreases in importance as the magnitude of the copy number change
increases, so if boundaries of amplified regions are abrupt they can be determined
even if the measurements are very noisy. Note that the data indicate that one of the
BAC clones is partially contained in the deletion, and that this might cause the slight-
ly reduced ratio on this clone. Thus, a tiling path of BAC clones could potentially
map the position of the copy number transition to a fraction of the length of a clone
(3). (Unpublished data courtesy of R. Redon.)
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
HI-RES-GG06-15-Pinkel.qxd
8/20/05
ARRAY CGH
C-5
HI-RES-GG06-15-Pinkel.qxd 8/20/05 12:12 PM Page 6
Figure 4 Copy number profiles of breast cancer cell lines and primary tumors show
characteristic differences. The upper panel shows the BRCA1 mutant cell line HCC
1937 (1). Portions of almost every chromosome are present at different copy
numbers. The genome is so variable that the biological significance of the ratio nor-
malization is not clear. The middle and lower panels show profiles from two ductal
invasive breast tumors. These have large regions of the genome at the same copy
number, which provides a reference level from which to measure gains and losses.
The tumor in the middle panel contains only a few single-copy changes, whereas the
one in the lower panel has one region of high-level amplification in addition to lower
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
CONTENTS
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
Annu. Rev. Genom. Human Genet. 2005.6:331-354. Downloaded from www.annualreviews.org
v
P1: KUV
July 27, 2005 13:16 Annual Reviews AR252-FM
vi CONTENTS
INDEXES
Subject Index 431
Cumulative Index of Contributing Authors, Volumes 1–6 453
by Stanford University - Main Campus - Lane Medical Library on 06/19/13. For personal use only.
ERRATA
An online log of corrections to Annual Review of Genomics
and Human Genetics chapters may be found
at http://genom.annualreviews.org/