Clsi MM12 P

MM12-P
Vol. 25 No. 18
Diagnostic Nucleic Acid Microarrays;

Proposed Guideline
PLEASE
TTT
TT
T
This proposed document is published for wide and thorough review in the new, accelerated
Clinical and Laboratory Standards Institute (CLSI) consensus-review process. The document will
undergo concurrent consensus review, Board review, and delegate voting (i.e., candidate for
advancement) for 90 days.
Please send your comments on scope, approach, and technical and editorial content to CLSI.
Comment period ends
11 October 2005
The subcommittee responsible for this document will assess all comments received by the end of
the comment period. Based on this assessment, a new version of the document will be issued.
Readers are encouraged to send their comments to Clinical and Laboratory Standards Institute, 940
West Valley Road, Suite 1400, Wayne, PA 19087-1898 USA; Fax: +610.688.0700, or to the
following e-mail address: customerservice@clsi.org
S
SS
SSS
COMMENT
This guideline provides recommendations for many aspects of the array process
including: a method overview; nucleic acid extraction; the preparation, handling, and
assessment of genetic material; quality control; analytic validation; and interpretation and
reporting of results.
A guideline for global application developed through the Clinical and Laboratory
Standards Institute consensus process.
Clinical and Laboratory Standards Institute
Providing NCCLS standards and guidelines, ISO/TC 212 standards, and ISO/TC 76 standards
The Clinical and Laboratory Standards Institute (CLSI, Most documents are subject to two levels of consensus—
formerly NCCLS) is an international, interdisciplinary, “proposed” and “approved.” Depending on the need for
nonprofit, standards-developing, and educational field evaluation or data collection, documents may also be
organization that promotes the development and use of made available for review at an intermediate consensus
voluntary consensus standards and guidelines within the level.
healthcare community. It is recognized worldwide for the
Proposed A consensus document undergoes the first stage
application of its unique consensus process in the
of review by the healthcare community as a proposed
development of standards and guidelines for patient
standard or guideline. The document should receive a wide
testing and related healthcare issues. Our process is
and thorough technical review, including an overall review
based on the principle that consensus is an effective and
of its scope, approach, and utility, and a line-by-line review
cost-effective way to improve patient testing and
of its technical and editorial content.
healthcare services.
Approved An approved standard or guideline has achieved
In addition to developing and promoting the use of
consensus within the healthcare community. It should be
voluntary consensus standards and guidelines, we
reviewed to assess the utility of the final document, to
provide an open and unbiased forum to address critical
ensure attainment of consensus (i.e., that comments on
issues affecting the quality of patient testing and health
earlier versions have been satisfactorily addressed), and to
care.
identify the need for additional consensus documents.
PUBLICATIONS
Our standards and guidelines represent a consensus opinion
A document is published as a standard, guideline, or on good practices and reflect the substantial agreement by
committee report. materially affected, competent, and interested parties
obtained by following CLSI’s established consensus
Standard A document developed through the consensus
procedures. Provisions in CLSI standards and guidelines
process that clearly identifies specific, essential
may be more or less stringent than applicable regulations.
requirements for materials, methods, or practices for use
Consequently, conformance to this voluntary consensus
in an unmodified form. A standard may, in addition,
document does not relieve the user of responsibility for
contain discretionary elements, which are clearly
compliance with applicable regulations.
identified.
COMMENTS
Guideline A document developed through the
consensus process describing criteria for a general The comments of users are essential to the consensus
operating practice, procedure, or material for voluntary process. Anyone may submit a comment, and all comments
use. A guideline may be used as written or modified by are addressed, according to the consensus process, by the
the user to fit specific needs. committee that wrote the document. All comments,
including those that result in a change to the document when
Report A document that has not been subjected to
published at the next consensus level and those that do not
consensus review and is released by the Board of
result in a change, are responded to by the committee in an
Directors.
appendix to the document. Readers are strongly encouraged
CONSENSUS PROCESS to comment in any form and at any time on any document.
Address comments to the Clinical and Laboratory Standards
The CLSI voluntary consensus process is a protocol Institute, 940 West Valley Road, Suite 1400, Wayne, PA
establishing formal criteria for: 19087, USA.
• the authorization of a project VOLUNTEER PARTICIPATION
• the development and open review of documents Healthcare professionals in all specialties are urged to
• the revision of documents in response to comments volunteer for participation in CLSI projects. Please contact
by users us at customerservice@clsi.org or +610.688.0100 for
additional information on committee participation.
• the acceptance of a document as a consensus
standard or guideline.
MM12-P
ISBN 1-56238-578-X
Volume 25 Number 18 ISSN 0273-3099
Diagnostic Nucleic Acid Microarrays; Proposed Guideline
Joseph L. Hackett, PhD
Kellie J. Archer, PhD
Adolfas K. Gaigalas, PhD
Carleton T. Garrett, MD, PhD
Loren J. Joseph, MD
Walter H. Koch, PhD
L. J. Kricka, D Phil
Ronald C. McGlennen, MD
Vivianna Van Deerlin, MD, PhD
Gregory B. Vasquez, PhD
Abstract
Clinical and Laboratory Standards Institute document MM12-P, Diagnostic Nucleic Acid Microarrays; Proposed Guideline
provides general recommendations for the operation of diagnostic nucleic acid microarrays. The recommendations cover nucleic
acid extraction; preparation, handling, and assessment of genetic material; and interpretation and reporting of results. The
guideline addresses array-based detection of variations in DNA sequence and gene expression analysis as it relates to: heritable
variations, somatic changes, methylation profiling, pathogen profiling including antibiotic resistance analysis, expression
profiling, and gene dosage/comparative genomic hybridization.
Clinical and Laboratory Standards Institute (CLSI). Diagnostic Nucleic Acid Microarrays; Proposed Guideline. CLSI document
MM12-P (ISBN 1-56238-578-X). Clinical and Laboratory Standards Institute, 940 West Valley Road, Suite 1400, Wayne,
Pennsylvania 19087-1898 USA, 2005.
The Clinical and Laboratory Standards Institute consensus process, which is the mechanism for moving a document through
two or more levels of review by the healthcare community, is an ongoing process. Users should expect revised editions of any
given document. Because rapid changes in technology may affect the procedures, methods, and protocols in a standard or
guideline, users should replace outdated editions with the current editions of CLSI/NCCLS documents. Current editions are
listed in the CLSI catalog, which is distributed to member organizations, and to nonmembers on request. If your organization is
not a member and would like to become one, and to request a copy of the catalog, contact us at: Telephone: 610.688.0100; Fax:
610.688.0700; E-Mail: customerservice@clsi.org; Website: www.clsi.org
Number 18 MM12-P
This publication is protected by copyright. No part of it may be reproduced, stored in a retrieval system,
transmitted, or made available in any form or by any means (electronic, mechanical, photocopying,
recording, or otherwise) without prior written permission from Clinical and Laboratory Standards
Institute, except as stated below.
Clinical and Laboratory Standards Institute hereby grants permission to reproduce limited portions of this
publication for use in laboratory procedure manuals at a single site, for interlibrary loan, or for use in
educational programs provided that multiple copies of such reproduction shall include the following
notice, be distributed without charge, and, in no event, contain more than 20% of the document’s text.
Reproduced with permission, from CLSI publication MM12-P—Diagnostic Nucleic Acid

Microarrays; Proposed Guideline (ISBN 1-56238-578-X). Copies of the current edition
may be obtained from Clinical and Laboratory Standards Institute, 940 West Valley
Road, Suite 1400, Wayne, Pennsylvania 19087-1898, USA.
Permission to reproduce or otherwise use the text of this document to an extent that exceeds the
exemptions granted here or under the Copyright Law must be obtained from Clinical and Laboratory
Standards Institute by written request. To request such permission, address inquiries to the Executive Vice
President, Clinical and Laboratory Standards Institute, 940 West Valley Road, Suite 1400, Wayne,
Pennsylvania 19087-1898, USA.
Copyright ©2005. Clinical and Laboratory Standards Institute.
Suggested Citation
(Clinical and Laboratory Standards Institute. Diagnostic Nucleic Acid Microarrays; Proposed Guideline.
CLSI document MM12-P [ISBN 1-56238-578-X]. Clinical and Laboratory Standards Institute, 940 West
Valley Road, Suite 1400, Wayne, Pennsylvania 19087-1898 USA, 2005.)
Proposed Guideline
July 2005
ISBN 1-56238-578-X
ISSN 0273-3099
ii
Volume 25 MM12-P
Committee Membership
Area Committee on Molecular Methods

Roberta M. Madej, MS, MT Uwe Scherf, PhD Leslie Hall, MMSc
Chairholder FDA Center for Devices and Radiological Mayo Clinic
Roche Molecular Systems, Inc. Health Rochester, Minnesota
Pleasanton, California Rockville, Maryland
Robert B. Jenkins, MD, PhD
Zhimin Cao, MD, PhD Michael A. Zoccoli, PhD Mayo Clinic
New York State Dept. of Health Celera Diagnostics Rochester, Minnesota
Albany, New York Alameda, California
Alan L. Landay, PhD
Maurizio Ferrari, MD Advisors Rush-Presby.-St. Lukes Medical Center
International Federation of Clinical Chicago, Illinois
Chemistry Dale H. Altmiller, PhD
Milan, Italy O.U. Medical Center Mario Pazzagli, PhD
Edmond, Oklahoma University of Florence
Frederick S. Nolte, PhD Florence, Italy
Emory University Hospital Lee Ann Baxter-Lowe, PhD
Atlanta, Georgia University of California, San Francisco Richard S. Schifreen, PhD, DABCC
San Francisco, California Promega Corporation
Timothy J. O’Leary, MD, PhD Madison, Wisconsin
Biomedical Laboratory Research and Mark Evans, PhD
Development Service Department of American Medical Association Laurina O. Williams, PhD, MPH
Veterans Affairs Chicago, Illinois Centers for Disease Control and
Washington, D.C. Prevention
Cristina Gianchetti, PhD Atlanta, Georgia
Carolyn Sue Richards, PhD, FACMG Gen-Probe
Oregon Health Sciences University San Diego, California Janet L. Wood, MT(ASCP)
Portland, Oregon BD Diagnostic Systems
Sparks, Maryland
Subcommittee on Molecular Methods for Microarrays

Joseph L. Hackett, PhD Gregory B. Vasquez, PhD Sangeeta M. Rataul, PhD
Chairholder National Institute of Standards and U.S. Food and Drug Administration
FDA Center for Technology Bothell, Washington
Devices/Radiological Health Gaithersburg, Maryland
Rockville, Maryland Robert F. Vogt, Jr., PhD
Advisors Centers for Disease Control and
Kellie J. Archer, PhD Prevention
Medical College of Virginia Victoria Derbyshire, PhD Atlanta, Georgia
Virginia Commonwealth University Wadsworth Center
Richmond, Virginia Albany, New York Staff
Walter H. Koch, PhD Adolfas K. Gaigalas, PhD Clinical and Laboratory Standards
Roche Molecular Systems National Institute of Standards and Institute
Alameda, California Technology Wayne, Pennsylvania
Gaithersburg, Maryland
Larry J. Kricka, D Phil Lois M. Schmidt, DA
Hospital of the University of Carleton T. Garrett, MD, PhD Staff Liaison
Pennsylvania Virginia Commonwealth University
Philadelphia, Pennsylvania Health Systems Donna M. Wilhelm
Richmond, Virginia Editor
Ronald C. McGlennen, MD
University of Minnesota Federico A. Monzon, MD Melissa A. Lewis
Minneapolis, Minnesota University of Pittsburgh School of Assistant Editor
Medicine
Vivianna Van Deerlin, MD, PhD Pittsburgh, Pennsylvania John J. Zlockie, MBA
University of Pennsylvania Health Vice President, Standards
System
Philadelphia, Pennsylvania
iii
Number 18 MM12-P
Acknowledgement
This guideline was prepared by CLSI, as part of a cooperative effort with IFCC to work toward the
advancement and dissemination of laboratory standards on a worldwide basis. CLSI gratefully
acknowledges the participation of IFCC in this project. The IFCC expert for this project is Dr. Maurizio
Ferrari.
The Subcommittee on Molecular Methods for Microarrays would like to recognize Daniel Pinkel, PhD,
Randy Davis, MBA, and Donna Albertson, PhD, University of California San Francisco Comprehensive
Cancer Center, for their contribution of Section 7.5, Detection of Gene Dosage Abnormalities Using
Comparative Genomic Hybridization.
iv
Volume 25 MM12-P
Contents
Abstract ....................................................................................................................................................i
Committee Membership........................................................................................................................ iii
Foreword .............................................................................................................................................. vii
1 Scope..........................................................................................................................................1
2 Introduction................................................................................................................................1
2.1 Diagnostic Microarrays.................................................................................................1
2.2 Diagnostic Utility..........................................................................................................1
2.3 Advantages and Disadvantages.....................................................................................2
2.4 Ethical, Legal, and Social Considerations ....................................................................2
2.5 Special Issues for Application of Microarray Technologies to Diagnosis....................2
3 Standard Precautions..................................................................................................................3
4 Terminology...............................................................................................................................3
4.1 Definitions ....................................................................................................................3
4.2 Acronyms/Abbreviations ..............................................................................................9
5 Method Overview ....................................................................................................................10
5.1 Solid Supports.............................................................................................................10
5.2 Probe Synthesis and Attachment to Support...............................................................11
5.3 Signal Generation and Detection ................................................................................14
6 Analytical.................................................................................................................................16
6.1 Nucleic Acid Extraction..............................................................................................16
6.2 Gene Chemistry ..........................................................................................................19
6.3 Hybridization ..............................................................................................................24
6.4 Posthybridization ........................................................................................................33
6.5 Signal Generation and Detection ................................................................................34
6.6 Laboratory-developed (“Home-Brew”) Microarray Assay ........................................35
7 Genetic Data Analysis..............................................................................................................42
7.1 Data Elements .............................................................................................................42
7.2 Heritable Changes.......................................................................................................44
7.3 Methylation Analysis..................................................................................................45
7.4 Pathogen Profiling ......................................................................................................46
7.5 Detection of Gene Dosage Abnormalities Using Comparative Genomic
Hybridization ..............................................................................................................47
8 Gene Expression Data Analysis...............................................................................................53
8.1 Overview.....................................................................................................................53
8.2 Data Elements .............................................................................................................54
8.3 Low-Level Analysis....................................................................................................55
8.4 Gene Filtering and Identification of Differentially Expressed Genes.........................58
8.5 High-Level Data Analysis – Unsupervised Learning Algorithms ..............................58
8.6 High-Level Data Analysis – Supervised Learning and Classification Procedures .....60
v
Number 18 MM12-P
Contents (Continued)
9 Validation.................................................................................................................................63
9.1 Analytical Validation ..................................................................................................64
9.2 Diagnostic Utility........................................................................................................65
10 Quality Control/Quality Assurance..........................................................................................82
10.1 Preanalytical Considerations.......................................................................................83
10.2 Analytical Phases ........................................................................................................84
10.3 Global QC Issues ........................................................................................................89
10.4 Reporting ....................................................................................................................90
10.5 Quality Assurance (QA) .............................................................................................91
References.............................................................................................................................................92
The Quality System Approach............................................................................................................104
Related CLSI/NCCLS Publications ....................................................................................................105
vi
Volume 25 MM12-P
Foreword
Molecular genetics has now become firmly entrenched as the third major subdiscipline of clinical
laboratory medical genetics, emerging more recently than the other subspecialities, biochemical genetics
and cytogenetics. Just as with any diagnostic method or test, in order to fully benefit the patient, it must
be developed and practiced under appropriate conditions. The purpose of this guideline is to define
conditions and principles, which will optimize the provision of accurate molecular information.
In producing MM12-P, the intention of the CLSI Subcommittee on Molecular Methods for Microarrays
was to reach a consensus so an approved guideline can be distributed to laboratories that use molecular
diagnostic tests. The subcommittee also intends the document to be broad in perspective and an
educational resource for molecular genetics.
Key Words
Amplification, gene, genetic disease, molecular diagnostic test, mutation detection, nucleic acid, Southern
blot
Invitation for Participation in the Consensus Process
An important aspect of the development of this and all Clinical and Laboratory Standards Institute
documents should be emphasized, and that is the consensus process. Within the context and operation of
Clinical and Laboratory Standards Institute, the term “consensus” means more than agreement. In the
context of document development, “consensus” is a process by which Clinical and Laboratory Standards
Institute, its members, and interested parties (1) have the opportunity to review and to comment on any
Clinical and Laboratory Standards Institute publication; and (2) are assured that their comments will be
given serious, competent consideration. Any Clinical and Laboratory Standards Institute document will
evolve as will technology affecting laboratory or healthcare procedures, methods, and protocols; and
therefore, is expected to undergo cycles of evaluation and modification.
The Area Committee on Molecular Methods has attempted to engage the broadest possible worldwide
representation in committee deliberations. Consequently, it is reasonable to expect that issues remain
unresolved at the time of publication at the proposed level. The review and comment process is the
mechanism for resolving such issues.
The Clinical and Laboratory Standards Institute voluntary consensus process is dependent upon the
expertise of worldwide reviewers whose comments add value to the effort. At the end of a 90-day
comment period, each subcommittee is obligated to review all comments and to respond in writing to all
which are substantive. Where appropriate, modifications will be made to the document, and all comments
along with the subcommittee's responses will be included as an appendix to the document when it is
published at the next consensus level.
vii
Number 18 MM12-P
viii
Volume 25 MM12-P
Diagnostic Nucleic Acid Microarrays; Proposed Guideline
1 Scope
MM12—Diagnostic Nucleic Acid Microarrays addresses array-based detection of variations in DNA
sequence and gene expression analysis as it relates to:
• heritable variations;
• somatic changes;
• methylation profiling;
• pathogen profiling including antibiotic resistance analysis;
• expression profiling; and
• gene dosage/comparative genomic hybridization (CGH).
This guideline provides recommendations for many aspects of the microarray process, including: a
method overview; nucleic acid extraction; the preparation, handling, and assessment of genetic material;
and interpretation and reporting of results. Quality control and analytic validation are also addressed.
This guideline is limited to clinically relevant targets and does not address tissue and protein microarrays,
non-nucleic acid microarrays, or research applications of microarrays.
2 Introduction
2.1 Diagnostic Microarrays
Diagnostic nucleic acid microarrays are a relatively recent outgrowth of more traditional molecular
diagnostic methods, and have the potential to allow rapid, simultaneous genetic testing of individuals for
multiple traits (e.g., polymorphisms, haplotypes) or multiple different mutations in a single disease gene.
Nucleic acid-based microarrays also have diagnostic potential for identification of infectious disease
organisms from a variety of sample matrices. Other configurations of microarrays enable comparative
surveys of gene expression in selected tissues and samples, resulting in diagnostic and response
predictions. These devices represent multiplexed analysis of clinical specimens and have unique
manufacturing and quality control concerns, as well as analytical and clinical validation differences and
novel interpretation algorithms when compared to simple unitary tests.
2.2 Diagnostic Utility
The usefulness of microarrays for diagnostic applications can be traced to advances in the identification
of disease genes for a number of genetic diseases and susceptibilities, and to the increased knowledge of
transcriptional loci provided by the sequencing and analysis of the human genome, as well as
development of molecular signatures for identification of disease-causing organisms. It is now possible to
rapidly test for specific mutations, polymorphisms, and gene expression patterns that may direct medical
management, prophylaxis, and treatment for any number of conditions. These tests may complement or
supplant more traditional diagnostic methods, and in some cases, may be the only available approach for
diagnosis. The utility of diagnostic microarrays encompasses predicting disease susceptibility, identifying
pathological organisms, screening for carriers of recessive traits, performing prenatal diagnosis, making
earlier or more reliable diagnosis of cancer, classifying tissues and tumors by molecular signature, and
making treatment decisions based on polymorphic markers of response and toxicity.
©
Clinical and Laboratory Standards Institute. All rights reserved. 1
Number 18 MM12-P
2.3 Advantages and Disadvantages
Nucleic acid-based tests may represent the most fundamental and definitive approach to the diagnosis of
both hereditary and somatic genetic alterations. These diseases are by definition a result of lesions in an
individual’s DNA, whether or not the disease is inherited. Tests for heritable diseases may usually be
performed on any tissue, including peripheral blood, while those tests that detect somatic lesions
generally require the sampling of the involved tissue (e.g., a tumor). With evolving technology, some tests
for somatic changes can now even be performed on peripheral blood as well, eliminating the need for
invasive procedures such as biopsy. The ability to amplify target genes, loci, or even whole genomes or
transcriptomes allows for the use of minimal sample at sufficient sensitivity for diagnosis. Advances in
bioinformatics and algorithm design can enable division of a single diagnostic entity into two or more
subsets with different outcome and therapy implications. Some configurations of microarrays allow
multiple independent tests to be run simultaneously, resulting in conservation of sample and uniformity of
test conditions. Identification, subtyping, and fingerprinting of pathogenic organisms may be
accomplished more rapidly using microarray platforms than when more traditional methods are
employed.
Microarray technologies for diagnosis, however, may yield results of increased complexity over
traditional tests, which argues for increased caution in interpretation and application to medical decision-
making. The potential, in the case of inherited disease, to reveal fundamental (negative) information about
a patient’s own genetic makeup, and that of blood relatives as well, many of whom may not recognize
that they are at risk, is significant and raises special concern over who should and should not have access
to the genetic result. Inherent difficulties are encountered in attempting to quality control certain
microarray configurations, which could lead to unexpected and undesirable results. In the case of
pathogen identification, the need for a priori knowledge of strain DNA changes, and appropriate selection
of organisms represented on an array, can limit usefulness if sufficient care is not taken in designing and
using microarrays for microbiological applications.
2.4 Ethical, Legal, and Social Considerations1
Detection of molecular lesions in inherited diseases has certain unique ethical, legal, and social
implications about which much discussion has already occurred. Ethical and social problems of note are
the potential for discrimination, stigmatization, and anxiety, and impact reproductive decisions and family
dynamics. Also problematic are diseases that can be diagnosed with molecular methods, but for which no
treatment can be offered. Careful consideration of whom to test, and how to transmit the results, is called
for at the earliest possible stages of the development process.
Diagnosis or classification of diseases resulting from somatic mutations is less susceptible to ethical,
legal, and social problems, although the medical decisions made regarding whom to treat, and what
treatment options are available, may have ethical overtones. Already, treatment with some anticancer
therapies is restricted to those who express certain markers at particular levels. “Tailored” therapies are
likely to become more and more common, and decisions to treat or not to treat will be necessarily more
ethically demanding.
2.5 Special Issues for Application of Microarray Technologies to Diagnosis
An essential feature of microarrays is that multiple analytes are queried in a single assay. The verification
issues of this particularity are several. First, the manufacturer must assure the identity of each microarray
element (or feature) and assure its reproducible placement on the microarray substrate. The analytical
verification will demonstrate that the test adequately detects what it is supposed to detect. This can be
relatively straightforward in the case of microarrays designed to detect known mutations or
polymorphisms, or complex, in the case of microarrays designed to make comparative assessments of
gene expression levels. Verification of diagnostic utility may likewise be straightforward
2 ©
Clinical and Laboratory Standards Institute. All rights reserved.
Volume 25 MM12-P
(mutation/polymorphism/signature is present or absent when disease or organism is present, or vice versa)

or complex (multiple, possibly nonoverlapping patterns of expression define diagnosis or therapy
choices).
Reproducibility of the result from microarray to microarray, laboatory to laboratory, etc. must be
established, and is itself dependent not only on microarray quality, but on sample preparation parameters
that should be defined carefully. Complex algorithms used to create clusters or hierarchies must be
carefully evaluated to ensure that samples are routinely assigned to the correct bin, and to ensure that
there are no statistical flaws that would give false results. The microarrays must be “read” using a
configuration of detector that does not introduce error or bias depending on the position or intensity of the
detected feature. All of these verification parameters must be satisfied for the development of a useful
nucleic acid microarray diagnostic.
3 Standard Precautions
Because it is often impossible to know what might be infectious, all patient and laboratory specimens are
treated as infectious and handled according to “standard precautions.” Standard precautions are guidelines
that combine the major features of “universal precautions and body substance isolation” practices.
Standard precautions cover the transmission of all infectious agents and therefore are more
comprehensive than universal precautions which are intended to apply only to transmission of blood-
borne pathogens. Standard and universal precaution guidelines are available from the U.S. Centers for
Disease Control and Prevention (Guideline for Isolation Precautions in Hospitals. Infection Control and
Hospital Epidemiology. CDC. 1996;17(1):53-80 and MMWR 1988;37:377-388). For specific precautions
for preventing the laboratory transmission of all infectious agents from laboratory instruments and
materials and for recommendations for the management of exposure to all infectious disease, refer to the
most current version of CLSI document M29—Protection of Laboratory Workers from Occupationally
Acquired Infections.
4 Terminology
4.1 Definitions
In this document, the following definitions of terms are used:
accuracy (of measurement) – closeness of the agreement between the result of a measurement and a true
value of the measurand (VIM93)2; NOTE: See the definition of measurand, below.
allele – 1) one of the alternative forms of a gene that may occupy a given locus; 2) one of the alternate
forms of a polymorphic DNA sequence that is not necessarily contained within a gene.
allele-specific oligonucleotide (ASO) – a nucleic acid probe of short length, exactly complementary to
either the normal or one of the mutant sequences of a target gene region, most often used for the detection
of point mutations.
amplification – the enzymatic replication in vitro of a target nucleic acid; NOTE: The polymerase chain
reaction (PCR) is a commonly used method of amplification.
analyte – component represented in the name of a measurable quantity; NOTE 1: In the type of quantity
“mass of protein in 24-hour urine,” “protein” is the analyte. In “amount of substance of glucose in
plasma,” “glucose” is the analyte. In both cases, the long phrase represents the measurand (ISO 17511)3;
NOTE 2: In the type of quantity “catalytic concentration of lactate dehydrogenase isoenzyme 1 in
plasma,” “lactate dehydrogenase isoenzyme 1” is the analyte (ISO 18153).4
©
Number 18 MM12-P
analytical specificity – the ability of a measurement procedure to measure solely the measurand; NOTE:
The ability of a measurement procedure to distinguish the target sequence(s) or allele or mutation(s) from
other sequences/alleles in the sample or genome.
annealing – the hybridization of two complementary strands of nucleic acid, as in the hybridization of a
probe with the target DNA.
bias – the difference between the expectation of the test results and an accepted reference value (ISO
3534-1)5; NOTE: See the definition of trueness, below.
calibration – set of operations that establish, under specified conditions, the relationship between values
of quantities indicated by a measuring instrument or measuring system, or values represented by a
material measure or a reference material, and the corresponding values realized by standards (VIM93)2;
NOTE: According to the U.S. Code of Federal Regulations, calibration is the process of testing and
adjustment of an instrument, kit, or test system, to provide a known relationship between the
measurement response and the value of the substance being measured by the test procedure (42 CFR
493.1217).6
complementary – describing the property of two strands of nucleic acid that can hybridize by specific-
base pairing between the nucleotides.
control//control material – a device, solution, or lyophilized preparation intended for use in the quality
control process; NOTE 1: The expected reaction or concentration of analytes of interest are known within
limits ascertained during preparation and confirmed in use; NOTE 2: Control materials are generally not
used for calibration in the same process in which they are used as controls.
denaturation – the conversion of double-stranded DNA or RNA to a single-stranded state with minimal
secondary structure; NOTE: This is done by heating, increasing the pH, or adding agents such as
formamide or urea; once denatured, nucleic acid molecules are available for hybridization with a primer
or probe.
detection limit – the lowest concentration of analyte that can be reported to be present at a specified
level of confidence, often taken to be the analyte concentration which reports a signal three standard
deviations above the background.
diagnostic sensitivity – the proportion of patients with a well-defined clinical disorder whose test values
are positive or exceed a defined decision limit (i.e., a positive result and identification of the patients who
have a disease); NOTE 1: The clinical disorder must be defined by criteria independent of the test under
consideration; See also true-positive ratio; NOTE 2: The clinical disorder must be defined by criteria
independent of the test under consideration; NOTE 3: The European term diagnostic sensitivity is
equivalent to the U.S. term clinical sensitivity; NOTE 4: In Europe, the term “clinical” applies mostly to
clinical studies of drugs, under much more stringent conditions.
diagnostic specificity – the proportion of subjects who do not have a specified clinical disorder whose
test results are negative or within the defined decision limit; NOTE 1: It is the fraction of clinically true
negative classifications divided by the sum of clinically true-negative plus clinically false-positive
classifications; NOTE 2: This term is equivalent to the U.S. term “clinical specificity”; NOTE 3: In
Europe, the term “clinical” applies mostly to clinical studies of drugs, under much more stringent
conditions.
diagnostic test – a measurement or examination of a diagnostic specimen for the purpose of diagnosis,
prevention, or treatment of any disease, or the assessment of health or impairment of health of an
individual patient; NOTE: Laboratory tests are often called “in vitro diagnostic tests.”
4 ©
Volume 25 MM12-P
diagnostic/confirmatory testing – testing generally performed to evaluate the genetic status of

individuals at increased risk for a particular disorder due to a positive family history or symptoms.
equivocal result – a test result within a specified range of the cut-off value, which cannot be interpreted
as either positive or negative; NOTE: In molecular genetics, equivocal test results may complicate risk
interpretations.
error (of measurement)//measurement error – the result of a measurement less a true value (or
accepted reference value) of the measurand (VIM93).2
false-negative rate – the rate needed for calculating negative predictive value incorporating prevalence.
false-negative ratio (FNR) – the ratio of subjects who have the disease, but who have a negative test
result (FN), to all subjects who have the disease (FN + True positives); FNR = FN/(FN + TP).
false-negative result (FN) – a negative test result for a patient or specimen who is positive for the
condition or constituent in question.
false-positive rate – the rate needed for calculating positive predictive value incorporating prevalence.
false-positive ratio (FPR) – the ratio of subjects who do not have the disease, but who have a positive
test result (FP), to all subjects who do not have the disease (FP + True negatives); FP/(FP + TN).
false-positive result//false positive (FP) – a positive test result for a patient or specimen that is negative
for the condition or constituent in question.
feature – a defined segment of single stranded nucleic acid immobilized on a solid substrate (i.e.,
microarray) that is used to identify specific DNA or RNA molecules having a complementary sequence.
fiducial – refers to the use of geographic markings on a microarray that permit the orientation of that
device as to left, right, up, and down; NOTE: Fiducial markings are typically fluorochrome labeled
oligonucleotides comprised of irrelevant sequence; such markers do not hybridize to the target sequence,
and are therefore detected as fluorescent spots on the microarray independent of the controls or of the test
samples.
gel electrophoresis – separation of molecules in an electric field within a matrix of agarose or

polyacrylamide according to size and charge.
genotype – 1) the genetic makeup of an organism, or group of organisms, with reference to a single trait,
set of traits, or an entire complex of traits; 2) the specific allelic composition of a gene, or set of genes,
established at the DNA level; NOTE: See also phenotype.
hybridization – base pairing of complementary strands of nucleic acid by hydrogen bond formation, the
binding of probe to specific nucleic acid sequences, or amplification products; NOTE: Hybridization can
be performed with both nucleic acid target and probe in solution, or with either one bound to a solid
support such as a microtiter plate or glass.
imprecision – dispersion of independent results of measurements obtained under specified conditions;

NOTE: It is expressed numerically as standard deviation or coefficient of variation.
inaccuracy – the numerical difference between a value and the true value; NOTE: See accuracy.
©
Number 18 MM12-P
linkage analysis – the assessment of genetic disease or carrier risk by observation of the cosegregation of
the disease phenotype with one or more polymorphic DNA markers; NOTE: Most often used as the
polymorphic markers are RFLPs (restriction fragment length polymorphisms) and VNTRs (variable
number of tandem repeats) and/or STRs (short tandem repeats).
locus – 1) the position of a gene on a chromosome; NOTE: Different forms (alleles) of the gene may
occupy the locus; 2) the position on a chromosome of a DNA sequence that is not necessarily contained
within a gene.
measurand – particular quantity subject to measurement (VIM93)2; NOTE: This term and definition
encompass all quantities, while the commonly used term “analyte” refers to a tangible entity subject to
measurement; for example, “substance” concentration is a quantity that may be related to a particular
analyte.
measurement procedure – set of operations, described specifically, used in the performance of particular
measurements according to a given method (VIM93).2
mismatch – hybridization of two DNA or RNA strands that are less than 100% complementary.
negative predictive value – the likelihood that an individual with a negative test does not have the
disease, or other characteristic that the test is designed to detect.
northern blot – a solid phase membrane to which RNA, transferred from a gel after electrophoresis, is
bound so that it can be hybridized with a labeled nucleic acid probe.
phenotype – the observed biochemical, physiological, and/or morphological characteristics of an

individual, as determined by the genotype and the environment in which it is expressed; also, in a more
limited sense, the expression of some particular gene or genes.
polymerase chain reaction – a common method of DNA amplification, utilizing pairs of oligonucleotide
primers as start sites for repetitive rounds of DNA polymerase-catalyzed replication alternating with
denaturation in successive heating-cooling cycles.
polymorphism – the occurrence together in a population of two or more alternative genotypes, each at a
frequency greater than that which could be maintained by recurrent mutation alone; NOTE: A locus is
arbitrarily considered to be polymorphic if the rarer allele has a frequency of 0.01, so that the
heterozygote frequency is at least 0.02.
positive predictive value – the likelihood that an individual with a positive test result has a particular
disease, or characteristic, that the test is designed to detect; NOTE: This varies with prevalence of the
disease unless the test is 100% specific.
precision – the closeness of agreement between independent test results obtained under stipulated
conditions (ISO 3534-1)5; NOTE: Precision is not typically represented as a numerical value but is
expressed quantitatively in terms of imprecision—the standard deviation (SD) or the coefficient of
variation (CV) of the results in a set of replicate measurements (ISO 3534-1).5
primary standard – standard that is designated or widely acknowledged as having the highest
metrological qualities and whose value is accepted without reference to other standards of the same
quantity (VIM93).2
probe – defined piece of single-stranded nucleic acid used to identify specific DNA or RNA molecules
bearing the complementary sequence; NOTE: In one usage, the probe is the molecule (e.g., cDNA,
6 ©
Volume 25 MM12-P
oligonucleotide) immobilized on the solid support to form the array. Alternatively, probe defines the
labeled mixture in solution that hybridizes or reacts with target molecules attached to the support. The
latter definition is the original definition of probe, whereas the former has come into usage more recently.
reference material/reference preparation (RM) – a material or substance, one or more of whose

property values are sufficiently homogeneous and well established to be used for the calibration of an
apparatus, the assessment of a measurement method, or for assigning values to materials (VIM93).2
reference method – a thoroughly investigated method, in which exact and clear descriptions of the
necessary conditions and procedures are given for the accurate determination of one or more property
values; the documented accuracy and precision of the method are commensurate with the method’s use
for assessing the accuracy of other methods for measuring the same property values or for assigning
reference method values to reference materials.
repeatability (of results of measurements) – closeness of the agreement between results of successive
measurements of the same measurand carried out under the same conditions of measurement (VIM93).2
repeatability {/repeatability of a measuring system/instrument} – the ability of a measuring {system/}

instrument to provide closely similar indications for repeated applications of the same measurand under
the same conditions of measurement; NOTE 1: These conditions include reduction to a minimum of the
variations due to the observer; same measurement procedure; same measuring equipment used under the
same conditions, same location; and repetition over a short period of time; NOTE 2: Repeatability may
be expressed quantitatively in terms of the dispersion characteristics of the indications.
reproducibility (of results of measurements) – closeness of the agreement between the results of
measurements of the same measurand carried out under changed conditions of measurement (VIM93).2
restriction enzyme – restriction endonuclease; including any of a large family of bacterial enzymes that
cleave double-stranded DNA at specific nucleotide sequences.
restriction fragment length polymorphism (RFLP) – a type of DNA polymorphism; NOTE: The
classic type of RFLP is usually a dimorphism determined by the presence or absence of a specific
restriction endonuclease cutting site; a specific RFLP is defined by a combination of a specific probe and
a specific restriction endonuclease; upon Southern blot analysis, the allele containing the cutting site will
be seen as one or two shorter fragments (depending on whether the probe hybridizes with one or both
fragments); the allele without the cutting site will be seen as a longer fragment.
sample – one or more parts taken from a system, and intended to provide information on the system,
often to serve as a basis for decision on the system or its production; EXAMPLE: A volume of serum
taken from a larger volume of serum.
screening test – generally used to evaluate the genetic status of an asymptomatic individual who is not at
increased risk due to a positive family history.
sense strand – the strand of a duplex DNA that has the same nucleotide sequence as mRNA except that T
substitutes in DNA for U in RNA; NOTE: The sense strand is also called the “coding” strand; the other
strand, which is the actual template for mRNA synthesis, is the “anticoding” or “antisense” strand.
sensitivity – the change in response of a measuring system or instrument divided by the corresponding
change in the stimulus (modified from VIM93)2; NOTE 1: The sensitivity may depend on the value of
the stimulus (VIM93)2; NOTE 2: The sensitivity depends on the imprecision of the measurements of the
sample.
©
Number 18 MM12-P
short tandem repeat (STR) – STRs, also known as microsatellite repeats, consist of di-, tri-, tetra-, or
pentanucleotide tandem repeats; NOTE: A specific locus can be highly polymorphic, having numerous
alleles which differ in the number of repeat units.
silanization – the preparation of glass or similar surfaces with silane; NOTE: Silane decreases the
amount of surface oxides and facilitates the linkages of carbon based molecules to those surfaces
signal//measurement signal – a quantity that represents the measurand and which is functionally related
to it; NOTE: Generally, a signal is the chemical, radioactive, luminescent, or colorimetric output of an
assay detection system.
solid phase – one of several support media to which either target or probe nucleic acids are immobilized;
NOTE: Examples include nylon or nitrocellulose membrane, beads, magnetic particles, microtiter plate
wells, glass, and silicon chips.
southern blot – a solid-phase membrane to which DNA, transferred from a gel after electrophoresis, is
bound so that it can be hybridized with a labeled nucleic acid probe.
specificity – the ability of a measurement procedure to measure solely the measurand; NOTE:
Specificity has no numerical value in this context.
specimen (patient) – the discrete portion of a body fluid or tissue taken for examination, study, or
analysis of one or more quantities or characteristics to determine the character of the whole.
stringency – the degree of specificity in a DNA hybridization or annealing reaction; NOTE 1:

Stringency is adjusted by altering the salt concentrations in the buffer and/or adjusting the temperature of
the reaction; for example, increasing the salts decreases the stringency (more DNA will bind less
specifically, allowing for more base-pair mismatches to occur); higher temperature increases the
stringency (DNA will bind less tightly but more specifically); NOTE 2: Stringency considerations apply
to all annealing steps in amplification, as well as hybridization and wash conditions in detection; NOTE
3: Conditions may be designed to allow only perfect hybridization, or to tolerate a certain degree of
mismatch; NOTE 4: Reduced stringency can be employed for specific applications such as those in
which microarrays from one organism (e.g., human) are used to study a related organism (e.g., mouse).
target//template – the area of the nucleic acid to be detected or amplified for detection; NOTE: As used
in microarrays, the target is the molecule in solution (new usage) or attached to the glass support (original
usage).
template - See target.
thermal cycler – an automated, programmable instrument used to repetitively heat and cool a set of
reaction tubes to promote DNA replication, denaturation, and hybridization in the polymerase chain
reaction.
trueness – closeness of agreement between the average value obtained from a large series of test results
and an accepted reference value (ISO 3534-1)5; NOTE: The measure of trueness is usually expressed in
terms of bias.
transcription – the cellular process in which RNA is synthesized from a DNA template.
validation – confirmation through the provision of objective evidence, that requirements for a specific
intended use or application have been fulfilled (ISO 9000)7; NOTE 1: WHO defines validation as the
action {or process} of proving that a procedure, process, system, equipment, or method used works as
expected and achieves the intended result (WHO-BS/95.1793)8; NOTE 2: The components of validation
8 ©
Volume 25 MM12-P
are quality control, proficiency testing, validation of employee competency, instrument calibration, and
correlation with clinical findings.
variable number of tandem repeats (VNTR) – any polymorphic locus consisting of repeat units of any
size (usually microsatellites or minisatellites); NOTE 1: Originally, this term was synonymous with
minisatellite repeats, which consist of tandemly arranged repeating units of 30 to 35 base pairs; NOTE 2:
The repeat unit of a VNTR (minisatellite) has a variable sequence, but contains a core sequence of 10 to
15 base pairs; NOTE 3: A specific locus can be highly polymorphic, having numerous alleles which
differ in the number of repeat units;
verification – confirmation through the provision of objective evidence that specified requirements have
been fulfilled (ISO 9000)7; NOTE: A one-time process completed to determine or confirm test
performance characteristics before the test system is used for patient testing.
4.2 Acronyms/Abbreviations
ARMS amplification refractory mutation system

ASO allele-specific oligonucleotide
BACs bacterial artificial chromosomes
bDNA branched DNA
CARD catalyzed reporter deposition reaction
CCD charge coupled device
CGH comparative genomic hybridization
cDNA complementary DNA
CFTR cystic fibrosis transmembrane conductor gene
CL chemiluminescence
DGGE denaturing gradient gel electrophoresis
DNA deoxyribonucleic acid
dATP deoxyadenine 5' triphosphate
dCTP deoxycytidine 5' triphosphate
dGTP deoxyguanidine 5' triphosphate
DMT dimethoxytrityl
dNTPs deoxyribonucleoside triphosphates
dTTP deoxythymidine 5' triphosphate
dUTP deoxyuridine 5' triphosphate
ddATP dideoxyadenine 5' triphosphate
ddCTP dideoxycytidine 5' triphosphate
ddGTP dideoxyguanidine 5' triphosphate
ddTTP dideoxythymidine 5' triphosphate
dsDNA double-stranded DNA
EDTA ethylene diaminetetracetic acid
EST expressed sequence tag
LCR ligase chain reaction
LDR ligase detection reaction
MALDI-TOF matrix-assisted laser desorption/ionization time-of-flight
MBEI model-based expression indexes
MMLV Moloney Murine leukemia virus
NASBA nucleic acid sequence-based amplification
NIR near-infrared
NVOC nitroveratryloxycarbonyl
PCR polymerase chain reaction
PET positron emission tomography
PMMA polymethyl methacrylate
PMT photomultiplier tube
©
Number 18 MM12-P
RCA rolling circle amplification

RFLP restriction fragment length polymorphism
RMA robust multichip average
RNA ribonucleic acid
RT-PCR reverse transcriptase-polymerase chain reaction
SBE single base extension reactions
SDA strand displacement amplification
SDS sodium dodecyl sulfate
SNE single nucleotide extension
SNPs single nucleotide polymorphisms
SSCP single-stranded conformational polymorphism
STR short tandem repeat
TAE TRIS-acetate-EDTA
TAS transcription-based amplification system
TBE TRIS-borate-EDTA
TE TRIS-EDTA
TMR tetramethylrhodamine
UNG uracil-N-glycosylase
5 Method Overview
A diverse range of methods have been developed for producing DNA and oligonucleotide microarrays9-22
and these differ in type of solid support, mode of probe attachment, probe density, and the labeling and
detection scheme for target bound to the arrayed probes.
5.1 Solid Supports
A solid support is the insoluble material onto which a microarray is formed. It is usually a flat sheet of
material, and the microarray is formed on the top surface. Size of the solid support can vary considerably
for microarray formats, ranging from large 8 cm x 12 cm or 22 x 22 cm sheets of nylon for traditional
array formats,23,24 to small 1 x 3 inch glass microscope slides, and miniature 1.28 cm x 1.28 cm glass
chips.
Materials for solid supports include glass,11,25,26 and plastics such as aminated polypropylene.27 In some
cases the solid support simply provides rigidity, and it is coated with a layer of substance specifically
designed for the immobilization of probes [e.g., polyacrylamide gel,28-30 streptavidin,31 polylysine,32
nitrocellulose-based polymer].33
There are some special considerations for the selection of solid supports. Solid supports for the early
macroarrays were porous (e.g., nitrocellulose, nylon),23,34 but as array density increased, the spreading of
solutions of applied probes caused overlap of adjacent probes and a loss of definition in the array.
Subsequently, nonporous supports have been favored on which spotting produces well-defined and non-
overlapping spots. Glass has a number of advantages as a solid support for microarrays. It is durable and
has a low intrinsic fluorescence, thus minimizing background signal in an assay using fluorophore labels.
Also, it is easily activated for covalent attachment chemistry, and nonporous, thus minimizing
hybridization volume and enhancing the kinetics of hybridization.
The solid support can have an active role in forming the array. Microelectrode microarrays are used to
form arrays by directing probes to specific microelectrode locations31,35 or initiating probe attachment via
an electrochemically initiated polymerization reaction.36-39
10 ©
Volume 25 MM12-P
5.2 Probe Synthesis and Attachment to Support
Two different approaches are employed to form a microarray. A presynthesized probe [e.g.,
oligonucleotide or cDNA (~500 to 5000 bases long)] can be attached, or an oligonucleotide probe can be
synthesized in situ on the surface of the solid support.
5.2.1 Attachment of Presynthesized Probes
A presynthesized probe can be attached to the surface of the substrate by physical and covalent methods.
5.2.1.1 Physical Attachment Methods
Attachment of presynthesized probes by “spotting” is the most commonly employed method of making
microarrays, and there are numerous spotting devices available for this purpose. The process simply
involves depositing, pipetting, printing, or spotting nanoliter or picoliter volumes of probe solutions at the
intended location on the surface of the solid support (e.g., using an ink jet or pins), and allowing the probe
to attach to the support surface via physical adsorption.32,40-42 Sometimes the surface is pretreated to
increase the attachment forces (for example, a positively charged polylysine layer is effective for the
attachment of negatively charged probes [e.g., cDNA] via electrostatic interactions).32
A planar microarray of microelectrodes (e.g., 80-µm2 platinum electrodes at a 200 µm center-to-center

spacing on a 2-mm2 array) provides a versatile alternative to spotting as a method for producing a
physically immobilized microarray. Probes are immobilized at unique electrodes in the microarray by
switching on the electrode. As a result, charged molecules (e.g., biotinylated oligonuclotides) are
transported by electrophoresis to the electrode where they bind with streptavidin in a permeation layer (a
porous hydrogel) covering the electrode. The highly stable streptavidin: biotinylated oligonucleotide
complex is retained within the pores of the permeation layer and is accessible for reaction with target
molecules.31,35
5.2.1.2 Covalent Attachment Methods
Presynthesized 5'-amino modified probes can also be covalently attached to the surface of a solid support.
The surface is activated so that it will react directly with the spotted probe—e.g., epoxide groups on a
glass surface43,44 or N-oxysuccinimide groups on a polystyrene surface.45 Indirect coupling utilizes a
bifunctional coupling reagent, such as phenylene di-isocyanate or glutaraldehyde, to covalently link the
probe to amino groups on the support surface.46,47 Alternatively, a 3'-amino-modified oligonucleotide
probe is linked to surface thiol groups using N-(gamma-maleimidobutyryloxy) succinimide ester.48
Another process involves bromoacetylation of aminosilyl-modified glass or aminated polylysine surfaces
and reaction with thiophosphorylated oligonucleotides.49 Photochemical methods have been developed,
and these attachment methods exploit the photochemical reactivity of thymine bases. Exposure to
ultraviolet radiation (254 nm) causes the thymine residues of DNA molecules to covalently link to amine
groups on an amine-derivatized glass surface.11
Covalent attachment chemistry is also used to immobilize oligonucleotide probes in arrays of gel pads. A
20-µm thin gel layer is formed on a glass slide by a gel photopolymerization technique, and an array of
gel pads (e.g., 40 µm x 40 µm or 100 µm x 100 µm) is produced by scribing the layer. Glass areas
between the pads are rendered hydrophobic by silanization. The gel is then activated for coupling by
treatment with hydrazine hydrate that converts some amide groups into hydrazide groups. The probe is
functionalized at the 3'-end with a 3-methyluridine group, and this is then oxidized with periodate to
produce a reactive dialdehyde. A pin device is then used for manual application of a solution of the
periodate-treated probe onto the activated polyacrylamide gel pad. The reactive aldehyde groups react
with the hydrazide groups in the gel pads, resulting in covalent immobilization of the probes.28-30,50
©
Number 18 MM12-P
Electrochemically directed copolymerization has also been employed to immobilize probes on a

microarray. An electrode microarray (48 or 128 addressable 50 µm x 50 µm gold microelectrodes on a
silicon chip) provides the layout for the array. A series of 5'-pyrrole modified oligonucleotides are
electrochemically copolymerized with pyrrole on the surface of the electrodes.36-39 Successive
copolymerizations at individual electrodes produces an array of covalently attached oligonucleotides.
5.2.2 In situ Synthesis of Probes
In situ step-wise conventional oligonucleotide synthesis methods based on phosphoramidite chemistry

have been used to build oligonucleotide probes nucleotide by nucleotide at defined locations on the
surface of a solid support to form a microarray.
5.2.2.1 Physical Barrier
The location where synthesis will occur can be defined using physical barriers pressed in contact with the
surface of the substrate, such as a glass plate.44,51-54 Reactants are flowed along the microchannel formed
by two parallel barriers, and nucleotides are attached to the surface of the plate in millimeter-wide stripes.
Repositioning of the barriers orthogonally to the initial position leads to covalent attachment of the next
nucleotide in the sequence of the intended oligonucleotide.
5.2.2.2 Surface Tension
Features of an oligonucleotide microarray can also be defined and separated by differential surface
tension. A photolithographic method is used to create spatially addressable, circular features containing
an amino-terminated organosilane coupled to the glass through a siloxane linkage. Each circular feature is
bounded by a hydrophobic perfluorosilanated surface that localizes chemical reactions within the defined
circular site. The efficiency of phosphoramidite-based oligonucleotide synthesis on the microarrays for
the individual amidite bases were 98.8% (dT), 98.0% (dA), 97.0% (dC), and 97.6% (dG).55
5.2.2.3 Microdispensing
A microdispenser or microspotter will accurately deposit reagents and the correct nucleotides at specific
locations on the surface of a solid support and so define the array features. A repetitive cycle of
deposition, reaction, and washing creates the array of oligonucleotides. This approach has been used to
synthesize a 64 x 64 array (4096 oligonucleotide elements) on clear, aminated polypropylene film using a
semiautomated 64-channel fluidic chemical delivery system and conventional phosphoramidite-based
synthesis chemistries.27
5.2.2.4 Photolithographic Masks
Another method for defining locations on a solid support where oligonucleotide synthesis will occur,
exploits the combination of photosensitive protection groups and photolithographic masks.25,26,56,57
This light-directed synthesis method was devised by Fodor and colleagues.25 The in situ synthesis of the
oligonucleotide array involves the following steps:
(1) A silicon or glass chip or slide (the substrate) is cleaned, and then surface hydroxyl groups are reacted
with aminopropyl triethoxysilane to produce surface amino groups.
(2) A hexaethyleneglycol linker functionalized at one end with a light-sensitive protecting group [e.g.,
nitroveratryloxycarbonyl (NVOC)] is reacted with the surface amino groups to produce a layer of
hexaethylene-NVOC molecules. The purpose of the linker molecule is to space apart the synthesized
oligonucleotide from the surface of the substrate. This makes the arrayed molecule more accessible to
12 ©
Volume 25 MM12-P
reactants in samples applied to the chip surface by minimizing steric hindrance effects due to the chip
surface.
(3) A photomask is prepared that has a pattern of opaque and clear areas corresponding to locations on
the array. Exposure of the surface of the substrate to light through the photomask initiates
photochemical deprotection reactions at sites corresponding to the clear areas on the photomask.
Light impinging on the substrate causes the NVOC group to decompose, and this generates reactive
hydroxyl groups on the surface of the substrate. No reaction occurs in areas of the substrate directly
beneath the opaque parts of the photomask. The hydroxy groups are then reacted with a 3'-O-
phosphoramidite-activated, 5'-O-NVOC-photoprotected deoxynucleoside to form the first base in the
oligonucleotide.
(4) Successive rounds of light-directed deprotection and subsequent reaction with 3'-O-activated, 5'-O-
photoprotected deoxynucleosides using a series of masks, facilitates base-by-base synthesis of
oligodeoxynucleotides at precisely defined micrometer-sized locations determined by each mask.
The complete set of oligodeoxynucleotides for a oligodeoxynucleotide of length n comprises 4n
oligonucleotides and requires 4 x n cycles of reaction.
The light-directed synthesis method is usually employed to synthesize oligonucleotides up to 25 bases in

length (25-mers) at sites measuring less than 50 µm x 50 µm for microarray applications. Site densities
of greater than 106 sites per square centimeter are possible, and step yields of >95% were demonstrated in
early array experiments.26
5.2.2.5 Digital Light Processing
An alternative to a photomask for achieving spatial directed synthesis is to use a digital mirror to direct
light to specific locations on a microarray (digital light processing). A micromirror microarray consists of
over 750 000 16 µm x 16 µm microfabricated mirrors on a 4 cm x 4 cm chip. Each mirror can be tilted
individually by ten degrees. These chips are used in consumer products for image projection, but have
been adapted to make microarray chips. An ultraviolet image of the virtual mask is projected onto the
surface of a glass substrate (a virtual mask) in order to initiate a photochemical reaction at the surface
locations defined by the virtual mask.58 The glass substrate is mounted in a flow cell reaction chamber
connected to a DNA synthesizer, where programmed chemical coupling cycles (phosphoramidite
chemistry) follow the photodeprotection reaction (1-(3,4-(methylenedioxy)-6-nitrophenyl)ethyl
chloroformate; MeNPOC) initiated by light exposure. A repetitive cycle of photodeprotection and
chemical coupling, with different virtual masks, produces the preselected oligonucleotides in a specified
microarray. The "maskless array synthesizer" has been used to synthesize oligonucleotide microarrays
containing more than 76 000 features (16 µm x 16 µm). A stepwise yield of 95 % was calculated for the
synthesis of a series of 82 unique 18-mers. A modification to the process uses photogenerated acid in
solution to initiate deprotection of the 5'-OH group of the nucleotide in conjunction with conventional
phosphoramidite chemistry.59
An alternative, higher contrast, photoresist-based masking method has been devised in conjunction with a
standard oligonucleotide synthesis procedure. Glass slides derivatized with a layer of dimethoxytrityl
(DMT)-hexaethylene glycol molecules are coated with a semiconductor photoresist (SU-8 epxoxy resin
and triphenyl sulfonium hexafluorantimonate). The photoresist is exposed with a mask, the exposed
photoresist stripped away, and the underlying areas detritylated to reveal reactive hydroxyl groups where
reaction with 5'-DMT-2'doxynucleoside phosphoramidites will occur. This process was effective at a
resolution of ~ 4 to 8 µm. A modification to this method introduced an underlayer to act as a barrier to
the chemical deprotection step and to protect the substrate surface during processing.60
©
Number 18 MM12-P
5.3 Signal Generation and Detection
Generally, labels, detection technologies, and assay formats have been adapted from existing
immunoassay and DNA probe assays.
For microarray-based expression assays, the polyA+mRNA target in the sample to be tested is reverse-
transcribed to either first strand cDNA or to cRNA and labeled by incorporation using labeled
nucleotides.61 The label can be a directly detectable as in the case of 33P, Cy3, or Cy5. Alternatively, a
secondary label, such as biotin or digoxigenin, can be incorporated and the labeled cDNA or cRNA
targets bound to the array, detected in a secondary reaction with labeled streptavidin and labeled anti-
digoxigenin, respectively.62
5.3.1 Fluorescence
Fluorophores are the most commonly used label for microarray applications, because the signal can be
generated directly without the need for ancillary reagents, and the signal can be detected with good spatial
resolution and sensitivity with a scanning fluorescence microscope. Examples of fluorophore labels used
with microarray format include FAM,28 HEX (hexachlorinated FAM),28 R-phycoerythrin,36,63 Cy3,64
Cy5,64 fluorescein,63 TMR, 28,62 and the phycoerythrin energy transfer dye system phycoerythrin*Cy5.63
Time-resolved fluorimetry provides a means of reducing background, because the long-lived fluorescence
signal is measured after the short-lived background, fluorescence has decayed. Near-infrared (NIR),
time-resolved fluorescence labels aluminum tetrasulfonated naphthalocyanine and tricarbocyanine dye,
IRD800 were evaluated using oligonucleotide probes covalently bound via a glutaraldehyde linkage to an
aminated-PMMA substrate surface. The limit of detection for oligonucleotides containing an NIR
fluorescent reporter was determined to be 0.038 molecules/square µm (time-gated measurement).65
Up-converting phosphor particles (0.4 µm diameter) emit visible light when illuminated with infrared
light (980 nm) as a result of a phenomenon called “up-conversion.”66 This type of nonfading phosphor
particle label provides enhanced image contrast, and a comparison with Cy5 label showed improved
sensitivity and a two order of magnitude linear relationship between the phosphor luminescence and
target concentration in model cDNA microarray hybridizations.
5.3.2 Chemiluminescence
Highly sensitive chemiluminescence (CL) detection of enzyme labels is widely used in immunoassay.
This technique has been adapted to microarray detection of alkaline phosphatase labeled antidigoxigenin
antibody and antifluorescein antibody bound to biotinylated target hybridized to probes on nylon
membranes and glass slides.67 The enzyme label is detected with a mixture of the proprietary substrate
APS-5 and lucigenin and light emission detected with a cooled charged-coupled device (CCD) camera
(0.5 picogram DNA/spot detected).
5.3.3 Radioactivity
Early arrays (dot blot format) used radioactive labels to reveal and quantitate hybridized target or probe.
For example, cDNA transcribed from poly(A)+ RNA target is radiolabeled with 33P and24,68,69
hybridization signals from the array collected on phosphor screens.
5.3.4 Colorimetry
This has been a less used detection option but is gaining in popularity by virtue of its affordability. Its
main application is for detection of secondary labels, such as biotin using horseradish peroxidase or beta-
galactosidase labeled streptavidin, or digoxigenin using alkaline phosphatase labeled antidigoxigenin
14 ©
Volume 25 MM12-P
antibody, DNP using horseradish peroxidase labeled anti-DNP antibody, or tetramethylrhodamine (TMR)
using goat anti-TMR antibody: anti-goat IgG - horseradish peroxidase. Colorimetric assays have a limited
dynamic range compared to fluorescence detection methods (<102 vs. 105), but it does have the advantage
of only requiring a low-cost flatbed scanner for signal quantitation.62,70 Flatbed scanner technology has
recently been employed in more sophisticated readers. 71
5.3.5 Mass Spectrometry
Matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) is effective in desorbing and

determining the molecular weights of species bound to a probe on a microarray and hence assessing the
extent of hybridization. This can be extended to the measurement of different-sized bound targets and
facilitate multiplex analysis at a single hybridization spot. Mass-based discrimination can be augmented
further using different mass tags, such as biotin, Cy5, fluorescein, and rhodamine.45
5.3.6 Multiplexed Detection
The use of microarrays for gene expression monitoring has produced a need for multiplexed assays and
labels that can be detected on the same microarray without cross-talk. Two-color fluorescence detection
schemes are popular, and various pairs of distinguishable labels have been developed for this purpose.72
These include Cy3 with Cy5,64 Cy5 with fluorescein,73 fluorescein with phycoerythrin, and a
phycoerythrin: Cy5 energy transfer dye system with phycoerythrin.63
Three labels can be combined for a multiplex assay. For example, the following combination has been
verified: 1) biotin detected using beta-galactosidase labeled streptavidin and a 5-bromo-4-chloro-3-
indolyl-beta-galactopyranoside substrate (X-Gal) (light-blue-colored product); 2) digoxigenin detected
using alkaline phosphatase labeled antidigoxigenin antibody and a fast red TR/AS-MX substrate (red
product); and 3) TMR detected using goat anti-TMR antibody: anti-goat IgG - horseradish peroxidase and
a DAB substrate (yellow-brown color).62 Each label was used to label an mRNA population from HeLa
cells at different stages in the cell cycle (G1, S, and G2/M). The level of expression of a gene is revealed
by the observed color at the spot representing the gene on the microarray. Genes highly expressed in one
phase of the cycle will appear principally with the color developed by the substrate for the label (e.g., red
for digoxigenin-labeled mRNA). However, if there is expression in more than one phase, then the colors
of the products combine to give a characteristic color, e.g., biotin (blue) and TMR (yellow-brown)
produce a green color; biotin (blue) and digoxigenin (red) produce a purple color; and digoxigenin (red)
and TMR (yellow-brown) produce an orange color at the spot on the microarray.
Multiplexed assays are also used in single nucleotide extension (SNE) assay format for detection of
mutations64,74,75 and genotyping of single nucleotide polymorphisms (SNPs).76 For mutation detection, a
microarray of primers (one per mutation) is immobilized on a glass surface. Mutations are detected by
extending immobilized primer: target hybrids immediately adjacent to the mutant nucleotide positions
with single labeled dideoxynucleoside triphosphates (ddNTPs) (e.g., Cy5 and Cy3 labels) using a DNA
polymerase. Fluorescence scanning reveals the pattern of bound label from which mutation can be
deduced.
5.3.7 Signal Amplification Reactions
Catalyzed reporter deposition reaction (CARD).62 The first step in this assay involves binding a
streptavidin-peroxidase conjugate to a biotinylated target captured by a probe on the array. Next, the
bound peroxidase reacts with a biotin-tyramide substrate in the presence of hydrogen peroxide to produce
radical products. These react indiscriminately with protein in the immediate vicinity of the peroxidase
label. The resulting decoration of the streptavidin-peroxidase: biotinylated target is revealed by incubation
with a streptavidin-beta-galactosidase conjugate and an X-Gal substrate. This procedure provides a
sixtyfold improvement in sensitivity compared to a direct colorimetric detection of the bound target.
©
Number 18 MM12-P
Signal amplification can also be achieved using a combination of the rolling circle amplification reaction
(RCA) and PCR ligation detection.77 This assay can be performed in a multiplex mode (e.g., 17 pairs of
probes), and the rolling circle amplification permits quantification of previously undetectable
hybridization events.
Websites
The reader is directed to the following websites for general information on microarrays:
http://www.clinchem.org/cgi/content/full/47/8/1479/DC1 (accessed on date)
http://genome-www5.stanford.edu/MicroArray/SMD/
http://cmgm.stanford.edu/pbrown/mguide/index.html
6 Analytical
6.1 Nucleic Acid Extraction
A number of methods exist for the preparation of nucleic acid samples for microarray analysis. Since this
step is crucial to a successful result, care should be exercised to ensure that an accepted protocol yielding
highly pure and intact nucleic acid is followed. Extraction of both RNA and DNA are potentially relevant
depending on the type of microarray analysis being performed. For expression analysis, the target sample
is usually derived from RNA. For DNA sequencing, the target is DNA. For cDNA microarrays, PCR
products must be purified for spotting on a medium such as a glass slide. Although in-house-developed
protocols may be used if properly verified, for both target and probe preparation for microarray analysis,
commercial kits may provide the greatest success and reproducibility.
6.1.1 DNA Isolation
6.1.1.1 Genomic DNA
Traditional methods of DNA isolation generally require the lysis of cells and addition of a proteolytic
enzyme such as proteinase K to digest proteins that may bind the DNA and interfere with the procedure.
Following this, the samples are usually treated with an enzyme to remove residual RNA and extracted
with organic solvents to remove remaining proteins and cell components that can interfere with the
quality of the result. Finally, the sample is treated with ethanol to precipitate the purified DNA from the
remaining material. Purified DNA is usually recovered at this point by centrifugation. Following this, the
purified DNA is resuspended in a buffer, such as Tris buffer. Excessive drying should be avoided, as it
can affect the structural integrity of the purified DNA and may decrease its solubilization. A common
modification of this procedure involves the use of a high-salt solution and isopropanol to promote
precipitation of the DNA from solution.
Several commercial nucleic acid isolation kits are available that may simplify and reduce the time of this
procedure. A variety of methods using different components and procedures to perform the above steps
are used. Many of the kits use alternatives for the extraction and precipitation of sample by replacing the
organic solvents and ethanol precipitations with other suitable components that are less toxic. For
example, some kits use a silica-based column to bind nucleic acids under high salt conditions. Then the
purified DNA is eluted under low salt conditions. Magnetic bead-based methods are also routinely used.
Automated DNA extraction instruments that make high-throughput extraction possible are widely
available. Regardless of the extraction method used (manual or automated) it is important to validate that
the DNA behaves as expected without inhibition in downstream applications. Refer to the most recent
edition of CLSI document MM14—Proficiency Testing (External Quality Assessment) for Molecular
Methods for more extensive discussion of sample preparation techniques.
Genomic DNA isolated for resequencing is usually broken down into smaller pieces for labeling and
hybridization. Therefore, it is usually not necessary to determine the intactness of the isolated DNA. For
16 ©
Volume 25 MM12-P
DNA that will be PCR amplified, it is sufficient if 1) the method has been shown to yield amplifiable
DNA, and 2) it is confirmed that a product of the expected size is produced by amplification.
6.1.1.2 PCR Product
PCR amplification of cDNA clones is used to generate probes that are then purified and spotted to make
microarrays. The purification of PCR products can be accomplished with one of numerous commercial
kits designed for this purpose. Usually hundreds if not thousands of PCR products are being purified
simultaneously, so it is necessary to use a high-throughput method for clean-up of the PCR products,
often in 96-well or 384-well microplate format. Confirmation of successful PCR amplification and
purification should be performed by, for example, agarose gel analysis or using more advanced
miniaturized electrophoresis approaches.
6.1.2 RNA Isolation
Preparation of intact high-quality RNA for expression analysis is of critical importance for obtaining
reproducible results. Depending on the protocol used, either total RNA or polyA+ mRNA can be isolated.
The sources of RNA for clinical expression analysis will most likely be tissue or blood. The challenge is
to isolate sufficient quantity of intact RNA free of significant DNA contamination. The number of
micrograms of RNA that will be needed is dependant on the array method and downstream sample
preparation and labeling protocol. Many different RNA isolation protocols are compatible with
expression microarray sample preparation protocols. The key to successful RNA extraction, particularly
from fresh or frozen tissue, is speed. RNAses need to be inactivated quickly by reagents present in the
homogenization solution, for example denaturants such as guanidinium isothiocyanate. Alternatively,
there are now a variety of commercial products for RNA stabilization from both tissue and blood. The use
of these products is particularly important for stabilizing RNA from clinical samples, because there is
often a delay from the time a sample is obtained to the time the sample is received in the laboratory for
processing.
Several protocols and solvent systems are available for the isolation and purification of RNA, depending
on the sample source (tissue or blood). As with the isolation of DNA, numerous methods have been
reported in the literature, and commercial products are available from a number of different companies.
RNA is very sensitive to degradation by RNases, which may contaminate glassware and plasticware and
enzyme and protein preparations. Extra care is necessary to ensure that all reagents and supplies used in
the extraction and storage of RNA are properly treated to destroy or inhibit RNases. Most liquid reagents
prepared in the laboratory may be treated with diethylpyrocarbonate (DEPC) followed by autoclaving
(exceptions include reagents containing Tris buffer and reagents with heat-sensitive ingredients such as
protein and volatile compounds). A variety of commercial preparations are available that either inhibit or
destroy this enzyme activity. As with all procedures, the laboratory should carefully follow the
established protocol and incorporate established controls to validate proper performance for each
extraction.
The use of polyA+-enriched RNA ensures a starting material highly enriched in mRNA and possibly less
contaminated by genomic DNA. In most cases polyA+ mRNA is purified from total RNA taking
advantage of the presence of the polyA tail. Again, there are several commercial kits for mRNA
purification. In some cases the mRNA can be captured by binding to a solid support, such as a
chromatographic column or a collection of magnetic beads containing bound complementary
oligodeoxythymidine [oligo(dT)] molecules.
©
Number 18 MM12-P
6.1.3 Quality and Quantity of Nucleic Acid
6.1.3.1 Spectrophotometry
Quantitation of DNA and RNA is important in order to determine concentration and yield. Quantitation
can be performed by several methods, including, most commonly, spectrophotometry. The ability of
nucleic acids to absorb UV light maximally at 260 nm is the basis for spectrophotometric analysis of
DNA and RNA concentration. The RNA concentration can be determined based on the fact that each OD
unit at 260 nm is equivalent to 40 µg/mL of single-stranded RNA. DNA concentration is calculated using
50 µg/mL per OD260 unit. Samples should be diluted in water rather than buffer, because the molar
extinction coefficient of DNA or RNA used to calculate the amount is based on its absorbance, which was
determined in water. Several newer models of spectrophotometer allow measurement of very small
volumes of nucleic acids, some down to 1 µL of sample, obviating the need for dilution. This is
particularly important when dealing with low concentration or limited samples which can be recovered
after measurement if necessary. Regardless of the instrument used, it is important that the
spectrophotometer be properly calibrated and that all measurements are made within the linear range of
the instrument, usually 0.1 to 1.0 absorbance units, and if applicable, in a clean quartz cuvette. If the
initial absorbance measurement is above this linear range, then a dilution should be made. If the
measurement is below the linear range, then either a less dilute sample should be measured or a more
sensitive spectrophotometer, such as the microvolume models described above, should be used. Human
leukocytes generally yield 10 to 40 µg of total RNA per 106 cells.
Nucleic acids absorb maximally at 260 nm and proteins maximally at the 280 nm. The ratio of the
absorbance at 260 nm to 280 nm (A260/280) has been typically used to provide an estimate of the purity of
the nucleic acid preparation with respect to the amount of contaminating protein that exists in the sample.
Unlike for the determination of concentration which should be done in water, the A260/280 ratio is pH- and
ionic-strength-dependant and should therefore be measured in a buffered solution near pH 7. A ratio of
~2.0 indicates relatively pure RNA, while a ratio of ~1.8 is indicative of relatively pure DNA.
6.1.3.2 Electrophoresis
Although absorbance is a suitable method for determining the approximate amount of nucleic acid in a
sample, it gives no indication of the intactness of the sample. The intactness of DNA is less important if it
will be fragmented for analysis or if a small target is amplified. However, for expression microarray
analysis the quality, repeatability, and reproducibility of the results obtained are directly linked to the
quality or intactness of the RNA. Intactness of RNA can be assessed by electrophoresis, either standard
denaturing agarose gel electrophoresis or on a chip-based capillary electrophoresis instrument. In either
case the ratio of 28S to 18S RNA, normally 2:1, is a good indicator of the intactness of the RNA. The
latter method can also be used to determine the concentration of RNA. However, this method cannot be
used to evaluate the intactness of messenger RNA, since no distinct bands would be visible on a gel.
6.1.4 Storage of Nucleic Acids
Isolated DNA and RNA should be stored in tightly capped containers in either nuclease-free water or
Tris-EDTA (TE) buffer. Isolated RNA should be stored at –70 °C and used as soon as possible after
extraction. Frequent freeze/thaw cycles should be avoided, so it is advantageous to aliquot the original
sample prior to freezing. Commercially available RNase inhibitors can be added to the RNA solution for
additional protection. For long-term storage of RNA, it should be precipitated in ethanol. DNA samples
can be maintained for several months by storing at 4 °C in tightly capped containers; however, storage at -
20 °C is recommended for long-term storage. The integrity of samples should be reevaluated before use if
stored for extended periods of time at any temperature.
18 ©
Volume 25 MM12-P
6.2 Gene Chemistry
Prior to analysis of DNA or RNA using microarrays, one or more biochemical modifications are
universally required. These include PCR amplification of specific DNA targets for genotyping or
resequencing and in vitro transcription linear amplification methods for minute quantities of mRNA, as
well as various methods for labeling the targets of interest. Because the hybridization result obtained is
only as good as the target that is generated, labeled, and applied to the array, special attention should be
given to the proper performance of these gene chemistry procedures.
6.2.1 Amplification
With the exception of transcription analyses using large (microgram) quantities of total RNA, virtually all
applications for nucleic acid microarrays require some form of amplification of the analyte(s) to be
measured prior to microarray analysis. Amplification reactions can serve two purposes. First, they
increase the analyte concentration to a level that can be measured accurately and reproducibly by
hybridization. Second, amplification also serves to reduce the complexity of complex heterogeneous
samples (e.g., total genomic DNA or total RNA). The method employed differs substantially based on
whether the analytes are a restricted subset of DNA sequences or all expressed RNA sequences.
6.2.1.1 Amplification Reactions for DNA Variation Analysis
For analysis of DNA, any of several amplification procedures could in theory be used. (See the most
current edition of CLSI/NCCLS documents MM1—Molecular Diagnostic Methods for Genetic Diseases
and MM3— Molecular Diagnostic Methods for Infectious Diseases.) These methods include polymerase
chain reaction (PCR), transcription-mediated amplification (TMA), self-sustained sequence replication
(3SR), nucleic acid sequence based amplification (NASBA), and other variations. Of these, PCR has been
the preferred means to isolate the genomic sequences of interest for microarray-based analysis. Several
different strategies have been used depending on whether sequence variation is being assessed within a
single gene or gene region or at multiple sites across the entire human genome. When a single gene is to
be genotyped or sequenced by microarray analysis, generally exon coding regions and their flanking
splice acceptor regions and promoter regions are amplified. Since these gene regions can cover several
hundred to thousands of nucleotides in length amplification, it is usually achieved by the use of multiplex
reactions. When multiple polymorphic sites scattered throughout the genome are to be analyzed, the
challenge becomes the ability to amplify a large number of targets with equal efficiency. For locus-
specific multiplex amplifications, amplicons should be designed to be uniform in size, or separated into
different reactions based on relative length, to avoid preferential amplification of shorter products. As
with any molecular diagnostic procedure that employs nucleic acid amplification, specific laboratory
practices and procedures should be implemented to minimize the occurrence of incorrect results resulting
from contamination, especially the potential for carryover from previous amplification reactions. (See the
most current edition of CLSI/NCCLS documents MM1—Molecular Diagnostic Methods for Genetic
Diseases and MM3—Molecular Diagnostic Methods for Infectious Diseases.) Minimally, template-free
areas should be established for extraction of nucleic acids, as well as for set-up of the amplification
reactions.
6.2.1.2 Sequence Variant Analyses Within a Single Gene
For analysis of sequence variation within a single gene, amplification reactions are usually designed to
generate targets of a few hundred to a few thousand base pairs, typically covering exon coding regions,
flanking intron splice junction sites, and in some cases, 5'-flanking promoter regions. Assays of clinical
relevance that have used this PCR amplification approach for microarray-based genotype analysis at
polymorphic sites include the cystic fibrosis transmembrane conductor (CFTR) gene,78 as well as
detection of common polymorphisms and rare mutations within the cytochrome P450 CYP2D6 and
CYP2C19 genes.79-81 For CFTR mutation analysis, all 27 exons and intron splice site junctions have been
©
Number 18 MM12-P
amplified and labeled incorporating dUTP and fluorescein-12dGTP. Incorporation of dUTP allows
fragmentation of amplicons by treatment with uracil-N-glycosylase followed by 95 °C heating. Mutation
detection of 37 CFTR mutations is achieved using oligonucleotide microarrays carrying probe sets
produced using light-directed synthesis with each polymorphism queried by a minimum of 40 probes.78
Analysis of perfect match versus mismatch signals from hybridized DNA probe arrays has demonstrated
the feasibility of genotyping hundreds of polymorphisms and mutations in a gene spanning more than 250
kb.
Similar PCR amplification strategies have been employed in oligonucleotide microarray hybridization-
based pharmacogenetic assays for Cytochrome P450 drug metabolism enzymes. Nine exons and splice
site junctions spanning 5 kb in the CYP2D6 gene, and two exons of CYP2C19, are amplified in a seven-
plex PCR reaction generating amplicons ranging from 150 to 1125 bp in length. These targets are
fragmented by limited treatment with DNase I and subsequently fluoresceinated or biotinylated using
terminal transferase 3'-end labeling.80-82 Biotinylated hybridized targets are subsequently visualized using
a streptavidin-phycoerythrin staining procedure. This has established the ease and trueness of genotyping
polymorphisms within relatively contiguous gene regions of clinically relevant genes using
oligonucleotide microarray-based hybridization analyses.
In contrast to analysis of nearby polymorphic sites, analysis of sequence variation across an entire gene
can be achieved by long PCR amplification if the gene spans a relatively small region (i.e., <20 kb), or by
multiplex amplification of coding regions if large introns separate exons. For example, sequence analysis
of the entire 16.6 kb mitochondrial DNA (mtDNA) target by high density oligonucleotide microarray
hybridization has been achieved using a single, long, 16-kb PCR amplification of this gene region.83 A
resequencing assay of this type could find use in the diagnosis of several degenerative diseases for which
mtDNA mutations have been implicated as causative. Resequencing of several genes implicated in cancer
has also been demonstrated. For example, the 9.17 kb coding region within 62 coding exons of the ATM
gene have been queried for all possible nucleotide substitutions and other variations by pooling amplicons
from 13 multiplex reactions, ranging in complexity from two to nine amplicons each, prior to microarray
hybridization analysis.84 Additional examples using this PCR amplification approach include screening
for sequence variation within the 3.4 kb Exon 11 of the human hereditary breast and ovarian cancer gene
BRCA1,85 and sequence analysis of the Exons 2 to 11 of the p53 tumor suppressor gene coding region.86-88
For the p53 tumor suppressor gene, a multiplex amplification reaction generating ten amplicons ranging
from ~90 to 400 bp has been employed with fresh tumor tissue, as well as from positron emission
tomography (PET). Recent improvements in sample preparation have allowed these amplifications to
proceed routinely using ten micron sections from archived tumor PET blocks.89 When sample input is
small, carryover prevention methods become increasingly important. Moreover for tumor tissue analyses,
sample preparation becomes a critical factor in minimizing heterogeneity of cell types included in the
amplification reactions, and subsequent microarray analyses.
6.2.1.3 Genome-Wide SNP Analysis
Though not yet clinically useful, it is widely expected that complex genetic disease diagnostics will
require the ability to genotype across the entire genome. PCR has been used to amplify hundreds of target
regions to be analyzed, demonstrating the potential for large-scale single nucleotide polymorphism (SNP)
analysis using oligonucleotide microarrays. A prototype array-based assay that queries over 500 loci
throughout the human genome by pooling multiple multiplex amplification reactions has been
described.90 Each amplification includes primer pairs for as many as 46 loci, and the amplicons were
designed to be less than 100 bp to increase multiplex efficiency, produce similar product yields, and to
eliminate the need to fragment the amplicons prior to hybridization analysis. Detection labels are
incorporated following the initial amplification using biotinylated T3 and T7 primers in a second round of
PCR where the initial primer pairs included these promoter sequences at their 5'-ends. Further refinements
of this approach have allowed development of a commercially available SNP assay that simultaneously
queries approximately 1200 of the 1494 SNPs originally identified by Wang, et al.90 using 24 multiplex
20 ©
Volume 25 MM12-P
reactions representing 50 to 100 loci. More recently, high-density, oligonucleotide array-based assays
that query approximately 10 000 or 100 000 SNPs distributed across the human genome have been made
available commercially. Large-scale SNP assays of this type using high-density DNA arrays have been
employed to determine loss of heterozygosity (LOH) within small-cell lung carcinomas,91 and to assess
LOH and allelic imbalance within esophageal adenomacarcinomas.92 A detailed protocol for LOH
analysis using SNP microarrays has been published.93
The potential to use these approaches for genome-wide SNP analyses is limited only by the scale-up of
amplifications and microarray density. For example, using eighteen 300 000-element microarrays, large-
scale discovery and genotyping of SNPs in mice has been achieved.94 The complexity of the human
genome requires different amplification approaches from those described above in order to query large
numbers of genomic regions. For human SNP discovery and verification applications, human genomic
DNA has been digested with restriction endonucleases, the resultant fragments of a desired size separated
using agarose gel electrophoresis and purification, and the gel slices excised. Ligation to a single adaptor
to these families of fragments allows PCR amplification with a single primer that hybridizes to an adaptor
sequence, generating a family of amplicons with the expected size.95 This approach has been used to
identify 124 SNPs located in 190 kb of genomic DNA distributed across the human genome. Because the
amplicon products generated in this manner cannot be strictly controlled by design, probe arrays need to
be designed to query the specific families of products generated by any one restriction enzyme used. The
method could also be compromised due to the occasional, if not rare, presence of polymorphisms within
the restriction sites themselves.
An alternative approach is to target regions of interest with a series of long PCR amplifications to
examine SNP patterns across large, genomic regions. This approach has been used to define haplotype
diversity across human chromosome 21, where over 3000 long PCR reactions producing products
averaging approximately 10 kb and spanning ~22 Mb of contiguous chromosome 21 DNA have been
amplified in individual reactions from rodent-human somatic cell hybrids.96 These long PCR amplicons
are fragmented and labeled, then pooled prior to hybridization to one of 160 unique five-inch wafer
oligonucleotide arrays, each carrying 3.4 x 109 oligonucleotide probes. This approach has allowed
assessment of genetic variation across ~22 Mb of unique, nonrepetitive, chromosome 21 genomic DNA.
Already in use for pharmacogenomic discovery efforts in a commercial setting, the approach may allow
development of genome-wide SNP genotyping diagnostics in the future.97
6.2.1.4 Strand Bias Generation
Most of the variant analysis or genotyping arrays described to date have fragmented amplicon targets
prior to hybridization, because sister strand self-annealing can reduce amplicon detection. An alternative
approach has been employed with some microarray detection methods (e.g., with certain electronic
detection array systems).98 To mitigate the self-annealing of target amplicons, strand bias can be
introduced by using asymmetric primer concentrations during PCR to enrich for a single strand,
complementary to capture/detection probes. This approach is not generally recommended, since it
decreases the robustness of PCR substantially. Indeed, as asymmetric multiplex reactions can be quite
challenging to optimize, fragmentation of amplified target material may be a more straightforward and
general approach to eliminating self-annealing effects that also eliminates the impact of secondary
structures that can arise within single stranded targets such as palindromic sequences.
6.2.1.5 Amplification Procedures for Microarray-based Gene Expression Analysis
For gene expression analyses, either total RNA or poly(A)+ purified mRNA can be employed. The large
amount of RNA required per hybridization to achieve adequate signal (e.g., fluorescence intensity) and
the low abundance of RNA per cell (typically 0.1 to 1 pmol) requires that either very large amounts of
tissue are available or that some form of amplification procedure is used. There are unique advantages
and disadvantages to each approach. Total RNA analyses suffer from the presence and cross-
hybridization of other types of RNA that are abundant in a sample (e.g., tRNAs, rRNA, alu-like
©
Number 18 MM12-P
transcripts, etc). Further, although rare transcripts and moderately expressed genes often make up a
relatively small proportion of total RNA present, they are responsible for the majority of the sequence
complexity of any given RNA sample. Nevertheless, total RNA remains the sample of choice for
microarray, hybridization-based expression profiling in most cases. Here oligo(dT) primers must be used
for first-strand synthesis. Poly(A)+ mRNA may also be used as a template either for direct labeling, or in
reaction primed with oligo(dT) or using random primers. When oligo(dT) priming is employed the
resultant cDNAs prepared are typically biased towards the mRNA 3' region. This 3' bias can be
eliminated by the use of random primers with the consequence that not only is an increased yield of
fluorescently labeled cDNA obtained, but the sequence complexity of target hybridized is also increased
with concomitant increases in background or cross-hybridization signals. The choice of reverse
transcription primer should also be considered in terms of the microarray probe type or design chosen.
Random priming is beneficial if cDNAs or oligonucleotide arrays are designed to detect predicted open
reading frames or splice variants, whereas use of oligo(dT) will best be coupled with microarray probe
sequences that are based on mRNA 3' regions.
Various procedures and kits are commercially available for rapid isolation of high-quality mRNA from
tissues. During RNA purification, special attention should be given to removal of various contaminants
including polysaccharides, proteins, and other organic materials. Ideally, microgram quantities of high-
quality mRNA are isolated for direct labeling (e.g., fluorescent dye, radioisotope) of first-strand cDNA
during reverse transcription using a poly(dT) primer. As with molecular diagnostics designed to detect
RNA-containing viruses in low abundance, great care must be taken to avoid the contamination of
samples with RNAses, since any degradation by RNase will lead to truncated cDNA products. In general,
a minimum of 50 to 200 µg of total RNA or 1 to 5 µg poly(A) mRNA is required for reproducible
expression analysis without amplification.99,100 For mRNA present in moderate or low abundance, the
fluorescence signals generated may be near the lower limit of detection, and therefore may not be
detected on arrays without an amplification step. Posthybridization signal amplification using enzymatic
color generation has been reported,101 but is not frequently employed.
6.2.2 Reverse Transcription, Labeling, and Amplification of mRNA
Virtually all mRNA gene expression hybridization analyses using microarrays involve reverse
transcription in vitro to generate a cDNA rather than direct hybridization of the mRNA. When sufficient
high-quality RNA is available, detection labels may be incorporated during this transcription step.
Fluorescent labels have emerged as the approach of choice, though a variety of dyes and methods for
incorporation have been used. Cy3- and Cy5-labeled dNTPs have frequently been employed for direct
labeling of cDNAs, either during first-strand cDNA synthesis by a reverse transcriptase, or during a
second-strand cDNA synthesis using E. coli DNA polymerase Klenow fragment. These bulky cyanine
dyes can be incorporated into cDNA with different efficiencies depending on which reverse transcriptase
is used, and when using dUTP-fluorescent nucleotides, may result in sequence-dependent incorporation
biases. A new series of Alexa dyes that exhibit some improved fluorescence intensity characteristics
relative to cyanine dyes are also available commercially.102 While direct labeling with reverse
transcriptase has the advantage of being relatively simple, it can result in reduced detection sensitivity
relative to indirect labeling approaches that utilize some form of signal amplification.
Indirect incorporation of labels during reverse transcription of RNA templates can reduce problems
associated with the lower efficiency or uneven incorporation of fluorescently labeled dNTPs. For
example, incorporation of amino-allyl dUTP into cDNA provides amine groups for subsequent dye
coupling using N-hydroxysuccinimidyl esters of dyes.103,104 (See the protocol online at
http://www.microarrays.org/pdfs/amino-allyl-protocol.pdf.) Alternatively, cDNA can be labeled with
biotinylated dNTPs and following hybridization can be detected by binding of streptavidin-dye
conjugates.104 These latter labeling methods, though somewhat more cumbersome than direct labeling,
have the advantages of better sensitivity and absence of dye-dependent biases.
22 ©
Volume 25 MM12-P
When insufficient RNA is available for direct labeling during first-strand cDNA generation, the mRNA
must be amplified. For example, amplification methods allow analysis of cells numbering as few as 1000
to 2000, compared to a minimum of 50 000 to 100 000 cells required without amplification. Reverse-
transcription-PCR can be used to amplify cDNA by using primers that contain a unique arbitrary anchor
sequence 5' to the oligo(dT) sequence. Incorporation of a homopolymeric dG 3' tail following first-strand
cDNA synthesis using terminal deoxynucleotidyl transferase and subsequent second-strand synthesis
using oligo(dC) as a primer generate annealing sites for PCR amplification. Because cDNAs generated in
this manner vary substantially in length, PCR amplification can result in significant biases due to
differences in amplification efficiency and result in cDNA populations that do not accurately reflect
original abundance of mRNAs isolated. Therefore, it may be inappropriate to use targets generated by
many cycles of PCR amplification.
A number of different amplification methods that use a single primer have been developed for use with
heterogeneous mRNA populations.104 Most popular among these is a linear amplification of the cDNA
pioneered by Eberwine and colleagues.105 After mRNA conversion to cDNA pools using a poly(dT)
primer with any of several commercially available reverse transcriptases, second-strand cDNA synthesis
(conversion to double-stranded cDNA) is achieved through the action of RNase H and a DNA polymerase
(e.g., E. coli DNA polymerase I). Inclusion of a phage RNA polymerase promoter sequence 5'-extension
during reverse transcription allows subsequent linear amplification by T7 RNA polymerase transcription
of cRNA (or amplified RNA (aRNA) in an in vitro transcription reaction (IVT).106 This amplification
method has been described as a linear, unbiased amplification. However, partial degradation of mRNAs
and processivity limitations of the reverse transcription step often generate products that are not full
length and are 3'-biased, although this intrinsic bias should be uniform across all cRNAs produced. The
main shortcoming of this and other RNA amplification procedures is that they are lengthy and laborious.
When mRNA sample is extremely limited, such as following extraction from laser capture microdissected
(LCM) cells, it is possible to carry out two sequential rounds of this IVT labeling. This achieves an
overall amplification with minimal bias introduction, allowing the use of ten- to one hundredfold fewer
cells than are typically employed.107
An alternative approach for amplification of small quantities of mRNA for microarray expression analysis
combines aRNA amplification with template switching.104,108 In this procedure an oligo(dT) –T7 primer,
containing the T7 promoter sequence at its 5' end, is used to initiate first-strand cDNA synthesis. Second-
strand cDNA synthesis uses Mo-MLV reverse transcriptase, an enzyme with intrinsic terminal transferase
activity and capable of template switching. This enzyme inserts a few nucleotides (mostly
deoxycytosines) to the 3' end of the first-strand cDNA, which serve as a priming site for the template-
switching primer containing a few dG residues at its 3' end. Following a short extension of this primer
with reverse transcriptase to create double-stranded cDNA, treatment with RNase H removes the original
mRNA, and a thermostable DNA polymerase blend of Taq polymerase and a thermostable 3' to >5'
proofreading polymerase is added to complete full-length, double-stranded cDNA. Purification of this
product yields a template for IVT by T7 RNA polymerase to generate aRNA. The advantage of this
method is that it reduces the 3' bias introduced in the method where terminal transferase is used to add
nucleotides to first-strand cDNAs irrespective of their length, and maintains the proportionality of the
original mRNA population throughout the amplification procedure.109 The disadvantage, however, is that
it requires multiple labor-intensive steps.
6.2.3 Other Enzymatic Reactions Associated With Microarray-based SNP Analyses
Fragmentation and Labeling of PCR Products. PCR amplicons are typically fragmented prior to
hybridization using limited DNase I digestion to reduce sister strand reannealing, and to reduce secondary
structure found in longer DNA sequences by fragmenting to an average size of about 100 nucleotides. To
facilitate subsequent 3'-end labeling with terminal transferase, alkaline phosphatase is typically included
during the digestion to remove remaining dNTPs.104 Fragmented PCR products can be then be labeled
directly using terminal transferase to catalyze 3' addition of nucleotides labeled with a fluorophore or
hapten (e.g., biotin).104
©
Number 18 MM12-P
HRP-catalyzed Tyramide Signal Amplification on Microarrays. In this procedure, reverse transcriptase is

used to incorporate biotin-labeled nucleotides that target binding of streptavidin-horse radish peroxidase
(HRP) conjugates to the hybridized tyramide-labeled target. Upon incubation with hydrogen peroxide,
HRP catalyzes oxidation and binding of fluorescent tyramide compounds to the microarray surface. This
indirect method requires substantially less (twenty- to one hundredfold) RNA than direct labeling
methods and under optimal conditions can increase fluorescent signals up to one hundredfold over direct
labeling methods, allowing detection of human transcripts of relatively low abundance. A number of
tyramide fluorescent dye conjugates are commercially available.110,111
A number of enzymatic procedures and protocols have been developed for mutation analysis and
genotyping on microarrays. These assays often employ generic so-called “tag” arrays or “zip-code”
arrays. The specific oligonucleotides deposited or synthesized on the microarray act as tags to
bifunctional primers that contain a complementary tag-recognition sequence. Low-abundance mutation
detection has been achieved using such zip-code arrays coupled with multiplexed PCR/ligase detection
reactions.112 Following amplification of the gene of interest, a thermostable Tth ligase is used to link
adjacent allele-specific oligonucleotides under stringent conditions such that a single base mismatch
precludes ligation. The ligation reactions can be highly multiplexed without interference and are
subsequently hybridized and detected via feature-specific hybridization with zip-code or tag arrays.
The most frequently employed variation on this tag-array theme makes use of multiplex PCR coupled
with subsequent single base primer extension (SBE) or mini-sequencing on oligonucleotide microarrays
for SNP analysis.104,113,114 Multiplexed PCR products are purified to remove primers and unincorporated
dNTPs and annealed to locus-specific SBE primers that end one base 5' of the SNP or mutation of interest
and containing a tag oligonucleotide complementary sequence at their 5' ends. SBE reactions are carried
out with DNA polymerase in the presence of four unique fluorescently labeled ddNTPs, and hybridized to
the tag arrays for visualization. The genotype is determined based on detection of each of the four unique
fluorescent labels. Alternatively, allelic variants can be distinguished by incorporation of either
fluoresceinated or biotinylated ddNTPs, the latter being detected by staining with streptavidin-
phycoerythrin conjugates. Although this method could in principle detect not only SNPs but also
mutations in lower than 50% abundance, its use has to date been largely restricted to SNP analysis of
individual samples.
An alternative SNP/mutation detection approach that does not use tag arrays makes use of gene/allele-
specific oligonucleotide arrays that serve to anchor solid-phase, allele-specific amplification and
detection.115 In a single reaction unbound primers serve to amplify the gene region of interest while
bound allele-specific primers interrogate the SNP or mutation sites of interest. The products of the solid-
phase, allele-specific amplification reactions are detected by incorporation of fluorescently labeled
dNTPs. Like the aforementioned tag array methods, this procedure is capable of detecting mutations in
low abundance; however, it has only been demonstrated as low as 10 to 20% of total template.
6.3 Hybridization
6.3.1 Principle
DNA microarrays exploit a powerful feature of the DNA double helix: the sequence complementarity of
the two strands. It is noteworthy that a molecule of such a complex structure can reassemble with precise
fidelity from the separated strands. Early studies of duplex melting and reannealing carried out on DNA
solutions provided helpful information such as the dependence of the annealing temperature (Tm) on G+C
content of the sequence, on salt concentration, and on the sequence complexity.116 During hybridization,
base pairing of complementary strands of nucleic acid by hydrogen bond formation occurs. The specific
binding through hybridization of a known DNA sequence (probe) to a specific nucleic acid sequence
(target) is the basic concept of addressing gene expression in a biological sample, and it is based on the
mechanism of duplex formation. This phenomenon is reversible. All base pairs of the duplex do not
24 ©
Volume 25 MM12-P
form simultaneously: the process begins by the formation of a transient nucleation complex from the
interaction of very few base pairs.117
Many microarray hybridizations differ from classical Southern and northern blot analyses of DNA and
RNA used in diagnosis of some genetic diseases (MM1-A) in that they are reverse hybridization
procedures. That is, microarray analysis hybridization entails labeling DNA or RNA target sequences that
are incubated and hybridized with immobilized probes (e.g., oligonculeotides, cDNA) on the microarray.
Because three hydrogen bonds form between G-C base pairs but only two between A-T(U) base pairs,
complementary strands “melt” or dissociate in a sequence and length-dependent manner. The melting
temperature (Tm) is defined as the temperature where half of the complementary strands are dissociated
and can be very roughly approximated by the formula Tm = 4 oC(# of GC bp) + 2 oC(# of AT bp) for
oligonucleotide sequences in a standard solution of ionic strength of 1 M NaCl.118 Several additional
algorithmic refinements that account for base stacking effects within specific sequence contexts have been
developed, significantly improving the ability to predict association/dissociation properties of any given
oligonucleotide duplex. Nevertheless, even the best programs available today do not predict Tm of DNA-
DNA or DNA-RNA hybrids with 100% accuracy. While Tm estimator programs can be quite useful for
the design of oligonucleotide microarrays and hybridization conditions that achieve optimal
discrimination of the targets being queried, the rules do not apply for long, complementary, nucleic acid
sequences such as cDNAs. Indeed, Tm considerations are significantly more important for hybridization
optimization and the design of oligonucleotide microarrays intended for discrimination of sequence
variation (genotyping or resequencing). Here the destabilizing effects of single base mismatches under a
given set of stringent hybridization conditions allow ready discrimination between perfect match vs.
mispaired duplexes.
Duplex formation proceeds, one base pair at a time, like a zippering process. Any time during the duplex
formation, the reaction can go in one of two directions, pairing or separation. If bases are complementary
and freely available for pairing, duplex formation is more likely to proceed; if bases are non-
complementary or a stable structure inhibits base pair formation, the block to the zippering process may
drive the nucleation complex to destabilize. Duplex formation, and therefore duplex yield, will be
determined by the stability of the nucleation complex and of intermediates up to the point in the zippering
process where the possibility of strand separation is insignificant.119 Hybridization can be performed with
both nucleic acid target and probe in solution, or with either one bound to a solid support such as a
membrane, glass, or permeation layer. For all these applications, the hybridization of the probe to target
nucleic acids should be as free as possible from interference from the solid support.
6.3.2 Nonspecific Binding of Nucleic Acids to the Solid Support
Binding of nucleic acids to the solid supports can occur through interactions other than hydrogen bonding,
such as electrostatic and other types of interactions. To prevent nonspecific hybridization to the support, a
prehybridization step is often performed to block the surface with macromolecules such as an unrelated
nucleic acid (i.e., yeast tRNA, poly(A) RNA, herring sperm DNA, DNA sequences of the multiple
cloning site of plasmid vectors, etc.); Denhardt's solution or BSA, etc.120 Sophisticated blocking reagents
are also available from a number of commercial vendors, providing extremely low, nonspecific binding
and high signal-to-noise values.
6.3.3 Conditions that Affect Hybridization
Many factors are known to influence hybridization and melting temperatures that collectively are
described as hybridization stringency. For some microarray procedures a prehybridization step is included
to decrease nonspecific binding of labeled targets. Hybridization solutions typically contain buffered salt
solutions, sheared salmon sperm DNA, and Denhardt’s solution as a blocking agent, to promote
hybridization. For a given set of salt conditions, increasing temperature has the effect of enhancing the
destabilization of mispaired duplexes, therefore increasing the specificity of the hybridization and
©
Number 18 MM12-P
reducing cross-hybridization of imperfectly matched complements. Reduced ionic strength has the same
effect as increased temperature, while increased salt stabilizes base pairing. For nucleic acid
hybridizations involving longer molecules (e.g., cDNAs), denaturants such as formamide are typically
used to reduce the Tms of duplexes and to increase stringency to minimize cross-hybridization. For large-
scale SNP analysis zwitterions like tetramethylammonium chloride (TMACl) or betaine have been used,
because they have the affect of equalizing GC and AT base pairs, therefore allowing Tm predictions to be
based almost solely on oligonucleotide length, rather than base composition and sequence.121
The concentration of the target being hybridized to immobilized probes is crucial, because it influences
hybridization kinetics and therefore the absolute signal obtainable under specified hybridization
conditions. Target sequences should be in the greatest concentration possible to assure rapid psuedo-first-
order kinetics for annealing and sufficient signal intensity from any given feature or probe element, as
long as the target concentration does not exceed the probe concentration. If the target concentration is too
high it can lead to loss of dynamic range and decreased signal-to-noise ratios (due to increased
background), and a loss of quantitation by saturation of the elements or spots of the microarray.
Additionally, the complexity of the target material being hybridized greatly influences the time required
to reach equilibrium. Therefore, for relatively concentrated targets of low complexity (e.g., isolated PCR
products) hybridization equilibrium can be established in a matter of minutes (e.g., 30 minutes), while
total RNA or even mRNA hybridization reactions require several hours (e.g., 2 to 12 hours) to reach
equilibrium with their probe complements. DNA or RNA targets used with oligonucleotide microarrays
are sometimes subjected to limited fragmentation to increase the rate of hybridization equilibration, and to
reduce reannealing of double-stranded targets, or self-annealing of palindromic regions or nearby regions
with complementarity within single-stranded targets. Shearing the target mixtures also reduces secondary
structure formation during hybridization, which is a problem particularly for the RNA molecules.
6.3.3.1 Effects of the Hybridization Solution
Numerous solutions have been used to ensure specific duplex formation between the probe and target
molecules. These solutions differ with respect to the solvents and temperatures used. In general,
formamide-based hybridization at 42 °C works better than aqueous solutions at 65 °C, as it favors a
higher signal-to-noise ratio when using microarrays of fairly short (e.g., 25-mer) oligonucleotides. For
longer oligoneucleotides and cDNA microarray applications, 65 °C works very well, because it allows
extremely rapid hybridization kinetics and melts secondary structure. The kinetics of hybridization in
formamide, however, are slower than in aqueous solution,122 making aqueous solutions the best
candidates for microarray analysis as long as background can be minimized. Hybridization solutions can
be supplemented with dextran sulfate or polyethylene glycol to allow better efficiency of duplex
formation for a target with low copy number.
Detailed protocols for hybridization of various types of microarrays are available at several websites and
in a recently published DNA microarray manual.104 There has been no standardization of mRNA
hybridization buffers or conditions, in part because different glass coatings and treatments (e.g., poly-
lysine, aminosilane) can benefit from particular blocking components. Atypical hybridization buffer
contains 2 to 5x SSC or SSPE, 0.1 to 0.25% SDS, 1% BSA or 5x Denhardt’s solution, 100 µg/mL salmon
sperm DNA and 50% formamide; hybridizations are typically carried out at 42 to 65 oC for 12 to 20
hours. In contrast, oligonucleotide arrays designed to perform genotyping or resequencing generally
require some degree of empirical optimization of hybridization buffer concentrations and temperature
stringency to achieve the best possible discrimination. Although many of the buffer components are
identical or similar to those previously described, arrays and microarrays designed to genotype at large
numbers of loci benefit dramatically from the use of tetramethylammonium chloride (TMACl) to
normalize the maximal signal obtained with target-probe hybrids that have widely differing GC content
bind.90,104 This is not, however, a requirement, since other genotyping or resequencing microarray
hybridizations achieve excellent discrimination using rather conventional SSPE buffers with a nonionic
detergent and acetylated BSA.86,88,104
26 ©
Volume 25 MM12-P
6.3.3.2 Effects of Solid Support
Microarray probes can consist of presynthesized oligonucleotides, PCR products, or cDNA sequences that
can be arrayed by mechanical deposition. Deposition is the method of choice for long sequences, which
can be available as complete cDNA molecules. The technology for making spotted microarrays is
generally a more accessible approach for “in-house” manufacturing compared to in situ fabrication.123 In
situ synthesis shows a major advantage over deposition of presynthesized oligonucleotides, as
oligonucleotides can be built from a simple collection of four chemical building blocks. In situ synthesis
of oligonucleotide probes requires a chemical treatment of the support, since oligonucleotides cannot be
coupled directly to the atoms that comprise silicate glass and plastics. It is necessary to treat the surface
with a chemical group from which to initiate the growth of the oligonucleotide chain, i.e., oligoethylene
glycols have been widely used for glass.11
Binding one end of an oligonucleotide to a surface improves the formation of duplex with a soluble target.
The bases nearest the solid support are less accessible than those furthest away. The packing of
oligonucleotides made by synthesis in situ or deposition on a glass or polypropylene surface can be made
so high that there is a steric blocking effect.119 This should be avoided to maximize hybridization signals.
Apparently, the ammonia used to deprotect the bases seems to dissolve enough from the surface of glass
to abrogate steric impediment but leaves enough to yield an optimal hybridization signal.124 Hybridization
yields are increased up to two orders of magnitude by introducing spacers between the surface and the
oligonucleotides for short oligonucleotide applications. There is an optimum spacer length beyond which
hybridization yield declines, presumably because the oligonucleotides sufficiently removed from the
surface to allow efficient hybridization.119
6.3.3.3 Effects of Probe Sequence and Composition
Base composition has a great effect on duplex yield in the aqueous solutions normally used for
hybridization. The effect is undeniably due to lower stability of A:T compared with G:C pairs. Short
oligonucleotides can have extreme biases in composition, and oligonucleotides of the same length have
correspondingly large differences in Tm. Roughly, adding an A:T base pair increases Tm by 2 ºC,
compared with 4 ºC for a G:C pair. Arrays offer opportunities to examine large numbers of sequence
interactions simultaneously. It has been reported that high concentrations of TMACl had a large effect on
hybridization yields. In addition, it has also been proven that terminal G:C pairs produce significantly
increased duplex yields.125 It has been demonstrated that sequences of the same composition but different
sequence give different duplex yields. Sequence effects are predictable, as base stacking interactions that
depend on nearest neighbors, significantly affecting duplex stability. Moreover, it has been shown that
unpaired bases, which stack on the end of the duplex when the target overlaps the probe, might have a
strong effect on duplex yield.126
Many modern microarray analyses are directed to complex targets, such as human genomic DNA and
messenger RNA. In general, it is advisable to reduce sequence complexity when possible to produce good
hybridization signals in a reasonable hybridization time. Consequently, amplification by PCR is a
standard part of target preparation when studying genomic DNA. In gene expression arrays, a preferential
procedure for producing single-stranded targets is to include a promoter for a RNA polymerase in an
oligo(dT) primer, from which RNA is transcribed.127 However, the RNA molecule has a tendency to form
a stable secondary structure, which may interfere with hybridization, as has been seen. Steps must be
taken to reduce these effects, such as fragmenting the RNA, preferably to a size close to that of the
oligonucleotides on the microarray.99
6.3.4 Membrane Hybridization
Southern blotting has been one of the gold standard techniques for nucleic acid hybridization.128 Its
potential expanded to blotting entire DNA libraries that can be arrayed on membranes as targets for
hybridization with labeled probes to identify genomic or cDNA clones. Note that in the original
©
Number 18 MM12-P
applications of blotting, the probe is the labeled sample in solution and the target is attached to the glass.
In recent usage, some researchers have reversed this nomenclature convention. The types of membranes
of choice for microarrays are nitrocellulose and charged nylon attached to planar supports such as glass.
Individual clones are applied as spots by robotic gridders, which can accurately dispense nanoliter and
picoliter scale-drops of DNA at high throughput. Traditional blots and arrays were usually hybridized
with 32P or 33P labeled cDNA probes transcribed from poly(A) RNA. The preferred label is 33P, as it
produces a more defined spot image than 32P when analyzed on a phosphorimager. Unincorporated label
is removed from the probes by gel filtration and the products can be examined by gel electrophoresis (6%
TBE–urea gel). Microarrays typically used fluorescent labels to prevent the signal dispersion seen with
the radioactive labels used for arrays.
Prior to hybridization, the probe is quenched with PolyA and COT-1 genomic DNA129 to prevent
reporting of nonspecific sequence hybridization signals. The phosphorimager-resulting images are
analyzed using software that quantifies the signal of each spot corresponding to an individual clone, the
intensity being proportional to the amount of mRNA present in each sample. As a result of the power of
this technology, nylon membrane-based cDNA arrays are now an accepted tool for obtaining high
throughput information on gene expression, though the trend in the field is clearly in the direction of
miniaturized microarray assays. Nylon membrane arrays are appreciated as a relatively economical
alternative to other gene expression technologies and can be assembled rapidly. Usually, the DNA probe
is cross-linked to the membrane matrix by ultraviolet irradiation. Recent approaches use the nylon
nitrocellulose and other membranes for arrays in a microarray format by attaching these coatings to a
glass support.71,130
As a final step, the deposited DNA probe is rendered single-stranded by applying heat or alkali.11 The
bound DNA probe remains as partially intrastrand, cross-linked to the membrane, having multiple
confined contacts with the matrix along its length, probably induced by drying the DNA onto the
membrane. It is therefore probably not the best hybridization probe. That intrastrand cross-linkage is
proportional to the length of the probe; therefore, one could predict that oligonucleotides will have single
points of constraint at each chain end, rendering them more accessible for hybridization than long cDNA
sequences. It would be convenient to have a format in which the accessibility of a simply bound, single-
stranded probe could be coupled with the specificity of a long cDNA probe for better specificity and
duplex yield.131
6.3.5 Glass Slide and Oligonucleotide Microarrays
Glass was first used for hybridization analyses in the mid-1980s. The primary advantages of glass-based
microarrays are that the substrate is rigid and planar, has a smooth, regular surface, and allows facile
fluorescent detection using a confocal microscope (laser scanner) or charge coupled device (CCD)
camera. Hybridization reactions with glass microarrays are carried out in buffers similar to those used for
membrane-based microarrays, although different blocking agents may be required during
prehybridization and hybridization, depending on the specific coatings or treatments and type of glass
substrate used. Of particular importance for hybridizing glass microarrays when fluorescence detection is
used is the need to keep the glass surface scrupulously clean and free of dust or contaminants that might
cause signal anomalies. Glass-based microarrays are usually prepared on microscope slides, which have
low inherent fluorescence. The slides are coated with polylysine, amino silanes, or amino-reactive
silanes,73 which enhance both the hydrophobicity of the slide and the adherence of the deposited DNA
target. They also limit the spread of the spotted DNA nanodrop on the slide. After fixation, residual
amines on the slide surface can be reacted with succinic anhydride to reduce the positive charge at the
surface, reducing nonspecific binding.
Glass supports have also been used for in situ synthesis of oligonucleotide probes. These microarrays
were introduced by the development and integration of two key technologies. The first is the fabrication
of hundreds of thousands of polynucleotides at high spatial resolution in precise locations on a surface.
28 ©
Volume 25 MM12-P
The second, laser confocal fluorescence scanning, facilitates the measurement of the multiple
hybridizations on the microarray.99 This combinatorial strategy for the fabrication of large arrays of
oligonucleotides in few coupling steps includes the photochemical deprotection method, which leads to a
localized synthesis by confining chemicals physically using masks. This method enables manufacture of
“random access” arrays; that is, the oligonucleotide in any position can have any chosen sequence. The
targets for high-density oligonucleotide microarrays are labeled using reverse transcription from an oligo-
dT primer. This has the asset of producing a labeled product from the 3' end of the gene, directly
complementary to the immobilized probe. Regularly, total RNA (rather than mRNA) is labeled, to
maximize the amount of message that can be obtained from a given amount of tissue and reduce the
experimental step of mRNA purification. The purity and integrity of RNA is a critical factor in
hybridization performance, particularly when using fluorescence, as cellular protein, lipid, and
carbohydrate can mediate significant nonspecific binding of fluorescently labeled cDNAs to glass
surfaces. RNA integrity also bears directly on labeling efficiency and signal strengths. Scrupulous
technique should be used at all times to minimize ribonuclease contamination of the RNA.
The design of the oligonucleotide probe sequence is crucial to ensure highly efficient hybridization to the
target. Using as little as 200 to 300 bases of the target gene, cDNA or expressed sequence tag (EST),
independent 25-mer oligonucleotides are selected (mainly non-overlapping, or minimally overlapping) to
serve as unique sequence-specific detectors. Probe design is based upon complementarity to the selected
gene, and an absence of complementarity to other highly abundant RNA sequences (i.e., rRNAs, tRNAs,
alu-like sequences, actin mRNA). In addition, specific hybridization can be ensured by the use of
mismatch (MM) control probes that are identical to their perfect match (PM) counterparts except for a
single base difference in a central position. The MM probes act as specificity controls that allow the direct
subtraction of both background and cross-hybridization signals, and allow discrimination between 100%
complementary binding between the probe and the target, and nonspecific or semispecific hybridization
of the intended RNA molecules. In the presence of even low concentrations of RNA, hybridization to the
PM/MM probe pairs produces quantitative fluorescent patterns. The strength of these patterns directly
relates to the concentration and is used by the software algorithms132 to calculate signal intensity values
and partial concentrations of each RNA in the hybridized mixture.
6.3.6 Electrical Field Hybridization
DNA microarrays can be fabricated onto impermeable, rigid substrate, such as glass, or over porous
membranes and gel pads. For the former arrays and microarrays, DNA hybridizations are carried out
under conditions where the reaction rates and stringency conditions are controlled by target concentration,
temperature, and/or salt concentration of the solutions and washes. These correspond to classical passive
hybridization conditions. On the other hand, active microelectronic array and microarray devices are able
to utilize electric fields to enhance the transport and selective addressing of DNA probes to any position
on the surface, the acceleration of the classic hybridization process. These devices, which provide
electronic hybridization, allow performing DNA microarray analysis for several different applications,
including DNA genotyping.
Active microelectronic planar microarrays have been designed to hold up to 10 000 microlocations or test
sites. The more actively used chip contains 100 test sites, with every active test site area being about 2
mm2 in size. Selective electronic addressing of DNA probes and electronic hybridization reaction on the
active microelectronic microarrays is carried out by application of a positive direct current (DC) bias to
the individual microelectrodes beneath the selected test sites while maintaining other selected negatively
charged counter-microelectrodes. The application of the proper current and voltage levels produces
electrophoretic fields, which transport and concentrate the negatively charged nucleic acid molecules to
the selected positively biased test sites on the microarray. Therefore, known nucleic acid sequences
(oligonucleotides, DNA, RNA, PCR amplicons, polynucleotides, etc.) can be selectively and rapidly
immobilized and bound (covalently or noncovalently) within the permeation layer overlaying the
activated platinum microelectrode. Usually, the permeation layer is a 1 to 10 µm thick coating of a
©
Number 18 MM12-P
hydrogel (agarose or polyacrylamide), and it serves as a matrix for the attachment of DNA probes and
target DNA sequences by impregnating the permeation layer with affinity binding substances such as
avidin or streptavidin for subsequent attachment of biotinylated probes.
It is essential to have a special buffer environment to facilitate both the rapid electrophoretic transport of
the DNA probes and its subsequent hybridization. Histidine has been found to be particularly effective for
electronic hybridization; in particular, its zwitterionic form at near neutral pH has a very low
conductivity, allowing rapid transport of DNA molecules at low field strengths, and still providing a
relatively good buffering capacity.133 The ability to hybridize DNA at the test sites is due to the buffering
effect of the histidine near the microelectrode surface. When the microarray is activated, the positively
charged microelectrode produces hydrogen ions (H+). The histidine molecules in this local environment
around the microelectrode, which buffer the acid, become protonated and gain a net positive charge. It is
this cationic histidine that stabilizes the hybridization of DNA probe in the test site.134 Overall, this
method offers a rapid concentration of DNA molecules at the microscopic test sites leading to a
significant reduction in immobilization and hybridization reaction times when compared to passive
hybridization techniques (seconds rather than hours). Reversal of the electric field potential (negative
bias) at the test site will now cause the rapid removal of unhybridized DNA molecules. When the electric
field is adjusted to the appropriate levels, it can be used to affect the selective dehybridization of the DNA
sequences from the attached complementary probe. This parameter is called “electronic stringency,” and
it provides a powerful and rapid method for single-base match/mismatch discrimination analysis of target
DNA sequences. These active microelectronic DNA microarrays have been used to carry out various
genotyping applications such as SNP analyses.135
6.3.7 Long Oligonucleotide Arrays
6.3.7.1 Introduction
DNA microarrays using long oligonucleotides (typically 60 to 70 mers) contain probes of intermediate
length between 25 mers and cDNAs. An important feature of long probes is that they can be designed to
discriminate very well between desired complementary targets and other competing sequences in the
genome or transcriptome, yielding high hybridization specificity.136 One study done comparing
hybridization of RNA targets to in situ synthesized oligonucleotides of lengths 20 to 60 bases found that
60 mers gave the best compromise between maximum sensitivity and specificity.137 Another study, which
used specially prepared representations of genomic DNA for array CGH, found that 70 mers were optimal
to maximize the signal-to-noise ratio.138
The high specificity of long oligonucleotide probes reduces cross-hybridization to untargeted sequences;
therefore, these probes can have very low, nonspecific backgrounds. This contributes to the low detection
limits (e.g., 1 transcript in 1 000 000) that have been reported for long oligonucleotide arrays.137,139 At the
same time, expression of a single gene can be reproducibly monitored using a single 60 mer probe.137
Long oligonucleotide arrays are widely used in gene expression studies, including studies leading to
clinically significant gene signatures. One study of clinical relevance that employed long oligonucleotide
arrays discovered a particular gene expression profile that could be used to predict the clinical outcome of
breast cancer.140, 141
Long oligonucleotide arrays have also been employed for high-resolution array CGH analysis. Arrays
with 60 mer oligonucleotide arrays designed for CGH have demonstrated consistent single copy loss
detection from single probes, using total genomic DNA samples142,143 and Phi29 amplified DNA.143
Phi29 amplification has been demonstrated to yield high-fidelity, total genomic DNA products with good
representation across the genome, allowing the use of nanogram quantities of genomic DNA for these
types of assays.144, 145
30 ©
Volume 25 MM12-P
6.3.7.2 In situ Synthesis of Long Oligonucleotide Arrays
Ink-jet technology has proven an effective method for long oligonucleotide array manufacturing. Given
that step-wise yield of coupling at each nucleotide addition will determine the overall yield of full length
product in each feature on the array, in situ synthesis methods with high step yields are especially
important for creating in situ synthesized, long oligonucleotide arrays.
In situ synthesis with ink-jet technology creates oligonucleotide microarrays by depositing A, T, C, or G

phosphoramidites onto the surface of a specially prepared glass slide at any desired location, much in the
same way ink-jet printers deliver different colors of ink in different locations to create a desired picture.
Each individual droplet will react with a growing probe, extending it by one nucleotide. After the
deposition of all of the phosphoramidites for a given layer, a flow cell process is used to simultaneously
and uniformly complete the DNA synthesis cycle. This leaves the 5'-hydroxyls free to undergo further
extension. Layer by layer, any probe sequence can be synthesized at any location on the microarray,
enabling the production of custom microarrays. The ability of the ink-jet to reproducibly and accurately
perform contact-free delivery of appropriately sized droplets enables flexible, easily customizable designs
and the accurate placement of 44 000 or more features on a single 1 x 3-inch slide. Building on the high
efficiency of the standard phosphoramidite chemistry, oligonucleotide probes of 60 bases in length, and
even considerably longer, can be readily synthesized.137
6.3.7.3 Sample Prep and Labeling
The methodologies for nucleic acid sample preparation and quantification are very similar to those used
for other array types, as described in Sections 6 and 7.5. Labeling of RNAs for gene expression analysis
is also similar to other array types, and is described in Section 6.2.2. Direct incorporation of labeled
nucleotides during reverse transcription is commonly done using either poly(A)+ RNA or total RNA.
Efficient and cost-effective T7 polymerase-based amplification protocols have also been developed,
which can be used when the initial RNA sample is limiting.140,146
6.3.7.4 Hybridization and Wash
6.3.7.4.1 Probe Design
When designing oligonucleotide probes, the primary consideration is to maximize specificity by selecting
probes with no close homologs in the transcriptome or genome of interest. In general, the longer the
probe sequences, the easier it is to find such probes. Probes should also be chosen to minimize internal
structure. When possible, it is best to equalize the hybrid Tm or GC content of the probes on the array,
which makes it easier to select hybridization conditions simultaneously appropriate for all probes on the
array. Depending on the target preparation protocol, other considerations may arise in probe design. For
example, if cDNA targets are copied from mRNA using MMLV (Moloney Murine leukemia virus) with
poly-T primers (see Section 6.2.1.5), the resulting copies favor the 3′ end of the target message, and
probes to that region of the message are more likely to be representative. Oligonucleotide probe design
strategies have been summarized in recent publications.147, 148
6.3.7.4.2 Hybridization Protocols
The important considerations for hybridization are stringency and kinetics. The hybridization must be
carried out at a temperature at which complementary hybrids are stable, but competing cross-hybridized
targets are melted off. Hybridization stringency can be adjusted by changing the temperature, changing
the salt concentration of the hybridization buffer, or adding organic modifiers such as formamide or
urea.149
©
Number 18 MM12-P
Hybridization kinetics are affected by temperature, target length, and probe concentration. Studies of the
kinetics of hybridization to surface-bound oligonucleotide probes show rates of hybridization comparable
to rates observed in solution.150, 151
Assuming a typical oligonucleotide surface probe density of 20 000 probes/µm2, hybridization of

intermediate-length targets (100 nt) to a single array feature of 104 µm2, in a hybridization volume of 500
µL, proceeds with a half time of about 100 hours.152 Smaller hybridization volumes, larger numbers of
surface probes, and shorter target lengths all serve to speed up the hybridization. Depending on
conditions, the time required for hybridization to reach equilibrium can range from tens of minutes (for
multiple, replicated, large features in small volumes) to hundreds of hours (for single, small features in
large volumes).
Hybridization of amplified cRNA targets is often preceded by fragmentation to an average size of 50 to

100 nucleotides by heating at 60C in 10 mM ZnCl2.140 Hybridization has been reported in 1 M NaCl,
0.5% sodium sarcosine, 50 mM MES, pH 6.5, and 30% formamide at 40 °C for 16 to 24 hours,137,140 or in
an aqueous buffer containing LiCl and lithium lauryl sulfate at 60 °C for 16 hours.146 153 It is important
that hybridization chambers be well mixed during hybridization.
6.3.7.4.3 Wash Stringency and Wash Protocols
Proper wash stringency is as important as proper hybridization stringency, but for different reasons.
Wash stringency must be sufficient to remove weakly homologous but highly abundant labeled targets
nonspecifically bound to the probe sites.149 In general, wash stringency is more critical for arrays with
long or very long probes than it is for arrays with short probes. Longer probes offer more opportunities
for unwanted targets containing short (i.e., 6 to 12 nt) complementary sequences to bind. Although these
partial duplexes are melted off under typical hybridization conditions, they are so numerous that they
maintain a steady-state level of duplexes during the hybridization, and they can also rapidly form as the
array cools during disassembly. The washes must be sufficiently stringent to remove such opportunistic
mismatches. A good rule of thumb is that the wash conditions should be approximately as stringent as the
hybridization conditions.
Wash stringency depends on salt concentration, temperature, and wash duration. Preferably, wash
conditions will be chosen to be nearly stable, so that wash duration is not critically important. Usually the
last wash is the most stringent – initial washes serve primarily to flush away unbound labeled targets. The
final wash solution should be relatively low salt, to prevent the formation of residual salt crystals when
the array is dried, which can interfere with scanning. Adding small amounts of detergent to the final wash
buffer also helps promote uniform drying.
A typical wash protocol for 60 mer oligonucleotide arrays involves a first wash in 6x SSPE, 0.005% N-
lauroylsarcosine, at room temperature, followed by a second, more stringent wash in 0.06x SSPE, 0.005%
N-lauroylsarcosine at elevated temperature.137,146 Washes can be done in glass histology slide staining
dishes.
6.3.7.5 Experimental Design and Data Analysis
Either two-color assays (sample/reference, as described in Sections 8.3.2.1, 8.3.3.1) or one-color assays
(for which the reference sample is hybridized to a separate array, as described in Section 8.3.2.2) can be
performed, using either premade or custom long oligonucleotide microarrays.154 Two-color assays
generally give more reproducible results, but are more expensive to perform, and are subject to a degree
of systematic dye bias. Some workers use dye-swapped pairs of arrays to eliminate dye bias (see Section
9.1.2.2). The experimental designs and data analysis methods (Sections 8.3 and 8.4) for both two-color
experiments and one-color experiments are applicable to long oligonucleotide arrays, regardless of
whether they are manufactured commercially or by a home-brew process.
32 ©
Volume 25 MM12-P
To avoid compression or distortion of log ratios, especially of weakly expressed probes, it is important
that backgrounds are correctly estimated and subtracted. A common technique for background estimation
is to use the average of a number of “negative control” probes, included on the array. Negative control
probes are probes that have been designed and verified not to hybridize to any targets in the transcriptome
or genome of interest, because of either internal structure or low homology. Alternatively, weakly
expressed probes can be used to estimate background.155 Because the surface energy of feature spots is
very different from that of glass, backgrounds computed from interfeature glass regions are a poor
estimator of signal background for arrays printed on glass slides.
6.4 Posthybridization
In contrast to the solutions used for hybridization protocols of microarrays, the posthybridization washing
processes are typically simple NaCl-based buffers with the addition of low concentrations of a detergent
(such as sodium dodecyl sulfate [SDS]) and a buffering agent. Specific formulations of posthybridization
wash solutions will vary depending on the stringency requirements of the target/probe hybrids.
Optimization of the wash conditions is empirically determined but depends on the time, temperature, and
the ionic strength of the wash buffers. However, because the stringency of the hybridization is typically
one that generates only a low level of background or nonspecific binding of product to probe, wash
protocols usually require only low stringency and are performed at room temperature.11
Commercial microarrays are typically packaged in sealed cassettes or as glass chips without a cassette.
The cassette devices are designed with fluidic ports to handle washes with prescribed solutions and
volumes. In most cases, the wash buffers are NaCl-based solutions. One manufacturer, however, uses
dilute histidine-based solutions to affect the pH control needed to maintain the specificity of single
nucleotide discrimination of their oligonucleotide probes in the context of electronic microarrays.
Laboratory-developed (“home-brew”) microarray protocols more likely use one or more wash buffers
with a composition similar to those used for Southern transfer analysis. A common buffer is 0.5x SSPE
and 0.005% Triton X-100,156 or alternatively 1x SSC and 0.1% SDS.157 Either of these buffers offers a
low stringency wash when performed at room temperature.
A third choice for a wash solution has been used in assays that employ the ligation detection reaction to
create DNA products modified with oligonucleotide tails called “zip-codes” that hybridize to perfect
complements on the surface of the microarray. In this case, the perfect bases for base match require only
a gentle wash to remove fluorescent DNA signal from areas of the microarray not affixed with probe and
to clean up background noise. This wash solution uses 30 mM bicine at pH 8.0, 10 mM MgCl2, and 0.1%
SDS for ten minutes at room temperature.158
The table below lists one recommended procedure applicable for each of the above described wash
buffers:
Table 1. Posthybridization Wash Protocols for Nucleic Acid-based Microarrays

(1) Immediately following hybridization, remove the microarray(s) and place in a beaker with 400 mL of
1x SSC and 0.1% SDS.
(2) Wash with gentle agitation of the solution for 5 minutes at room temperature (RT).
(3) Transfer microarrays to a second beaker containing a fresh volume of 400 mL of 1x SSC and 0.1%
SDS at RT.
(4) Repeat the wash above a third time in fresh buffer at RT.
(5) Remove the microarrays and allow to air dry.
(6) Scan microarrays for fluorescence emission.
©
Number 18 MM12-P
6.5 Signal Generation and Detection
6.5.1 Slide Readers
The slide reader illuminates the regions containing the hybridized, labeled target oligonucleotides (or
cDNA) and detects the resulting fluorescence from the fluorochrome. It is assumed that the resulting
intensity of the fluorescence is proportional to the number of fluorochrome molecules in the illuminated
region. Two techniques are used to generate the image of the distribution of fluorochromes on the
microarray slide: 1) a scanning procedure in which a focused laser beam sequentially illuminates different
regions of the slide; and 2) a wide-field illumination of a large portion of the slide and imaging of the
emitted light on a CCD detector.
6.5.1.1 Scanning Readers
A microarray scanner utilizes a laser to illuminate a small region of a glass substrate (~100 µm2). A laser
is used to provide the excitation light in order to get very high illumination intensities. The laser light is
usually reflected off a dichroic mirror. From there, the laser hits two mirrors that are mounted on motors;
these mirrors scan the laser across the sample. Fluorochromes in the sample fluoresce, and the emitted
light is reflected using the same mirrors that are used to direct the laser excitation light. The emitted light
passes through the dichroic mirror and is focused onto the detector input, such as a photomultiplier tube
(PMT) with a pinhole. The signal from the detector is sampled, digitized, and stored in a computer that
builds up the image, one pixel at a time. The scanning of the laser beam may be supplemented by
additional mechanical translation of the slide itself.
The quality of the image is determined by four principal factors159: spatial resolution, dynamic range,
signal-to-noise ratio (S/N), and temporal resolution. Spatial resolution is mainly determined by the beam
handling optics, and to some degree by the scan rate. The dynamic range of the PMT can be higher than
105. The S/N is determined by the intensity of the illumination and the electronic noise. The temporal
resolution is determined by the sampling rate of the electronics.
6.5.1.2 Wide-field Imaging Readers
A large area (several square millimeters up to 1 cm2) of the microarray is illuminated uniformly, and the
emitted light from the entire region is collected and imaged on a CCD detector. The CCD detector
consists of an array (e.g., 512 x 512) of light-sensitive elements, each of which responds independently to
incident photons. The illuminating and emitted light pass through lenses and filters in a manner similar to
that in the scanning microscope, though no scanning of the illuminating light is used in “area detectors.”
The slide is moved and different regions are illuminated sequentially. With sensitive CCD detectors, the
collection time for each illuminated region can be a fraction of a second up to 30 seconds. As the slide is
translated to the next region, the CCD image is downloaded to a computer that assembles the image of the
entire slide computationally. Although the intensity of illumination is less in wide-field imaging than in
scanning imaging, the time spent accumulating the signal from each pixel can be much larger in wide-
field imaging, as opposed to scanning, since many pixels are accumulated simultaneously for a long
period. The ability to process many pixels simultaneously is an advantage of the CCD imaging technique.
The light source in wide-field imaging instruments is usually a lamp with a filter to select the appropriate
wavelength range for efficient excitation of the fluorochrome.
The quality of the CCD image depends on the quality of the optics and on the quality of the CCD. The
quantum efficiency of CCD pixels can be made close to 90% in the visible range. The pixels comprising
the CCD detector integrate the photon-induced charge during the entire exposure time. The pixels also
accumulate “dark” charge that needs to be measured independently and subtracted. The “dark” charge can
be reduced by cooling the CCD chip to well below 0 °C. The CCD reading process generally introduces
34 ©
Volume 25 MM12-P
more noise than scanning and PMT systems, but both scanners and imagers work very well for
microarray applications.
6.5.2 Signal Calibration and Validation
Photodegradation is a problem with some fluorochromes. The excited states of fluorochromes are
inherently unstable and are subject to chemical reactions with oxygen and other molecules if excessive
light is used for illumination. Some of these reactions change the structure of the fluorochrome and
destroy the fluorescence. Furthermore, the emission properties of fluorochromes may change in the
presence of interfaces and ligands. Therefore, the calibration of the fluorescence intensity signal is best
accomplished using a “calibration” slide with fluorochrome environment similar to that in the actual assay
slide. Other “reference” slides, with photostable fluorochromes, may be used to validate the operation of
the instrument (for example, characterize the uniformity of response over the entire slide, measure the
dynamic range, and establish the detection threshold). Most of the current dyes used in microarray
analysis are much more stable than the dyes used in the early stages of the field.111
6.6 Laboratory-developed (“Home-Brew”) Microarray Assay
6.6.1 Introduction
Recent publications indicate that laboratory-developed (“home-brew”) based DNA microarray based
assays can be as reliable, precise, and clinically useful as many commercially prepared devices. The term
laboratory-developed (“home-brew”) refers most significantly to the part of the overall protocol
describing the preparation of the microarray devices. Other components to any laboratory-developed
based assays refer to operational steps with features common to many other molecular diagnostic
protocols. As such, the methods leading to an optimized and verified set of assays are similar to other
methods, but more complex because of the added commitments to assess the quality measures associated
with microarray slide production. The following table outlines the approaches to the optimization and
verification of a laboratory-developed based assay.
Table 2. Approaches for Optimization and Verification of a Laboratory-developed (“Home-Brew”)

Based Assay
Preparation of Laboratory-Developed Components of the Laboratory-Developed Assay

Components
• Design of microarray • Sample preparation and nucleic acid extraction
• Microarray production • Gene chemistry: target amplification
• Optimization of nucleic acid chemistry • Hybridization to the microarray
• Optimization of hybridization and wash • Data gathering and analysis by fluorescence
conditions detection
6.6.2 Sample Preparation
The preparation of the DNA sample for use in a laboratory-developed microarray assay is similar to that
for any other common molecular genetic assay. Please refer to Section 6.2 for details regarding
preparation of genomic DNA for PCR or other gene chemistry applications.
6.6.3 Nucleic Acid Chemistry
Laboratory-developed based microarray assays have been adapted to most of the common gene chemistry
platforms including PCR, ligase detection reaction, and isothermal linear signal amplification using
©
Number 18 MM12-P
structure-specific oligonucleotide cleavage chemistry.13 The means of target DNA production is not a
limitation in the application to microarray assays; instead, the basis of gene amplification will determine
the assay’s level of sensitivity.160
6.6.4 Overview of Laboratory-developed Microarray Production
The basic principles of DNA immobilization and probe hybridization are relevant in the construction and
use of a DNA microarray.128,161 Specifically, the construction of DNA microarrays is analogous to the
commonly used analytic technique called “reverse dot-blot.” Reverse dot blot detection is based on the
use of a filter, typically made from nitrocellulose paper or nylon, onto which oligonucleotide probes are
affixed. The probes are attached to the filter through the application of the DNA probes in solution
spotted onto the surface, then dried thoroughly or cross-linked by means of exposure to ultraviolet
radiation. The result is the permanent attachment of the short DNA probes to the surface of the filter. The
process of producing a DNA microarray follows a similar logic, but the spotting of the probes usually
involves a process using special instrumentation that deposit the microdroplets either mechanically,
electrostatically, or chemically onto the surface of the array.162-165 Consequently, the chemical attachment
of DNA to the surface of the microarray is either noncovalent or covalent.166
For the production of commercial DNA arrays, highly specialized processes are used to either spot DNA
probes or to actually synthesize DNA in situ. These processes are often proprietary and require
considerable infrastructure. In the case of laboratory-developed based microarray assays, the technology
is at the level that a research laboratory or a shared facility that produces oligonucleotide primers and
probes might afford the requisite instrumentation. In such a setting the cost of producing glass-slide-
based DNA arrays can approach the $5 to $10 range.167 Based on an experience of using devices
manufactured at such facilities, many of the described laboratory-developed DNA assays have shown
acceptable analytic performance. In most cases, these assays involve arrays created on glass slides,
essentially identical to microscope slides used for histologic examination.168 The following is a synopsis
of the more common techniques to prepare such glass-slide-based arrays, the options for the application
of DNA probes, and the subsequent requirements for the quality management of these devices.
6.6.5 The Basis of Probes Attachment — Relative Merits of the Attachment Chemistry
Two general approaches are used currently to attach nucleic acids to surfaces: 1) in situ synthesis, and 2)
deposition of presynthesized oligonucleotides or biologically derived nucleic acids. In the case of in situ
synthesis of oligonucleotides, this most commonly combines photolithographic techniques with
photolabile or photogenerated, acid-labile, nucleic acid protecting groups. Either masks or a digital
micromirror device are used to direct light as needed to specific sites.25,58,59 Alternative schemes, not
requiring light, use either macroscopic masks that act as a stencil or microfabricated ink-jet pumps, which
deliver the reagent components of oligonucleotide synthesis to specified areas of the chip.163,169 From
probes that are robotically or manually deposited at defined sites, immobilization can be noncovalent,
covalent by cross-linking, and covalent by end-tethering. Noncovalent immobilization relies on ionic and
hydrophobic interactions of the probe with complementary functional groups introduced to the surface of
the array (see Figure 1).170 Covalent cross-linking immobilization is achieved by photo-induced radical
reactions of the nucleobase thymine to surface attached amines, or by coupling reactions with adenine,
cytosine, or guanine to surface diazonium ions.171 Probe immobilization based on these methods produce
attachments that are often multiple and ambiguous. Additionally, the orientation of the surface affixed
probes may be such that target hybridization is hindered.119,172 Consequently, recent development in the
routine production of DNA microarrays has focused on the methods of end-tethering presynthesized
oligonucleotides by means of a process that activates reactive functional groups on the surface. Although
more labor-intensive, such immobilization schemes give greater spatial accessibility to probes,
independent of the size of that probe.
36 ©
Volume 25 MM12-P
Most of the probe attachment strategies are done on glass microscope slides. Glass is a preferred
material, because it is easily handled and has low intrinsic fluorescence. However, the loading capacity
of glass becomes a limiting factor, because the surface is planar (i.e., two-dimensional and nonporous).173
These features place limitations on the ability of these devices to bind target molecules. For this reason,
the analytic sensitivity and specificity of DNA arrays also depends on the evaluation of the quantity and
density of probes that successfully attach to the array.174 Although probes spotted at higher density may
increase the sensitivity of detecting weakly amplified targets, there is precedent to document decreased
target binding due to steric hindrance.
For this reason, by preparing the array surface by the addition of a gel layer, the use of dendrimeric
probes and inorganic substrates has been shown to enhance signal output by providing binding capacities
more characteristic of three-dimensional (3D), microporous filter membranes.50,175,176
One protocol for the preparation of the glass slide prior to application of the oligonucleotide probes is
listed below:
6.6.6 Preparation of Glass-Slide-Based Microarrays
6.6.6.1 Overview
Briefly, glass slides are first cleaned with a strong base followed by an acid wash. The glass slides are
then silanized to permit the formation of covalent bonds with the overlying polymer coat. In the case of
gel-coated arrays, a third step involves application an acrylamide/bisacrylamide/acrylic acid polymer
coating to each slide. A final process involves the activation of an amine linker on the surface. Activated
slides are then ready to be spotted with oligonucleotides using any of a variety of different spotting
technologies. Oligonucleotide probes spotted to gel-coated arrays form a covalent bond with the amide
linker via their activated carboxyl groups.
6.6.6.2 Detailed Protocol for Glass Slide Preparation
(1) Cleaning Glass Slides — Standard glass microscope slides are cleaned by first submerging each
in a bath of 700 mL of concentrated ammonium hydroxide (NH4OH):30% hydrogen peroxide (H2O2):
Water (H2O) (1:1:5) at 95 ºC for five minutes, followed by thorough rinsing with distilled and deionized
water. Next, the slides are bathed in 700 mL of concentrated hydrochloric acid (HCl):30% hydrogen
peroxide:water (1:1:5) at 95 ºC for five minutes. A series of rinses follow, first with ddH2O, second with
methanol, and a finally with acetone. Slides were then allowed to dry at room temperature.
(2) Silanize Slides — After cleaning, slides are submerged into a mixture of 2% (v/v) (γ-
Methacryloxypropyltrimethoxysilane (MTS), 0.2% (v/v) triethylamine in chloroform (CHCl3) at room
temperature for 30 minutes, followed by a final 15 minute wash in chloroform. Slides are again allowed
to dry at room temperature. Silanized slide are ready for spotting by several protocols. However,
improved probe attachment and higher sensitivity in fluorescent signal production has been observed with
arrays coated with polymeric gels.
(3) Gel Polymerization of Slides — The application of polymeric gels to the array surface is
described elsewhere and has been adapted as a manual technique, in which hand-to-hand difference in
success is likely.177 One approach is to use 20 µL of the polymer [8% (w/v), 2% (v/v) acrylic acid, 0.02%
(w/v) N,N-Methylene Bis-acrylamide, and 0.8% (w/v) ammonium persulfate] cast onto silanized
microscope slides.158 Briefly, apply the droplet of nonpolymerized gel mixture to the center of the glass
slide. Overlay the slide with a 24 x 50 mm glass coverslip, allowing the polymer to spread laterally and
to cover the surface of the slide. Polymerization is initiated by placing the slides on a heat block at 65 ºC
and incubating for approximately four minutes. The glass coverslips are then carefully removed using a
razor blade to pry up one corner firmly and evenly, but with care so as not to disrupt the gel. The slides
are then rinsed with deionized water and allowed to dry at room temperature.
©
Number 18 MM12-P
(4) Activation of the Slides — The free carboxyl groups randomly dispersed across the polymeric
surface were activated by covering the surface with a solution of 0.1 M 1-(3-dimethylaminopropyl)-3-
ethylcarbodiimide hydrochloride (EDC)/20 mM N-hydroxysuccinimide (NHS) in 0.1 M
(K2HPO4/KH2PO4), at pH 6.0.
6.6.6.3 Spotting Activated Slides With Oligonucleotides and Other Fiducial DNA Probe Markers
One method to attach oligonucleotide probes covalently is to have each probe synthesized to contain an
amine linker on its 3′ end. In the case of adding fiducial DNA markers, which serve as point of reference
when reading the DNA arrays, each is amine terminated 10 mer and has a corresponding fluorescent Cy3
or Cy5 label on its 5′ end (5Cy3//iSp18/TTT TTT TTT T/3AmMC7/ and 5Cy5//iSp18/TTT TTT TTT
T/3AmMC7/). Such oligonucleotide probes are available from most commercial vendors of custom DNA
syntheses.
Spotting of the DNA probes is achieved by means of any of a variety of commercial microarray
instruments. In one example, each locus or address in the 2 x 5 array consisted of a set of four 100-µm
spots, created by depositing 10 to 50 nL aliquots of amide modified oligonucleotides (1.5 mM in 0.2 M
K2HPO4/KH2PO4, pH 8.3) at room temperature onto the preactivated surfaces. As illustrated in Figure
2, ten positions were used for each array, and each slide had a series of eight arrays. This was achieved
by programming the movement of the spotting head containing a series of spotting pins. The protocol for
the spotting procedure will vary between the different manufacturers of the instrument as well as for the
“geography” of the spots on the array. However, the general experience with such instruments points to
the need to empirically set up the spotting conditions, and with that transcription of the precise operation
procedure.
Following spotting, uncoupled DNA probe is removed by soaking the surface with solution of 300 mM
bicine, pH 8.0, 300 mM NaCl, and 0.1% SDS, for 30 minutes at 65 °C, followed by rinsing with water
and drying in the air. The resulting arrays can then be stored in a desiccator at room temperature until
their use in the hybridization step of the procedure.
6.6.6.4 Hybridization of DNA Target Products to DNA Arrays
Typically, each position of the array can hybridize a volume of approximately 20 µL. DNA products
generated by means of PCR, LDR, or other techniques can be added to 20 µL of a 2x hybridization
buffer, resulting in a solution with a final concentration of 300 mM MES, pH 6.0, 10 mM MgCl2, and
0.1% SDS. The efficiency of the target hybridization to the probes on the array will depend on the
requirements of the specific assay, but can generally be achieved in incubation of less than one hour. In
the example of hybridizing short target DNA products (up to 80 bp in length), optimal conditions can be
achieved when arrays are preincubated for 15 minutes at 25 °C in 1x hybridization buffer alone, followed
by the hybridization with DNA product for one hour. To isolate individual positions on the array, a
variety of 8-, 10-, or 12-well perfusion chambers are applied to the array with a self adhesive, and then
filled with the diluted DNA products. The arrays are then placed in a rotating hybridization oven or in a
heat-controlled humid chamber. Commercial devices are available that provide close control of the
hybridization temperature along with constant agitation. Following hybridization, the arrays may be
washed with a solution of 30 mM bicine, pH 8.0, 10 mM MgCl2, and 0.1% SDS for ten minutes at room
temperature.
6.6.7 Data Analysis and Readout Systems
A variety of microarray readout systems are now available (see Figure 3). The adaptation of the
laboratory-developed based microarray to commercial instrumentation for fluorescence detection is
described elsewhere (see Section 6.6.4).
38 ©
Volume 25 MM12-P
6.6.8 Quality Control Measures and Tests for Reliability
Quality management of the production and performance of laboratory-developed based microarray assays
represents the most significant challenge to the clinical laboratory. In general, two aspects of the quality
measures of the microarray devices should be routine when considering offering clinical results from such
devices. First, the quality of the probe attachment, and the reliability and uniformity of the spotting
procedure is paramount. One approach to the monitoring of spot quality is to provide with each
microarray slide a series of spots containing fluorescently labeled probes across the slide that will
represent a sampling of the spotting process across each of the geographically localized microarrays.
Visual inspection of the printed microarrays by dissecting microscopy (e.g., 20x) provides a means of
determining the presence or absence of a microarray element at each location, and is highly recommended
as a quality control procedure.
Positional quality spots are sometimes referred to as “fiducial markings,” because they can also serve to
orient the slide microarray regarding left/right and top/bottom alignment. A related approach is to spot
each position on the microarray with an admixture of assay-specific probes, along with a small percentage
of fluorescently labeled probe of an irrelevant DNA sequence. Prior to hybridization with target DNA,
the microarray can be scanned for the intensity of the fluorochrome-labeled irrelevant probes to note the
uniformity, distribution, and quality of the fluorescent signal (Figure 4).
The second aspect of quality management of the slide microarrays is the monitoring of background
fluorescence, either due to nonspecific signal, or due to cross-contamination with fluorochrome-labeled
amplicons. In this case, a policy in the laboratory should be put forth that dictates when and how often
new slide microarrays should be scanned for ambiguous fluorescent noise within the printed spots or at
positions between the microarrays or for evidence of amplicon contamination. In general, the routine use
of negative and reagent controls should suffice to detect fluorescent noise.
©
Number 18 MM12-P
(a)
Noncovalent immobilization: Ionic and/or hydrophobic

interactions occur between complementary functionalities
introduced onto modified surfaces. Multiple fixation sites are
possible; their location is ambiguous. This model may be
applicable to amine or polylysine surfaces that have not been UV
irradiated.
(b)
Covalent cross-linking immobilization:Covalent bonds

have been established between probe and surface functional
groups. As with (a), multiple fixation sites of ambiguous
location are possible. This model may be applicable to
amine or polylysine surfaces that have been UV irradiated.
(c)
Covalent end-tethering immobilization: The probe is

covalently bound to the surface through a functional group
at the end of the probe. This model may be applicable to
Aldehyde (fixed by reductive amination), activated amine,
activated polylysine, and activated gel surfaces.
Figure 1. Probe Immobilization
40 ©
Volume 25 MM12-P
Figure 2. Schematic of the Geography of a Glass-Slide-Based Microarray. Each microarray is

configured with ten independent probe positions, as well as corner-oriented fiducial spots. This combination of
spots represents the locus for the hybridization of a single sample of target DNA (sample). The microarray, in turn,
is configured for eight independent loci, potentially serving the same number of patient tests.
Low Magnification View of 8 Microarrays

Prepared on Slides
Cy3 Cy5
Figure 3. Photomicrograph of Fluorescent Spots. The photomicrograph illustrates the fluorescent spots
following the hybridization of a series of target DNA samples to each of the loci. Two fluorochromes are used,
illustrating that each position on the microarray can be designed to carry probes to two alleles of a particular gene
product.
©
Number 18 MM12-P
Figure 4. Higher Power Photomicrograph of a Single Microarray. The photomicrograph demonstrates

the hybridization of target DNA to several probe positions. Additionally, the top and right-side row of fluorescent
spots highlight the fiducial markings for the single microarray.
7 Genetic Data Analysis

Many genetic analyses use microarrays of oligonucleotide probes, and both hybridization as well as
enzymatic reactions (e.g., single base extension reactions) have been used to generate the basic data
elements. For human genetic analyses, the most informative type of sequence variations are SNPs. The
SNPs causative for disease or responsible for response to drug therapies are well known and can now be
readily assessed in a massively parallel fashion using high-density oligonucleotide microarrays.
Similarly, such microarrays can be tiled to detect known or predicted somatic mutations within gene
regions of interest. Applying the same principles in infectious disease allows detection and gene-based
typing of bacterial and viral species.
7.1 Data Elements
With few exceptions, the data collected in genetic analyses are fluorescent intensities associated with
specific microarray features (localized unique oligonucleotides of a single sequence). The simplest data
element can be described as the comparison of any two features that differ by a single nucleotide, and as a
result report on the presence of a biallelic variant within a sample. This approach builds on well-
established principles of allele-specific hybridization118 that have been employed for nearly two decades
in procedures such as colony hybridization, Southern blots, and reverse dot blots. Like reverse dot blots
or line arrays, microarrays designed to detect genetic variations carry probes immobilized to a solid
surface, while the target is generally uniquely amplified by PCR, labeled, and hybridized to the
immobilized probe.
One approach for interrogating any given polymorphic nucleotide position of interest makes use of paired
sets of four oligonucleotide probes (typically 18- to 25-mers) in which all probe sets are identical except
for the single nucleotide position under interrogation, which is usually positioned centrally within the
oligonucleotide sequence (Figure 5). Additional redundancy can be added (as in the Block design shown
in Figure 5) by also querying positions flanking the polymorphic site. When two chromosomes are
interrogated simultaneously for allelic variation, as occurs when PCR or some other amplification method
is used to isolate a human genomic region of interest, either of two oligonucleotides will be perfectly
matched in a normal or homozygous mutant individual, or both in a heterozygous individual. It is
©
42 Clinical and Laboratory Standards Institute. All rights reserved.
Volume 25 MM12-P
recommended that both strands be analyzed to add confidence to the determination, especially for regions
capable of forming secondary structures, or where less discriminating G-T mismatches arise on one of the
two strands.
Figure 5. SNP Array Design. (A) Design for querying a locus. Target sequences (lowercase) for both A and B
alleles are identical except for the polymorphic base (uppercase). Five positions at or near the polymorphic locus,
indicated by -4, -1, 0, +1, and +4 are queried. (Solid line) Probe sequences on the SNP array that are
complementary to the targets; (squares) set of four probes (each probe 20 bases in length), referred to as a tiling,
identical except for the simple base that is either A, C, G, or T; (closed squares) perfect match (PM) probes for the
target sample; (open squares) mismatch (MM) probes for the target sample. (B) Block design for genotyping of two
alleles. The A-allele (A) and B-allele (B) probes are arranged adjacent to each other at each position (-4, -1, 0, +1,
and +4). The A- and B-allele tiles at position -1, -4, +1, or +4 define a miniblock, whereas for the polymorphic base
(position 0 the single tile) defines a miniblock. One strand of a marker is represented by these five miniblocks,
defining a block. (From Mei R, Galipeau PC, Prass C, et al. Genome-wide detection of allelic imbalance using
human SNPs and high-density DNA arrays. Genome Res. 2000;10:1126-1137. Copyright by Cold Spring Harbor
Laboratory Press.)
For resequencing arrays, a similar approach is employed using four oligonucleotide probes per site that
differ by the substitution of all four possible bases (A, C, G, T) near the middle of the oligonucleotide
(Figure 6). It is also relatively straightforward to tile for single base deletions by adding a fifth probe in
which the interrogated base position is deleted, or to detect single base insertions by adding four
oligonucleotides in which the interrogated base position has one of four possible bases inserted. By tiling
probe sets sequentially throughout the region of the gene to be sequenced, the sequence can then be read
using the probe(s) with highest intensity (perfect match) within each probe set. Robustness can be
improved where needed by adding additional probe set redundancy using different probe lengths or
varying the substitution position of the interrogated site. It is recommended that both oligonucleotide
resequencing arrays be designed to query both sense and antisense strands with conventional, diagnostic
DNA sequencing applications.
©
Number 18 MM12-P
Figure 6. Example of p53 Probe Array Design (Image courtesy of Affymetrix, Inc.)
7.2 Heritable Changes
Oligonucleotide microarrays using hybridization-based discrimination can be designed to detect many

forms of heritable genetic variation, including single nucleotide polymorphisms, single or multiple base
insertions and deletions, gene conversion events, and repeats. The capacity of these arrays depends
primarily on whether the array queries multiple polymorphisms or mutations in a single gene (e.g., CFTR,
CYP2D6),78,82 or at multiple genetic loci throughout the genome, as is done with commercially available
SNP arrays (Lindblad-Toh protocol 9 in Bowtell and Sambrook, 2003).104 Polymorphism analysis can be
used to detect coding changes associated with a biologically relevant phenotype (e.g., drug metabolism
enzyme polymorphisms), for linkage analysis if performed on a genome-wide scale or in a dense
chromosomal region, or for assessment of loss of heterozygosity (LOH) in cancer.
An alternative to genetic variation discrimination by patterns of hybridization alone makes use of generic
oligonucleotide “tag” arrays coupled with enzymatic polymerization discrimination for the sequence
variation of interest113 (Protocols 6 and 7, Bowtell and Sambrook). 104 Here the immobilized tag probes
serve to physically isolate signals from within multiplexed extension or ligation reactions such that they
can be measured fluorescently. For example, SBE primers are designed with 3'-ends immediately adjacent
to a variable nucleotide position, and the variant(s) are detected by primer extension reactions
incorporating one or more detectable nucleoside triphosphates.178 SBE-tag array assays build on this
principle by incorporating unique sequence tags into the 5' end of each SBE primer. This allows
genotyping for many SNP loci by parallel or multiplexed SBE reactions with primers that are localized to
©
Volume 25 MM12-P
a single, unique tag site on the array during hybridization. As hybridization controls, and to enable
background and cross-hybridization subtraction, each tag probe (PM, perfect match) is paired with a
second probe that is identical in sequence except for a single base difference at the central position (MM,
mismatch).113 Therefore, literally thousands of SBE reactions can be run simultaneously and the data
deconvoluted by physical isolation to unique feature sites on the microarray. Critical factors to consider
in the design of SBE primers and probe are discussed extensively by Lindblad-Toh.104 An alternative to
SBE that can also be considered builds on the tag array concept but uses allele-specific ligation reactions
to detect variation.179
The most expansive genetic use of oligonucleotide hybridization arrays determines DNA sequence in a
contiguous linear region of a gene, a process known as “sequencing by hybridization” or “variant
analysis.”180,181 Sequencing arrays have been designed to determine de novo sequences using all possible
sequences of a particular oligonucleotide length, for example 65 536 (48) 8-mers. Hybridization-positive
probe signals are scored, and a computer is used to assemble and align overlapping probe sequences to
determine the target sequence. Potential difficulties arise with this approach in gene areas where repeats
are encountered, so SBH is not recommended for routine diagnostic applications.
The completion of the human genome sequence project affords the opportunity to resequence previously
characterized genes, an application of particular utility in assessing somatic mutations in oncogenes
and/or tumor suppressor genes in particular cancers. Oligonucleotide arrays for diagnostic applications in
cancer have been designed to resequence the p53 tumor suppressor gene, portions of the BRCA1 and
ATM cancer genes, and to detect mutations in mitochondrial DNA (mtDNA).84-86 Single base
substitutions generally discriminated well by hybridization alone; however, frameshift mutations are more
challenging to detect robustly. This is because frameshift detection requires a minimum of five probes to
anticipate all possible single base insertions and single base deletion at a particular nucleotide position.
Moreover, because hybridization conditions cannot be optimized for every probe set, considerable cross-
hybridization between perfectly matched and mismatched probe sets occurs. Therefore, it is
recommended that stringent hybridization and wash procedures (along with selective redundant tiling
approaches) be employed for optimal detection of frameshift and base substitution mutations in
resequencing array assays.
Multipatient microarrays provide an interesting alternative to oligonucleotide arrays for detecting

heritable changes in gene sequences.182,183 In the multipatient approach, the assay is turned “upside down”
relative to a traditional oligonucleotide microarray in that the patient DNA is attached to the microarray
substrate and probed with fluorescent oligonucleotides in solution to detect single nucleotide differences.
At the current printing densities, multipatient microarrays allow >100 000 patients to be screened in a
single hybridizaton, providing extremely cost-effective genotyping information. By separating different
loci for each patient, this format allows screening of multiple patients for multiple diseases in a single
test.
7.3 Methylation Analysis
The hypermethylation of gene promoter regions has been recognized as an important biological pathway
for repression of gene transcription in cancer and other human diseases. Methylation of cytosine within
CpG islands to form 5-methylcytosine is an epigenetic alteration that does not lead to a change in
nucleotide sequence. CpG island methylation is associated with histone deacetylation and transcriptional
silencing and is essential for normal embryonic development, genomic imprinting, and X-chromosome
inactivation. While hypermethylation of CpG islands has been shown to lead to suppression of gene
expression, hypomethylation can lead to higher levels of gene expression. Aberrant DNA methylation of
CpG islands in tumor suppressor genes has been implicated in tumorigenesis, and aberrant methylation of
imprinted genes is associated with several inherited human diseases. Methylation-specific oligonucleotide
microarrays are an emerging application of microarray technology for the detection of these epigenetic
alterations.184,185
©
Number 18 MM12-P
7.3.1 Bisulfite Treatment
The basis for detection of methylated cytosine residues is the deamination of unmethylated cytosine to
uracil via bisulfite treatment, while 5-methylcytosine remains unreactive and remains intact during this
chemical treatment. PCR amplification of bisulfite treated genomic DNA therefore results in pools of
products that may contain cytosine or thymine at methylation sites, the relative abundance of which can
be detected with allele-specific oligonucleotide probes. Unlike SNPs, where homozygous and
heterozygous states are found, methylation can range continuously from 0 to 100%. Therefore, the
proportion of methylation is reflected in the relative fluorescent signal intensity observed at
oligonucleotide probe features differing by a single nucleotide. Like SNP detection, detection can be
based on hybridization discrimination alone or through the use of SBE-tag array approaches.
7.4 Pathogen Profiling
The section will consist of identifying the potential pitfalls of utilizing microarrays to detect, identify, and
quantitate pathogenic organisms. The assumption is that appropriate PCR amplification and primer design
were already taken into account, though general references to these issues will be made and representative
references cited. Most of the guidelines that correspond to human gene detection using microarrays are
identical for pathogenic organisms. Specific handling guidelines for pathogens should be followed as
recommended by CDC186 and Clinical and Laboratory Standards Institute. In fact, much of the specific
information relating to the hurdles in the detection of pathogenic organisms has to do with accurate
quantitative PCR amplification187-189 and efficient multiplex PCR methods,114 though these topics are
outside of the scope of this document. Additionally, pathogen detection and identification requires
specific considerations for prokaryotes (e.g., bacteria) and viruses, which will be examined in more detail
below. (See the most current editions of CLSI/NCCLS documents MM3—Molecular Diagnostic Methods
for Infectious Diseases and MM14—Proficiency Testing (External Quality Assessment) for Molecular
Methods.)
7.4.1 Prokaryotic Nucleic Acid Detection
In prokaryotic species, it is easier to work with the DNA itself rather than in eukaryotes, since there are
no intervening sequences, introns, and few nuclear modifications, such as methylation, with which to
contend. Isolation of the nucleic acids can be more difficult, depending upon whether the cells are
vegetative or the nucleic acids are encapsulated in spores. One must find a balance between methods to
efficiently disrupt the cell or spore envelopes and minimization of nucleic acid damage/fragmentation.
One should therefore consider the biology of the organism with regard to the extent to which the DNA is
associated with the cell or spore envelope to evaluate the method that is chosen.
Quantification of mRNA poses a difficult problem, since the extended polyadenylated 3' ends of the
mRNA found in eukaryotes are not employed in prokaryotes, so alternative methods must be employed to
facilitate the efficient synthesis of cDNA through the use of reverse transcriptase. The current methods
most often utilized to circumvent the lack of an extended polyadenylated tail are primarily the use of
random primers or “universal bases.” The literature reports that “universal bases” are not always exactly
universal, since they often show a preference for associating with particular bases, they are not well
characterized for RNA complimentary binding, and were shown to have reduced polymerase efficiency
when incorporated in primers.190 These factors all contribute to an uneven amplification of mRNA, thus
potentially severely affecting the accuracy of mRNA quantitation. Therefore, one should evaluate whether
appropriate amplification is being obtained at these steps for the sequences of interest.
7.4.2 Virus Nucleic Acid Detection
The nucleic acids contained in viral particles will be DNA, RNA, or a mixture of both. It is very
important that detection of viruses be sensitive, since very little nucleic acid material is present in
©
Volume 25 MM12-P
comparison to host nucleic acids. Additionally, viruses have relatively high mutation rates,191 so in
choosing microarray probe sequences, one should assess the probabilities of viral mutations in the areas
of the probes and have enough coverage of each gene, mRNA, or cDNA to identify the products of
interest despite mutations. There are a number of good reviews about quantitative viral PCR methods,
which could help provide additional guidance for particular platforms and chemistries.192,193 Overall, the
greatest consideration in the detection and identification of viruses by microarrays will be in the prudent
selection of probe sequences for both amplification and detection.
7.5 Detection of Gene Dosage Abnormalities Using Comparative Genomic Hybridization
7.5.1 Overview
Many human health problems are the result of abnormalities in the number of copies of segments of
genomic DNA. For example, developmental abnormalities such as Down syndrome result from the gain
or loss of whole chromosomes or chromosomal regions prior to or shortly after fertilization, while
alterations in gene dosage acquired in somatic cells are frequently found in malignancies. The magnitude
of the copy number changes ranges from single-copy increases or decreases relative to the normal two
copies per cell, to deletion of all copies, to high-level amplifications.
Comparative genomic hybridization (CGH, see Figure 7) was developed to measure alterations in dosage
of DNA sequences throughout the entire genome in a single experiment.194 As originally developed,
CGH employs the comparative hybridization of differentially labeled genomic DNA from two (or more)
cell populations to (usually normal) metaphase chromosomes. The DNAs are usually labeled with
fluorochromes. The ratio of the hybridization intensities along the chromosomes then gives a measure of
the relative copy number of sequences in the genomes that hybridize to each location on the
chromosomes. One of these genomic DNAs is usually from a normal genome (termed the “reference”) so
that ratios directly map copy number variations in the “test” genomes relative to the cytogenetic map
provided by the chromosomes. It is important to note that CGH by itself detects only copy number
variation, not absolute copy number. As a result, a perfectly tetraploid cell population will give constant
ratios across the genome and be indistinguishable from the result obtained with a diploid test cell
population.
Test Genomic DNA Reference Genomic DNA
Cot-1 DNA
Ratio
Ratio
Position on Chromosome Position on Sequence
Figure 7. Comparative Genomic Hybridization. Left — Total genomic DNAs are isolated from a “test” and
a “reference” cell population, labeled with different fluorochromes, and hybridized to normal metaphase
chromosomes. Cot-1 DNA is used to suppress hybridization of repetitive sequences. The resulting ratio of the
fluorescence intensities of the two fluorochromes at a location on a chromosome is approximately proportional to
the ratio of the copy numbers of the corresponding DNA sequences in the test and reference genomes. A similar
hybridization to an array of mapped clones permits measurement of copy number with resolution determined by the
length of the clones and/or their map spacing. (Figure courtesy of Daniel Pinkel, PhD, University of California, San
Francisco)
©
Number 18 MM12-P
Microarray formats for array CGH have been developed over the last several years (Salinos-Toldo).195
Array CGH can provide improved quantitative accuracy, resolution, and dynamic range compared to
chromosome CGH, and the measurements can be referenced directly to positions on the genome
sequence. The use of a comparative hybridization strategy stabilizes the measurements against many
sources of variation. For example, the density of DNA available for hybridization in an array element
affects signal intensity, but since both test and reference signals are likely to be similarly affected, the
ratio is independent of this common variable in array production. Single-color approaches to DNA copy
number measurement are also being developed, but these require more control over the array
manufacturing process than for ratio techniques.
The performance level required from an array CGH platform depends on the type of aberrations that are
to be detected, the characteristics of the specimens, and the application of the results. For example, in a
research project one may be able to tolerate considerable noise since one may only be interested in getting
an overview of aberrations in a population of specimens, while in clinical applications one needs more
precision because of the necessity of getting the correct result for each patient. The difficulty of obtaining
the correct result depends strongly on the types of aberrations. It is much easier to detect the large
increases in copy number due to amplifications than single copy gains and losses. The difficulty increases
further as the number of array elements affected by an aberration decreases, since averaging over multiple
array elements becomes less effective at increasing statistical certainty. Finally, specimen heterogeneity
(for example, the presence of normal cells within tumor specimens), and tissue fixation add further
challenges.
Arrays for CGH have been made from the DNA of mapped BAC and other large insert genomic
clones,195-198 cDNAs,199 and oligonucleotides.200 Arrays made from large insert clones provide the best
performance when total genomic DNA from the specimens is being analyzed. Some platforms using
oligo arrays employ PCR techniques to amplify the genomic DNA and reduce its complexity in order to
increase signal intensities. All array CGH technologies are in continuing development, so the ultimate
performance that can be obtained from them is not yet established.
7.5.2 Measurement Considerations
DNA copy number aberrations typically occur through the gain or loss of chromosomal segments.
Therefore, in a homogenous cell population the actual DNA copy number profile of the genome consists
of a series of “plateaus” of constant copy number, bounded by sharp transitions, as shown in Figure 8a.
The noise level of the measurement is evident by the ratio variation within a plateau. This is quite distinct
from expression analysis, where one has no a priori expectation on relative expression levels of genes and
no method to assess measurement quality for all analyses.
a b 7
6 c
2 5
2
1.5 1
4
log2ratio
1
log2ratio
log2ratio
3 0
0.5 0 20000 40000 60000 80000 100000120000140000
2
0 -1
1
-0.50 20000400006000080000100000
120000
140000
160000 -2
0
-1
-1
0 20000 40000 60000 80000 100000 120000 140000 160000 -3
-1.5
-2 -2 -4
kb kb kb
Figure 8. Copy Number Profile Alterations Mapped by Array CGH. Ratios on each arrayed BAC are
plotted relative to the position in the draft human sequence on a chromosome from the tip of the p arm to the tip of
the q arm. Note that the plot shows log2(ratio). Data are normalized so that the median copy number for the
genome is at 0. Each data point includes error bars showing the standard deviation of triplicate spots for each clone
on the array. In most cases, these are smaller than the size of the symbol. (a) Loss on the p arm and gain on the q
arm; (b) High-level amplification. Copy number of the peak is approximately 26=64 times average for the genome.
A deletion distal to the peak, ratio 2-1= 0.5 is also visible; (c) Homozygous deletion. The ratio is 2-3 = 0.12
(Snijders, et al).197 (From Snijders AM, Nowak N, Seagraves R, et al. Assembly of microarrays for genome-wide
measurement of DNA copy number by CGH. Nature Genetics. 2001;29:263-264. Reprinted with permission.)
©
Volume 25 MM12-P
The view of genomic aberrations provided by array CGH depends on the measurement precision and the
density of the array elements along the measured portion of the genome. In general, arrays made using
large insert genomic clones such as BACs provide more precise measurements than arrays made using
oligonucleotides. With BAC arrays, the boundaries of single copy changes can be located to a fraction of
the length of a BAC, if the density of coverage is high enough so that the copy number transitions occur
within a BAC.201 With smaller array elements such as oligonucleotides, it may be necessary to average
over several array elements, with a concomitant loss in resolution, in order to get a sufficiently reliable
measurement.
Array CGH measurements must be performed and interpreted in light of some basic characteristics of the
genome sequence structure. Three are especially important:
High Copy Repetitive Sequences. A large portion of the genomic DNA is made up of a few families of
highly repeated sequences. Some of these are located in the vicinity of centromeres and telomeres, and
others are interspersed throughout the genome. The interspersed sequences have the most effect on array
CGH, since they may be contained in the DNA sequences used for the array elements (for example,
bacterial artificial chromosome [BACs]). Therefore, the hybridization signal will consist of unique
sequences in the genome, which allow measurement of copy number at that location, and repetitive
sequences that are located throughout the genome. In order to measure local copy number, the unique
sequence hybridization must dominate that of the repetitive sequences. Adequate suppression of the
repetitive sequence hybridization can be achieved by use of unlabeled Cot-1 DNA. An important
contribution to problems with array CGH is variability in the quality of Cot-1 preparations. Use of
oligonucleotide arrays for CGH provides the potential for designing array elements that do not contain
repeats.
Low Copy Shared Sequences. The human genome contains many stretches of sequence that occur more
than once in the haploid genome and are thus not strictly “unique.” These sequences include portions of
genes in gene families, and long (tens to hundreds of kilobases) sections of the genome. Locations of
many “duplicons” are indicated on the UCSC genome browser (http://genome.ucsc.edu/). Hybridization
of these sequences to the array is not blocked by Cot-1 DNA, so that the signal on an array element
containing such a sequence reflects the entire genomic abundance of that sequence. An array element
containing such a sequence will not be as responsive to a local copy number change as an element that
does not contain one. Furthermore, an array element that contains a shared sequence may show an
abnormal ratio in the absence of a true copy number change, if the location of the other copy(ies) of the
duplicon is involved in an aberration.
DNA Copy Number Polymorphisms. Polymorphisms in copy number of some DNA are detectable by
array CGH (see Figure 9). Therefore, interpretation of a candidate abnormality must be performed in light
of the currently incompletely known genomic locations and population distribution of the polymorphic
sequences. In some cases, it may be necessary to determine if an apparent abnormality in an individual is
a polymorphism by studying other family members.
©
Number 18 MM12-P
Figure 9. DNA Copy Number Polymorphisms. Comparison of two individuals to the same reference shows
one clone that shows different ratios in the two analyses. This clone on chromosome 6 contains a gene that has a
polymorphism in the number of repeats of a 5 kb sequence. (From Albertson DG, Pinkel D. Genomic microarrays
in human genetic disease and cancer. Human Molecular Genetics. 2003;12(2):R145-R152. Reprinted with
permission from the authors and Oxford University Press.)
Given the considerations above, and other aspects of array measurements that are now discussed, the
quantitative relationship between the measured ratios and DNA copy number variation can be complex.
Ideally, copy number ratio, R i, on array element i is given by:
R i = KTui / Rui (1)
where: Tui and Rui are the signal intensities due to specific hybridization of the unique sequences in the
test and reference genomes respectively, and are proportional to DNA copy number if the entire
hybridization and measurement process behaves linearly. K is a proportionality constant that includes
such factors as the characteristics of the labeling reaction, the fluorescence properties of the labels, the
amount of the two DNAs included in the hybridization, and the differential sensitivity of imaging system,
etc. Characteristics of the array that are difficult to control during manufacture (such as variation in the
density of sequences available for hybridization among the array elements) cancel in the ratio. Ratios on
different array elements can be directly compared within an analysis, and a simple overall normalization
allows comparison among different hybridizations.
In practice, the signal intensities on the array elements contain contributions from sources other than
specific, unique sequence hybridization. The additional signal comes from the incompletely suppressed,
repetitive sequences discussed above, nonspecific binding, autofluorescence of the array elements, errors
in background corrections, etc. It is convenient to reference the magnitude of these additional signals to
the intensities tui and rui that would be found from the unique sequence hybridization with test and
reference genomes with two copies of all of the unique sequences. Therefore, the total test signal is given
by Tui + εi tui, and the total reference signal is Rui + δi rui . Note that if a normal genome is used for the
test and/or reference in the hybridization, then Tui = tui and/or Rui = rui, with appropriate modification for
the sex chromosomes. This nomenclature is convenient because ε and δ have the simple interpretation of
indicating magnitude of the “undesirable” signal on an array spot relative to the desired signal. Moreover,
many of the factors that affect the specific, unique sequence signal have equal impact on the additional
signals, so that ε and δ can be relatively stable over the array. Given this nomenclature, the measured
ratio, ρi, on array element i can be written as:202
ρi = ( Tui + ε i tui) / (Rui + δi rui) (2)

©
Volume 25 MM12-P
A major effect of nonzero values of ε and δ is to modify the relationship between the measured ratio and
the true copy number ratio. Figure 10a shows a linear plot of the relationship of the measured ratio
(Equation (2)) to true copy number of the tested sequence changes from 0 to twice the median number of
copies per cell. The relationship is shown for several values of ε from 0 to1.0. The plot assumes that ε
does not depend on the copy number variation at that locus, since it is primarily affected by the total
repeat sequence content of the test genome, autofluorescence of the array element, etc. Note that as ε
increases the slope of the relationship decreases, so that it will be harder to detect a copy number change
of a given magnitude. Figure 10b shows the dependence of the log2 of the measured ratio, since data are
frequently presented in this manner.
If ε’s and δ’s vary among array elements, each locus in analysis may be operating on a different response
slope, and variation in ratio among different array elements with the same copy number will increase.
Therefore, it is important to reduce the magnitude and variation in these quantities as much as possible.
Note that in Figure 8a the ratio variation on each of the copy number plateaus is about the same,
indicating that all of these array elements are operating on very similar response curves. The curves of
Figure 10 also show the effect of an admixture of normal cells in a specimen, which can be viewed as a
contribution to ε.
Figure 10 presents the simplest behavior of real array CGH measurements. In practice there can be
variations in the hybridization environment across the area of the array, which differentially affect the
behavior of the two genomic DNAs due to the chemical differences in their labels. The DNA can have
label-specific incorporation efficiencies that depend on sequence motifs, the imaging system may have
nonuniform sensitivity over the array and be nonlinear in its intensity response, background levels may
vary considerably from location to location, etc. These factors can substantially affect the relationship
between measured ratio and relative copy number in a manner that varies over the array. Variations of
this type are common in measurements of mRNA expression and have led to the adoption of spatial- and
intensity-dependent normalization procedures. If these computational procedures do not properly model
the defects in the measurement process, then the resulting modified data will be inaccurate. As discussed
above, the measurement precision required for many array CGH applications is much higher than for
most current work with gene expression, so that ad hoc analysis procedures that are commonly used in
expression analysis may not be appropriate for measurement of DNA copy number. Therefore, it is
suggested that computational correction of the data be limited to addressing those features for which there
is a verified model of the process that is being corrected, or for which independent evidence for validity
and stability of the correction can be obtained.
©
Number 18 MM12-P
Figure 10. Relationship of Measured Ratio to DNA Copy Number. DNA copy number is measured
relative to the median for the genome. Therefore a relative copy number of 1 indicates two copies in a diploid cell,
4 copies in a tetraploid cell etc. a) A linear plot of Equation (1) showing the dependence of the measured ratio for a
cell population as the DNA copy number of a locus increases from 0 (total deletion) to twice the median. It is
assumed that the nonspecific binding and hybridization of unsuppressed, repetitive sequences is the same for all of
these cases, because these effects are dominated by sequences throughout the test and reference genomes. The solid
heavy line shows the ideal situation in which the signal is due solely to hybridization of unique sequences (ε = 0),
and for cases where 10, 20, 30, 40, and 50% of the total signal at relative copy number = 1 is due to nonunique
sequences (corresponding to ε = 0.11, 0.25, 0.43, 0.67, and 1.0). These plots assume that other aspects of the
measurement, such as the imaging system, behave ideally. Note that the effect of having a heterogeneous
population (for example, a mixture of tumor and normal cells) can be included conceptually as a contribution to ε; b)
The same data plotted on a Log2(ratio) scale. (Figure courtesy of Daniel Pinkel, PhD, University of California, San
Francisco)
7.5.3 Validation of Array Performance
Hybridizations with cell populations that contain well-characterized chromosomal abnormalities offer one
effective route to experimentally test array performance. These allow one to assess the noise level of the
measurements; to determine differences in the behavior of different array elements; and to determine the
relationship of measured ratio, ρi, to true copy number.197 Figure 8 shows array CGH analysis of several
cell lines with different types of aberrations. Panel a shows deletion of a portion of 8p and gain of distal
8q in a polyploid cell line. Note that the data clearly show plateaus in copy number as one would expect
for aberrations affecting chromosomal segments. The noise level, the variation in ratio among the clones,
on all of the plateaus is approximately the same on this log2 plot, which is consistent with all clones
responding with the same slope to copy number changes (e.g., the measurement process is linear and the
δ’s and ε’s are very similar for all clones). Figure 8c shows a homozygous deletion of a small region of
the genome. The log 2 (ratio) of ~ -3 indicates that the values of δ and ε in this analysis are on the order
of 0.18 (approximately 85% of the signal on an array element is due to specific unique sequence
hybridization).
The Coriell Institute (http://ccr.coriell.org/ccr/) has a range of human cell cultures with established single
copy gains and losses affecting many regions of the genome. Hybridization to an array with a battery of
these cell lines allows testing of the responses of all of the clones that are present in the aberrations. Such
tests are also very important as part of the procedures to validate that the identity of the clones has been
properly maintained during array manufacture, to detect possible errors in the genome sequence assembly
on which the mapping is based, and to determine if a clone may contain “duplicons” as discussed above,
so that its response is affected by sequences at multiple locations in the genome.197
©
Volume 25 MM12-P
7.5.4 Conclusion
DNA copy number measurements require high array performance to have maximum clinical utility,
because single copy changes in sequence content need to be detected with high reliability, sometimes in
heterogeneous cell populations such as tumors with intermixed normal cells. A number of approaches to
meet these requirements have been proposed and are under continuing development. Importantly, there
are easily recognized characteristics of the measurement results that indicate immediately the
measurement quality for each specimen. In addition, test specimens with defined aberrations are easily
available. As a result, performance of an array CGH platform can be directly established against
objective standards. Clinical application of array CGH awaits clarification of the regulatory requirements
for this technology, or the availability of commercial, FDA-approved array platforms.
8 Gene Expression Data Analysis
8.1 Overview
Gene expression changes result from transcription of DNA into RNA, and translation of the latter into
proteins. In many respects, RNA is an excellent analyte to use for monitoring changes in gene expression,
because its steady-state levels provide a good estimate of protein abundance. The sequence of bases
comprising a gene is the most specific marker for that gene. Moreover, highly sensitive tests exist for
detection of any nucleic acid sequence through various amplification technologies (PCR, LCR, NASBA,
etc.). The genes expressed by a tissue are one measure of the tissue’s “phenotype.”
Human disease is a process that results in changes in gene expression within affected tissues. These
changes in gene expression result in grossly detectable phenotypic changes that have served as the basis
for characterizing and stratifying patients with the disease for hundreds of years. A classic example is the
presence of an inflammatory process, which is characterized by redness, swelling, increased temperature
and pain. Such gross phenotypic changes reflect complex underlying processes involving migration by
“inflammatory cells” into the affected areas and changes in gene expression within the latter cells and
within cells indigenous to the affected region of the body.
Traditionally, classification of disease processes has utilized grossly detectable phenotypic changes. The
difficulties with this approach become apparent from considering the many etiologies that can produce the
signs of inflammation (i.e., redness, swelling, increased temperature, and pain). Such gross phenotypic
changes are many steps removed from the specific etiologies that drive the mechanisms leading to the
grossly detectable phenotypic changes that can diminish their diagnostic sensitivity and specificity for
identifying a disease process. As a result, current efforts in medical science have been directed to
identifying molecular markers that have both robust methods for measurement (i.e., high analytical
sensitivity and specificity) and are directly involved in the mechanism of the disease providing the marker
with a high diagnostic sensitivity and specificity. A classical example of the latter is detection of
chimeric mRNAs resulting from translocated genes such as bcr-abl in the t9;22 translocation found in
CML.
A limitation of current molecular assays to date has been that while human cells from a given tissue
express on the order of 10 000 different genes, these assays detect only a very small fraction of these
potential phenotypic targets. Recently, through the use of microarrays, it has become possible to
simultaneously assay and quantitate the presence of 10 000 to 30 000 different genes. This has
enormously increased the possibility to stratify patients based not only on the presence or absence or level
of a given target sequence but on the presence of differences in the distribution of genes within diseased
tissue. However, the signals for these 10 000 to 30 000 different genes are determined simultaneously
and must be analyzed as a group, which raises a multiplicity of technical and data management problems.
While a complete understanding of all of the problems related to data analysis of microarrays is not
©
Number 18 MM12-P
available, this section identifies some of the problems that are currently recognized, as well as some of the
possible solutions.
8.2 Data Elements
The data elements that are presented for evaluation in expression microarrays, at this juncture, arise from
two general schemas. These are 1) premade, high-density oligonucleotide microarrays that have been
commercialized, and 2) custom microarrays. Premade microarrays are created through synthesis of
oligonucleotides directly onto a glass surface using photolatholgraphic processes similar to those used in
semiconductor manufacturing, or by conventional chemical synthesis approaches. Premade chips are
made by the commercial vendor and provided to the customer. Custom microarrays are created by the end
user, typically by transferring the specific probe (e.g., a cloned fragment of cDNA or long
oligonucleotide) onto the surface of a glass slide using a robotic spotting apparatus. Custom microarrays
can be created using commercial printing robots provided by a number of commercial vendors. Custom
microarraying equipment can also be “home-built” using information provided on public websites (e.g.,
http://cmgm.stanford.edu/pbrown/mguide/index.html).
The target sequences to be hybridized to either the premade or custom microarrays are labeled with a
fluorescent dye through processes described elsewhere in this guideline and then annealed to the probes
on the microarray. After washing to remove unbound target, the microarrays are scanned with a high-
resolution fluorescence scanner. The basic data element for all high density microarrays is a measure of
signal intensity, termed a “spot intensity” or “probe intensity,” which is considered a proxy for the
concentration of that sequence in the sample of RNA that is being examined. Spot or probe intensities
form the primary data elements that are subsequently evaluated in microarray studies.
It is important to be aware of a basic difference in the significance of spot or probe intensities as utilized
in different commercial, premade microarrays or the custom microarrays. For example, some
oligonucleotide probes are 25 mers, while long oligonucleotides and cDNAs (50 nucleotides to several
thousand nucleotides in length) are used in many other microarrays. This difference in size has obvious
implications for the analytical specificity of the probe. For example, in a microarray using short
oligonucleotide probes (i.e., 25 mers), the RNA from a given gene can be assessed through the use of a
panel of multiple probes (i.e., spots), whereas with longer probe sequences the RNA expressed by a given
gene may be assessed by the signal arising from evaluation of a single spot. Moreover, some microarrays
using short probes are designed to provide an internal correction for nonspecific hybridization. Such a
design includes an oligonucleotide probe that is perfectly matched to a portion of the target RNA
sequence and in an adjacent location on the array, another probe with a one base pair mismatch at a
specified location in the oligonucleotide probe. As discussed below, use of this approach to correct for
nonspecific hybridization has not been found to be entirely helpful in this regard.
8.2.1 Confounding Covariate Parameters
As noted in Section 8.1, what is ultimately of interest in the clinical use of expression microarrays is the
measurement of differences between disease and nondisease states and between different disease states.
Observed differences, however, are recognized as arising from two sources, interesting variation in signal
intensity, and obscuring variation in signal intensity. Interesting variation refers to those differences
arising from biological differences between the states being measured. Obscuring or nuisance variation
can arise from a host of factors, some of which are listed in Table 3. In addition to obscuring variation,
differences in results can also be attributed to the selected statistical methods applied, such as intensity
extraction and data analysis methods.
©
Volume 25 MM12-P
Table 3. Sources of Obscuring Variation in Microarray Measurements

1. Sample handling (degree of physical manipulation, time from extripation to freezing)
2. Microarray manufacture[printed microarrays] (missed spots, mix-ups, degraded probes)
3. Sample processing (extraction procedure, RNA integrity & purity, RNA labeling)
4. Processing differences (hybridization chambers, washing modules, scanners)
5. Personnel differences
6. Random differences in signal intensity in a data set that covary with the biological process
Because of the multiplicity of potential confounding covariate parameters, the assessment of gene
expression directly from the spot or probe signal intensities is complex. In essence, it is necessary to
create a model, which takes into account the various sources of variation, and to extract from the model
the “interesting” variation component. Fundamental to being able to evaluate the models is identification
and recording of the data that captures the measurement of confounding covariates. A standard that has
been put forth as a minimal collection of such parameters is Minimum Information About Microarray
Experiment (MAIME), which is promoted by the Microarray Gene Expression Data Society (MGED;
http://www.mged.org/).
8.3 Low-Level Analysis
The process of extracting spot or probe intensity signals and correcting them for sources of variation
present in items 2 to 4 in Table 3 are part of the process of low level analysis. The first step in low-level
analysis should be to visually inspect the arrays for various quality problems, such as scratches, bubbles,
background haze, and/or edge effects.203 Thereafter, image analysis (Section 8.3.1), normalization
(Section 8.3.2), and expression summaries (Section 8.3.3) should follow.
8.3.1 Extraction of Spot or Probe Intensity Signals (Image Analysis)
Image analysis is required to translate the raw data from a microarray experiment (i.e., the set of images)
into a numerical form. General issues of image analysis include addressing, the process of identifying the
location or coordinates of the probes; segmentation, the process of classifying the pixels as either signal
or background; and intensity extraction, the process of calculating the intensity measures for each probe.
8.3.1.1 Custom Spotted Microarrays
The model usually employed for measuring gene expression with custom spotted microarrays is to
simultaneously hybridize to the microarrays two samples of labeled RNA, one of which is a “reference”
RNA and the second which is the “test” RNA. The “reference” RNA will be the same for all microarrays,
while the “test” RNA used on a microarray will generally vary, being different for each sample or
condition being tested. The “reference” and “test” RNAs are each labeled with a different fluorescent
dye (frequently cy3 and cy5) and are distinguishable by the instrument scanning the custom spotted
microarray. Therefore, the raw data for custom spotted microarrays consist of two images, one for each
dye.
Probes are typically spotted in the microarrays (slides or substrates) as circles; however, the spots may
vary somewhat in shape and size due to variations in the printing process. Although the layout of the
cDNA microarray is known and can be used for addressing, the known model must be matched to the
scanned image. Therefore, most software packages include both automatic and manual procedures for
addressing. Once the address of the spots has been identified, the pixels must be classified as signal vs.
background, or segmentation. Methods of segmentation include fixed circle, adaptive circle, and adaptive
shape segmentation. When using fixed circle segmentation, all probes in the image are assumed to have
the same constant diameter. To estimate a diameter separately for each spot, one would employ adaptive
circle segmentation. Adaptive shape segmentation allows for noncircularity of the spots. After identifying
©
Number 18 MM12-P
the location, size, and shape of each probe, the signal and background are calculated as some function
(e.g., mean, median) of the pixel values within each segmented area.204
Tools: ScanAnalyze (available at http://rana.lbl.gov/EisenSoftware.htm)

R/Spot package – Bioconductor: http://www.bioconductor.org/
8.3.1.2 Commercially Available Microarrays
Most commercially available microarrays are accompanied by equipment and software which is used to
scan the microarray and save it as an image file. For example, the scanned image may be interpreted by
the superpositioning of a grid for feature cell addressing. Given the standard nature of manufactured
arrays, often such software performs cell addressing automatically but permits the user to manually adjust
the grid if it is not properly aligned. After identifying the location of each feature cell (addressing), the
pixels comprising each feature cell are used in the calculation of feature intensity. For example, the
software may take the median or 75th percentile of the pixel level intensities within each cell to represent
intensity for the given feature. The software also may estimate background using some function of the
pixel or feature level intensities such as the lowest 2% of the feature intensities. For manufactured
microarrays where several features interrogate the same transcript, often a method to calculate signal for
the transcript is employed after addressing and intensity extraction.
8.3.2 Background and Normalization
Background — Background represents a value of the measured signal intensity that is presumed to be due
to nonspecific binding of target to the probe and which must be removed from the signal intensity
measurement in order to accurately quantitate the amount of target RNA present in the sample.
Normalization — The purpose of normalization is to remove experimental artifacts of no direct interest;

that is, the removal of systematic effects other than differential expression such as minor differences in
labeling efficiency, scanning intensity, and so forth that produce small overall intensity differences
between multiple chips or between two channels on the same chip. Normalization procedures often
include background subtraction, detection of outliers, and removal of variation due to differences in
sample preparation, microarray differences, and scanning differences.
8.3.2.1 Custom Microarrays
The need for normalization has been clearly demonstrated in the cDNA microarray setting by examining
expression ratios from self-self hybridizations. Probe cDNA is spotted in circles using a series of pins;
however, there can be minor variations in the amount of deposited probe DNA among spots on the same
microarray (due to pin variability), variability in the amount of probe DNA printed between spots on
different microarrays (within pin variability), or differences in the size and shape of deposited target.
High-quality commercial pins have virtually eliminated microarray variability owing to mechanical
differences between pins, offering significantly better quality than the home-made tools used in the early
stages of the technology. Additionally, two samples — namely, an experimental and reference sample —
are labeled and hybridized to one chip. Differences in the labeling process, either due to unequal
incorporation of the dye labels, unequal amounts of experimental and reference sample hybridized, or
differences in the labeling efficiencies can also be a source of signal variability.
For a single slide or substrate, creating an M vs. A plot (MVA) is helpful to achieve normalization. The
MVA plot consists of plotting for each spot,
©
Volume 25 MM12-P
Red signal
M = log 2 = log 2 (Red signal) - log 2 (Green signal) (difference) vs.
Green signal
log 2 (Red signal + Green signal)
A= (average).
2
For global normalization, a constant function equal to the mean or median of the log ratios M may be
subtracted from all spots. Alternatively, one may normalize by adjusting for important experimental
artifacts such as location and pin origin. This is usually accomplished using robust, locally weighted
regression to model the intensity log-ratios M on the predictor variables. Another alternative is to apply
robust, locally weighted regression of intensity log-ratios M on the average log-intensity A overall (global
loess). Rather than apply this global loess normalization method, within-pin-group loess is preferred,
whereby robust, locally weighted regression of intensity log-ratios M on the average log-intensity A
within pin groups is fit. Recently, model-based methods have been used, whereby the intensities from the
two separate color channels have been included as independent variables along with dye, microarray, and
other experimental conditions.205-207 The residuals from the fitted model have been used as the measure of
spot intensity (measurand).
8.3.2.2 Premade Microarrays
The impact of normalization procedures on expression data may be visually examined using boxplots of
the signal intensities. For premade microarrays to which only one sample is hybridized when comparing
microarrays i and j, MVA plots are useful. Under these circumstances, MVA plots plot the log ratio of
⎛ signal1 ⎞
the signal of one microarray against that of the second microarray, i.e., log 2 ⎜ ⎟ , vs. the average log
⎝ signal2 ⎠
log 2 ( signal1 ) + log 2 ( signal2 )
intensities, . Ideally, these MVA plots should show a cloud of points about
2
zero for technical replicates.208
8.3.3 Calculation of Expression Levels
Expression levels — The amount of target sequence present in a sample of RNA is generally assumed to
be linearly related to the amplitude of the signal intensity obtained from Section 8.3.1 above after
correction for background and normalization. For custom spotted microarrays, the target sequence is a
direct measure of the gene transcript, since the probe is generally a cDNA sequence. For short
oligonucleotide probes used in microarrays for internal correction of nonspecific hybridization, a more
complicated process must be performed to combine the signal intensities of the 11 to 20 probe pairs (i.e.,
perfect match and mismatch) into a single value that measures the expression of the RNA transcript of
interest. Microarrays with longer oligonucleotides offer sufficient specificity to allow a single
oligonucleotide to be used per gene.169,209
8.3.3.1 Custom Spotted Microarrays
Gene-level summaries from cDNA microarrays are most frequently reported as the relative signal of the
experimental sample to reference sample, calculated as the ratio of the red and green intensities for each
Red signal
spot. Most often, the log2 transformation is applied; therefore the measurand is log 2 for
Green signal
each spot. Recently a model-based measurement procedure has been developed whereby the intensities
from the two separate color channels have been used.205-207
©
Number 18 MM12-P
8.3.3.2 Premade Microarrays
Gene expression is obtained when the intensity values for the panel of multiple probes which interrogate
the same transcript are combined to provide a measure of gene expression. Commercial software
calculates a measure of gene expression using the perfectly matched probe and designed mismatched
probe data. Other methods of calculating gene expression for specific data are also available.210-213 Users
should follow the instructions when using premade microarrays from commercial vendors.
8.4 Gene Filtering and Identification of Differentially Expressed Genes
Gene filtering is performed on expression data that have already undergone low-level analysis in order to
remove as many sources of obscuring variation as possible.214 The filtering step may exclude spots or
probe sets with too many “bad” values on microarrays, exclude spots or probe sets that do not vary
significantly, or may exclude spots or probe sets with a fold-change less than an arbitrary specified
threshold. This is an important step prior to high-level data analysis, because most algorithms used in
such analyses do not perform well with thousands of variables.
Following low-level analysis, the resulting background adjusted normalized data must be examined to
identify which genes are most likely to yield insight into the biological process being evaluated. There
are two general approaches for identifying such genes. The first is to compute some measure of
variability, such as the coefficient of variation (standard deviation/mean) for each gene across all of the
microarrays and retain those that show the highest variability. The second is by gene to compute an index
for samples belonging to each of the phenotypic classes, then assess the likelihood that the indexes of the
different groups/phenotypes are significantly different. A test that may be applied when it is desirable to
exclude spots or probe sets that do not vary significantly is to calculate the variance for each spot 1,…,k;
then exclude those spots i where (n - 1)s i2 < χ 2 (1 - α, n - 1) × median (s12 ,..., s k2 ) . In this equation, n
represents the number of arrays, α is the desired significance level, χ2(1-α,n-1) is the quantile associated
with the chi-square distribution at the 1-α level with n-1 degrees of freedom, and median (s12 ,..., s k2 ) is the
median of the k variance estimates.
Tools:
Significance Analysis of Microarrays: http://www-stat.stanford.edu/~tibs/SAM/index.html
R/Bioconductor: http://www.bioconductor.org/
NCI – BRB Array Tools: http://linus.nci.nih.gov/BRB-ArrayTools.html
Whitehead – GeneCluster2: http://www-genome.wi.mit.edu/cancer/software/software.html
8.5 High-Level Data Analysis – Unsupervised Learning Algorithms
Unsupervised learning or clustering involves the aggregation of samples into groups based on similarity
of their respective expression patterns. The aim is to identify structure in a complex data set without
making any a priori assumptions. However, because many different relationships are possible in a
complex data set, the structure uncovered by clustering may not reflect critical or biological distinctions
of interest. There are various clustering algorithms used for class discovery, including hierarchical, k-
means, and k-medoids. Inherent to all clustering procedures is that each method seeks to place objects
(genes) into clusters such that objects (genes) within the same cluster are more similar than objects
(genes) in different clusters. A detailed description of various unsupervised learning methods is provided
in Hastie, et al.215
Clustering methods require that a distance measure be specified. There are multiple methods that may be
used to measure the distance or similarity between genes from two samples. Distance between two
samples may be defined as the Euclidean distance. Given two arrays (samples), the expression levels can
be expressed in a matrix as
©
Volume 25 MM12-P
Microarray 1 Microarray 2
⎡g 11 g 12 ⎤
⎢g g 22 ⎥⎥
⎢ 21
⎢M M ⎥
⎢ ⎥
⎣g G1 g G2 ⎦
where there are G rows of gene expression levels and two columns, one for each microarray. The
Euclidean distance would be calculated as the square root of the sum of the G-squared distances between
G
the genes for samples 1 and 2, symbolically, Σ (g i1 - g i 2 )2 . Other distance measures include the angle
i=l
between the vectors that are formed between the data point of the array and the center of the coordinate
system:
G
∑g g
cos α1,2 = i =1 i1 i2
G 2 G 2
∑g ∑g
i =1 i1 i =1 i2
and Pearson’s correlation

G
(
∑ g − g.1 g i2 − g.2
i=1 i1
)( )
ρ1,2 =
G
(
∑ g i1 − g.1
i=1
2 G
)
∑ g i2 − g.2
i=1
2
( )
Note that calculating distance as the vector angle given a dataset that has been already been normalized to
the means will yield Pearson’s correlation; therefore, Pearson’s correlation may “over-normalize” the data
if one is working with normalized data. Additionally, because the Euclidean distance will tend to cluster
genes with similar profiles but different absolute expression levels into different clusters, the vector angle
and Pearson’s correlation are considered to represent more biologically plausible distances to use when
clustering.
Agglomerative hierarchical clustering has been widely used in microarray data analyses.216,217 At the first
step, all genes are essentially clusters of size 1. Once the distance between all pairs of clusters has been
calculated, hierarchical clustering groups the two clusters that are “most similar,” that is, the two closest
genes are combined to form the first cluster. Thereafter, the distance between all pairs of clusters is
recalculated, and the clustering algorithm continues to merge the two closest clusters until all genes are
assigned. Once a cluster includes more than one gene, there is some question regarding how one should
calculate the distance between clusters. One can calculate the distance using the nearest neighbor (single-
linkage), the center of the genes within the cluster (average linkage), or the maximum distance between
all members of the cluster (complete linkage). Genes are grouped by continuing this procedure until all
genes are merged into one large cluster. Results are typically displayed in a dendrogram with a red and
green scaled matrix.
K-means clustering215 requires that K (the number of clusters) be prespecified. Once K has been
determined, each gene is randomly assigned to be a member of one of the K clusters. Then the method
iteratively proceeds as follows:
©
Number 18 MM12-P
Step 1. The center (centroid) of each cluster is calculated.

Step 2. For each gene, the distance between the gene and the K centroids is calculated.
Step 3. The gene is assigned to the cluster that minimizes this distance.
Steps 1 to 3 are repeated until no assignments change. K-means uses the squared Euclidean distance as the
distance metric.
K-medoids, also referred to as “partitioning around medoids,” is similar to the k-means clustering
algorithm, with the exception that the distance between each gene and the K clusters is taken to be the
sum of the distances between the gene and all members of the cluster. Again, the genes are iteratively
assigned to that cluster, which minimizes this distance until cluster assignments do not change.
Self-organizing maps (SOM) is another clustering method similar to the K-means clustering procedure;
however, the centroids are constrained to a two-dimensional space rather than N-dimensional space,
where N is the number of feature vectors (e.g., the number of genes).
The primary disadvantage in using clustering methods for data analysis is that there is not a clear measure
of success that can be ascertained. Therefore, any comparison of the effectiveness of different clustering
procedures is somewhat subjective, largely based on the biological plausibility of the resulting clusters.
Unsupervised methods:
• hierarchical clustering
• k-means
• k-medoids
• self-organizing maps
• principal components
• multidimensional scaling
Tools:
Eisen Lab Cluster and Treeview: http://rana.lbl.gov/EisenSoftware.htm
WhiteheadGeneCluster2:http://www-genome.wi.mit.edu/cancer/software/software.html
8.6 High-Level Data Analysis – Supervised Learning and Classification Procedures
8.6.1 Identification of Genes That are Differentially Expressed Between Known Classes
Supervised learning incorporates knowledge of the phenotypic class information of the samples being
examined. For example, in general situations when there are two known classes (phenotypes) and one
wishes to determine whether there is a significant difference in the quantitative level of Class 1 vs. Class
2, a two-sample t-test or a Wilcoxon rank sum test may be conducted. In the microarray setting, there are
often thousands of genes, each measured on a quantitative scale, and it may be desired to determine what
genes are “significant” or important. Although the two-sample t-test and Wilcoxon rank sum test may be
employed, it is important to make an adjustment for multiple comparisons.218 For example, when testing
10 000 genes each at a Type I error rate of 0.05, one would expect to find 500 genes to be significant by
chance alone (assuming genes are independent). The simplest approach to correcting for multiple
comparisons is to simply divide the desired Type I error rate by the number of comparisons being tested.
This is the Bonferroni adjustment and is likely the most well-known method for adjusting for multiple
comparisons. However, it is quite stringent for testing individual null hypotheses and likely leads to a loss
in power, i.e., a null hypothesis may not be rejected when in fact it is false. That is, the Bonferroni
©
Volume 25 MM12-P
adjustment may lead to a failure to accept a truly significantly different test result. Westfall and Young
developed another approach referred to as the “step-down” method for adjusting p-values. It is more
powerful, but like the Bonferroni method, this step-down method assumes each test is independent.
Additionally, if there are more than two classes of observations under study, ANOVA or the Kruskal-
Wallis test may be used. Tusher219 developed a method of identifying differentially expressed genes by
controlling the false discovery rate. This latter method is referred to as “SAM” (significance analysis for
microarrays).
8.6.1.1 Estimating p-values Through Resampling and Permutation
In gene expression analyses, genes are likely correlated because they are coregulated. As a result, the
multiple tests of hypothesis are not independent. Resampling methods can be used to provide adjusted p-
values and are not dependent on the unknown joint distribution of the test statistics. Assume that the two
phenotypes/classes of samples are independent random samples drawn possibly from two distributions.
The null hypothesis is that there is no difference between the two groups with respect to gene expression.
Under the null, any of the gene expression levels could have come from either Group 1 or Group 2.
Therefore, the m Group 1 and n Group 2 expression levels are combined to form a single set of values of
size m + n. A sample of size m is drawn without replacement to represent Group 1, and the remaining n
observations represent Group 2. The test statistic θ̂ * is computed using these artificially created samples.
For example, a typical statistic is the difference of the means of the expression values for gene i for the
two artificially created samples divided by the estimated standard deviation of the two samples. The
permutation-achieved significance level (ASLperm) is perm ASL

Ho = Prob {θ̂* ≥ θ̂ }
. If one creates as many
artificial samples as were possible given the m + n observations, then the probability that the test statistic
was greater than some predetermined value would be equal to the number of times that θ̂* exceeded the
predetermined value θ̂ divided by the total number of permutations, which is equivalent
{
to # θ̂* ≥ θ̂ } ⎛⎜⎜ mn+ n ⎞⎟⎟ , since there are ⎛⎜⎜ mn+ n ⎞⎟⎟ permutation replications. However, due to the large
⎝ ⎠ ⎝ ⎠
number of permutation replications possible, Monte Carlo methods are typically used to approximate the
permutation distribution. These methods randomly select a subset of possible permutations to create the
distribution frequency of the statistic. After taking N permutations, the null hypothesis is rejected if the
observed value of the test statistic is among the largest 100α% permutation values. The bootstrap method
may also be used where the random sample is drawn with replacement independently from the m Group 1
and n Group 2 observations.
8.6.2 Predicting Class Labels Based on Gene Expression Data
Aside from identifying lists of differentially expressed genes in known classes/phenotypes, researchers
are increasingly interested in predicting the G class labels from a set of gene expression measures, where
the G class labels are known. The standard approach is to identify a set of samples, which comprise the
training set for the analysis. Supervised learning procedures are applied to the training set to identify the
levels of gene expression, which best make the phenotypic class distinction. These features are then
applied to an independent test set to verify the ability of the features to make that distinction.
The first step in constructing a class predictor is to filter the data to avoid overfitting the model (see
Section 8.4, Gene Filtering and Identification of Differentially Expressed Genes). In predictive modeling,
more stringent filtering criteria must be applied to adequately decrease the number of model parameters.
Once filtered, a supervised learning method such as k-nearest neighbors, neural networks, compound
covariate prediction,220 or classification and regression trees may be employed. A list of some of the more
commonly used methods is given at the end of this section under the heading, Methods of Supervised
©
Number 18 MM12-P
Learning. A detailed description of various supervised learning methods is provided in Hastie, et al.215
Two of these methods are explained in more detail below.
8.6.2.1 k-nearest Neighbor Classification
The method for k-nearest neighbors is as follows: for each microarray, identify the k closest microarrays
using a distance measure to form a group. The class prediction for the group is determined by majority
vote. For example, consider two genes x and y and their measured expression values. Assume that the
values for each of these genes are known for A number of cases with a good outcome and B number of
cases with a poor outcome. Next, for all A + B cases plot each case using the expression values of the x
and y genes as the x and y coordinates. Now move systematically from point to point in the xy plane. At
each point examine k-nearest neighbor cases and assign that value to the point that represents the majority
vote of the k cases. Note, odd values of k are typically used to avoid ties; when k is even and a tie occurs,
it is broken by a random number generator. Once the above process has been completed, draw a boundary
(or boundaries) that separate the regions of the xy plane which have majority votes for A and those for B.
The boundary(s) create the classifier for any new cases where the outcome (i.e., whether the case is an A
case or B case) is unknown. The new case is placed onto the xy plane at the location of its values of x and
y genes and obtains the classification associated with that region of the xy plane as determined by the
boundaries created from the training set. More computationally advanced methods of supervised learning
such as neural networks and support vector machines are available; due to their complexity they require
many more arrays (samples) than k-nearest neighbors.
8.6.2.2 Classification and Regression Trees (CART)
Classification and regression trees are methods for predicting class membership or outcome. If one is
interested in predicting survival time or disease-free survival time, tree-structured survival analysis, which
accounts for censored data, can be used to predict prognosis based on gene expression. There are two
general tree-based methods for modeling survival data. Using the first method, the log-rank test is used as
the splitting criteria. That is, all samples are placed in the root node. Thereafter, for each gene the
algorithm determines the expression level at which the log-rank test statistic is maximized (the splitting
value). Then among all genes, the gene reporting the smallest p-value is chosen as the first split. All
samples with an expression level less than the splitting value for that gene are placed in one daughter
node while the remaining samples, whose expression level for that gene are greater than the splitting
value, are placed in a second daughter node. Splitting of all subsequent nodes continues according to this
rule continued until there are fewer than n (e.g., 20) subjects observed in a terminal node or until
censoring becomes too heavy. One starts with growing an overly large tree, then applies pruning to select
(cut back to) the best subtree. Segal describes a pruning method used for tree-structured survival analyses,
in which the log-rank statistic is employed as the splitting criterion.221 This is analogous to selecting a
node and all its daughters with an insignificant p-value for pruning. Another tree-based method is
available that assumes the exponential distribution is appropriate for the survival times.
8.6.2.3 Assessing the Accuracy of Classification Algorithms
An extended discussion of this subject is given in Section 9.2, Diagnostic Utility. The main points are
summarized here. In developing a prediction rule, one requires multiple subjects from each of the
classes/phenotypes under study. As previously discussed, most class prediction methods do not work well
when there are a large number of independent variables (genes/spots/probe sets) in comparison to the
number of samples, as this typically results in over fitting the model. The implications of overfitting are
that the model works well for the data at hand, but may perform poorly when applied to an independent
data set. Therefore, filtering the set of spots/probes sets to reduce the dimensionality is essential. Once a
set of “important” genes has been identified, the multivariate function producing the prediction rule must
be defined. Examples of multivariate methods include weighted voting, support vector machines,
compound covariate prediction, k-nearest neighbors, and others as mentioned in this section. It is essential
©
Volume 25 MM12-P
that once the prediction rule has been defined, an unbiased estimate of the true error rate must be
calculated. Applying the prediction rule to an independent dataset and k-fold cross-validation are two
commonly used methods that provide a measure of the quality of a prediction algorithm. In a setting
where many observations are available, it is recommended that the dataset be randomly divided into two
parts, representing a training and test data set. The prediction algorithm is built using the training data set
and once a final model has been developed, the prediction rule is applied to the test data set to estimate
the generalization error. In the clinical diagnostic setting, it is recognized that often a laboratory may be
constrained by the number of samples, so that withholding a large portion of the data for validation
purposes may limit the ability of developing a prediction rule. Therefore, cross-validation has proven
useful for small data sets.
K-fold cross-validation requires one to randomly split the dataset into K equally sized groups. Thereafter,
the model is fit to K-1 parts of the data and the generalization error is calculated using the Kth remaining
part of the data. This procedure is repeated so that the generalization error is estimated for each of the K
parts of the data, providing an overall estimate of the generalization error and its associated standard
error. With a sample size of n ≥ 200, five- or tenfold cross-validation will provide an estimate of the
generalization error that is approximately unbiased. With smaller sample sizes, five or tenfold cross-
validation tends to overestimate the generalization error. Therefore, in small sample settings, N-fold
cross-validation, or “leave-one-out,” is recommended as it provides an unbiased estimate of the
generalization error. A permutation-based p-value for the cross-validation rate may be used to assess the
strength of the multivariate predictor. The procedure is as follows: for each random permutation of class
labels, the entire cross-validation procedure is repeated to determine the cross-validated misclassification
rate obtained from developing a multivariate predictor with two random classes. The final p-value is the
proportion of the random permutations that gave as small a cross-validated misclassification rate as was
obtained in the observed data. In summary, N-fold cross-validation is likely to be the most practical
method for assessing the generalization error of a prediction algorithm when modeling microarray data,
and the probability of the cross-validation error being as extreme or more extreme than the one observed
may be calculated using permutation testing.
Methods of Supervised Learning
• linear or quadratic discriminants

• k-nearest neighbors
• compound covariate prediction
• weighted voting
• naïve Bayes
• neural networks
• support vector machines
• classification and regression trees
Tools:
9 Validation
Validation of analytical tests within the clinical laboratory has largely centered around measurement of a
set of parameters related to the analyte or analytes being measured in the assay including accuracy,
trueness, repeatability, linearity, effect of interfering substances, etc. Determining these parameters is
considered under the subject of “analytical validation” of a clinical test. In the field of microarrays,
analytical validation is greatly complicated by the large number of separate analytes being simultaneously
evaluated and by the tremendous amount of postanalytical data processing required to obtain a
©
Number 18 MM12-P
measurement of the analyte. In fact, the steps required just to obtain the measurement of the analyte
depend upon how the image analysis, expression summaries, transformation of the data, normalization,
and predictive modeling steps are conducted. This will pose substantial challenges for laboratories
regulated by accrediting agencies or organizations that require proficiency testing to demonstrate the
accuracy of the test at least twice each year. See the current version of CLSI/NCCLS document GP29 —
Assessment of Laboratory Tests When Proficiency Testing is Not Available.
9.1 Analytical Validation
Analytical validation of diagnostic microarrays requires that one validate the microarray through
examination of hybridization controls, examining assay operating characteristics and validating gene
expression via independent assays.
9.1.1 Polymorphisms and Mutations (Changes Within a Genome)
Manufacturers of, or laboratories performing, microarray-based assays for sequence variation detection
must establish the analytical specificity and sensitivity for such tests. Due to the large number of
polymorphic or somatic mutation sites that can be simultaneously assessed using high-density
oligonucleotide microarrays, validation with well-characterized, positive genomic DNA controls for all
possible alleles or mutations may not be possible. For many loci mutation positive validation controls can
be obtained from cell repositories (e.g., NIGMS Human Genetic Cell Repository) in the form of cell lines
or isolated genomic DNA. In general, DNA sequence analysis is considered to be the "gold standard" for
assay comparison and validation purposes. When genomic positive controls are not available, it is
recommended that synthetic controls be produced by site-directed mutagenesis procedures. The identity
of such synthetic controls should be also be verified by bidirectional DNA sequence analysis prior to use
in microarray assay validation. Reconstructions using pools of PCR targets generated from synthetic
and/or genomic samples should be performed to determine the sensitivity for detection of specific genetic
variation in the heterozygous and homozygous state. Further, assay specificity should also be assessed
with a significant number of normal samples.
For resequencing applications where hundreds of unique mutations can in principle be detected (e.g., p53
tumor suppressor gene), a practical approach for establishing assay sensitivity must include known
genomic samples carrying common, frequently occurring mutations. Additionally, some genomic or
synthetic samples representative of common or rare mutations that might be expected to be challenging
for routine detection (e.g., frameshifts, mutations within palindromic regions or homopolymer runs)
should also be included in analytical validations. Further, a range for limit of detection should be
established where somatic mutations are going to be measured in mixed cells populations (e.g., fine
needle aspirates) by dilutions of known particular mutant controls into normal control samples.
9.1.2 Gene Expression (Changes Within a Transcriptome)
Validation of an expression microarray assay will, in general, involve development of tolerance levels or
limits for a set of “internal control” analytes that are measured simultaneously with the test sample and
independent comparisons between measurements obtained for a predetermined set of genes from the
microarray assay and a “gold standard.” The latter will be part of the validation study that assesses the
operating characteristics of the assay.
9.1.2.1 Internal Control Analytes
Various methods to evaluate the trueness of intensity measures from a microarray exist. Internal controls
or spots/probe sets included multiple times on one microarray should be assessed. Manufacturing quality
control for some commercially available arrays evaluates three aspects: design, synthesis, and signal
©
Volume 25 MM12-P
intensity by using hybridization controls. As many as 100 probe sets that represent hybridizaton control
genes have been included on certain commercially available arrays.
9.1.2.2 Assay Operating Characteristics
Assessment of assay operating characteristics such as trueness, repeatability, linearity, etc. must be
evaluated for a defined set of analytes. The approach to the analytical validation of the selected analyte
should follow previously established principles for individual analytes. One method of assessing the
trueness of the measured intensities is to divide an RNA sample into multiple aliquots and hybridize this
same RNA sample to multiple microarrays, then perform a variance components analysis. In the cDNA
microarray setting, a “dye swap” experiment should be performed to exclude artifacts arising from the use
of different fluors. Use of a standard reference sample to assess stability of the system is essential; the
same lot of a standard reference material should be available for a number of years for testing new lots of
microarrays and process drift. Alternatively, an RNA sample may be split with another laboratory that
performs the same test. Significant genes could be validated using RT-PCR or another more traditional
gene expression assay.
The linearity of the intensities can be assessed using serial dilutions of the same RNA. (Refer to the most
current edition of CLSI/NCCLS document EP6 — Evaluation of the Linearity of Quantitative
Measurement Procedures: A Statistical Approach.)
9.1.2.3 Selection of a Gene Set for Test Validation
It is obviously not possible to perform an analytical validation for every probe (or probe set) that may be
present on a microarray. For clinical testing, it must be assumed that microarrays produced commercially
for this purpose will be provided from manufacturers that follow GMP with regard to microarray
production. For in-house manufactured cDNA custom spotted microarrays, careful documentation of the
manufacturing process will be necessary, including possibly sequencing the solutions being spotted for
the analytes used for assessment of assay operating characteristics.
At this juncture, selection of a set of analytes for assessment of assay operating characteristics will
depend on the clinical purpose for which the assay is to be used and the importance (weight) of the
analyte for affecting the outcome of the classification rule in which the analyte is utilized. Spots or probe
sets reported as significant with weights above a given threshold should be validated by use of an
independently established method such as real time RT-PCR. In addition, confirmatory studies may be
required that may be based on a different assay method requiring development of gene-specific reagents
such as antibodies or DNA probes.
9.2 Diagnostic Utility
The traditional model for judging the diagnostic utility of an assay is to test the assay using samples from
patients with and without a disease, and compute its diagnostic sensitivity (probability that the test is
positive if the patient has the disease; P(T+ | D+)) and diagnostic specificity (probability that the test is
negative if the patient does not have disease; P(T- | D-)). These values may then be used together with a
knowledge of the disease prevalence and Bayes’ Theorem to the calculation of a positive (P(D+ | T+)) and
negative (P(D- | T-)) predictive value for the assay. See the most current edition of CLSI/NCCLS
document EP12 — User Protocol for Evaluation of Qualitative Test Performance.
Microarray test results, however, introduce a new aspect to the above algorithm in that a clinical report
regarding the results from a microarray test is more likely to be in the form of the result of some
prediction rule (a categorical diagnosis or prognosis), rather than a report containing the expression, SNP,
or mutation data from thousands of spots/probe sets. Consequently, the new aspect that is introduced with
microarray studies is the requirement for the prediction rule or classifier. No longer are diagnostic
©
Number 18 MM12-P
sensitivity and specificity directly tied to the analytical sensitivity and specificity of the assay that
measures the presence or absence of a specific analyte, but instead on the categorical “outcome”
calculated from multiple analytes.
To reiterate, microarrays can produce signal intensities for thousands or tens of thousands of genes. One
function of a classifier is to take the relevant data produced by the microarray and, based on the signal
intensities allocate the case into a single category of a very limited number of categories – generally only
two. This is frequently achieved through creation of a univariate “risk score” or “index” using any of the
various statistical methods discussed in Section 8.6.2, such as logistic regression or the compound
covariate predictor method. In the logistic regression, for example, patients are commonly classified as
positive for the disease if the predicted probability of disease from the resulting model is ≥0.5, and
patients are classified as negative for the disease if the predicted probability of disease from the model is
<0.5. This value of 0.5 is a commonly used cutpoint, but it is not necessarily optimal as will be discussed
in Section 9.2.3.2. In the sections that follow the analytical relationships developed have assumed a
univariate index of value x that has been derived from the multivariate microarray data.
9.2.1 Development of a Prediction Rule (Classifier)
Development of a prediction rule or classifier is a supervised learning process. Creation of the classifier
requires a set of cases (training set) in which the outcome variable to be predicted by the classifier (e.g.,
presence or absence of disease) is known a priori. The k different possibilities for the outcome variable
are referred to as classes. Ultimately the accuracy of the classifier must be assessed. As summarized in
Section 8.6, the misclassification rate can be assessed by applying the prediction rule to an independent
data set or by cross-validation. The misclassification rate then can be used to judge the performance of the
classifier.
Classifiers can be viewed as arising from two general approaches.222 The first, sometimes referred to as
the sampling paradigm, is to create models of the distribution of the gene expression values for each
class and then construct rules that identify boundaries that optimally separate the classes. The second
approach, sometimes referred to as the diagnostic paradigm, attempts to directly model the conditional
probabilities of class k, that is, the likelihood that a case belongs to class k, given the observed gene
expression values. These conditional probabilities are typically referred to as posterior probabilities. The
model clearly depends on the distribution of gene expression values, but the analytical techniques depend
mainly on data in regions of overlap of the distributions of values from the respective classes. Having
some understanding of which approach is being used can be helpful in understanding the limitations of a
particular classification rule.
9.2.1.1 Basic Structure of Classifiers
Figure 11 shows a hypothetical training set of data for two classes of patients. Class 1 patients are
without disease and Class 2 represents patients with disease. The situation represented in Figure 11 is
that seen with many standard clinical laboratory tests. Class 1 and Class 2 patients are assumed to
represent a sample drawn from some population that includes both patients with and without disease. The
frequency curves for Class 1 and Class 2 cases can be thought of as being constructed in three steps. First,
draw a sample of patients from the population and independently assign patients to Class 1 (without
disease) or Class 2 (with disease) based on independent criteria. Next, measure the value of the gene or
other biomarker in each case. Finally, plot the number of patients in each class for each value of x
divided by the total number of cases in the sample.
©
Volume 25 MM12-P
Frequency of ALL Cases

P(x) = [P(X=x, C=1) + P(X=x, C=2)]
Frequency of
Frequency of
Class 1 Cases
Class 2 Cases
P(X=x, C=1)
P(X=x, C=2)
Frequency of Cases
Class 2 =
Class 1 = No
Disease
Disease
present
false negatives XI
false positives uk
XL XR
Expression Index (g) [x - units]
Figure 11. Hypothetical Distributions of Expression Index Values g in a Population of Patients With
and Without Disease (Two Classes)
The resulting frequency curves represent the joint probability of a case having gene g with value x AND
being a member of Class k, where k = 1 or k = 2. The probability of observing gene g with expression
value x given the patient is from Class 1 is given by P(X=x | C=1 ). This conditional probability can be
calculated from the joint probability if one knows the proportion of the sample that comprises each of the
classes. The probability that someone in the sample population is a member of Class k is called the prior
probability for Class k, πk = P(C=k). Thus,
P ( X = x, C = 1) P (C = 1 | X = x) • P ( X = x)
P (X = x | C = 1) = = (1)
P (C = 1) π1
and similarly for Class 2,
P ( X = x, C = 2) P (C = 2 | X = x) • P ( X = x)
P ( X = x | C = 2) = = .
P (C = 2) π2
In the hypothetical data in Figure 11, the prior probabilities for the classes were both 0.50.
In viewing the data in Figure 11, it is clear that there will exist a range of values of x (denoted in Figure
11 as those values from xL to xR), where the value of x cannot unambiguously determine the class
membership of the patient being evaluated. There is then the issue of how one should proceed given that
there will be a region of ambiguity irrespective of the classification rule developed. To resolve this issue,
one must first identify the cost of a misclassification. To simplify this discussion, it will be assumed that
the misclassification costs are equal; that is, the cost of diagnosing a truly nondiseased person as having
disease is the same as diagnosing a truly diseased person as NOT having the disease. Let k equal the
correct class (k =1 or k = 2, i.e., [k ε{1,2}]) for a given patient whose gene g expression value equals x.
Let l represent the value of the class assigned to a patient by the classifier based on the patient’s value of
x. Then the Loss(k,l) function can be represented as in Equation (2).
⎧0 if l = k
Loss(k , l ) = ⎨ (2)
⎩1 if l ≠ k
©
Number 18 MM12-P
Intuitively, the cost of correctly classifying a subject ( l = k ) is zero, while the cost of misclassifying a
subject ( l ≠ k ) is 1. Next we will define a classifier (c(x)) which assigns a value l based on the value of x.
We need to know the risk associated with using this classifier c(x).
We will then define a second function, Risk(x | k), which represents the expected Loss. According to
Equation (2) above, the value of the loss function is one when c(x) makes an error; that is, c(x) predicts
class l ≠ k. In viewing the data in Figure 11, it is clear that low values of x should predict Class 1 whereas
high values of x should predict Class 2, with some degree of misclassification occurring in the region
from xL to xR. Applying Bayes’ theorem to Equation (1), which makes use of the multiplication rule and
the theorem of total probabilities, we can obtain the conditional probability P(C=k |X=x), which measures
the likelihood that a patient belongs to a specific Class k value given the value of x.
P (C = k , X = x ) P (X = x | C = k )P (C = k ) P (X = x | C = k )π k
P (C = k | X = x ) = = = (3)
P( X = x) 2 2
Σ P (X = x | C = j )P (C = j ) Σ P (X = x | C = j )π j
j=l j=l
The classifier that provides the lowest risk (i.e., the smallest expected loss) given the stipulation in
Equation (2) that the magnitude the losses for all types of misclassification are equal is shown by
Equation (4) and is referred to as Bayes’ Rule.
Let c(x) = k if P(C = k | X = x ) = max P(C = j | X = x ) (4)

j
Equation (4) [Bayes’ Rule] assigns a case with value X=x to Class k if Class k has the largest conditional
probability (i.e., max P(C = j | X = x ) ) among all classes when X=x. Examining the data in Figure 11, we
j
will classify a case as belonging to Class 1 when x < xI and as belonging to Class 2 when x> xI. For
values where x = xI, the observation is randomly classified as Class 1 or Class 2. Using this rule, there
will obviously be some cases that are incorrectly classified (the false negatives and false positives).
However, Bayes’ rule provides the lowest rate of misclassification that is theoretically possible given the
data.
To measure the risk associated with use of a classifier for a given value of x, one applies the rule to
predict the class for a new case based on the value of x and substitutes this value for l into the formula for
Loss in Equation (2). Then one multiplies this value times the likelihood that a case is Class 1 or Class 2
(i.e., the posterior probabilities) and sums over the total number of possible classes (Equation (5)).
2
Risk (x ) = ∑ Loss(c( x ), j)P(C = j | x ) (5)
J =1
Where:
c( x ) = k if P(C = k | x ) = max j P(C = j | x )
In order to assess total risk we must evaluate the risk over all values of x that occur in our population.
Therefore:
Bayes’ Risk = Risk(-∞<x<xL) + Risk(xL<x<xI) + Risk(xI<x<xR) + Risk(xR<x<+∞) (6a)
©
Volume 25 MM12-P
= (0 × 1 + 1 × 0)
+ (0 × P (C = 1 | x L < x < x I ) + 1 × P (C = 2 | x L < x < x I ))
+ (1 × P (C = 1 | x I < x < x R ) + 0 × P (C = 2 | x I < x < x R ))
+ (1 × 0 + 0 × 1)
= P (C = 2 | x L < x < x I ) + P (C = 1 | x I < x < x R )
P (x L < x < x I | C = 2)π 2 P (x I < x < x R | C = 1)π 1

= +
2 2
Σ P (x L < x < x I | C = j )π j Σ P (x I < x < x R | C = j )π j
j =1 j=1
It is desirable to assess total risk in terms of a quantity that can be determined empirically. In viewing
Figure 11, it is noted that the probability of misclassification of Class 2 cases pmc(2) occurs when
xL<x<xI and Class 2 is the true class, P(x L < x < x I , C = 2 ) . A similar expression for the probability of
misclassification of Class 1 cases pmc(1) can be derived for the interval of x xI<x<xR, which is
P(x I < x < x R , C = 1) Using the relationship between the joint and conditional probabilities from
Equation (1), the two misclassification probabilities can be expressed as:
(
P xL < x < xI | C = 2 π2 )
pmc(2 ) = and (6b)
2
(
∑ P xL < x < xI | C = j π j )
j=1
(
P x I < x < x R | C = 1 π1 )
pmc(1) =
2
(
∑ P xI < x < xR | C = j π j )
j=1
Comparing Equations (6a) and (6b) , it is seen that the Bayes’ Risk can be rewritten for our classifier c(x)
given our definition of loss that occurs with misclassification as the sum of the misclassification rates for
Class 1 and Class 2 (6c).
2
Bayes ' Risk (c( x ) ) = ∑ pmc( j) (6c)
J =1
Bayes’ Risk is the lowest value one can achieve if the prior probabilities (πK) and conditional probability
densities (P(X=x | C=k)) are known and provides a benchmark for all other classification procedures. In
the section to follow, which addresses judging the “goodness” of two classifiers, it is of limited value to
prove that one classifier with a misclassification rate of 22.8% is significantly better than a second at
18.5% if the Bayes’ Risk is of the order of 6%.
9.2.1.2 Sample and Diagnostic Paradigms
Having determined the general principal that must be invoked to create a classifier, how does one actually
develop such a rule? One approach might be to assume that the data for Class 1 and Class 2 cases are
normally distributed then use the data to derive a set of parameters that describe the “normal”
distributions, such as the means and standard deviations of the two distributions. Using these parameters
and the functions for the normal distributions, it should be possible to compute a boundary point between
the distributions (xI in Figure 11) at which the two classes of cases have equal probability of
©
Number 18 MM12-P
misclassification and therefore, according to Equation 4 the point where class assignment k should change
from Class 1 to Class 2. In general, methods that develop their classifiers by approximation of the
conditional probability density functions of x given class C=k belong to the sampling paradigm approach.
Expressions derived for the conditional probability density functions for Class 1 and Class 2 cases have
an important value for potentially identifying outlier values for these two classes. Failure to identify
outliers can unduly influence the values of the parameters calculated for the distributions of the
conditional probabilities, and thereby alter the value of the boundary point between the distributions
decreasing the accuracy of the classifier. The example from Figure 11, though fairly straightforward,
represents the use of a general statistical approach involving the estimation of finite mixture models,
either parametrically or non-parametrically. Such methods can also be applied when the model must be
more complex, such as when the variable x is multivariate rather than univariate and where the number of
classes is >2.
While development of classifiers according to the sampling paradigm has certain advantages with regard
to identification of outliers, much of the effort may involve approximating regions of the conditional
probability density functions that contain little information directly relevant to creation of the classifier.
For example, as seen in Figure 11, the probability distributions for Class 1 and Class 2 cases involve
values of x extending over a broad range (theoretically from -∞ to +∞), but as seen in Figure 11 and in
Equations (6a) to (6c) above, most of the information required for an accurate classifier is confined to the
region of x for xL<x<xR. The diagnostic paradigm attempts to directly estimate posterior probabilities
P(C=k|X=x) without directly determining the separate conditional probability density functions for each of
the separate classes.
Figure 12 shows the posterior probabilities (P(C=k | X=x) for k є {1,2}) for the hypothetical data in
Figure 1. In essence, Figure 12 shows that as x increases from -∝, the probability that a case belongs to
Class 1 (P(C=1|x)) is 100% until the value of x for the case falls into the range of xL<x<xR beyond which
the probability that a case belongs to Class 1 is 0%. A similar but inverse relationship holds for the
probability that a case belongs to Class 2. A conceptually simple but computationally intensive approach
to estimate the posterior probability P(C=k|X=x) for k = 1 or 2 (k є {1,2}) would be to first divide the
interval xL<x<xR into an arbitrary number of closely spaced values of x; then, count the number of cases
of Class 1 or Class 2 within each interval of x; and finally, divide those values by the total number of
cases within that interval of x. This is analogous to the application of k-nearest neighbor (knn) analysis to
situations in which the values of x are multivariate rather than univariate. In the latter case, the
multidimensional space is divided up into consecutive sequential hypercubes. Then for each hypercube
the k cases with values of x closest to the value of x at the center of the hypercube are identified (k-nearest
neighbors). The fraction of these cases that belong to each class is then determined and that proportion
becomes the posterior probability for each of the classes for values of x contained within the interval
represented by the hypercube. Following the classification rule noted in Equation (4) above, all new
cases (from a test set or in which the class designation is unknown) with values of x that fall into the
interval of x within the hypercube would be classified as according to the class of case having the greatest
posterior probability according to the knn estimation.
©
Volume 25 MM12-P
Likelihood Class Equals 1or 2Given the Value of Index g

[posterior probability –P(k=C | x)]
1
Prob(k=1| x) Prob(k=2| x)
(Class 2;Disease Present)
Probability of Class
(Class 1;No Disease)
as a function of ‘x’
0 XI
uk
XL XR
Figure 12. Posterior Probability Distributions (P(C=k|X=x) for a Population of Two Classes of
Patients — One With (k=2) and One Without (k=1) Disease — Based on the Value of Biomarker x
The knn method is useful in that it requires no knowledge of the distributions of the various conditional
probability density functions of any of the classes. However, it can be shown215 that either its accuracy
degrades significantly or the number of cases required to maintain acceptable accuracy becomes limiting
as the dimensionality of the x variable increases (“curse of dimensionality”).
An alternative approach, and the one that is the major source of classifier development, is the use of
regression analysis. Examples of the use of this approach listed in Section 9 include: linear and logistic
discriminator analysis, logistic regression, and classification and regression trees. In this approach, one
must postulate some analytical relationship between the posterior probabilities and the variables
(univariate or multivariate x). The analytical relationship is referred to as the “model” and it contains
parameters whose values must be determined based on the values in the training set. One then uses some
statistical method such as the method of least squares (MLS) or maximum likelihood to estimate the
parameters from the data in the training set. The parameters then permit an analytical relationship to be
specified between the posterior probabilities and the values of the independent variables that can be used
to predict the class of new cases. This approach imposes certain restrictions on the type of analytical
functions that may be used to define the model. Generally speaking, the analytical functions must be well
behaved. That is, they must possess certain properties such as first and second derivatives over the values
of x for which the model is defined. Once the values for the parameters have been determined, an
analytical relationship between the posterior probabilities and the values of the independent variables can
be specified and used to predict the class of new cases. While the type of analytical relationship used in
the model is somewhat arbitrary, as long as it is well behaved and fits the data, it is highly desirable that it
involves assumptions that can be justified on the basis of what is known regarding the biology of the
process under consideration.
To return to the example of Figure 11, a plot of the posterior probabilities for the two classes of cases is
shown in Figure 12. If one wishes to model these probabilities, one needs to obtain a function whose
values are confined between zero and one and which varies in a sigmoidal fashion as a function of x. One
such method that does meet this requirement as well as being well behaved mathematically is logistic
regression. Logistic regression is used to model the relationship between a categorical outcome variable,
which is usually dichotomous, and a set of predictor variables. Traditionally, logistic regression assumes
independent random sampling from a population, where the model is expressed as
©
Number 18 MM12-P
C i = π (x i ) + ε i .
In this equation, C i represents the classification variable; π (xi ) represents the conditional probability of
class i given independent predictor variables xi , or Pr (C i = 1 | xi ) ; and εi represents the binomial random
error term. More formally, the conditional probability π (xi ) as a function of the independent covariates
xi is expressed as
e x iβ
π (xi ) = Pr (C i = 1 | xi ) =
1 + e x iβ
(
where β = β 0 , β1 , β 2 , K , β p ) are the model parameters to be estimated and p is the number of β terms in
the model. For one predictor variable X, this is usually expressed as
eα+βx
π (x i ) = P (C = k | X = x ) = (7)
1 + eα+βx
Moreover, the logistic function can be used directly for estimating the odds ratio. For example, given a
dichotomous, independent variable X, the odds of Class 1 given X is present vs. X is absent is (Equation
(8a)).
π (1)
odds ratio =
[1 − π (1)] (8a)
π (0)
[1 − π (0)]
Examining Figure 12, one observes that near the value xL, P(C=1 | x) is close to one, and P(C=2 | x) is a
value close to zero. When x equals xI, the curves for P(C=2 | x) and P(C=1 | x) intersect, and thus the
value of the odds ratio at xI is one. At values of x near xR, P(C=1 | x) is close to zero with P(C=2 | x) close
to one. The natural logarithm of the odds ratio can be represented as a function of x (Equation (8b)):
⎛ ⎛ e α +βx ⎞ ⎛ 1 ⎞ ⎞⎟
⎜⎜ ⎟⎟ ⎜ ⎟
⎜ ⎜ 1 + e α +βx ⎝1+ e
α +βx
⎠⎟ ⎛ e α +βx ⎞
ln (odds ratio ) = ln⎜ ⎝ ⎠
⎟ = ln ⎜⎜ α ⎟⎟ = βx (8b)
⎜ ⎛⎜ e ⎞ ⎝ e
α
⎛ 1 ⎞ ⎟ ⎠
⎟⎟ ⎜ α ⎟
⎜ ⎜ 1 + eα ⎝1+ e ⎠ ⎟
⎝ ⎝ ⎠ ⎠
The shape of the resulting curves for the natural logarithm of the odds ratio of the posterior probabilities
are shown in Figure 13.
©
Volume 25 MM12-P
100 +? +?
Ln{p(k|x)\[1 - p(k|x)]}
Ln{p(k=2| x)/p(k=1|x)}
Ln{p(k=1| x)/p(k=2|x)}
0
-100 -? XI -?
uk
XL XR
Figure 13. Natural Logarithm of Odds Ratio of Posterior Probabilities of Class 1 vs. Class 2 Cases
Over x
While the above example has been derived using x as a univariate variable, it is directly generalizable to
the multivariate situation. In the latter circumstance, x becomes a vector with the number of components
equal to the number of variables being investigated (for example, a set of expression values for a set of
genes that have been judged to differ significantly between Class 1 and Class 2 patients). Then α and β
become column matrices with the number of terms equal to the number of component variables in x plus
one term for the intercept α.
The requirement for the development of a prediction rule introduces a new step into the validation process
that impacts validation sample requirements. Under the old paradigm, the presence or absence of the test
analyte in disease and nondisease states determined diagnostic sensitivity and specificity of the assay and
so only one population of sufficient size was required in order to gain confidence in the calculation of the
diagnostic utility of the test. However, the requirement of a prediction rule to evaluate implies evaluation
of two populations: first, a population of samples used to train generate the specific prediction rule using
some prescribed algorithm (a training set); and a second population of samples in which the algorithm is
tested to determine its sensitivity and specificity (testing set). In general, this is the approach described
by a number of investigators who have utilized many of the classification algorithms described in Section
8.6. However, specific details of the application of the algorithms are generally scanty, and at present no
standards regarding establishment of cutoff values are available.
9.2.2 Assessing the Performance of Prediction Rules (Classifiers)
As discussed in the preceding section, prediction rules or classifiers have inherent limitations for correctly
classifying a new case due to overlap in the distributions of the values for the gene expression values used
to distinguish cases of different classes. One obvious measure of performance of a classifier is its overall
misclassification rate (pmc). This may be estimated by applying the prediction rule to an independent
data set that is as large as the original data set, by using cross-validation (“leave a portion out” method),
or by permutation testing. In the clinical diagnostic setting, where a laboratory may be constrained by a
small number of samples, withholding a large portion of the data in order to create a test set may limit the
ability of developing a prediction rule. The cross-validation method where one leaves one out has proven
useful for small data sets.
As demonstrated in the preceding section (Equation (5)), when the loss resulting from incorrectly
assigning a case to a class is the same regardless of the type of misclassification, the risk is equivalent to
total pmc (i.e., pmc across all classes). Therefore, with regard to the example of Figure 10, risk is a
©
Number 18 MM12-P
measure of misclassification rate when Bayes Rule (Equation (4)) was used as the classifier, because the
loss function considers the misclassification of a person with disease as being disease free (false
negative), and the misclassification of a person without disease as being affected by disease (false
positive) to be equivalent errors.
Loss functions will generally be structured to address a specific set of clinical circumstances. As a result,
a “screening loss function” might be designed to penalize a false-negative misclassification at a value of n
times that of a false-positive classification. In the example of Figure 11, the risk function (Equation (5))
would weight more heavily misclassifications that are false negatives. The decision rule would likely be
altered by moving the boundary from xI left toward xL. The distance between xL and the boundary point
would be determined by the value of n, i.e., the extra cost of false negatives. In terms of Equation (6b)
the value for pmc (2) would be reduced, while pmc (1) would be substantially increased, thereby
increasing the total misclassification rate for the screening classifier over that of the Bayes Rule classifier.
This example demonstrates the point that misclassification rates should be used to judge the adequacy of a
classifier only if they utilize a common measure of loss (or cost) of misclassification in the development of
the prediction rule. Also, in judging the adequacy of two or more prediction rules, the correct comparison
would use preferable risk or alternatively one type of misclassification, i.e., pmc(k), where k is the
“critical” class which must be most accurately identified (e.g., in the above example patients with disease)
rather than the total misclassification rate.
To judge the accuracy of the performance measurement, it is necessary to have some estimate of the
variability of the measurement. This in turn can be utilized to establish a confidence interval about the
value of interest. A misclassification rate is, in fact, a proportion of incorrectly classified cases, and the
number of misclassified patients contained in any sample of tested patients would be expected to have a
binomial distribution. Setting p = pmc and n = the number of subjects tested, the likelihood that the
sample of n would contain m patients that were misclassified is given by Fleiss,223 Pagano and
Gauvreau224:
⎛n ⎞
P(# misclassified = m ) = ⎜⎜ ⎟⎟ p m (1 − p )
n −m
(9a)
⎝m⎠
If the sample size n is sufficiently large — that is, np>5 and n(1-p)>5 — the binomial distribution can be
approximated by the normal distribution. This knowledge allows us to construct estimates of the
expected variability for the value of pmc and provides a means for us to estimate the number of cases that
are needed to examine in order to achieve a level of accuracy for pmc that will be acceptable in the
setting. The standard error of p is given by Equation (9b):
p (1 - p )
s.e. ( p) = (9b)
n
If α equals the probability of the Type 1 error that is acceptable and +cα/2 the corresponding value of the
α/2 percentile of the standard normal distribution, then the 100(1-α)% confidence interval for p (=pmc) is
approximately (Fleiss)223:
p (1 - p ) p (1 - p )
p - cα 2 , p + cα 2 (9c)
n n
The results of Equations (9b) and (9c) can be used to estimate the sample size n needed to insure a
sufficiently narrow confidence interval. For example, if the measured pmc is 5% and one wishes to be
95% confident that the actual population pmc lies between 4% and 6% (i.e., 5±1%), one can set 1.96
times the standard error in Equation (9b) equal to 1%.
©
Volume 25 MM12-P
0.05(1 - 0.05)
0.01 = 1.96 * => n = 1825 (9d)
n
Although 1825 cases is a large number, it is less than one-third the number of the cases that would be
needed for each sample if one was to apply the classifier to two independent samples and expect to
measure a difference with pmcs of 5% and 6% at the 95% confidence level in 95% of trials (α/2 = β =
5%). The latter number regarding sample size may be estimated using the formula below224 and will be
found to be approximately 6700:
(80% power would require a sample size of approximately 3940)
2
⎛c
n =⎜
p 0 (1 − p 0 ) + cβ p1(1 − p1) ⎞
⎟
α 2
(9e)
p1 − p 0
⎝ ⎠
As defined above, cα/2 is the corresponding value of the α/2 percentile from the standard normal
distribution, p0 is the proportion believed to represent the mean of one sample population, p1 the
proportion considered to represent the mean of the second population, and cβ is the corresponding β
percentile from the standard normal distribution, where β represents the probability of committing a Type
2 error.
9.2.3 Comparing the Performance of Prediction Algorithms (Classifiers)
An important issue will be deciding what classification algorithm to utilize. Theoretically, this would be
determined by which algorithm provides the lowest level of risk when applied to a given data set. As
noted above, risk may be directly equated to misclassification rate when the loss associated with an
instance of misclassification is the same regardless of the category of misclassification. That is to say, in
the case of a two-class system where one class indicates the presence of disease and the second class the
absence of disease, risk and misclassification rates are identical if the loss associated with the
misclassification of a false positive equals that of a false negative.
As pointed out by Salzberg,225 determining whether an apparent difference between the performance of
classification algorithms is statistically significantly different is also a matter that requires further
investigation and the development of standards.
9.2.3.1 Assuming Equal Loss for All Instances of Misclassification
Under the assumption of equal loss, the general approach to judging the performance of two classification
rules is to measure whether one rule produces more accurate classification of the cases from a test set of
data than the other classification algorithm. To do this one may apply the two classification rules to a
sample, which produces p1 and p2 rates of misclassification for rules 1 and 2 respectively, and then
compare the proportions of misclassification:
1
(
p 2 - p1 - 1 n1 + 1 n2 ) p1 n1 + p 2 n2
, q = (1 - p )
2
z= where p = (10a)
(
p q 1 n1 + 1 n2 ) +
n1 n2
A drawback to this approach is the need for two probably quite large independent samples. An approach
that has been proposed which makes more efficient use of test set data is to compare both algorithms
©
Number 18 MM12-P
using the same test set of data, then examine the cases correctly classified and incorrectly classified using
a McNemar test for examination of paired data.223 Table 4 below outlines the analysis for a paired data
comparison with two outcomes.
Table 4. Data on Two Outcomes From Two Classifiers on Paired Comparisons

Classifier 1
Classifier 2 Correct Incorrect Total
Correct a b a+b
Incorrect c d c+d
Total a+c b+d n
McNemar’s test statistic makes use of the discrepant cases. That is, classifier 1 and 2 both correctly
classified a number of cases and incorrectly classified d number of cases. These a + d cases are
uninformative in comparing Classifier 1 and 2, since the classification performance was the same.
However, Classifier 1 correctly classified c number of cases that classifier 2 misclassified while classifier
2 correctly classified b number of cases that classifier 1 misclassified. The test statistic is shown in
Equation (10b) and is approximately distributed as a Chi square random variable with one degree of
freedom.223
2
=
( b - c - 1)
2
(10b)
χ
b+c
The above approach can be extended to compare an arbitrary (m) number of classifiers that are applied to
N samples. In Table 5 below, X is 0 if the sample is correctly classified and 1 if it is misclassified.
Table 5. Presentation of Data From m Classifiers

Classifier
Sample 1 2 ... m Total
1 X11 X12 ... X1 m S1
2 X21 X22 ... X2 m S2
. . . ... . .
. . . ... . .
. . . ... . .
N XN1 XN2 ... XNm SN
Total T1 T2 ... Tm T
Proportion p1 p2 ... pm p
S1 is the sum of row 1, which represents the total number of misclassifications of Sample 1. Similarly, T1
is the sum of column 1, which represents the total number of misclassifications by classifier 1; T is the
overall total number of misclassifications. The values pj represent the proportion of samples misclassified
by classifier j:
Tj
pj = (11a)
N
©
Volume 25 MM12-P
The average proportion of misclassifications is then:
1 m T
p = ∑ pj = (11b)
m j =1 Nm
The greater the difference in the number of mistakes (Tj) between classifiers, the greater the likelihood
that the classifiers are different. However, the more that the mistakes tend to cluster within the same
sample (Sn), the more likely are the classifiers to be the same. A test statistic ( Q ) that measures this
relationship is approximately distributed as Chi square random variable with (m-1) degrees of freedom is
given in Equation (11c):
m 2 2
m ∑ Tj − T
j =1
Q = (m − 1) × (11c)
N 2
mT − ∑ Sn
n =1
The Q statistic like that of Equation (10b) is unaffected by deletion of those cases in which either all m
responses are correct or all m are incorrect. Further, in order for the Q distribution to approximate the
Chi square distribution, the product of the number of classifiers and the number of samples after deletion
of those in which all responses are the same must be at least 24. Some additional constraints and
alternative approaches have also been proposed.223
In summary, standards for comparison of classifier performance are still evolving. One approach that has
been suggested222 is to compare classifiers based on discrepant misclassification of cases. This approach
has the advantage of requiring only one test set to evaluate performance of multiple separate prediction
rules.
9.2.3.2 Assuming Unequal Loss for Instances of Misclassification
Comparison of algorithm performance as judged from the statistical significance of differences in the
misclassification rates produced by the algorithms (i.e., algorithm accuracy) has potential significant
limitations in assessing the value of the diagnostic utility of an algorithm. As discussed earlier, the use of
misclassification rates as a metric of performance implies that the loss associated with misclassification in
all categories is equal (See Equations (2), and (6a-c), as well as Section 9.2.1.1). It is relatively easy to
generalize Equations (6a-c) for the situation in which loss of misclassification is not equal. In this case,
total risk for a classifier c(x), particularly for the two-class case, for unequal loss is as shown in Equations
12a-b below. Here, k is the category selected by the classifier given the actual category of class = l.
los(k|l) varies as a function of the value of the misclassified category k that is selected by the classifier:
⎧0 if l = k
Loss(k , l ) = ⎨ (12a)
⎩los (k | l ) if l ≠ k
Total Risk (c( x)) = los (1 | 2) * pmc(2) + los (2 | 1) * pmc(1) (12b)
Equation (12b) has the intuitive appeal in that every classifier or prediction algorithm will be associated
with a measurable misclassification rate, which can be quantitated from a test set or by cross-validation.
Moreover, while the loss from a specific type of misclassification may not be able to be known absolutely
based on clinical circumstances, it may be able to be quantitated relative to the loss resulting from
©
Number 18 MM12-P
misclassification of the opposite type (e.g., los(2|1) = 10*los(1|2)). As a result, evaluation of classifiers is
reduced to identifying the one that produces the lowest value of total risk as described by Equation (12b).
In Equation (12b) the probabilities of misclassification are dependent on the distribution of the values of x
for Class 1 and Class 2 cases, i.e., on the prior probabilities π1 and π2. It is useful, however, to separate
the concept of a misclassification rate; that is, the likelihood that a Class 1 case will be classified as Class
2 or visa versa and the likelihood that a case is from Class 1 or from Class 2. Therefore, the probability of
misclassification can also be thought of as the product of a rate of misclassification; that is, the likelihood
that if a case is Class 1 it will be classified as Class 2 times the likelihood that it is a Class 1 case and visa
versa. An alternative representation of total risk (Total Risk2) is then:
Total Risk 2(c( x)) = los (1 | 2) * rate(1 | 2) * π 2 + los (2 | 1) * rate (2 | 1) * π1 (12c)
Using the convention that Class 2 cases have disease while Class 1 do not one can refer to the rate of
misclassification of Class 2 cases as Class 1 (rate(1|2)) as false negatives (FN) and misclassification of
Class 1 cases as Class 2 (rate(2|1)) as false positives (FP).
Total Risk 2(c( x)) = los (1 | 2) * FN * π 2 + los (2 | 1) * FP * π1 (12d)
The frequency with which a classifier correctly classifies a Class 2 case (e.g., a case with disease as in
fact belonging to that class) is referred to as the true positive (TP) rate for the classifier. Since a bivariate
classifier must classify a Class 2 case as either true positive or false negative, it follows that Equation
(12d) can be rewritten:
Total Risk 2(c( x ) ) = los(1 | 2 ) ∗ (1 − TP ) ∗ π 2 + los(2 | 1) ∗ FP ∗ π1 (12e)
9.2.3.2.1 Use of Receiver Operating Characteristic (ROC) Curves to Evaluate Prediction Algorithms
(Classifiers) Assuming Unequal Loss for Instances of Misclassification
At this juncture it is worthwhile to review some common terminology used in evaluating clinical
classifiers. Sensitivity and specificity are conditional probabilities that describe the performance of a
diagnostic test that is, in turn, a direct function of the classification rule that is applied to the interpretation
of the value of the test result. Sensitivity is the probability that a patient who has the disease will test
positive for the disease. It can also be referred to as the true positive rate (TP) for the test. Specificity is
the probability that a patient who does not have the disease will test negative for it and is referred to as the
true negative rate (TN) for the test. Sometimes it is convenient to report 1 – Specificity, which is the false
positive rate (FP). That is, 1 – Specificity is the probability that a patient who does NOT have the disease
tests positive. The calculations for these measures are reflected in Table 6 below.
Table 6. Sensitivity and Specificity of a Diagnostic Test

Disease
Diagnostic Test Present Absent
Positive a b
Negative c d
a ; d
Sensitivity= Specificity=
a+c b+d
In the example being considered here, the classifier is presumed to produce a “risk score” or index. In the
logistic regression example presented, suppose patients were classified as positive for the diagnosis if the
predicted probability of disease from the resulting model is ≥0.5, and patients are classified as negative
for the diagnosis if the predicted probability of disease from the model is <0.5. (Note that 0.5 is a
©
Volume 25 MM12-P
commonly used cutpoint but is not necessarily optimal.) Patients can be cross-classified according to their
true underlying disease status (present/absent) and whether the test (i.e., the logistic regression decision
rule) classifies them as positive or negative for the disease. Therefore, sensitivity and specificity may be
calculated for a cutpoint of 0.5. Suppose instead one wanted to use a different cutpoint, say 0.4 or 0.6.
One can likewise calculate sensitivity and specificity for such other chosen cutpoints. Imagine now that
sensitivity and specificity are calculated for all possible cutpoints. A plot of 1 – Specificity (false positive
rate) on the x-axis versus Sensitivity (true positive rate) on the y-axis and is referred to as the receiver
operating characteristic (ROC) curve (Figure 14). The area under the ROC curve is a measure of the test’s
ability to discriminate diseased from nondiseased persons.
The optimal cutpoint is that which results in the lowest risk to the patients being tested. In cases where
there is equal loss if a patient is misclassified as a false positive vs. a false negative, then it is the point
that produces the lowest misclassification rate and one that best distinguishes diseased patients from the
nondiseased. Alternatively, the cutpoint may be changed to increase the test’s sensitivity or specificity
depending upon the desired use of the test. Different approaches may be used to select a cutpoint that
achieves a desired level of specificity.226 For example, the test result corresponding to the specificity
quantile of interest among patients without the condition can be used; or the decision threshold may be
selected by finding a cutpoint with assurance that the sensitivity (true positive rate) is at a certain level
with a stated amount of confidence. Alternatively, a decision threshold may be selected after considering
there are costs associated with each decision. The cost of a false positive and the cost of a false negative
can be used to identify an optimal decision threshold as that point which minimizes the total cost of a
decision.
Use of iso-performance lines227 provides one approach for identifying optimum cut points. From
Equation (12e) it can be shown directly that two points (TP1, FP1) and (TP2, FP2) in ROC space have the
same performance (Total Risk) if:
TP 2 − TP1 π1 ∗ los(2 | 1)
= (12f)
FP 2 − FP1 π 2 ∗ los(1 | 2 )
In Figure 14, for example, one might be faced with a situation in which the loss associated with a false
negative (los(1|2; a person with cancer but undiagnosed) is considered to be ten times worse than that
from a false positive (los(1|2) = 10*los(2|1)) but where there are generally 20 patients without disease for
each person with disease (π1 = 20*π2). This would lead to an iso-performance line with slope of ~2
similar to that seen in Figure 14. Then the cutpoint that would produce the lowest risk given these values
for relative loss, and relative prior probability of disease would be given by the value of the risk index for
the two algorithms that produces the ratio of TP/FP given at the point in which the iso-performance curve
is tangent to the ROC curve for each of the algorithms.
©
Number 18 MM12-P
1.0
Algorithm A
Iso-performance
True Positive rate (Sensitivity)

line
Algorithm B
0.5
0 0.5 1.0
False Positive rate (1 – Specificity)
Figure 14. ROC Curves for Two Hypothetical Algorithms
There are other methods for comparing classifiers using ROC curves. Statistical hypothesis tests
comparing whether two ROC curves are the same, comparing whether two ROC curves agree at a
specified false-positive rate, and comparing the area under the ROC curves are available.226 Since ROC
curves may cross, classifier A may be preferable in one region while classifier B may be preferable in
another region, while the area under the ROC curves are the same for both classifiers. Therefore, selection
of an appropriate hypothesis test for comparing ROC curves is essential.
9.2.4 Expression Analysis of Asynchronously Acquired Samples
The paradigm of the molecular diagnostic laboratory is to train an assay, then run test samples
asynchronously. The previous section pointed out that the results from microarrays depend on image
analysis, expression summaries, transformation of the data, normalization, and predictive modeling. In an
asynchronous testing environment, such methods must be applied in a nondata-based fashion. That is,
data-based methods of normalizing and expressing signal intensities, such as robust multichip average
(RMA) and model-based expression indexes (MBEI), which require all microarrays to be preprocessed in
a batch, may not be useful or may need to be modified in the clinical diagnostic setting, as the normalized
expression values are dependent upon the other microarrays included in such calculations. Expression
summaries produced by commercial software, for example, can handle data in an asynchronous testing
environment; however, some probe set summaries generated from commercial software tend to be more
variable than RMA and MBEI expression summaries.228,229 Therefore, in an asynchronous testing
environment, some modification of RMA expression summaries could include selecting a specified set of
chips for developing a standard reference normalization method; then, asynchronously acquired
microarrays would be normalized to that standard. The same specified set of chips would additionally be
used to develop an expression summary model; that model would then be applied to asynchronously
acquired chips. In addition, implications of asynchronous testing for cDNA microarrays is that reference
designs using the same lot of reference material, rather than the alternative loop-designs, are important.
9.2.5 Information Used to Assess the Diagnostic Utility of the Microarray Gene Expression Test
As indicated in the preceding Sections 8 and 9.2, the end result of a gene expression microarray study in
the context of patient care is likely to be a quantitative statement regarding the likelihood of one or more
©
Volume 25 MM12-P
clinical outcomes. For example, in circumstances in which a patient’s cancer has been evaluated, possible
outcomes might include a diagnosis regarding the type of cancer present or the likelihood that a patient’s
cancer will be susceptible to a specific regimen of therapy, or a “gene expression stage” of the disease
analogous to “clinical” and “pathologic” stages currently obtained from routine clinical evaluations.
Given the complexity of the analytical process involved in generating such an outcome statement, many
details of the process by which the result was obtained must be specified. This is necessary in order to
enable comparisons between different laboratories. Information that should be collected is summarized in
Table 7. It is unrealistic to believe that all of these items will be listed in the patient’s report that is
provided by the clinical laboratory to the treating physician. However, the clinical laboratory will need to
have a process in place whereby knowledgeable clinicians, patients, or individuals asked to consult in the
case can obtain the details of the process used to arrive at the outcome result from the raw CEL file or
image file data. Establishing these processes and associated standards will be an ongoing activity for the
clinical laboratory employing microarray gene expression testing in their test menu.
Table 7. Specification of a Microarray Gene Expression Test With Diagnostic Utility

Item Description
Preanalytic
1 Type of tissue
2 Date/time specimen obtained
3 Date/time specimen frozen
4 Personnel handling sample
5 Clinical contributor (surgeon/clinician)
6 Patient demographics (age, sex, race, other)
7 Clinical diagnosis(es)
8 Histopathological/clinical laboratory findings
Analytic
1 Method for RNA extraction
2 Quality control variables (260/280 ratio, 18S/28S ratio, 3'/5' ratios, other)
3 Method of microarray analysis (MIAME)
a Specifications of array manufacture
b Probe labeling protocol
c Hybridization protocol
d Washing/staining protocol
e Scanning protocol
f Analyst
Post Analytic
1 Extraction of spot/probe intensity signals
2 Background, scaling & normalization
3 Calculation of expression levels
4 Method for identification of differentially expressed genes
5 Method to independently validate differentially expressed genes
6 Outcome measurements
7 Data set (data warehouse)
8 Prediction algorithm/classifier
9 Method of classifier validation (e.g., cross-validation, test set)
10 Classifier performance (sensitivity, specificity, loss function, prior
probabilities, other)
11 Analytical comparisons of classifier performance
12 Method for inclusion of asynchronously acquired data
©
Number 18 MM12-P
10 Quality Control/Quality Assurance

At present four clinical microarray applications are available or in development:
• Comparative genomic hybridization arrays (CGHA) (cytogenetic analysis);
• Gene expression arrays (GEA);
• Single nucleotide polymorphism arrays (SNPA); and
• Sequencing by hybridization arrays (SBHA).
Although there are several thousand reports describing research applications, many of them clinically
relevant, few articles address quality control or quality assurance, and still fewer address clinically
relevant aspects of QC or QA. Section 7.5 gives a concise presentation of QC for CGHA. Section 9 notes
aspects of validation.
Formats differ in the support material (nylon filters, glass slides, chips, microfluidic devices, beads),
probe set, labeling method, imaging hardware, and analysis software. One dichotomous classification of
formats is especially important for QC: the sample and the reference sample are either hybridized
individually to separate arrays (single-color assay), or cohybridized to one array (two-color assay). Until
recently it has been asserted or assumed that the two signals for a given probe element in a two-color
GEA could only be analyzed as a ratio. Using gene-specific oligonucleotide controls and integrating data
from multiple scans at different gains, a method was described for relating ratios to absolute abundance
for GEA.230 Analysis of data for a “long oligo” format GEA shows that each color, analyzed separately,
gives results similar to parallel single-color assays: the labeled sample and reference sample are not
competing for a limiting amount of probe.231 These conclusions, if more widely verified, will markedly
simplify QC of two-color arrays.
Given this variety, detailed prescriptive guidelines are not feasible, but a conceptual framework can be
outlined. It can be anticipated that QA/QC for microarray assays will be broadly similar to QA/QC for
any clinical assay, from defining sample rejection criteria to participating in proficiency testing. Preparing
a protocol following the most current edition of CLSI/NCCLS document GP29—Assessment of
Laboratory Tests When Proficiency Testing is Not Available will bring key points to attention. The most
current editions of CLSI/NCCLS documents MM1—Molecular Diagnostic Methods for Genetic
Diseases; MM5—Nucleic Acid Amplification Assays for Molecular Hematopathology; MM13—
Collection, Transport, Preparation, and Storage of Specimens for Molecular Methods; and MM3—
Molecular Diagnostic Methods for Infectious Diseases give detailed specifications for numerous QC/QA
issues common to all molecular testing, such as storage of nucleic acids, sample preparation and testing
for infectious diseases. This section will emphasize aspects of particular concern to array analysis.
The multiplex nature of this testing raises QA issues which cannot be answered here:
• If the number of genes of interest is small, are microarrays efficient?
• If the number of genes of interest is modest, but the array en passant has thousands of probes, does
this present a potential privacy concern?
• What specific country regulations and/or requirements will apply to multiplex tests on this scale?
Microarray analysis also involves devices as well as reagents. Guidelines are evolving rapidly, but are
still in development.232
©
Volume 25 MM12-P
• For GEA, how should quality control limits for signal strength (of control samples) at a single
specific feature be adjusted for the fact that one might be monitoring the signal at thousands of
features?
For example, to assess only trisomy 21, FISH is faster and more economical. More focused arrays might
be less expensive, but whole genome surveys (CGHA, SNPA, GEA) generate valuable databases for
clinical investigation and permit reanalysis of prior samples with newer algorithms.
10.1 Preanalytical Considerations
10.1.1 Patient Heterogeneity (Biological Variability)
This is most problematic for gene expression studies — potential confounders include age, medication,
time of day, diet including preoperative fasting, anesthesia, and procedure duration. During validation one
should address the likely variables anticipated, such as whether specimens will be obtained by
endoscopic/laparoscopic biopsy or by surgical resection. For DNA sequence and cytogenetic variation,
the significance of a finding must be interpreted, as is customary for genetic testing, with attention to
family history, variable expressivity, penetrance, and ethnic variation; patient information will often not
be available to the laboratory.
• The range of acceptable samples should be specified in the protocol.
• Refer to the current edition of CLSI/NCCLS document MM1 — Molecular Diagnostic Methods for
Genetic Disease for recommendations:
-Family history and clinical data
-Informed consent
10.1.2 Sample Heterogeneity (Biological Variability)
Consider arrays (GEA, CGHA, SBH, possibly SNPA) used to classify breast cancer. The proportions of
the specimen represented by tumor and by the normal component (adipose tissue, scar tissue,
inflammatory cell infiltrate) vary widely. The laboratory must specify and verify acceptable sample
parameters, even if it requires a low-tech, semiquantitative method such as estimating the proportion of
tumor in representative histological sections. Laser capture microdissection for tissue and flow sorting for
cells (followed by DNA or RNA amplification) permit better control; however, they should not be used
for GEA unless comparable preparations were used in verification of the “gene expression signature.”
• The protocol should specify the method of determining heterogeneity and the acceptable level.
• Cell selection methods must be verified for gene expression arrays.
• Amplification methods must be verified, regardless of the application.
• For CGHA and SBHA analysis of possible somatic mutations, consider parallel analysis of genomic
DNA from a nonpathologic sample from the same subject.
10.1.3 Specimen Collection (Technical Variability)
Even postmortem tissues can be used for gene expression analysis by RT-PCR. If degradation is not too
extensive, very limited data suggests that array profiles are generally similar to profiles for fresh tissue,
but this has not been tested with an actual gene expression signature on clinical samples.233,234 Some
collection systems stabilize mRNA from blood or tissue prior to purification. Genomic DNA is
significantly more stable than mRNA.
©
Number 18 MM12-P
• Verify the gene expression signature for any new RNA stabilization system.
• Establish the allowable time from collection to processing for accepting specimens.
• Refer to the current edition of CLSI/NCCLS document MM1 — Molecular Diagnostic Methods for
Genetic Disease for recommendations:
-Specimen identification
-Requisition form information
-Accessioning specimens
-Specimen transport and storage
-Specimen retention
10.2 Analytical Phases
10.2.1 DNA/RNA Preparation (Technical Variability)
Some protocols prepare DNA/RNA from whole blood; others prepare it from cell preparations.
Differences in expression profile are to be expected. Mononuclear cell preps typically exclude platelets,
neutrophils, apoptotic cells and cell-free DNA/RNA. RNA preparations can contain DNA; similarly DNA
preps can contain RNA.
• Specify the acceptable method(s) for sample prep determined during validation.
• Indicate which (if any) method is used to remove DNA (RNA).
• Indicate which method is used to assess DNA (or RNA) contamination.
Amplification methods enable the use of samples with RNA or DNA concentrations initially too low to
measure. Several studies have demonstrated that amplification of cDNA or of genomic DNA preserves
most, but not all, of the quantitative differences initially present.235-239 For validation it is not sufficient to
amplify and assay several aliquots of RNA/DNA diluted from a single, more concentrated preparation.
Needle biopsies, for example, sample such a small number of cells that (pseudo)clonality can distort the
results. It would be more appropriate to assay several needle biopsies from one lesion and compare the
results to the analysis of a much larger sample.
10.2.2 DNA/RNA Quantitation
Absorption at 260 nm and 280 nm gives information on the concentration and purity of RNA/DNA preps.
Quantification by dye binding methods works for smaller sample volumes and more dilute samples. The
two methods will NOT necessarily give similar results. UV absorption spectrophotometry does NOT
distinguish between DNA and RNA, nor between intact DNA/RNA and individual nucleotides. Dyes aim
to distinguish between DNA and RNA and do not measure purity or integrity.
• The spectrophotometer (or plate-reader for dye binding) should have an established program and
schedule for both calibration and maintenance.
• Controls of known concentration should be tested in every run.
Even though quantification by UV absorption is based solely on the measurements for a given sample
(i.e., not from a calibration curve), control samples (rarely used in research laboratories) verify that the
instrument is performing acceptably.
©
Volume 25 MM12-P
For additional information refer to the current edition of CLSI/NCCLS document MM1 — Molecular
Diagnostic Methods for Genetic Disease for recommendations:
10.2.3 DNA/RNA Integrity
RNA and DNA integrity is judged by the size distribution on gel electrophoresis. For analysis of samples
with limited yield there are miniaturized systems, often available in core facilities. For whole blood, the
purification method will be sufficiently standardized, so that it might not be necessary to check
DNA/RNA integrity, except in the event of a failed assay. In some applications a modest level of
degradation, especially for DNA, does not preclude analysis.
• Specify when integrity must be checked.
• Specify the method(s) for determining sample integrity
• Specify the parameter(s) for rejecting/accepting a sample, for example:
Ratio of RNA peaks in a gel analysis

Size distribution of DNA
• Amplified cDNA/gDNA should have a separate set of specifications.
10.2.4 Test Reagent Quality Control Program
For a summary of good laboratory practices, refer to the current edition of CLSI/NCCLS document MM1
— Molecular Diagnostic Methods for Genetic Disease for recommendations.
10.2.5 Target Labeling
Most methods use fluorescent labels (see Section 5). For gene expression arrays, SBH, CGH arrays, and
some SNP detection formats, labeling of the target occurs prior to hybridization (“off-array”). In this
format, labeled target should be purified away from unincorporated label, to minimize background
fluorescence on the array. Several limited studies have compared different labeling methods;
postsynthesis incorporation of fluor generally gives more uniform and reproducible results than
incorporation during probe synthesis, but numerous methods have been used successfully.240,241
• Purify target from unincorporated label.
• Measure the extent of labeling.
• Establish an acceptable range for incorporation.
Although normalization and scaling of array data can compensate for poor labeling, postarray analysis
should not do all the work.242
In one SNP array format, Sequencing by Primer Extension, the labeling reactions occur “on-array”
following hybridization of the targets. A suitable PCR product binds to the probe and is extended by a
single labeled dideoxynucleotide. Reproducibility studies should establish, for a given input of DNA, the
acceptable range for the absolute and relative levels of the expected signals (homozygous wild type,
heterozygous, null) for each array element. It should be possible to distinguish actual homozygosity from
apparent homozygosity due to loss of one allele (deletion/mutation preventing primer or product binding).
Because of sequence-dependent differences in hybridization and extension efficiency, the level of signals
established will differ for each SNP.
©
Number 18 MM12-P
10.2.6 Array Format
All arrays, as well as reagents for labeling and hybridization, should be manufactured in compliance with
Good Manufacturing Practice as detailed in the second portion of this section.
Probe Identity (Does the probe have the intended sequence?). For high-density gene expression 'chips',
manufacturers use complex algorithms to control the sequences synthesized. Other array formats use long
oligos (40 to 70 nucleotides), PCR-amplified cDNA, or genomic clones. Amplification, processing, and
tracking thousands of cDNA or genomic DNA clones is error-prone. Computer-directed oligo synthesis
can also make small-, and large- scale errors.
• Documentation of the extent and nature of postproduction verification should be available from the
manufacturer.
Since probe design is often proprietary, documentation of the actual sequences might not be available.
The probe choice in GEA affects detection of isoforms and homologues, but independent of these
concerns the verification process should have verified performance of the GEA for a specific test
application. The nature (oligonucleotide, BAC) and density of probe coverage used in CGH will affect
resolution and signal strength, but these general parameters should be available from the manufacturer.
Probe Existence (Is the probe actually on the array?). For SNP analysis, SBH and CGH the "correct" or
"wild-type" signal is known for each array element, so controls (reference samples) monitor probe
existence in every run. For gene expression arrays, a weak or absent signal could be due to low expression
of the gene or to absence of the probe; hybridization of the typical reference sample is unlikely to
generate signals for every gene on large arrays. Reversible staining with sequence-independent dyes can
demonstrate the presence and measure the concentration of all probes for slide-based arrays, reportedly
without affecting subsequent performance. Alternatively, arrays can include a “nonsense” sequence in
each probe, which is detected with a corresponding oligo labeled with a third color. This also provides
quality control for uniformity of hybridization, but requires a suitable scanner.243,244
Probe Replication. Replication is desirable to control for variation in probe concentration, and for
uniformity of hybridization, washing, and imaging across an array. Cohybridization formats minimize the
impact of these variations, since both the sample and control/reference will be affected comparably at
each probe, but if both give a weak signal the CV for the ratio could increase to an unacceptable level.
Array formats which use multiple distinct oligos for each target provide replication, but do not necessarily
control for spatial variation.
• There is no simple rule to determine the necessary number of replicates.
• Recent CGHA studies use triplicate probes successfully.238
• SBH should analyze both strands.
Lot-to-Lot Variation. As for any assay, every new lot of every component must be tested for
performance. Although an array represents an unusually high proportion of the reagent cost, at least one
array of each lot should be checked with a validated reference/control sample. Since printing
reproducibility can vary from the beginning to the end of a print run for spotted arrays, consider
validating the first and last arrays if that information is available. If further experience confirms that each
color in a two-color assay can be analyzed independently, the use of a standard control/reference sample
will allow cross-over validation.
©
Volume 25 MM12-P
10.2.7 Hybridization
Details of the procedure have been published.245 The protocol(s) should specify several parameters for
each buffer/solution (prehybridization, hybridization, wash):
• composition (including acclerants, concentration of label for hybridization solution);

• volume;
• temperature;
• concentration; and
• order of processing (if multiple arrays are hybridized simultaneously).
The temperature should be monitored for each assay. Although serial hybridization at different levels of
stringency can provide useful information, it is unlikely that this labor-intensive procedure will be
practical in a clinical laboratory.245,246
10.2.8 Signal Detection
Sections 5, 6, 7, and 8 describe details of instrumentation and analysis. Briefly, scanners have three
essential components: light sources, detectors (CCD, photomultiplier tube [PMT], confocal microscope),
and motorized stages (or optics), each of which can subtly or strongly influence results.247,248
Light Production. The light source should be monitored, if possible, for temperature and power output.
Many scanners provide real-time monitoring and a self-adjusting mechanism to maintain stable output.
Monitoring should be part of the QC for every run.
Detection. For PMTs sensitivity is in part a function of the “gain”; this is usually a user setting and should
be specified and recorded for each run. Repeated scanning can lead to loss of signal
through photobleaching; the number of acceptable rescans should be established during verification.
Calibration. During validation of the instrument one should:
• Determine background. This should be validated before every scan.

• Determine the linear range of response for each fluor.
• If possible, validate color compensation/cross-talk for two-color assays.
• Validate the spatial uniformity of detection.
• Specify the frequency with which this calibration should be repeated.
• Specify acceptable limits for the results of routine calibration.
Commercially available slides provide spots of graded intensity for a given fluor.249 Replicate spots on the
array assess uniformity of detection over the array for that assay. Blank “spots” provide a measure of
instrument background. Such an array will underestimate background from an actual hybridization but
provides a least lower bound.
10.2.9 Image Analysis
Capable freeware programs are available, but image analysis software is usually bundled with scanners.
There is no universally accepted set of algorithms. It is important that the user be aware of which features
are implemented in the software, rather than simply accepting a list of present/absent calls. Software
should provide several features, including:120
• User inspection of the image for alignment, scratches, dust, or other artifacts.
©
Number 18 MM12-P
• Quality metrics for the image of each probe could include:

- spot size (number of pixels with signal above background);
- signal intensity relative to local background;
- absolute signal intensity (compare to limit of detection); and
- variance of signal intensity among probe replicates.
• Quality metrics for the entire array, which could include:

- negative control probes distributed across the array;
- uniformity (or variance) in background across the array;
- the number of flagged probes;
- the number of saturated pixels;
- ratio of average foreground intensities to average background intensities;
- range of absolute level of signal for each channel;
- total variance for signals across the array; and
- open format for data (and ability to save raw data).
Automated image analysis can make major errors (for example, grid misalignment) or subtle errors (for
example, biases arising from the image segmentation method).250,251 A given intensity value for a probe
could arise from a uniformly fluorescent spot or a blotchy spot. Quality scores use additional information
from the distribution of signal over the pixels of a probe to flag spots with potential problems. One simple
quality score is the signal-to-noise ratio, which is usually desired to be greater than three.
([Average signal intensity]-[Average local background intensity])

(Standard deviation of local background intensity)
An example of a more sophisticated quality score is ABACUS, which is described for SBH arrays.239 An
open data format enhances the prospect that future retrieval and analysis of archived raw data will NOT
require the original software.
10.2.10 Data Analysis
Data analysis by a variety of methods has provided accurate results for SBHA, SNPA, and
CGHA.236,238,239,252, For GEA, data analysis is more problematic. The challenge begins with the nature of
DNA compared to mRNA. The DNA genome of a tissue is stable over time and conditions compared to
the levels of mRNA for all the genes. Furthermore, the complexity of the genome (basically two copies
of every portion) is much lower than the complexity of mRNA, which has mRNA levels varying over six
magnitudes. Alternative splicing adds an additional layer of complexity. To address these challenges,
many different array formats and data analysis programs have been developed, which so far, are not easily
compared and often give discrepant results.
A popular format uses sets of multiple perfectly matched and mismatched oligonucleotide probes. The
manufacturer’s recommended data analysis program uses the information from both probe sets, but other
programs demonstrate superior performance using information from only the perfect-match
probes.211,253,254 Gene expression signatures developed using one format do not readily translate for
laboratories using a different format, even for an index such as counting the number of genes in the set
which show increased (or decreased) expression relative to a control. Comparative experiments looking
only at genes common to different formats, or at a single format in different laboratories, give results
which can be viewed as cause for optimism or despair.209,255-261 Reanalysis of data from a single
laboratory by another laboratory can show dramatic changes in the derived gene expression
signature.262,263 GEA analysis of tumors might also eventually require correlation with CGHA analysis of
the same tumors.264
©
Volume 25 MM12-P
10.3 Global QC Issues
Validation must assess repeatability, sensitivity, and reference ranges.
10.3.1 Repeatability
Given the cost of reagents and the labor-intensive nature of the assay, the number of repeated assays will
be smaller than for inexpensive automated clinical assays, BUT the laboratory must still demonstrate
intrarun and inter-run repeatability over the course of at least several days. Repeatability must test all
phases, including multiple independent DNA/RNA extractions from a given sample, labeling,
hybridization, imaging, and analysis. Assaying aliquots of a single purified RNA/DNA sample or of a
single labeled preparation would NOT be sufficient. Repeatability can be interpreted as clinical
repeatability, for example a normal karyotype. It can also be interpreted as analytical repeatability (see
Section 10.3.4.1).
10.3.2 Analytical Sensitivity
For SNPA and SBHA, analytical sensitivity is usually not an issue. For a standardized amount of input
DNA, an acceptable range for signal intensity should be established. If the laboratory analyzes cases in
which somatic mosaicism is a possibility, sensitivity must be determined. Mixing experiments is a
straightforward approach. For gene expression analysis, analytical sensitivity would most usefully be
interpreted as the level of sample heterogeneity which still permits correct classification of the sample.
For CGHA, the ability to detect small differences in copy number over small regions is a challenge
requiring complex computational as well as analytical input (see Section 7.5). Aneuploidy could also
affect SNPA analysis.
10.3.3 Reference Ranges
Databases describing the “normal” as well as numerous variants are available for genomic DNA sequence
and cytogenetics. The significance of new sequence variants or of previously cryptic chromosomal
alterations usually cannot be guessed. The report should indicate the level of knowledge.
For gene expression arrays, reference ranges will have to be established by analyzing samples for each
category to be identified. For a frequently proposed application such as prognostic classification of
tumors, verification would require many samples of fresh (frozen) tissue from patients with known
prognosis. The “gene expression signature” is an arithmetic or algebraic combination of gene expression
values (tens to hundreds of genes), which classifies a sample into one of two or more categories (e.g.,
benign or malignant). The transferability of such a “signature” among laboratories, even using the same
format, remains to be shown. The comparison of similar studies using different array formats (or even the
same format) is not encouraging.261 It will be essential for a laboratory introducing an assay to verify the
diagnostic validity of such gene expression signature(s). Typically, a gene expression signature is
developed after analyzing several hundred samples. The number of samples necessary to validate such an
assay (as opposed to “discovering” it) is an uncharted area of statistical quality control.
10.3.4 Controls
10.3.4.1 General
In clinical assays, every run typically contains “positive” and “negative” controls. Alternatively, one can
formulate this as a requirement to have controls at medically critical decision points. Many arrays assay
thousands of genes simultaneously; it is not possible to have control samples for each gene/feature in each
assay. For SNP, SBH, and CGHA “normal” genomic DNA controls are usually available. If there are
©
Number 18 MM12-P
common clinically significant mutations or cytogenetic changes, corresponding samples should be used as
controls on a rotating basis in addition to the use of the normal control in every run.
At least one control (reference) sample will be repeatedly assayed on multiple arrays in each run for
several runs, performed over several days, as part of repeatability studies during validation. This data will
give a measure of variation within-array (probe replicates), between-array (array replicates), and between-
day. All replicates must use the same array and reagent lots. From this data one can calculate control
limits for signal variation at each array element, but the choice of optimal control rules is unexplored. The
first impulse of the clinical analyst is to apply Shewhart-style control rules. The use of two S.D. or three
S.D. control limits for each probe in a multiplex assay involving thousands of elements will result in a
large number of false rejections. The use of widely spaced replicate probes will reduce loss of information
when one probe is “out-of-control.” Elimination of data for one gene is unlikely to invalidate the run, but
the optimal QC rules remain to be determined.
10.3.4.2 Controls for Gene Expression Arrays
For gene expression arrays the selection of control samples is more challenging. For example, if the assay
goal is to stratify survival in breast cancer, one might consider using samples from patients with long-
term (or short-term) survival as controls, but it is unusual for any one case to show all the genes of
interest appropriately up- or down-regulated or even present. If neither the control sample nor the patient
sample generate a signal for gene Y, there is no way to know if the corresponding probes performed
properly. Pools of breast cancer samples or cell lines can perform well as reference samples, but require
manipulation to ensure pools consistently express all the genes of interest (GOI).265 Assuming clinical
arrays will include a modest number of probes (tens or hundreds), the most useful reference sample would
be a pool of cRNA for each GOI, perhaps diluted with a complex nonmammalian RNA pool to mimic the
experimental samples. For two-color (cohybridization) formats the reference sample corrects for variation
in hybridization and imaging. If the signal for the reference sample is extracted alone, the data provides
an ongoing (inter-run) measure of quality control for each element.
Gene expression arrays can employ two additional types of controls:
Endogenous Controls. “Housekeeping” genes were once thought to be expressed at a uniform level in all
cells and conditions. The genes commonly proposed for this role, can serve as a rough measure of
the adequacy of the entire assay. “Reference ranges” should be established for several such genes,
selecting ones at high, intermediate, and low levels of expression. The intended use is for QC, NOT for
normalization of signals.
Exogenous Controls. The RNA sample can be spiked with cRNAs for several genes NOT found in human
samples, typically bacterial or plant sequences. There must be corresponding probes on the array. These
spiked RNAs undergo reverse transcription, labeling, and hybridization in parallel with the human
sample, providing a control for each of these steps.266
10.4 Reporting
The report should indicate limitations of the method. For example, CGH array results could note that this
method does not (at present) detect inversions or translocations. Many chromosomal alterations now
being uncovered do not yet have pathological correlations; the report should reflect the current level of
knowledge for each abnormality. Reports for SNP and SBH arrays should indicate if the method can
distinguish actual homozygosity from real or apparent loss of one allele (the latter due to additional
mutations affecting PCR or hybridization). Reporting for CGHA, SNPA, and SBHA should largely
follow reporting for nonarray methods. CLSI/NCCLS document MM1 — Molecular Diagnostic Methods
for Genetic Diseases provides a useful checklist of report elements.
©
Volume 25 MM12-P
10.5 Quality Assurance (QA)
10.5.1 Confirmatory Testing
Depending on the clinical significance of a finding, it might be desirable to confirm the result. There are
established alternative methods for confirmation of some results from each of the array applications
(cytogenetics, FISH, traditional sequencing, alternative SNP detection formats, real-time RT-PCR). Given
the potential range of sequence and cytogenetic abnormalities, it is not possible to provide confirmatory
testing for all abnormal results. For example, CGHA could be used to screen prenatal specimens for
numerous abnormalities, but FISH could be used to confirm specific significant findings. Reextraction of
DNA or RNA from the original sample, where possible, for confirmatory testing, might be appropriate in
some instances to ensure correct linkage of specimen to result.
• The laboratory protocol should specify which results, if any, require confirmatory testing, either by
repeat analysis or by an independent method.
• The laboratory should have a plan and schedule for confirmatory testing of at least some samples as
part of QA, including criteria for selection. In addition, review of extant follow-up testing and clinical
outcome can be important components of a QA program.
10.5.2 Performance Improvement Collaboration
Two cycles per year is common for unregulated analytes. The number of samples in each cycle is at the
discretion of the laboratory. Proficiency tests offered by proficiency testing providers might be suitable
for some assays (for example, SNP detection for Factor V Leiden). At present, it is most likely that
arrangements will have to be made with other laboratories performing similar assays to exchange masked
samples. As a last choice, one should retest the laboratory’s own samples in a blinded fashion. With any
system, the plan for proficiency testing should be explicitly documented, including the details of
arranging for blinded samples, the criteria required for passing, and the plan for corrective action if
necessary. For a complete description of the elements of a good proficiency program, refer to the most
current edition of the CLSI/NCCLS document MM14 — Proficiency Testing for Molecular Methods.
©
Number 18 MM12-P
References
1
Grody WW. Ethical issues raised by genetic testing with oligonucleotide microarrays. Mol Biotechnol. 2003;23(2):127-138.
2
ISO. International Vocabulary of Basic and General Terms in Metrology. Geneva: International Organization for Standardization; 1993.
3
ISO. In Vitro Diagnostic Medical Devices – Measurement of Quantities in Biological Samples – Metrological Traceability of Values
Assigned to Calibrators and Control Materials. ISO/FDIS 17511. Geneva: International Organization for Standardization; 2002.
4
ISO. In Vitro Diagnostic Medical Devices – Measurement of Quantities in Biological Samples – Metrological Traceability of Values
Assigned to Catalytic Concentration of Enzymes in Calibrators and Control Materials. ISO/FDIS 18153. Geneva: International
Organization for Standardization; 2002.
5
ISO. Statistics-Vocabulary and Symbols – Part 1: Probability and General Statistical Terms. ISO 3534-1. Geneva: International
Organization for Standardization. 1993.
6
42 Federal Register. February 28, 1992. Rules and Regulations; Part 493, Subpart K, Subsection 493.1218 (b) (2): 7166.
7
ISO. Quality Management Systems – Fundamentals and Vocabulary. ISO 9000. Geneva: International Organization for Standardization;
2000.
8
WHO. Expert Committee on Biological Standardization. Glossary of Terms for Biological Substances Used for Texts of the
Requirements. WHO unpublished document BS/95.1793. World Health Organization: Geneva: 1995.
9
Beaucage SL. Strategies in the preparation of DNA oligonucleotide arrays for diagnostic applications. Curr Med Chem.
2001;8:1213-1244.
10
Blanchard A. Synthetic DNA arrays. Genet Eng (NY). 1998;20:111-123.
11
Cheung VG, Morley M, Aguilar F, Massimi A, Kucherlapati R, Childs G. Making and reading microarrays. Nat Genet. 1999; 21: 15-19.
12
Fortina P, Graves D, Stoeckert, Jr C, McKenzie S, Surrey S. Technology options and applications of microarrays. In: Cheng J, Kricka LJ,
eds. Biochip Technology. Philadelphia: Harwood Academic Publishers; 2001:185-215.
13
Heller MJ. DNA microarray technology: Devices, systems, and applications. Annu Rev Biomed Eng. 2002;4:129-153.
14
Holloway AJ, van Laar RK, Tothill RW, Bowtell DD. Options available--from start to finish--for obtaining data from DNA
microarrays II. Nat Gen. 2002;32 Suppl:481-489.
15
Jordan B (ed). DNA microarrays: Gene expression applications. Berlin: Springer; 2001.
16
Jordan B. Historical background and anticipated developments. Ann NY Acad Sci. 2002;975:24-32.
17
Kricka LJ, Fortina P. Microarrays technology and applications: An all-language literature survey, including books, and patents. Clin
Chem. 2001;47:1479-1482.
18
Nguyen DV, Arpat AB, Wang N, Carroll RJ. DNA microarray experiments: Biological and technological aspects. Biometrics 2002;58:701-
717.
19
Rampal JB (ed). DNA arrays. Totowa, NJ: Humana Press; 2001.
20
Schena M (ed). DNA microarrays. Oxford, United Kingdom: Oxford University Press. 1999.
21
Schena M (ed). Microarray biochip technology. Natick, MA: Eaton Publishing. 2000.
22
Wilson DS, Nock S. Recent developments in protein microarray technology. Angew Chem Intl. 2003;42:494-500.
23
Lennon GG, Lehrach H. Hybridization analyses of arrayed cDNA libraries. Trends Genet. 1991;7: 314-317.
24
Pietu G, Alibert O, Guichard V, et al. Novel gene transcripts preferentially expressed in human muscles revealed by quantitative
hybridization of a high density cDNA array. Genome Res. 1996;6:492-503.
25
Fodor SP, Read JL, Pirrung MC, Stryer L, Lu AT, Solas D. Light-directed, spatially addressable parallel chemical synthesis Science. 1991;
251:767-773.
26
Pease AC, Solas, D, Sullivan EJ, Cronin MT, Holmes CP, Fodor SP. Light-generated oligonucleotide arrays for rapid DNA sequence
analysis. Proc Natl Acad Sci USA. 1994; 91:5022-5026.
27
Matson RS, Rampal J, Pentoney SL Jr, Anderson PD, Coassin P. Biopolymer synthesis on polypropylene supports: oligonucleotide arrays.
Anal Biochem. 1995;224:110-116.
©
Volume 25 MM12-P
28
Guschin D, Drobishev A, Dubiley S, Mirzabekov A. DNA analysis and diagnostics on oligonucleotide microchips. Proc Natl Acad Sci
USA. 1996, 93: 4913-4918.
29
Guschin D, Yershov G, Zaslavsky A, et al. Manual manufacturing of oligonucleotide, DNA, and protein microchips. Anal Biochem.
1997a;250:203-211.
30
Guschin DY, Mobarry BK, Proudnikov D, Stahl DA, Rittmann BE, Mirzabekov AD. Oligonucleotide microchips as genosensors for
determinative and environmental studies in microbiology. Appl Environ Microbiol. 1997b;63:2397-2402.
31
Heller MJ, Forster AH, Tu E. Active microeletronic chip devices which utilize controlled electrophoretic fields for multiplex DNA
hybridization and other genomic applications. Electrophoresis. 2000;21:157-164.
32
Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray.
Science. 1995;270:467-470.
33
Stillman BA, Tonkinson JL. FAST slides: a novel surface for microarrays. Biotechniques. 2000;29:630-635.
34
Sharpe AN, Michaud GL. Hydrophobic-grid membrane filters: A new approach to microbiological assays. Appl Microbiol.
1974;28:223-225.
35
Cheng J, Sheldon EL, Wu L, et al. Preparation and hybridization analysis of DNA/RNA from E. coli on microfabricated bioelectronic
chips. Nat Biotechnol. 1998;16:541-546.
36
Bidan G, Billon M, Galasso K, et al. Electropolymerization as a versatile route for immobilizing biological species onto surfaces.
Application to DNA biochips. Appl Biochem Biotechnol. 2000;89:183-193.
37
Livache T, Roget A, Dejean E, Barthet C, Bidan G, Teoule R. Preparation of a DNA matrix via an electrochemically directed
copolymerization of pyrrole and oligonucleotides bearing a pyrrole group. Nucleic Acids Res. 1994;22:2915-2921.
38
Livache T, Bazin H, Caillat P, Roget A. Electroconducting polymers for the construction of DNA or peptide arrays on silicon chips.
Biosens Bioelectron. 1998a; 13: 629-634.
39
Livache T, Fouque B, Roget A, et al. Polypyrrole DNA chip on a silicon device: example of hepatitis C virus genotyping. Anal Biochem.
1998b; 255: 188-194.
40
Lemmo AV, Rose DJ, Tisone TC. Inkjet dispensing technology: Applications in drug discovery. Curr Opin Biotechnol. 1998; 9: 615-617.
41
Stimpson DI, Cooley PW, Knepper SM, Wallace DB. Parallel production of oligonucleotide arrays using membranes and reagent jet
printing. Biotechniques. 1998; 25: 886-890.
42
Taylor S, Papen R. Non-contact microarraying technologies. In: Cheng J, Kricka LJ, eds. Biochip Technology. Philadelphia: Harwood
Academic Publishers; 2001:97-114.
43
Guo Z, Guilfoyle RA, Thiel AJ, Wang R, Smith LM. Direct fluorescence analysis of genetic polymorphisms by hybridization with
oligonucleotide arrays on glass supports. Nucleic Acids Res. 1994; 22: 5456-5465.
44
Southern EM, Maskos U, Elder JK. Analyzing and comparing nucleic acid sequences by hybridization to arrays of oligonucleotides:
evaluation using experimental models. Genomics. 1992; 13: 1008-1017.
45
Isola NR, Allman SL, Golovlev VV, Chen CH. MALDI-TOF mass spectrometric method for detection of hybridized DNA oligomers. Anal
Chem. 2001; 73: 2126-2131.
46
Sanguedolce LA, Chan V, McKenzie S, Surrey S, Fortina P, Graves D. Fundamental studies of DNA adsorption and hybridization on solid
surfaces. In: Dinh S, De Nuzzio J, Comfort AR, eds. Intelligent Materials for Controlled Release. Washington, DC: American Chemical
Society; 1999, 728:22-30.
47
Zou ZL, Wang SQ, Wang ZQ. [Preparation optimization and properties of the aldehyde microscopic slides for oligonucleotide microarray
fabrication]. Sheng Wu Gong Cheng Xue Bao. 2001; 17: 498-502.
48
Halliwell CM, Cass AEG. A factorial analysis of silanization conditions for the immobilization of oligonucleotides on glass surfaces. Anal
Chem. 2001; 73:2476-2483.
49
Bordoni R, Consolandi C, Castiglioni B, et al. Investigation of the multiple anchors approach in oligonucleotide microarray preparation
using linear and stem-loop structured probes. Nucleic Acids Res. 2002; 30: E34-4.
50
Proudnikov D, Timofeev E, Mirzabekov A. Immobilization of DNA in polyacrylamide gel for the manufacture of DNA and DNA-
oligonucleotide microchips. Anal Biochem. 1998,259:34-41.
51
Maskos U, Southern EM. Parallel analysis of oligodeoxyribonucleotide (oligonucleotide) interactions. I. Analysis of factors
influencing oligonucleotide duplex formation. Nucleic Acids Res. 1992a; 20: 1675-1678.
©
Number 18 MM12-P
52
Maskos U, Southern EM. Oligonucleotide hybridizations on glass supports: a novel linker for oligonucleotide synthesis and
hybridization properties of oligonucleotides synthesised in situ. Nucleic Acids Res. 1992b; 20: 1679-1684.
53
Maskos U, Southern EM. A study of oligonucleotide reassociation using large arrays of oligonucleotides synthesised on a glass support.
Nucleic Acids Res. 1993a; 21: 4663-4669.
54
Maskos U, Southern EM. A novel method for the analysis of multiple sequence variants by hybridisation to oligonucleotides. Nucleic
Acids Res. 1993b; 21: 2267-2268.
55
Butler JH, Cronin M, Anderson KM, et al. In situ synthesis of oligonucleotide arrays by using surface tension. J Am Chem Soc. 2001; 123:
8887-8894.
56
Barone AD, Beecher JE, Bury PA, et al. Photolithographic synthesis of high-density oligonucleotide probe arrays. Nucleosides Nucleotides
Nucleic Acids. 2001; 20: 525-531.
57
McGall GH, Fidanza JA. Photolithographic synthesis of high-density oligonucleotide arrays. Methods Mol Biol. 2001; 170: 71-101.
58
Singh-Gasson S, Green RD, Yue Y, et al. Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror
array. Nat Biotechnol. 1999; 17: 974-978.
59
Gao X, LeProust E, Zhang H, et al. A flexible light-directed DNA chip synthesis gated by deprotection using solution photogenerated
acids. Nucleic Acids Res. 2001; 29: 4744-4750.
60
McGall G, Labadie J, Brock P, Wallraff G, Nguyen T, Hinsberg W. Light-directed synthesis of high-density oligonucleotide arrays using
semiconductor photoresists. Proc Natl Acad Sci USA. 1996; 93: 13555-13560.
61
Lipshutz RJ, Fodor SPA, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nat Genet. 1999; 21: 20-24.
62
Peck K, Sher Y-P. Application of enzyme colorimetry for cDNA microarray deetection. In: Cheng J, Kricka LJ, eds. Biochip Technology.
Philadelphia: Harwood Academic Publishers; 2001:325-340.
63
Hacia JG, Edgemon K, Sun B, Stern D, Fodor SP, Collins FS. Two color hybridization analysis using high density oligonucleotide arrays
and energy transfer dyes. Nucleic Acids Res. 1998; 26:3865-3866.
64
Fortina P, Delgrosso K, Sakazume T, et al. Simple two-color array-based approach for mutation detection. Eur J Hum Genet. 2000; 8: 884-
894.
65
Waddell E, Wang Y, Stryjewski W, et al. High-resolution near-infrared imaging of DNA microarrays with time-resolved acquisition of
fluorescence lifetimes. Anal Chem. 2000; 72: 5907-5917.
66
Van DeRijke F, Zijlmans H, Li S, et al. Up-converting phosphor reporters for nucleic acid microarrays. Nat Biotechnol. 2001; 19: 273-276.
67
Akhavan-Tafti H, Reddy LV, Siripurapu S, Schoenfelner BA, Handley RS, Schaap AP. Chemiluminescent detection of DNA in low- and
medium-density arrays. Clin Chem. 1998,44:2065-2066.
68
Bernard K, Auphan N, Granjeaud S, et al. Multiplex messenger assay: simultaneous, quantitative measurement of expression of many
genes in the context of T cell activation. Nucleic Acids Res. 1996; 24: 1435-1442.
69
Bertucci F, Bernard K, Loriod B, et al. Sensitivity issues in DNA array-based expression measurements and performance of nylon
microarrays for small samples. Hum Mol Genet. 1999; 8: 1715-1722.
70
Chen JJ, Wu R, Yang PC, et al. Profiling expression patterns and isolating differentially expressed genes by cDNA microarray system with
colorimetry detection. Genomics. 1998, 51: 313-324.
71
Lebrun S. Colorimetric microarray assays for rheumatoid arthritis using a recombinat proteome library. In Protien Microarrays (ed. M.
Schena). Sudbury, MA: Jones and Bartlett Publishers; 2005:119-135.
72
Shalon D, Smith SJ, Brown PO. A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe
hybridization. Genome Res. 1996; 6: 639-645.
73
Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW. Parallel human genome analysis: microarray-based expression monitoring
of 1000 genes. Proc Natl Acad Sci USA. 1996; 93: 10614-10619.
74
Pastinen T, Kurg A, Metspalu A, Peltonen L, Syvanen AC Minisequencing: A specific tool for DNA analysis and diagnostics on
oligonucleotide arrays. Genome Res. 1997; 7: 606-614.
75
Shumaker JM, Metspalu A, Caskey CT. Mutation detection by solid phase primer extension. Hum Mutat. 1996; 7: 346-354.
76
Pastinen T, Raitio M, Lindroos K, Tainola P, Peltonen L, Syvanen AC. A system for specific, high-throughput genotyping by allele-
specific primer extension on microarrays. Genome Res. 2000; 10: 1031-1042.
©
Volume 25 MM12-P
77
Ladner DP, Leamon JH, Hamann S, et al. Multiplex detection of hotspot mutations by rolling circle-enabled universal microarrays. Lab
Invest. 2001; 81: 1079-1086.
78
Cronin MT, Fucini RV, Kim SM, Masino RS, Wespi RM, Miyada CG. Cystic fibrosis mutation detection by hybridization to light-
generated DNA probe arrays. Hum Mutat. 1996;7:244-55.
79
Murphy GM Jr, Pollock BG, Kirshner MA, et al. CYP2D6 genotyping with oligonucleotide microarrays and nortriptyline
concentrations in geriatric depression. Neuropsychopharmacology. 2001;25:737-743.
80
Yoon YR, Cha IJ, Shon JH, et al. Relationship of paroxetine disposition to metoprolol metabolic ratio and CYP2D6*10 genotype of Korean
subjects. Clin Pharmacol Ther. 2000;67:567-576.
81
Nikoloff D, Shim JC, Fairchild M, et al. Association between CYP2D6 genotype and tardive dyskinesia in Korean schizophrenics. The
Pharmacogenomics Journal. 2002;2:400-407.
82
Chou WH, Yan FX, Robbins-Weilert DK, et al. Comparison of two CYP2D6 genotyping methods and assessment of genotype-phenotype
relationships. Clin Chem. 2003;49:542-551.
83
Chee M, Yang R, Hubbell E, et al. Accessing genetic information with high-density DNA arrays. Science. 1996;274:610-614.
84
Hacia JG, Sun B, Hunt N, et al. Strategies for mutational analysis of the large multiexon ATM gene using high-density
oligonucleotide arrays. Genome Res. 1998;8:1245-1258.
85
Hacia JG, Brody LC, Chee MS, Fodor SP, Collins FS. Detection of heterozygous mutations in BRCA1 using high density
oligonucleotide arrays and two-colour fluorescence analysis. Nat Genet. 1996;14:441-447.
86
Ahrendt SA, Halachmi S, Chow JT, et al. Rapid p53 sequence analysis in primary lung cancer using an oligonucleotide probe array. Proc
Natl Acad Sci U S A. 1999;96:7382-7387.
87
Wen WH, Bernstein L, Lescallett J, et al. Comparison of TP53 mutations identified by oligonucleotide microarray and
conventional DNA sequence analysis. Cancer Res. 2000;60:2716-22.
88
Wikman FP, Lu ML, Thykjaer T, et al. Evaluation of the performance of a p53 sequencing microarray chip using 140 previously sequenced
bladder tumor samples. Clin Chem. 2000;46:1555-61.
89
Wu L, Patten N, Yamashiro CT, Chui B. Extraction and amplification of DNA from formalin-fixed, paraffin-embedded tissues. Appl
Immunohistochem Mol Morphol. 2002;10:269-274.
90
Wang DG, Fan JB, Siao CJ, et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human
genome. Science. 1998;280:1077-1082.
91
Lindblad-Toh K, Tanenbaum DM, Daly MJ, et al. Loss-of-heterozygosity analysis of small-cell lung carcinomas using single-nucleotide
polymorphism arrays. Nat Biotechnol. 2000;18:1001-1005.
92
Mei R, Galipeau PC, Prass C, et al. Genome-wide detection of allelic imbalance using human SNPs and high-density DNA arrays. Genome
Res. 2000.;10:1126-1137.
93
Lindblad-Toh K. In: DNA Microarrays: A Molecular Cloning Manual. Bowtell D and Sambrook J, eds. Cold Spring Harbor, NY: Cold
Spring Harbor Press; 2003:439-452.
94
Lindblad-Toh K, Winchester E, Daly MJ, et al. Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse.
Nat Genet. 2000;24:381-386.
95
Dong S, Wang E, Hsie L, Cao Y, Chen X, Gingeras TR. Flexible use of high-density oligonucleotide arrays for single-nucleotide
polymorphism discovery and validation. Genome Res. 2001:11:1418-1424.
96
Patil N, Berno AJ, Hinds DA, et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human
chromosome 21. Science. 2001;294:1719-1723.
97
Roses R. Pharmacogenetics. Hum Mol Genet. 2001;10:2261-2267.
98
Umek RM, et al. Electronic detection of nucleic acids: a versatile platform for molecular diagnostics. J Mol Diagn. 2001;3(02):74-84.
99
Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nat Genet. 1999;21:20-24 Review.
100
Pollack JR. In: DNA Microarrays: A Molecular Cloning Manual. Bowtell D and Sambrook J, eds. Cold Spring Harbor, NY: Cold Spring
Harbor Press; 2003: 168-177.
101
Chen JJ, Wu R, Yang PC, et al. Profiling expression patterns and isolating differentially expressed genes by cDNA microarray system with
colorimetry detection. Genomics. 1998;51:313-324.
©
Number 18 MM12-P
102
Wildsmith SE, Archer GE, Winkley AJ, Lane PW, Bugelski PJ. Maximization of signal derived from cDNA microarrays. Biotechniques
2001;30:202-208.
103
Randolph JB, Waggoner AS. Stability, specificity and fluorescent brightness of multiply-labeled fluorescent DNA probes. Nucleic Acids
Research, 1997;25:2923-2929.
104
Bowtel D, Sambrook J, eds. DNA Microarrays: A Molecular Cloning Manual. New York: Cold Spring Harbor Press; 2003.
105
Van Gelder RN, von Zastrow ME, Yool A, Dement WC, Barchas JD, Eberwine JH. Amplified RNA synthesized form limited quantities of
heterogeneous cDNA. Proc Natl Acad Sci USA, 1990;87:1663-1667.
106
Phillips J, Eberwine JH. Antisense RNA Amplification: A Linear Amplification Method for Analyzing the mRNA Population from Single
Living Cells Methods. 1996:10:283-288.
107
Luzzi V, Mahadevappa M, Raja R, Warrington JA, Watson MA (2003). Accurate and reproducible gene expression profiles from laser
capture microdissection, transcrip amplification, and high density oligonucleotide microarray analysis. J Mol Diagn. 2003;5(1):9-15.
108
Wang E, Miller ED, Ohnmacht GA, Liu ET, Marincola EM High-fidelity mRNA amplification for gene profiling. Nat
Biotechnol. 2000; 18:457-459.
109
Ohnmacht GA, Wang E, Mocellin S, et al. Short-term kinetics of tumor antigen expression in vivo. J Immunol. 2001; 167:1809-1820.
110
Karsten SL, Van Deerlin, VMD, Sabatti C, Gill LH, Geschwind DH. An evaluation of tyramide signal amplification and archived fixed and
frozen tissue in microarray gene expression analysis. Nucleic Acids Res. 2002;30:e4
111
Schena M. Microarray Analysis. Hoboken, NJ: John Wiley & Sons, Inc. 2003.
112
Gerry NP, Witowski NE, Day J, Hammer RP, Barany G, Barany F. Universal DNA microarray method for multiplex detection of low
abundance point mutations. J Mol Biol. 1999;292:251-262.
113
Fan JB, Chen X, Halushka MK, et al. Parallel genotyping of human SNPs using generic high-density oligonucleotide tag arrays. Genome
Res. 2000;10:853-860.
114
Pastinen T, Raito M, Lindroos K, Tainola P, Peltonen L, Syvänen A-C. A system for specific, high-throughput genotyping by allele-specific
primer extension on microarrays. Genome Res. 2000;10:1031-1042.
115
Huber M, Losert D, Hiller R, Harwanegg C, Mueller MW, Schmidt WM. Detection of single base alterations in genomic DNA by solid
phase polymerase chain reaction on oligonucleotide microarrays. Anal Biochem. 2001;299:24-30.
116
Freier SM, Petersheim M, Hickey DR, Turner DH. Thermodynamic studies of RNA stability. J Biomol Struct Dyn. 1984; 1: 1229-1242.
117
Wetmur JG, Davidson N. Kinetics of renaturation of DNA. J Mol Biol. 1968; 31: 349-370.
118
Miyada CG, Wallace RB. Oligonucleotide hybridization techniques. Methods Enzymol. 1987; 154: 94-107.
119
Southern E, Mir K, Shchepinov M. Molecular interactions on microarrays. Nat Genet. 1999;21:5-9.
120
Sambrook J, Fritsch EF, Maniatis T. Molecular Cloning: A Laboratory Manual. Second edition, Cold Spring Harbor: Cold Spring Harbor
Laboratory; 1989.
121
Wood WI, Gitschier J, Lasky LA, Lawn RM. Base composition-independent hybridization in tetramethylammonium chloride: a method for
oligonucleotide screening of highly complex gene libraries. Proc Natl Acad Sci USA. 1985; 82: 1585-1588.
122
Casey J, Davidson N. Rates of formation and thermal stabilities of RNA:DNA and DNA:DNA duplexes at high concentrations of
formamide. Nucleic Acid Res. 1977; 4: 1539-1545.
123
Bowtell DL. Options available—from start to finish—for obtaining expression data by microarray. Nat Genet. 1999; 21: 25-32.
124
Matson RS, Rampal J, Pentoney SL Jr, Anderson PD, Coassin P. Biopolymer synthesis on polypropylene supports: Oligonucleotide arrays.
Anal Biochem. 1995; 224: 110-116.
125
Maskos U, Southern EM. A novel method for the analysis of multiple sequence variants by hybridisation to oligonucleotide arrays. Nucleic
Acid Res. 1993a; 21: 2267-2268.
126
Williams JC, Case-Green SC, Mir KU, Southern EM. Studies of oligonucleotide interactions by hybridisation to arrays: the influence of
dangling ends on duplex yield. Nucleic Acids Res. 1994; 22: 1365–1367.
127
Maskos U, Southern EM. A study of oligonucleotide reassociation using large arrays of oligonucleotides synthesized on a glass support.
Nucleic Acids Res. 1993b; 21: 4663-4669.
128
Southern EM. Detectin of specific sequences among DNA fragments separated by gel elctrophoresis. J Mol Biol. 1975;98:503-517.
129
Speicher MR, Ballard SG, Ward DC. Karyotyping by combinatorial multi-fluor FISH. Nat Genet. 1996; 12: 368-375.
©
Volume 25 MM12-P
130
Espejo A, Cote J, Bednarek A, Richard S, Bedford MT. A protein-domain microarray identifies novel protein-protien interactions. Biochem
J. 2000;367:6697-6702.
131
Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM. Expression profiling using cDNA microarrays. Nat Genet. 1999; 21: 10-14.
132
Hubbell E, Liu W-M, Mei R. Robust estimators for expression analysis. Bioinformatics. 2002; 18: 1585-1592.
133
Edman CF, Raymond DE, Wu DJ, et al. Electric field directed nucleic acid hybridization on microchips. Nucleic Acids Res. 1997; 25:
4907-4914.
134
Heller MJ, Forster AH, Tu E. Active microeletronic chip devices which utilize controlled electrophoretic fields for multiplex DNA
hybridization and other genomic applications. Electrophoresis. 2000; 21: 157-164.
135
Gurtner C, Tu E, Jamshidi N, et al. Microelectronic array devices and techniques for electric field enhanced DNA hybridization in low-
conductance buffers. Electrophoresis. 2002; 23: 1543–1550.
136
Barrett JC, Kawasaki ES. Microarrays: the use of oligonucleotides and cDNA for the analysis of gene expression. Drug Discovery Today.
2003;8:134-141.
137
Hughes TR, Mao M, Jones AR, et al. Nature Biotechnology. 2001;19, 342-347.
138
Lucito R, Healy J, Alexander J, et al. Representational oligonucleotide microarray analysis: a high-resolution method to detect genome copy
number variation. Genome Research. 2003;13, 2291-2305.
139
Moreau Y, Aerts S, De Moor B, De Strooper B, Dabrowski M. Comparison and meta-analysis of microarray data: From the bench to the
computer desk. Trends in Genetics. 2003;19, 570-577.
140
van’t Veer LJ, Dai H, van de Vijver MJ, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415, 530-
536.
141
Van de Vijver MJ, He YD, van’t Veer LJ, et al. A gene expression signature as a predictor of survival in breast cancer. New Eng J Med.
2002;347, 1999-2009.
142
Brennan C, Zhang Y, Leo C, et al. High-resolution global profiling of genomic alterations with long oligonucleotide microarray. Cancer
Research. 2004; 64, 4744-4748.
143
Barrett MT, Scheffer A, Ben-Dor A, et al. Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA.
Proc Natl Acad Sci USA. 2004; 101: 17765-17770.
144
Lage JM, Leamon JH, Pejovic T, et al. Whole genome analysis of genetic alterations in small DNA samples using hyperbranched strand
displacement amplification and array-CGH. Genome Research. 2003; 13, 294-307.
145
Paez JG, Lin M, Beroukhim R, et al. Genome coverage and sequence fidelity of phi29 polymerase-based multiple strand displacement
whole genome amplification. Nucleic Acids Res. 2004; 32: e71.
146
Nielson TO, Hsu FD, Jensen K, et al. Immunohistochemical and clinical characterization of the basal-like subtype of invasive breast
carcinoma. Clinical Cancer Research. 2004; 10: 5367-5374.
147
Nielsen HB, Wernersson R, Knudsen S. Design of oligonucleotides for microarrays and perspectives for design of multi-transcriptome
arrays. Nucleic Acids Res. 2003; 31: 3491-3496.
148
Rouillard J-M, Zuker M, Gulari E. OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic
approach. Nucleic Acids Res. 2003; 31: 3057-3062.
149
Korkola JE, Estep ALH, Pejevar S, DeVries S, Jensen R, Waldman FM. Optimizing stringency for expression microarrays. Biotechniques.
2003; 35: 828-835.
150
Stillman B, Tonkinson J. Expression microarray hybridization kinetics depend on length of the immobilized DNA but are independent of
immobilization substrate. Anal Biochem. 2001; 295: 149-157.
151
Henry M, Stevens P, Sun J, Kelso D. Real-time measurements of DNA hybridization on microparticles with fluorescence resonance energy
transfer. Anal Biochem. 1999; 27: 204-214.
152
Dai H, Meyer M, Stepaniants S, Ziman M, Stoughton R. Use of hybridization kinetics for differentiating specific from non-specific binding
to oligonucleotide microarrays. Nucleic Acids Res. 2002; 30:e86.
153
Li J, Spletter ML, Johnson JA. Dissecting tBHQ induced ARE driven gene expression through long and short oligonucleotide arrays.
Physiol Genomics. In press (2005).
154
He YD, et al. Microarray standard data set and figures of merit for comparing data processing methods and experiment designs.
Bioinformatics. 2003; 19, 956-965.
©
Number 18 MM12-P
155
Rocke DM, Durbin BJ. A model for measurement error for gene expression arrays. Comp Biol. 2002; 8, 557-569.
156
Winzeler EA, Schena M, et al. Fluorescence-based expression monitoring using microarrays. Methods in Enzymology. 1999; 306: 3-18.
157
Case-Green S, Prithchard C, et al. Use of Oligonucleotide Arrays in Enzymatic Assays: Assay Optimization. Oxford: Oxford University
Press; 1999.
158
Witowski NE, Leiendecker-Foster C, et al. Microarray-based detection of select cardiovascular disease markers. Biotechniques. 2000;29(5).
159
Murphy DB. Fundamentals of Light Microscopy and Electronic Imaging. New York: Wiley-Liss; 2001.
160
Watson SJ, Akil H. Gene chips and arrays revealed: a primer on their power and their uses. Biological Psychiatry. 1999; 45:533-543.
161
Yershov G, Barsky V, Belgovskiy A, et al. DNA analysis and diagnostics on oligonucleotide microchips. Proceedings of the National
Academy of Sciences of the United States of America. 1996; 93:4913-4918.
162
Boa Z, Ma WL, Hu ZY, Rong S, Shi YB, Zheng WL. A method for evaluation of the quality of DNA microarray spots. J Biochem Mol
Biol. 2002; 35:532-535.
163
Okamoto T, Suzuki T, Yamamoto N. Microarray fabrication with covalent attachment of DNA using bubble jet technology [see comments].
Nature Biotechnology. 2000; 18:438-441.
164
Weil MR, Macatee T, Garner HR. Toward a universal standard: comparing two methods for standardizing spotted microarray data.
Biotechniques. 2002; 32:1310-1314.
165
Morozov VN, Morozova T. Electrospray deposition as a method for mass fabrication of mono- and multicomponent microarrays of
biological and biologically active substances. Analytical Chemistry. 1999; 71:3110-3117.
166
Zammatteo N, Jeanmart L, Hamels S, et al. Comparison between different strategies of covalent attachment of DNA to glass surfaces to
build DNA microarrays. Analytical Biochemistry. 2000; 280:143-150.
167
Hodgson J. Shrinking DNA diagnostics to fill the markets of the future. Nature Biotechnology. 1998; 16:725-727.
168
Graves DJ, Su HJ, McKenzie SE, Surrey S, Fortina P. System for preparing microhybridization arrays on glass slides. Anal Chem. 1998;
70:5085-5092.
169
Hughes TR, Mao M, Jones AR, et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nature
Biotechnology. 2001; 19:342-347.
170
Beier M, Hoheisel JD. Versatile derivatisation of solid support media for covalent bonding on DNA-microchips. Nucleic Acids Research.
1999; 27:1970-1977.
171
Saito I, Sugiyama H, Furukawa N, Matsuura T. Photoreaction of thymidine with primary amines. Application to specific
modification of DNA. Nucleic Acids Symposium Series. 1981:61-64.
172
Steel AB, Levicky RL, Herne TM, Tarlov MJ. Immobilization of nucleic acids at solid surfaces: effect of oligonucleotide length on layer
assembly. Biophysical Journal. 2000; 79:975-981.
173
Diehl F, Grahlmann S, Beier M, Hoheisel JD. Manufacturing DNA microarrays of high spot homogeneity and reduced
background signal. Nucleic Acids Research. 2001; 29:E38.
174
Peterson AW, Heaton RJ, Georgiadis RM. The effect of surface probe density on DNA hybridization. Nucleic Acids Research. 2001;
29:5163-5168.
175
Dubiley S, Kirillov E, Mirzabekov A. Polymorphism analysis and gene detection by minisequencing on an array of gel-
immobilized primers. Nucleic Acids Research. 1999; 27:e19.
176
Vasiliskov AV, Timofeev EN, Surzhikov SA, Drobyshev AL, Shick VV, Mirzabekov AD. Fabrication of microarray of gel-
immobilized compounds on a chip by copolymerization. Biotechniques. 1999; 27:592-594, 596-598, 600 passim.
177
Gerry NP, Witowski NE, Day J, Hammer RP, Barany G, Barany F. Universal DNA microarray method for multiplex detection of low
abundance point mutations. J Mol Biol. 1999; 292:251-262.
178
Syvanen AC. Accessing genetic variation: Genotyping single nucleotide polymorphisms. Nature Rev Genet. 2001; 2:930–942.
179
Favis R, Barany F. Mutation detection in K-ras, BRCA1, BRCA2, and p53 using PCR/LDR and a universal DNA microarray. Ann N Y
Acad Sci. 2000;906:39-43.
180
Drmanac R, Drmanac S. Sequencing by hybridization arrays. Methods Mol Biol. 2001;170:39-51.
181
Warrington JA, Shah NA, Chen X, et al. New developments in high-throughput resequencing and variation detection using high density
microarrays. Hum Mutat. 2002;19:402-409.
©
Volume 25 MM12-P
182
Schena M, ed. Genetic Screening and Diagnostics in Microarray Analysis. Hoboken, NJ: Wiley & Sons; 2003:387-404.
183
Stears RL, Martinsky T, Schena M. Trends in microarray analysis. Nature Medicine. 2003; 9:140-145.
184
Gitan RS, Shi H, Chen C-M, Yan, PS, Huang, T H-M. Methylation-specific oligonucleotide mciroarray: a new potentiual for high
throughput methylation analysis. Genome Res. 2001; 12:158-164.
185
Shi H, Maier S, Nimmrich I, et al. Oligonucleotide-based microarray for DNA Methylation analysis: principles and applications. J Cell
Biochem. 2003;88:103-143.
186
CDC, NIH. Biosafety in Microbiological and Biomedical Laboratories. 4th ed. Washington, DC: U.S. Government Printing Office: 1999.
187
Orlando C, Pinzani P, Pazzagli M. Developments in quantitative PCR Clin Chem Lab Med. 1998;36:255-269.
188
Freeman WM, Walker SJ, Vrana KE. Quantitative RT-PCR: Pitfalls and potential. BioTechniques. 1999;26:112-125.
189
Wang J. Nucleic Acids Res. 2000;28:3011-3016.
190
Loakes D. The applications of universal DNA base analogues. Nucleic Acid Res. 2001;29:2437-2447.
191
Drake JW, Charlesworth B, Charlesworth D, Crow JF. Rates of spontaneous mutation. Genetics. 1998;148:1667-1686.
192
Clementi M. Quantitative molecular analysis of virus expression and replication. J Clin Microbiol. 2000;38:2030-2036.
193
Mackay IM, Arden KE, Nitsche A. Real-time PCR in virology. Nucleic Acid Res. 2002;30:1292-1305.
194
Kallioniemi A, Kallioniemi OP, Sudar D, et al. Comparative genomic hybridization: a powerful new method for cytogenetic anlaysis of
solid tumors. Science. 1992; 258: 818-821.
195
Solinas-Toldo S, Lampel S, Stilgenbauer S, et al. Matrix-based comparative genomic hybridization: biochips to screen for genomic
imbalances. Genes, Chromosomes and Cancer. 1997; 20: 399-407.
196
Pinkel D, Segraves R, Sudar S, et al. Gray JW and Albertson DG (1998) High resolution analysis of DNA copy number variation in breast
cancer using comparative genomic hybridization to DNA microarrays. Nature Genetics. 20: 207-211.
197
Snijders AM, Norma Nowak N, Segraves R, et al. Assembly of microarrays for genome-wide measurement of DNA copy number by CGH.
Nature Genetics. 2001; 29: 263-264.
198
Hodgson G, Hager JH, Volik S, et al. Genome scanning with array CGH delineates regional alterations in mouse islet carcinomas. Nature
Genetics. 2001; 29: 459-464.
199
Pollack JR, Perou CM, Alizadeh AA, et al. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nature
Genetics. 1999; 23: 41-46.
200
Lucito R, West J, Reiner A, et al. Detecting gene copy number fluctuations in tumor cells by microarray analysis of genomic
representations. Genome Research. 2000; 10: 1726-1736.
201
Albertson DG, Ulstra B, Segraves R, et al. Quantitative mapping of amplicon structure by array CGH identifies vitamin D-24 hydroxylase
(CYP24) as a candidate oncogene. Nature Genetics. 2000; 25: 144-146.
202
Mohapatra G, Moore DH, Kim DH, et al. Analyses of brain tumor cell lines confirm a simple model of relationships among fluorescence in
situ hybridization, DNA index, and comparative genomic hybridization. Genes Chromosomes Cancer. 1997;20:311-319.
203
Sawitzki G. Quality control and early diagnostics for cDNA microarrays. R News. 2002;2(1):6-10.
204
Yang YH, Buckley MJ, Dudoit S, Speed T. Comparison of methods for image analysis on cDNA microarray data. Technical report #584.
University of California at Berkeley; 2000.
205
Kerr MK, Martin M, Churchill GA. Analysis of variance for gene expression microarray data. J Computational Biology. 2000;7:819-837.
206
Lee MT, Kuo FC, Whitmore GA, Sklar J. Importance of replication in microarray gene expression studies: Statistical methods and evidence
from repetitive cDNA hybridizations. PNAS. 2000;97(18):9834-9839.
207
Wolfinger RD, Gibson G, Wolfinger ED, et al. Assessing gene significance from cDNA microarray expression data via mixed models.
Journal of Computational Biology 2001;8(6):625-637.
208
Bolstad BM, Irizzary RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based
on variance and bias. Unpublished manuscript, submitted to Bioinformatics; 2002.
209
Barczak A, Rodriguez MW, Hanspers K, et al. Spotted long oligonucleotide arrays for human gene expression analysis. Genome Research.
2003;13:1775-1785.
©
Number 18 MM12-P
210
Li C, Wong WH. Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error applications. Genome
Biology. 2001;2(8): 1-11.
211
Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data.
Accepted for publication in Biostatistics; 2002.
212
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix Genechip probe level data. Nucleic Acids
Research. 2003;31(4):e15.
213
Lemon WJ, Palatini JJT, Krahe R, Wright FA. Theoretical and experimental comparisons of gene expression indexes for
oligonucleotide arrays. Bioinformatics. 2002;18(11): 1470-1476.
214
Kohane IS, Kho AT, Butte AJ. Microarrays for an Integrative Genomics. Cambridge, MA: MIT Press; 2003.
215
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer;
2001.
216
Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome- wide expression patterns. PNAS. 1998; 95: 14863-
14868.
217
Bhattacharjee A, Richards WG, Stauton J, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct
adenocarcinoma subclasses. PNAS. 2001;98(24):13790-13795.
218
Dudoit S, Yang YH, Callow MJ, Speed TP. Statistical methods for identifying differentially expressed genes in replicated cDNA
microarray experiments. Statistica Sinica. 2002;12, 111-139.
219
Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response.Proceedings of the
National Academy of Sciences. 2001;98:5116-5121.
220
Radmacher MD, McShane LM, Simon R. A paradigm for class prediction using gene expression profiles. J Computational Biology.
2002;9:505-512.
221
Segal M. Regression trees for censored data. Biometric. 1998;44:35-47.
222
Ripley BD. Pattern Recognition and Neural Networks. New York: Cambridge University Press; 1996.
223
Fleiss JL. Statistical Methods for Rates and Proportions, 2nd ed. New York: John Wiley & Sons; 1981.
224
Pagano M, Geauvreau K. Principles of Biostatistics. Belmont, CA: Duxbury Press; 1993.
225
Salzberg SL. On comparing classifiers: A critique of current research and methods. Data Mining and Knowledge Discovery. 1999;1:1-12.
226
Zhou X-H, Obuchowski NA, McClish DK. Statistical Methods in Diagnostic Medicine. New York: John Wiley & Sons; 2002.
227
Provost F, Fawcett T. Analysis and visualization of classifier performance: comparison under imprecise class and cost
distributions. Proc Third Internatl Conf on Knowledge Discovery and Data Mining (KDD-97). Huntington Beach, CA; c 1997.
228
Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data.
Biostatistics. 2003;4:249-264.
229
Li C, Wong WH. Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection.
Proceedings of the National Academy of Sciences. 98(1):31-36.
230
Dudley A, Aach, J, Steffen, Church GM. Measuring absolute expression with microarrays using a calibrated reference sample and an
extended signal intensity range. MASLINER (software) supplemental notes. 2002. http://arep.med.harvard.edu/.
231
Hoen PAC, Turk R, Boer JM, Sterrenburg E, et al. Intensity-based analysis of two-colour microarrays enables efficient and flexible
hybridization designs. Nucleic Acids Research. 2004; 32:e4.
232
Petricoin EF, Hackett JL, Lesko LJ, et al. Medical applications of microarray technologies: a regulatory science perspective. Nature
Genetics. 2002; 32(Supplement):474-479.
233
Schoor O, Weinschenk T, Hennenlotter J, et al. Moderate degradation does not preclude microarray analysis of small amounts of RNA.
BioTechniques. 2003;35:1192-1201.
234
Miller CL, Diglisic S, Leister F, et al. Evaluating RNA status for RT-PCR in extracts of postmortem human brain tissue.
BioTechniques. 2004; 36:628-633.
235
Gold D, Coombes K, Medhane D, et al. A comparative analysis of data generated using two different target preparation methods for
hybridization to high-density oligonucleotide microarrays. BMC Genomics. 2004;5:2.
©
Volume 25 MM12-P
236
Matsuzaki H, Loi H, Dong S, et al. Parallel genotyping of over 10,000 SNPs using a one-primer assay on a high-density
oligonucleotide array. Genome Research. 2004; 14:414-425.
237
Spiess A-N, Mueller N, Ivell R. Amplified RNA degradation in T7-amplification methods results in biased microarray
hybridizations. BMC Genomics. 2003;4:44.
238
Lucito R, Healy J, Alexander J, et al. Representational oligonucleotide microarray analysis: a high-resolution method to detect genome copy
number variation. Genome Research. 2003;13.
239
Cutler DJ, Zwick ME, Carrasquillo MM, et al. High-throughput variation detection and genotyping using microarrays. Genome Research.
2001; 11:1913-1925.
240
Manduchi E, Scearce LM, Brestelli JE, et al. Comparison of different labeling methods for two-channel high-density microarray
experiments. Physiological Genomics. 2002;10:169-179.
241
Naderi A, Ahmed AA, Barbosa-Morais NL, et al. Expression microarray reproducibility is improved by optimizing purification steps in
RNA amplification and labelling. BMC Genomics. 2004; 5:9.
242
Simon RM, McShane LM, Wright GW, et al. Design and analysis of DNA microarray investigations. Springer-Verlag. 2004; 38-51.
243
Forster T, Costa Y, Roy D, et al. Triple-target microarray experiments: a novel experimental strategy. BMC Genomics. 2004; 5:13.
244
Hessner MJ, Singh VK, Wang X, et al. Utilization of a laberled tracking oligonucledotide for visualization and quality control of spotted 70
mer arrays. BMC Genomics. 2004;5:12.
245
Korkola JE, Estep ALH, Pejavar S, et al. Optimizing stringency for expression microarrays. BioTechniques. 2004;35:828-835.
246
Dai H, Meyer M, Stepaniants S, et al. Use of hybridization kinetics for differentiating specific from non-specific binding to
oligonucleotide microarrays. Nucleic Acids Research. 2002;30:e86.
247
Lyng H, Badiee A, Svendsrud DH, et al. Profound influence of microarray scanner characteristics on gene expression ratios: analysis and
procedure for correction. BMC Genomics. 2004;5:10.
248
Workshop on Fluorescence Standards for Microarray Assays. National Institute of Standards and Technology. December 10, 2002.
http://www.cstl.nist.gov/biotech/fluormicroarray/FluorMicroarrayWkshp12-10-2002.html (Accessed 4/24/04).
249
Zong Y, Wang Y, Zhang S, Shi Y. How to evaluate a microarray scanner. In: Hardiman G, ed. Microarrays Methods and Applications:
Nuts and Bolts. Eagleville, PA: DNA Press LLC; 2003: 97-114.
250
Ahmed AA, Vias M, Iyer NG, et al. Microarray segmentation methods significantly influence data precision. Nucleic Acids Research. 2004;
32;e50.
251
Marzolf B, Johnson MH. Validation of microarray image anlaysis accuracy. BioTechniques. 2004;36:304-308.
252
Ishkanian AS, Malloff CA, Watson SK, et al. A tiling resolution DNA microarray with complete coverage of the human genome. Nature
Genetics. 2004;36:299-303.
253
Chen D-T, Lin S-H, Soong S-J. Gene selection for oligonucleotide array: An approach using PM probe level data. Bioinformatics. 2004;
20;854-862.
254
Rosati B, Grau F, Kuehler A, Rodriguez S, McKinnon D. Comparison of different probe-level analysis techniques for
oligonucleotide microarrays. BioTechniques. 2004;36:316-322.
255
Etienne W, Meyer MH, Peppers J, et al. Comparison of mRNA gene expression by RT-PCR and DNA microarray. BioTechniques.
2004;36:618-626.
256
Kothapalli R, Yoder SJ, Mane S, Loughran TP. Microaray results: How accurate are they? BMC Bioinformatics. 2002;3:22.
257
Kuo WP, Jenssen T-K, Butte AJ, et al. Analysis of matched mRNA measurements from two different microarray technologies.
Bioinformatics. 2002;18:405-412.
258
Rhodes DR, Barrette TR, Rubin MA, et al. Meta-analysis of microarrays: Interstudy validation of gene expression profiles reveals pathway
dysregulation in prostate cancer. Cancer Research. 2002;63:4427-4433.
259
Sorlie T, Tibshirani R, Parker J, et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Nat’l
Acad Sci (USA). 2003;100:8418-8423.
260
Sotiriou C, Neo S-Y, McShane LM, et al. Breast cancer classification and prognosis based on gene expression profiles from a population-
based study. Proc Natl Acad Sci. 2003; 100:10393-10398.
261
Wigle DA, Tsao M, Jurisica I. Making sense of lung-cancer gene-expression profiles. Genome Biology. 2004;5:309-312.
262
Van de Vijver JC, Yudong DH, et al. A gene expression signature as a predictor of survival in breast cancer. NEJM. 2002; 25:1999-2009.
©
Number 18 MM12-P
263
Tibshirani RJ, Efron B. Pre-validation and inference in microarrays. Statistical Applications in Genetics and Molecular Biology. 2002; 1:
Article 1.
264
Masayevsa BG, Ha P, Garrett-Mayer E, et al. Gene expression alterations over large chromosomal regions in cancers includemultiple genes
unrelated to malignant progression. Proc National Acad Sci (USA). Online May 20, 2004, 10.1073/pnas.0400027101
265
Novoradovskaya N, Whitfield MI, Basehore LS, et al. Universal reference RNA as a standarad for microarray experiments. BMC
Genomics. 2004;5:20.
266
External RNA Control Consortium Workshop: Specifications for Universal External RNA Spike-In Controls. National Institute of Standards
and Technology. Gaithersburg, MD, USA. December 2, 2003. http://www.cstl.nist.gov/biotech/workshops/ERCC2003/index.html
(Accessed 4/24/04)
©
Volume 25 MM12-P
NOTES
©
Number 18 MM12-P
The Quality System Approach

Clinical and Laboratory Standards Institute subscribes to a quality management system approach in the development
of standards and guidelines, which facilitates project management; defines a document structure via a template; and
provides a process to identify needed documents. The approach is based on the model presented in the most current
version of CLSI/NCCLS document HS1—A Quality Management System Model for Health Care. The quality
management system approach applies a core set of “quality system essentials” (QSEs), basic to any organization, to
all operations in any healthcare service’s path of workflow (i.e., operational aspects that define how a particular
product or service is provided). The QSEs provide the framework for delivery of any type of product or service,
serving as a manager’s guide. The quality system essentials (QSEs) are:
Documents & Records Equipment Information Management Process Improvement

Organization Purchasing & Inventory Occurrence Management Service & Satisfaction
Personnel Process Control Assessment Facilities & Safety
MM12-P addresses the quality system essentials (QSEs) indicated by an “X.” For a description of the other Clinical
and Laboratory Standards Institute documents listed in the grid, please refer to the Related CLSI/NCCLS
Publications section on the following page.
Purchasing &
Improvement
Organization
Management
Management
Information
Satisfaction
Assessment
Facilities &
Occurrence
Documents
Equipment
& Records
Service &
Personnel
Inventory
Control
Process
Process
Safety
X GP29
C24
EP10
MM14
Adapted from CLSI/NCCLS document HS1—A Quality Management System Model for Health Care.
Path of Workflow
A path of workflow is the description of the necessary steps to deliver the particular product or service that the
organization or entity provides. For example, CLSI/NCCLS document GP26⎯Application of a Quality
Management System Model for Laboratory Services defines a clinical laboratory path of workflow which consists of
three sequential processes: preexamination, examination, and postexamination. All clinical laboratories follow these
processes to deliver the laboratory’s services, namely quality laboratory information.
MM12-P addresses the clinical laboratory path of workflow steps indicated by an “X.” For a description of the other
Clinical and Laboratory Standards Institute documents listed in the grid, please refer to the Related CLSI/NCCLS
Publications section on the following page.
Preexamination Examination Postexamination

receipt/processing
Sample collection
Results reporting
Sample transport
Results review
and follow-up
and archiving
Interpretation
Examination
Examination
management
ordering
Sample
Sample
X X X X X X X
MM6 MM1 MM1 MM1 MM1 MM1 MM6 MM1
MM2 MM3 MM2 MM2 MM2 MM3
MM5 MM6 MM3 MM3 MM3 MM5
MM6 MM5 MM5 MM5 MM6
MM6 MM6 MM6
Adapted from CLSI/NCCLS document HS1—A Quality Management System Model for Health Care.
©
Volume 25 MM12-P
Related CLSI/NCCLS Publications

C24-A2 Statistical Quality Control for Quantitative Measurements: Principles and Definitions; Approved
Guideline – Second Edition (1999). This guideline provides definitions of analytical intervals; plans for
quality control procedures; and guidance for quality control applications.
EP10-A2 Preliminary Evaluation of Quantitative Clinical Laboratory Methods; Approved Guideline - Second
Edition (2002). This guideline addresses experimental design and data analysis for preliminary evaluation of
the performance of an analytical method or device.
GP29-A Assessment of laboratory Tests when Proficiency Testing is Not Available; approved Guideline (2002).
This guideline will suggest workable alternatives for evaluating the accuracy of an assay when standard
interlaboratory comparison programs are unavailable.
MM1-A Molecular Diagnostic Methods for Genetic Diseases; Approved Guideline (2000). This document
provides guidance for the use of molecular biologic techniques for clinical detection of heritable mutations
associated with genetic disease.
MM2-A2 Immunoglobulin and T-Cell Receptor Gene Rearrangement Assays; Approved Guideline (2002). This
document is a guideline for conducting molecular tests of immunoglobulin and T-cell receptor gene
arrangements.
MM3-A Molecular Diagnostic Methods for Infectious Diseases; Approved Guideline (1995). This document
includes guidelines for use of nucleic acid probes and nucleic acid amplification techniques for detection of
target sequences specific to particular microorganisms; limitations; quality assurance; proficiency testing; and
interpretation of results.
MM5-A Nucleic Acid Amplification Assays for Molecular Hematopathology; Approved Guideline (2003). This
guideline addresses the performance and application of assays for gene translocations by both PCR and RT-
PCR techniques and includes information on specimen collection, sample preparation, test reporting, test
validation, and quality assurance.
MM6-A Quantitative Molecular Methods for Infectious Diseases; Approved Guideline (2003). This document
provides guidance for the development and use of quantitative molecular methods, such as nucleic acid
probes and nucleic acid amplification techniques of the target sequences specific to particular
microorganisms. It also presents recommendations for quality assurance, proficiency testing, and
interpretation of results.
MM14-P Quality Assessment Programs for Molecular Methods; Proposed Guideline. This document provides
guidelines for a quality proficiency testing program including reliable databases; design control in the choice
of materials and analytes; good manufacturing processes; documentation procedures; complaint handling;
corrective and preventive action plans; and responsive timing of reports.
©
Active Membership
(as of 1 July 2005)
Sustaining Members FDA Center for Devices and LifeScan, Inc. (a Johnson & Johnson California Pacific Medical Center
Radiological Health Company) Cambridge Memorial Hospital
Abbott Laboratories FDA Center for Veterinary Medicine Medical Device Consultants, Inc. (Cambridge, ON, Canada)
American Association for Clinical FDA Division of Anti-Infective Merck & Company, Inc. Canterbury Health Laboratories
Chemistry Drug Products Micromyx, LLC (New Zealand)
Bayer Corporation Iowa State Hygienic Laboratory National Pathology Accreditation Cape Breton Healthcare Complex
BD Maryland Dept. of Health & Mental Advisory Council (Australia) (Nova Scotia, Canada)
Beckman Coulter, Inc. Hygiene Nippon Becton Dickinson Co., Ltd. Carilion Consolidated Laboratory
bioMérieux, Inc. Massachusetts Department of Public Nissui Pharmaceutical Co., Ltd. (VA)
CLMA Health Laboratories Novartis Institutes for Biomedical Carolinas Medical Center (NC)
College of American Pathologists National Center of Infectious and Research Cathay General Hospital (Taiwan)
GlaxoSmithKline Parasitic Diseases (Bulgaria) Olympus America, Inc. Central Laboratory for Veterinarians
Ortho-Clinical Diagnostics, Inc. National Health Laboratory Service Optimer Pharmaceuticals, Inc. (BC, Canada)
Pfizer Inc (South Africa) Ortho-Clinical Diagnostics, Inc. Central Ohio Primary Care
Roche Diagnostics, Inc. National Institute of Standards and (Rochester, NY) Physicians
Technology Ortho-McNeil Pharmaceutical Central Texas Veterans Health Care
Professional Members National Pathology Accreditation (Raritan, NJ) System
Advisory Council (Australia) Oxoid Inc. Centro Diagnostico Italiano (Milano,
American Academy of Family New York State Department of Paratek Pharmaceuticals Italy)
Physicians Health Pfizer Animal Health Chang Gung Memorial Hospital
American Association for Clinical Ontario Ministry of Health Pfizer Inc (Taiwan)
Chemistry Pennsylvania Dept. of Health Powers Consulting Services Children’s Healthcare of Atlanta
American Association for Saskatchewan Health-Provincial Predicant Biosciences (GA)
Respiratory Care Laboratory Procter & Gamble Pharmaceuticals, Children’s Hospital (NE)
American Chemical Society Scientific Institute of Public Health; Inc. Children’s Hospital Central
American Medical Technologists Belgium Ministry of Social QSE Consulting California
American Society for Clinical Affairs, Public Health and the Radiometer America, Inc. Children’s Hospital & Clinics (MN)
Laboratory Science Environment Radiometer Medical A/S Childrens Hospital of Wisconsin
American Society for Microbiology Reliance Life Sciences Children’s Hospital Medical Center
American Society of Hematology Industry Members Replidyne (Akron, OH)
American Type Culture Collection, Roche Diagnostics GmbH Chinese Association of Advanced
Inc. AB Biodisk Roche Diagnostics, Inc. Blood Bankers (Beijing)
Assn. of Public Health Laboratories Abbott Diabetes Care Roche Diagnostics Shanghai Ltd. Christus St. John Hospital (TX)
Assoc. Micro. Clinici Italiani- Abbott Laboratories Roche Laboratories (Div. Hoffmann- City of Hope National Medical
A.M.C.L.I. Acrometrix Corporation La Roche Inc.) Center (CA)
British Society for Antimicrobial Advancis Pharmaceutical Roche Molecular Systems Clarian Health - Methodist Hospital
Chemotherapy Corporation Sanofi Pasteur (IN)
Canadian Society for Medical Affymetrix, Inc. Sarstedt, Inc. CLSI Laboratories (PA)
Laboratory Science - Société Ammirati Regulatory Consulting Schering Corporation Community College of Rhode Island
Canadienne de Science de Anna Longwell, PC Schleicher & Schuell, Inc. Community Hospital of Lancaster
Laboratoire Médical A/S ROSCO SFBC Anapharm (PA)
Canadian Standards Association AstraZeneca Pharmaceuticals Streck Laboratories, Inc. Community Hospital of the
Clinical Laboratory Management Axis-Shield POC AS SYN X Pharma Inc. Monterey Peninsula (CA)
Association Bayer Corporation - Elkhart, IN Sysmex Corporation (Japan) CompuNet Clinical Laboratories
COLA Bayer Corporation - Tarrytown, NY Sysmex Corporation (Long Grove, (OH)
College of American Pathologists Bayer Corporation - West Haven, IL) Covance Central Laboratory
College of Medical Laboratory CT TheraDoc Services (IN)
Technologists of Ontario BD Theravance Inc. Creighton University Medical Center
College of Physicians and Surgeons BD Diabetes Care Thrombodyne, Inc. (NE)
of Saskatchewan BD Diagnostic Systems THYMED GmbH Detroit Health Department (MI)
ESCMID BD VACUTAINER Systems Transasia Engineers DFS/CLIA Certification (NC)
International Council for Beckman Coulter, Inc. Trek Diagnostic Systems, Inc. Diagnostic Accreditation Program
Standardization in Haematology Beckman Coulter K.K. (Japan) Vicuron Pharmaceuticals Inc. (Vancouver, BC, Canada)
International Federation of Bio-Development S.r.l. Wyeth Research Diagnósticos da América S/A
Biomedical Laboratory Science Bio-Inova Life Sciences XDX, Inc. (Brazil)
International Federation of Clinical International YD Consultant Dianon Systems (OK)
Chemistry Biomedia Laboratories SDN BHD YD Diagnostics (Seoul, Korea) Dr. Everett Chalmers Hospital (New
Italian Society of Clinical bioMérieux, Inc. (MO) Brunswick, Canada)
Biochemistry and Clinical Biometrology Consultants Trade Associations Duke University Medical Center
Molecular Biology Bio-Rad Laboratories, Inc. (NC)
Japanese Committee for Clinical Bio-Rad Laboratories, Inc. – France AdvaMed Dwight David Eisenhower Army
Laboratory Standards Bio-Rad Laboratories, Inc. – Plano, Japan Association of Clinical Medical Center (GA)
Joint Commission on Accreditation TX Reagents Industries (Tokyo, Japan) Emory University Hospital (GA)
of Healthcare Organizations Blaine Healthcare Associates, Inc. Enzo Clinical Labs (NY)
National Academy of Clinical Bristol-Myers Squibb Company Associate Active Members Florida Hospital East Orlando
Biochemistry Cepheid Focus Technologies (CA)
National Association of Testing Chen & Chen, LLC 82 MDG/SGSCL (Sheppard Focus Technologies (VA)
Authorities - Australia Chiron Corporation AFB,TX) Foothills Hospital (Calgary, AB,
National Society for The Clinical Microbiology Institute Academisch Ziekenhuis -VUB Canada)
Histotechnology, Inc. Comprehensive Cytometric (Belgium) Franciscan Shared Laboratory (WI)
Ontario Medical Association Quality Consulting ACL Laboratories (WI) Fresno Community Hospital and
Management Program-Laboratory Copan Diagnostics Inc. All Children’s Hospital (FL) Medical Center
Service Cosmetic Ingredient Review Allegheny General Hospital (PA) Gamma Dynacare Medical
RCPA Quality Assurance Programs Cubist Pharmaceuticals Allina Health System (MN) Laboratories (Ontario, Canada)
PTY Limited Cumbre Inc. American University of Beirut Gateway Medical Center (TN)
Sociedad Espanola de Bioquimica Dade Behring Inc. - Cupertino, CA Medical Center (NY) Geisinger Medical Center (PA)
Clinica y Patologia Molecular Dade Behring Inc. - Deerfield, IL Anne Arundel Medical Center (MD) General Health System (LA)
Sociedade Brasileira de Analises Dade Behring Inc. - Glasgow, DE Antwerp University Hospital Guthrie Clinic Laboratories (PA)
Clinicas Dade Behring Inc. - Marburg, (Belgium) Hagerstown Medical Laboratory
Taiwanese Committee for Clinical Germany Arkansas Department of Health (MD)
Laboratory Standards (TCCLS) Dade Behring Inc. - Sacramento, CA Associated Regional & University Harris Methodist Fort Worth (TX)
Turkish Society of Microbiology David G. Rhoads Associates, Inc. Pathologists (UT) Hartford Hospital (CT)
Diagnostic Products Corporation Atlantic Health System (NJ) Headwaters Health Authority
Government Members Digene Corporation AZ Sint-Jan (Belgium) (Alberta, Canada)
Eiken Chemical Company, Ltd. Azienda Ospedale Di Lecco (Italy) Health Network Lab (PA)
Armed Forces Institute of Pathology Elanco Animal Health Barnes-Jewish Hospital (MO) Health Partners Laboratories (VA)
Association of Public Health Electa Lab s.r.l. Baxter Regional Medical Center High Desert Health System (CA)
Laboratories Enterprise Analysis Corporation (AR) Highlands Regional Medical Center
BC Centre for Disease Control F. Hoffman-La Roche AG Baystate Medical Center (MA) (FL)
Caribbean Epidemiology Centre Gavron Group, Inc. Bbaguas Duzen Laboratories Hoag Memorial Hospital
Centers for Disease Control and Gen-Probe (Turkey) Presbyterian (CA)
Prevention Genzyme Diagnostics BC Biomedical Laboratories (Surrey, Holy Cross Hospital (MD)
Centers for Medicare & Medicaid GlaxoSmithKline BC, Canada) Hôpital Maisonneuve - Rosemont
Services Greiner Bio-One Inc. Bo Ali Hospital (Iran) (Montreal, Canada)
Centers for Medicare & Medicaid Immunicon Corporation Bon Secours Hospital (Ireland) Hôpital Saint-Luc (Montreal,
Services/CLIA Program Instrumentation Laboratory Brazosport Memorial Hospital (TX) Quebec, Canada)
Chinese Committee for Clinical International Technidyne Broward General Medical Center Hospital Consolidated Laboratories
Laboratory Standards Corporation (FL) (MI)
Commonwealth of Pennsylvania I-STAT Corporation Cadham Provincial Laboratory Hospital de Sousa Martins (Portugal)
Bureau of Laboratories Johnson and Johnson Pharmaceutical (Winnipeg, MB, Canada) Hospital for Sick Children (Toronto,
Department of Veterans Affairs Research and Development, L.L.C. Calgary Laboratory Services ON, Canada)
Deutsches Institut für Normung K.C.J. Enterprises (Calgary, AB, Canada) Hotel Dieu Grace Hospital (Windsor,
(DIN) LabNow, Inc. ON, Canada)
Huddinge University Hospital Medical University of South Rhode Island Department of Health The Permanente Medical Group
(Sweden) Carolina Laboratories (CA)
Humility of Mary Health Partners Memorial Medical Center Robert Wood Johnson University Touro Infirmary (LA)
(OH) (Napoleon Avenue, New Orleans, Hospital (NJ) Tri-Cities Laboratory (WA)
Hunter Area Health Service LA) Sahlgrenska Universitetssjukhuset Tripler Army Medical Center (HI)
(Australia) Methodist Hospital (Houston, TX) (Sweden) Truman Medical Center (MO)
Hunterdon Medical Center (NJ) Methodist Hospital (San Antonio, St. Alexius Medical Center (ND) Tuen Mun Hospital (Hong Kong)
Indiana University TX) St. Agnes Healthcare (MD) UCLA Medical Center (CA)
Innova Fairfax Hospital (VA) Middlesex Hospital (CT) St. Anthony Hospital (CO) UCSF Medical Center (CA)
Institute of Medical and Veterinary Montreal Children’s Hospital St. Anthony’s Hospital (FL) UNC Hospitals (NC)
Science (Australia) (Canada) St. Barnabas Medical Center (NJ) Unidad de Patologia Clinica
International Health Management Montreal General Hospital (Canada) St. Christopher’s Hospital for (Mexico)
Associates, Inc. (IL) National Healthcare Group Children (PA) Union Clinical Laboratory (Taiwan)
Jackson Health System (FL) (Singapore) St-Eustache Hospital (Quebec, United Laboratories Company
Jacobi Medical Center (NY) National Serology Reference Canada) (Kuwait)
John H. Stroger, Jr. Hospital of Cook Laboratory (Australia) St. John Hospital and Medical Universita Campus Bio-Medico
County (IL) NB Department of Health & Center (MI) (Italy)
Johns Hopkins Medical Institutions Wellness (New Brunswick, St. John Regional Hospital University of Chicago Hospitals
(MD) Canada) (St. John, NB, Canada) (IL)
Kaiser Permanente (MD) The Nebraska Medical Center St. John’s Hospital & Health Center University of Colorado Hospital
Kantonsspital (Switzerland) Nevada Cancer Institute (CA) University of Debrecen Medical
Kimball Medical Center (NJ) New Britain General Hospital (CT) St. Joseph’s Hospital – Marshfield Health and Science Center
King Abdulaziz Medical City – New England Fertility Institute (CT) Clinic (WI) (Hungary)
Jeddah (Saudi Arabia) New York City Department of St. Jude Children’s Research University of Maryland Medical
King Faisal Specialist Hospital Health & Mental Hygiene Hospital (TN) System
(Saudi Arabia) NorDx (ME) St. Mary Medical Center (CA) University of Medicine & Dentistry,
LabCorp (NC) North Carolina State Laboratory of St. Mary of the Plains Hospital NJ University Hospital
Laboratoire de Santé Publique du Public Health (TX) University of MN Medical Center -
Quebec (Canada) North Central Medical Center (TX) St. Michael’s Hospital (Toronto, Fairview
Laboratorio Dr. Echevarne (Spain) North Coast Clinical Laboratory ON, Canada) University of the Ryukyus (Japan)
Laboratório Fleury S/C Ltda. (OH) St. Vincent’s University Hospital The University of the West Indies
(Brazil) North Shore - Long Island Jewish (Ireland) University of Virginia Medical
Laboratorio Manlab (Argentina) Health System Laboratories (NY) Ste. Justine Hospital (Montreal, PQ, Center
Laboratory Corporation of America North Shore University Hospital Canada) University of Washington
(NJ) (NY) San Francisco General Hospital US LABS, Inc. (CA)
Lakeland Regional Medical Center Northern Plains Laboratory (ND) (CA) USA MEDDAC-AK
(FL) Northwestern Memorial Hospital Santa Clara Valley Medical Center UZ-KUL Medical Center (Belgium)
Lawrence General Hospital (MA) (IL) (CA) VA (Tuskegee) Medical Center
Lewis-Gale Medical Center (VA) Ochsner Clinic Foundation (LA) Seoul Nat’l University Hospital (AL)
L'Hotel-Dieu de Quebec (Canada) Onze Lieve Vrouw Ziekenhuis (Korea) Virginia Beach General Hospital
Libero Instituto Univ. Campus (Belgium) Shands at the University of Florida (VA)
BioMedico (Italy) Orlando Regional Healthcare System South Bend Medical Foundation Virginia Department of Health
Lindy Boggs Medical Center (LA) (FL) (IN) Washington Adventist Hospital
Loma Linda Mercantile (CA) Ospedali Riuniti (Italy) South Western Area Pathology (MD)
Long Beach Memorial Medical The Ottawa Hospital Service (Australia) Washington State Public Health
Center (CA) (Ottawa, ON, Canada) Southern Maine Medical Center Laboratory
Long Island Jewish Medical Center Our Lady of the Resurrection Specialty Laboratories, Inc. (CA) Washoe Medical Center
(NY) Medical Center (IL) State of Connecticut Dept. of Public Laboratory (NV)
Los Angeles County Public Health Pathology and Cytology Health Wellstar Health Systems (GA)
Lab (CA) Laboratories, Inc. (KY) State of Washington Department of West China Second University
Maimonides Medical Center (NY) Pathology Associates Medical Health Hospital, Sichuan University (P.R.
Marion County Health Department Laboratories (WA) Stony Brook University Hospital China)
(IN) Phoenix College (AZ) (NY) West Jefferson Medical Center (LA)
Martin Luther King/Drew Medical Piedmont Hospital (GA) Stormont-Vail Regional Medical Wilford Hall Medical Center (TX)
Center (CA) Presbyterian Hospital of Dallas (TX) Center (KS) William Beaumont Army Medical
Massachusetts General Hospital Providence Health Care (Vancouver, Sun Health-Boswell Hospital (AZ) Center (TX)
(Microbiology Laboratory) BC, Canada) Sunnybrook Health Science Center William Beaumont Hospital (MI)
MDS Metro Laboratory Services Provincial Laboratory for Public (ON, Canada) Winn Army Community Hospital
(Burnaby, BC, Canada) Health (Edmonton, AB, Canada) Sunrise Hospital and Medical Center (GA)
Medical Centre Ljubljana (Slovinia) Quest Diagnostics Incorporated (NV) Winnipeg Regional Health
Medical College of Virginia (CA) Swedish Medical Center - Authority (Winnipeg, Canada)
Hospital Quintiles Laboratories, Ltd. (GA) Providence Campus (WA) York Hospital (PA)
Medical Research Laboratories Regional Health Authority Four Tenet Odessa Regional Hospital
International (KY) (NB, Canada) (TX)
Regions Hospital The Children’s University Hospital
Rex Healthcare (NC) (Ireland)
OFFICERS BOARD OF DIRECTORS
Thomas L. Hearn, PhD, Susan Blonshine, RRT, RPFT, FAARC J. Stephen Kroger, MD, MACP
President TechEd COLA
Centers for Disease Control and Prevention
Maria Carballo Jeannie Miller, RN, MPH
Robert L. Habig, PhD, Health Canada Centers for Medicare & Medicaid Services
President Elect
Abbott Laboratories Kurt H. Davis, FCSMLS, CAE Gary L. Myers, PhD
Canadian Society for Medical Laboratory Science Centers for Disease Control and Prevention
Wayne Brinster,
Secretary Russel K. Enns, PhD Klaus E. Stinshoff, Dr.rer.nat.
BD Cepheid Digene (Switzerland) Sàrl
Gerald A. Hoeltge, MD, Mary Lou Gantzer, PhD James A. Thomas

Treasurer Dade Behring Inc. ASTM International
The Cleveland Clinic Foundation
Lillian J. Gill, DPA Kiyoaki Watanabe, MD
Donna M. Meyer, PhD, FDA Center for Devices and Radiological Health Keio University School of Medicine
Immediate Past President
CHRISTUS Health
Glen Fine, MS, MBA,

Executive Vice President
940 West Valley Road T Suite 1400 T Wayne, PA 19087 T USA T PHONE 610.688.0100
FAX 610.688.0700 T E-MAIL: exoffice@clsi.org T WEBSITE: www.clsi.org T ISBN 1-56238-578-X

Clsi MM12 P

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Clsi MM12 P

Uploaded by

Copyright:

Available Formats

MM12-P

Diagnostic Nucleic Acid Microarrays;

Comment period ends

Reproduced with permission, from CLSI publication MM12-P—Diagnostic Nucleic Acid

Copyright ©2005. Clinical and Laboratory Standards Institute.

Area Committee on Molecular Methods

Subcommittee on Molecular Methods for Microarrays

Committee Membership........................................................................................................................ iii

Foreword .............................................................................................................................................. vii

The Quality System Approach............................................................................................................104

Related CLSI/NCCLS Publications ....................................................................................................105

Invitation for Participation in the Consensus Process

Diagnostic Nucleic Acid Microarrays; Proposed Guideline

2.1 Diagnostic Microarrays

2.2 Diagnostic Utility

2.3 Advantages and Disadvantages

2.4 Ethical, Legal, and Social Considerations1

2.5 Special Issues for Application of Microarray Technologies to Diagnosis

(mutation/polymorphism/signature is present or absent when disease or organism is present, or vice versa)

In this document, the following definitions of terms are used:

diagnostic/confirmatory testing – testing generally performed to evaluate the genetic status of

gel electrophoresis – separation of molecules in an electric field within a matrix of agarose or

imprecision – dispersion of independent results of measurements obtained under specified conditions;

phenotype – the observed biochemical, physiological, and/or morphological characteristics of an

reference material/reference preparation (RM) – a material or substance, one or more of whose

repeatability {/repeatability of a measuring system/instrument} – the ability of a measuring {system/}

stringency – the degree of specificity in a DNA hybridization or annealing reaction; NOTE 1:

ARMS amplification refractory mutation system

RCA rolling circle amplification

5.1 Solid Supports

5.2 Probe Synthesis and Attachment to Support

5.2.1 Attachment of Presynthesized Probes

5.2.1.1 Physical Attachment Methods

A planar microarray of microelectrodes (e.g., 80-µm2 platinum electrodes at a 200 µm center-to-center

5.2.1.2 Covalent Attachment Methods

Electrochemically directed copolymerization has also been employed to immobilize probes on a

5.2.2 In situ Synthesis of Probes

In situ step-wise conventional oligonucleotide synthesis methods based on phosphoramidite chemistry

5.2.2.1 Physical Barrier

5.2.2.2 Surface Tension

5.2.2.4 Photolithographic Masks

The light-directed synthesis method is usually employed to synthesize oligonucleotides up to 25 bases in

5.2.2.5 Digital Light Processing

5.3 Signal Generation and Detection

5.3.5 Mass Spectrometry

Matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) is effective in desorbing and

5.3.6 Multiplexed Detection

5.3.7 Signal Amplification Reactions

6.1 Nucleic Acid Extraction

6.1.1 DNA Isolation

6.1.1.1 Genomic DNA

6.1.1.2 PCR Product

6.1.2 RNA Isolation

6.1.3 Quality and Quantity of Nucleic Acid

6.1.4 Storage of Nucleic Acids

6.2 Gene Chemistry

6.2.1.1 Amplification Reactions for DNA Variation Analysis

6.2.1.2 Sequence Variant Analyses Within a Single Gene

6.2.1.3 Genome-Wide SNP Analysis

6.2.1.4 Strand Bias Generation

6.2.1.5 Amplification Procedures for Microarray-based Gene Expression Analysis

6.2.2 Reverse Transcription, Labeling, and Amplification of mRNA

6.2.3 Other Enzymatic Reactions Associated With Microarray-based SNP Analyses