You are on page 1of 8

Int. J.

Cancer: 000, 000000 (2009) ' 2009 UICC

Implementation of a novel microarray-based diagnostic test for cancer of unknown primary


Ryan K. van Laar1, Xiao-Jun Ma2, Daphne de Jong3, Diederik Wehkamp1, Arno N. Floore1, Marc O. Warmoes1, Iris Simon1, Wilson Wang2, Mark Erlander2, Laura J. van t Veer1,3 and Annuska M. Glas1* 1 Agendia BV, SciencePark 406, 1098 XH Amsterdam, The Netherlands 2 BioTheranostics, San Diego, CA 3 Department of Pathology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
Patients with carcinoma of unknown primary (CUP) present with metastatic disease for which the primary site cannot be found, despite extensive standard investigation. Here, we describe the development and implementation of the rst clinically available microarray-based test for this cancer type (CUPPrint), based on 633 individual tumors representing 30 carcinoma and 17 noncarcinoma classes. Tissue of origin prediction for either fresh frozen or parafn-embedded tumor samples is achieved with the use of a custom 8-pack 1.9k microarray and robust classication algorithm. An expression prole of 495 genes was used to predict tumor origin by applying a k-nearest neighbor algorithm. Internal cross-validation and analysis of an independent, previously published, 229-sample dataset revealed that clinically informative predictions were made for up to 94% of samples analyzed. Analysis of 13 previously published CUP specimens yielded predicted tumor origins that supported the clinical suspicion in 12 cases (92%). Microarray proling presents a promising tool to assist in the identication of the primary tumor and might direct a more tailored treatment for CUP patients. ' 2009 UICC Key words: cancer of unknown primary; microarray; molecular diagnostics; gene expression proling; tumor of unknown origin

Approximately 35% of all cancer diagnoses fall into the category of cancer of unknown primary (CUP), indicating the difculty of identifying the origin of metastatic tumors using currently available workup procedures.1,2 As the number of tissue-specic targeted therapies expands, accurate diagnosis of tumor origin becomes increasingly important to maximize each patients chance of receiving the optimal treatment.37 The current standard of diagnostic effort for CUP employs extensive imaging, serum tests, and pathological evaluation, including immunohistochemistry (IHC) with a growing panel of antibodies of different tumor specicity.1,8 However, the success rate of this approach is still limited to 2030%.911 To date, most diagnostic protocols are primarily reliant on microscopy, single gene or protein biomarkers (IHC) and imaging techniques, such as MRI and PET Scan.2,8,9,12 Unfortunately, these techniques all have limitations and may not provide an adequate answer for widely metastasized tumors, poorly differentiated malignancies, rare subtypes or unusual presentations of common cancers.13,14 It has been hypothesized that the information gained from gene expression proling can be used in conjunction with existing diagnostic approaches, helping to conrm or rene the predicted tumor origin in a focused and efcient manner. Several studies using expression microarrays have demonstrated that the expression levels of thousands of genes can be used as a molecular ngerprint to classify a multitude of tumor types.12,1521 Using support vector machine algorithms, Ramaswamy et al.17 demonstrated 78% classication accuracy in classifying 10 carcinoma and 4 non-carcinoma types, whereas Tothill et al.20 achieved 89% with 12 classes of carcinoma and 1 of noncarcinoma. Bloom et al.15 extended the coverage of tumor types to 10 carcinoma and 11 non-carcinomas by combining multiple datasets and developing a neural network classier with 85% accuracy. Tissue specicity of MicroRNA expression has also been recently demonstrated by Rosenfeld et al.,22 with predictive accuPublication of the International Union Against Cancer

racy of 89% for 16 carcinoma and 6 non-carcinoma classes reported. Tothill et al.,20 Varadhachary et al.23 and Ma et al.24 and have described polymerase chain reaction-based CUP assays that use FFPE tissue and classify samples into one of 5, 6 and 39 classes, respectively. Although previous studies have demonstrated that gene expression microarrays hold great promise as a powerful tool for cancer diagnosis, their survey has been limited to, at most, 22 tumor types. We have established a comprehensive gene expression database of human tumors and developed analytical methods and quality control systems that are suitable for routine clinical use. Our gene expression database was generated from both fresh frozen and parafn-xed malignancies, representing 30 diverse classes of carcinomas and 17 classes of non-carcinomas.24 To enable the translation of our database and classier to a clinically available test, we describe the development and validation of a novel 8-pack microarray for routine diagnostic analysis of CUP, using FFPE tissue. A similar platform was recently described as a device for gene expressionbased breast cancer prognosis,25 which was the rst of a new class of in vitro diagnostic multivariate index assays to be cleared by the US Food and Drug Administration (FDA).26 A 15-class classier for CUP developed by Pathwork Diagnostics (Sunnyvale, CA) was recently cleared by the FDA. The reproducibility of the assay, which requires fresh tissue, was described by Dumur et al.21 We also present a number of in silico validations of our classier, including internal cross-validation and analysis of an independent 229 sample cDNA microarray dataset representing 13 broad tumor classes and additionally 13 individual CUP specimens with well-described clinical and pathological history.20 The clinical utility of the specic classier described herein to aid in the management of CUP, as dened by the lack of a denitive diagnosis after standard diagnostic workup procedures, has been comprehensively evaluated by several recent studies.2729

Materials and methods Tumor samples The acquisition, ethical board approval and quality control for the primary and metastatic tumors used in the development and validation of the gene expression database has been described preAdditional Supporting Information may be found in the online version of this article. Ryan K. van Laars current address is: Regeneron Pharmaceuticals Inc, Tarrytown, NY. Marc O. Warmoess current address is: VU University Medical Center, Amsterdam, The Netherlands. The rst two authors contributed equally to this work. *Correspondence to: Agendia BV, SciencePark 406, 1098 XH Amsterdam, The Netherlands. Fax: 131204621505. E-mail: annuska.glas@agendia.com Received 27 August 2008; Accepted after revision 20 March 2009 DOI 10.1002/ijc.24504 Published online 14 April 2009 in Wiley InterScience (www.interscience. wiley.com).

2
24

VAN

LAAR ET AL.

viously. Briey, this database consists of 497 fresh frozen and 146 formalin-xed parafn-embedded tumor samples (n 5 643), which are annotated into 47 distinct tumor classes on the basis of tissue origin and histology. All cases were independently reviewed by 2 pathologists and a secure database was created to store relevant clinical and pathological data. RNA isolation, cRNA labeling and microarray hybridization RNA isolation of fresh frozen and FFPE tumor samples was performed as described previously.24 Cy5-labeled sample cRNAs from the 643 fresh frozen tissues and formalin-xed parafn-embedded tumor samples were hybridized on custom-designed 22k oligonucleotide microarrays using Cy3-labeled human tumor cell line reference RNA, as previously described.24 Fluorescence intensities on scanned images were quantied, values corrected for background nonspecic hybridization, and normalized using Agilent Feature Extraction software version 7.5.1 (Agilent Technologies). Data was further analyzed using R (http://www.r-project.org/) and Matlab version 7.1 (The Mathworks, Natick, MA). Design and validation of the 1.9k 8-pack microarray The datasets and workow used during the development of the CUPPrint microarray is outlined in Figure 1. A novel algorithm called CorTrim (see Supporting Information Materials and Methods) was used with an intergene correlation cutoff of 0.35 to identify a subset of 1,545 differentially expressed genes. For this analysis, 400 (80%) of the frozen tumor samples proled on the 22,000-gene (22k) microarray that were available at the time of this analysis were used. The resulting gene set was combined with 355 control probes to create a custom 1,900 gene (1.9k) oligonucleotide 8pack microarray (Agilent Technologies, Santa Clara, CA). Because the 22k and 1.9k microarrays were manufactured using the same technology and shared the same probe sequences for the 1,545 genes, data generated from these 2 platforms were expected to be highly concordant (mean correlation coefcient of the 1,545

for a series of 28 samples hybridized on both arrays was 0.95; data not shown). Therefore, to prepare the training set for future analysis of test samples proled on the 1.9k microarray, a nal step was performed. After hybridization of the remaining 97 frozen and 146 FFPE tissues, raw data for the 1,545 genes were extracted from the 643 22k microarray proles and renormalized to this common subset. Classier optimization Following creation of the custom 8-pack microarray, a supervised gene selection process (Linear Models for Microarray Data; LIMMA), implemented in R, was used to identify the most differentially expressed subset of the 1,545 genes suitable for tumor origin prediction.30 This step used a subset (80%) of the frozen tumor samples, allowing the selected subset of genes to be independently evaluated in the remaining 20% of the frozen samples, plus the 146 FFPE tumor samples (Fig. 1). The k-nearest neighbor (kNN) algorithm was selected as the method to predict the origin of test specimens based on their molecular similarity to tumors in our classication database.31 By adjusting the number (k) of samples used to identity a test sample, the algorithm can be tuned to achieve optimal performance. Multiple rounds of leave-one-out cross validation (LOOCV), with values of k from 1 to 15 were performed to create a classier with the lowest misclassication rate possible (Supporting Information Fig. 2). To obtain a single prediction of tumor origin when multiple tumor types are present in the k-nearest neighbors of a given test sample, P1 the tumor type with the largest summed weight (i.e., u, where u 5 kNN vector angle) is selected. Because of the complexity of CUP specimens, in routine application of the classier, the identity of 5 neighbors with the largest summed weights are reported, in addition to the specic tumor type identied by this calculation. Independent assessment of classier accuracy Independent in silico validation was performed using a 229sample dataset representing 13 tumor classes (10.5k custom cDNA microarrays) downloaded from EBI ArrayExpress, (AEXP-113).20 This dataset contains both primary and metastatic tumors of known origin. The accuracy of our kNN classier on patients diagnosed with CUP was evaluated using 13 CUP samples, also present. Extensive clinical and pathological data, provided in the Supporting Information of the original publication, was used to evaluate our predictions. Results Gene selection and algorithm development for diagnostic 8-pack microarray We have described previously the development of a whole genome tumor gene expression database representing a wide spectrum of tissue origins and histological subtypes.24 This database represents 497 fresh frozen tissues, and 146 formalin-xed parafn-embedded tumors, of known primary or metastatic origin proled on 22k oligonucleotide arrays. Here, we describe the distillation of this dataset and creation of a custom 1.9k 8-pack microarray, suitable for routine clinical application (see Fig. 1 for workow). Using CorTrim, an intergene correlation threshold of 0.35 was determined to give the least redundant subset of genes from the 22k dataset that could be printed on the 1.9k microarray format. This threshold corresponded to a selection of 1,545 genes (see Supporting Information). An interim cross-validation analysis, using all tumor classes with 5 or more frozen specimens, showed that a kNN classier trained on these 1,545 genes, performed stably across a range of k values. Accuracies ranged from 82% to 85% (k 5 5) (see Supporting Information Fig. 2). Supervised gene selection (limma),30 to select differentially expressed genes with the highest predictive ability, identied a

FIGURE 1 Datasets and work ow. (a) Construction of the 8-pack microarray from the original 22k microarray dataset. (b) Optimization and performance evaluation of the 495-gene classier. (c) Independent in silico evaluation. [Color gure can be viewed in the online issue, which is available at www.interscience.wiley.com.]

MICROARRAY-BASED DIAGNOSTIC TEST FOR CANCER

FIGURE 2 Hierarchical clustering of the nal 495 gene (horizontal axis), 633 sample (vertical axis) database used for prediction of CUP origin. Co-clustering of samples into broad anatomical origins (No. NCI disease types =11) and also carcinoma vs. non-carcinoma classes is evident. Specic tumour types are indicated along the lower edge of the heatmap. See Supplementary Information for a higher resolution version of this gure.

subset of 495, which achieved a maximum training-set LOOCV accuracy of 85% (Fig. 1 and Supporting Information). Comparison of kNN results using either the 1,545 (CorTrim) or the 495 (CorTrim 1 limma) gene set revealed no difference in performance when evaluated on data from frozen tissues. Use of the optimized 495-gene subset did, however, result in a 3% increase in performance for FFPE tissues (1,545 genes: 80%, 495 genes: 83%) (see Supporting Information Table 1). To preserve as much information from the 22k dataset as possible, the 1,545 gene set was combined with 355 Agilent control probes to create a 1.9k 8-pack microarray. Validation experiments showed that data from the 8-pack microarray generated equivalent classication accuracy as the original 22k microarrays (see Supporting Information). Therefore, we effectively distilled the 643sample 22k microarray dataset to a high throughput 1.9k 8-pack platform, with negligible loss of tumor-discriminating ability. Finally, we conducted an additional pathology review of the 643 tumors, focusing on those frequently misclassied by LOOCV during algorithm development. Tumor class names were updated from those originally published to reect standard medical nomenclature. Ten samples were excluded because of poor annotation or suspicion as to accuracy of the original diagnosis. The nal database consisted of 633 individual samples, representing 47 tumor classes (excluding adenocarcinoma of the ovary (unknown subtype)). Functional analysis of differentially expressed genes Using the DAVID 2007 functional annotation tool (http://david.abcc.ncifcrf.gov/), 1,382 DAVID IDs were identied within the genes selected for the 1.9k microarray. A diverse group of gene

families are represented in the genes used by the classier described in this study. Among the most signicantly enriched gene ontologies are those involved in specic tissue functions, including hormonal regulation and organ-specic constituents (cytoskeleton-related genes and characteristic excreted cell products). Genes described in the oncogenic WNT and Hedgehog signaling pathways were also signicantly overrepresented, reecting their differential involvement in a range of tumor classes. Full ontology and pathway analysis results are shown in Supporting Information Table 2. When comparing the individual genes contained in our 495 set to those used by Tothill et al., only 3 genes were shared by both classiers (ESR1, DLX5 and MSX1). Despite this, comparative gene ontology analysis (Fatigo; www.fatigo.org) of the 2 gene sets revealed no difference in gene ontology (GO) category, Biocarta or KEGG or pathway representation, at the 0.05 signicance level. To visualize the complexity of the molecular variation present in the database and relationships between individual tumor types, hierarchical clustering was performed on the nal 633 samples, 495 gene dataset, as shown in Figure 2. Clustering of samples into disease types as described by the National Cancer Institute Thesaurus (http://bioportal.nci.nih.gov/) is apparent. Also notable is the higher level clustering of carcinoma and non-carcinoma categories.

Classier cross-validation After establishing the nal dataset (no. samples 5 633) and classier (kNN, k 5 5), we performed LOOCV on the entire dataset to assess the 495-gene classier. From this, we determined the

VAN

LAAR ET AL.

TABLE I CLASS SENSITIVITY, SPECIFICITY, NEGATIVE AND POSITIVE PREDICTIVE VALUES OF EACH TUMOR TYPE PRESENT IN THE FINAL 633 SAMPLE DATABASE USED FOR ROUTINE CLINICAL ANALYSIS OF CUP (CUPPRINT) Class size Sensitivity (%) Specicity (%) Positive predictive value (%) Negative predictive value (%)

Class

Adenocarcinoma of bile duct Adenocarcinoma of the breast Adenocarcinoma of the cervix Adenocarcinoma of the colon/rectum Adenocarcinoma of the endometrium Adenocarcinoma of the esophageus Adenocarcinoma of the lung (NSCLC) Adenocarcinoma of the ovary (clear cell) Adenocarcinoma of the ovary (endometroid) Adenocarcinoma of the ovary (mucinous) Adenocarcinoma of the ovary (serous) Adenocarcinoma of the ovary (unknown subtype) Adenocarcinoma of the pancreas Adenocarcinoma of the prostate Adenocarcinoma of the small bowel Adenocarcinoma of the stomach Adrenal cortical carcinoma Cholangiocarcinoma Ewings sarcoma Extrahepatic biliary duct/bladder carcinoma Fibrosarcoma Follicular carcinoma of the thyroid Gastrointestinal stromal cell tumor Germ cell tumor of the testis nonseminoma Germ cell tumor of the testis seminoma Glioma Hepatocellular carcinoma Leiomyosarcoma Liposarcoma Malignant brohistiocytoma Malignant lymphoma Malignant melanoma Medullary carcinoma of the thyroid Meningioma Mesothelioma Osteosarcoma Ovarian germ cell tumor Renal cell carcinoma Rhabdomyosarcoma Small cell lung cancer (SCLC) Small intestinal well-differentiated neuroendocrine carcinoma (carcinoid) Squamous cell carcinoma of Lung (NSCLC) Squamous cell carcinoma of the cervix Squamous cell carcinoma of the esophageus Squamous cell carcinoma of the larynx Squamous cell carcinoma of the skin Synoviosarcoma Urothelial carcinoma Mean (all classes) Mean (all classes 5 samples)

1 53 8 38 23 12 24 17 1 4 20 1 26 13 3 12 9 4 2 6 3 14 10 15 13 32 16 14 5 13 25 13 7 8 12 10 9 15 3 10 10 12 14 9 8 18 9 29 13 16

n/a 96 38 97 74 42 79 88 n/a 25 90 n/a 96 92 0 67 100 100 50 33 67 100 90 100 100 97 94 100 80 92 100 100 100 100 100 100 22 93 67 90 80 67 86 67 25 78 100 86 79 83

n/a 100 100 99 100 99 100 99 n/a 99 99 n/a 99 100 100 99 100 100 100 100 100 99 100 100 100 99 100 100 100 99 99 100 100 100 100 100 100 100 100 100 100 99 99 99 100 98 100 99 100 100

n/a 100 100 84 89 56 90 83 n/a 25 82 n/a 83 92 62 100 100 50 67 100 82 100 94 100 86 94 100 100 80 86 93 100 100 92 100 67 88 100 90 100 62 67 46 100 64 100 86 90 86

n/a 100 99 100 99 99 99 100 n/a 99 100 n/a 100 100 99 99 100 100 100 99 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 99 100 100 100 100 99 100 99 99 99 100 99 100 100

The identity of each individual tumor was predicted in a leave-one-out cross-validation analysis, using an expression prole of 495 genes.
TABLE II CLASSIFICATION RESULT EVALUATION SCHEME Result type Description

A B C D

Correct result: the weighted prediction of tumor origin is identical to the known true origin Related subtype: the weighted prediction of tumor origin is a related subtype of the true tumor origin, or indicates the correct anatomical tract. In the case of CUP samples, would direct further diagnostic procedures to a denitive diagnosis. For example, adenocarcinoma of the endometrium being predicted by the classier as adenocarcinoma of the ovary Present in top 5: one or more of the 5 (maximum) tumor types represented in the 5 nearest neighbors is the known origin of the tumor. In the case of CUPs, a result of this type would reduce the number of possible tumor origins/tracts and result in a clinically relevant differential diagnosis Incorrect: prediction missed entirely

sensitivity, specicity, positive and negative predictive value of each tumor type (Table I). As summarized in Tables II and III, 94% of samples analyzed in this nal round of cross-validation were predicted as either: (i)

the correct tumor type; (ii) a related subtype or otherwise clinically useful result; or (iii) a list of no more than 5 possibilities. The results generated by this classier can be interpreted by clinicians in multiple ways, depending on a number of factors spe-

MICROARRAY-BASED DIAGNOSTIC TEST FOR CANCER


TABLE III COMPARISON OF PREDICTION ACCURACY OBTAINED FROM DATABASE LOOCV AND PREDICTION OF AN INDEPENDENT VALIDATION SET Training set LOOCV (n 5 633, 47 classes) Result type No. samples Cumulative percentage (%) In-silico independent validation set (n 5 229, 13 classes) No. samples Cumulative percentage (%)

For 9 of the 13 CUP specimens, our predictions were identical to the original prediction, as summarized in Table IV. Overall, our predictions of tumor origin were consistent with the published clinical and pathological information provided for 12 of the 13 CUPs.

A B C D

512 51 30 40

81 89 94 100

186 6 22 15

82 83 93 100

Discussion This study describes the development and evaluation of a robust gene expressionbased diagnostic assay to assist in the identication of the primary tumor in patients diagnosed with CUP. Several microarray-based studies of CUP have concluded that the technique offers great promise to assist clinicians in their workup procedures for these tumors.12,1520,23 The authors of these studies also describe the compromises necessary to translate the scientic ability to discriminate between multiple tumor types using gene expression proling into a robust, clinically useful test.20,32 To overcome this, we have developed a 1.9k 8-pack microarray, designed for routine diagnostic prediction of tumor origin. We expanded on previous studies by increasing the number of tumor classes to 30 carcinomas and 17 noncarcinomas and demonstrated that accurate predictions of tumor origin can be achieved by using RNA extracted from either frozen or FFPE tissue. Additionally, the robustness of classier was demonstrated with the use of a large, independently generated, validation dataset, including a subset of independently diagnosed CUPs. Results from our classier are presented to the clinician in the form of a weighted, single tumor type, which is conclusive in 89% of cases. In addition, the identities of the top 5 nearest neighbors selected by the kNN algorithm are also reported. This allows the user to consider any other tumor types present as candidate tumor origins, information that may be benecial in conjunction with results from other diagnostic procedures. In the evaluation of results generated from the kNN algorithm, we introduce the concept of considering technically inexact results (i.e., type B and C), as they may offer useful information in the context of complimentary diagnostic procedures. This is particularly important in cases where the predicted tumor origin is a closely related subtype to the actual origin, as was the case in approximately 8% of samples tested by cross-validation. The inclusion of these results in our evaluation of classiers performance is based on its intended use, i.e., to assist in tumor origin identication in routine clinical practice. In many CUP cases, the number of possible tumor origins can make the use of traditional diagnostic methods costly, from both a nancial and time perspective. By using a gene expressionbased test to rapidly reduce the number of possible origins, more conventional procedures, such as IHC or imaging techniques, including PET and MRI, can then be focused on a specic area of the body and/or cell type and a conclusive diagnosis achieved with greater time and cost efciently. Interestingly, only one of the panel of 10 IHC markers used by Horlings et al.29 is represented in the set of 495 genes used by our classier, that coding for the estrogen receptor (ESR1). Approximately 25% of the genes used by our classier are known to be integral to the cell membrane, including Claudin 8 and 10 (CLDN8, CLDN10), prolactin receptor (PRLR) and broblast growth factor receptor 2. Therefore, the majority of genes used are involved in internal cellular processes and not expressed on the surface of tumor cells. The bias towards intracellular localization in the 495 gene set reects an advantage of mRNA-based diagnosis of CUP over IHC alone, which is limited to a smaller set of molecules that are translated and expressed on the cell surface. Analysis of previously published gene expression data generated from 13 CUP specimens resulted in predictions that supported the clinical picture in 12 cases (Table IV). Some differences between the original prediction made by Tothill et al.20 and by our classier were observed and may partially be explained by the broader range of tumor types present in our database. This highlights the need for a comprehensive database when developing

cic to each patient and complimentary investigative procedures available to the clinician. These may include considering each tumor type present in the top 5 neighbors as the possible origin, or an evaluation of the general anatomical theme present (e.g., gastrointestinal or gynecological), in the case where the result may contain multiple different, yet related, tumor classes.27 To reect this, we have devised a system to evaluate the results of CUP classier evaluations, shown in Tables II and III. The impact of tumor differentiation and metastatic origin on prediction accuracy We examined our classication performance according to tumor differentiation status. For the subset of tumor samples with available tumor differentiation information (n 5 191), LOOCV prediction accuracies for well, moderately and poorly or undifferentiated tumors were 92, 84 and 76%, respectively. This indicates a reduction in sample predictability with increasing degree of dedifferentiation (p 5 0.04). However, our gure of 76% accuracy for poorly or undifferentiated tumors is substantially higher than the 30% reported previously for these traditionally difcult to diagnose tumors.17 These comparisons are based on type A results only, however, in daily practice, considering type B and C results may further increase the diagnostic yield of dedifferentiated CUPs. We also tested whether prediction accuracies differ between primary and metastatic tumors. From the nal round of LOOCV, the mean classication accuracies for primary and metastatic tumors were 86 and 82%, respectively. This difference was not statistically signicant (p 5 0.44), in agreement with ndings reported previously.19 Independent in silico classier validation To verify the predictive ability of the nal gene expression database, selected genes and kNN method of class prediction, data from Tothill et al.20 was downloaded from ArrayExpress (AMEXP-27). This dataset was generated from 229 tumor samples of 13 classes, plus 13 samples of well-annotated CUP. All were hybridized on customized cDNA microarrays containing 10,500 genes. Based on UniGene annotation (Build #199), an overlap of 663 genes were identied between the 10.5k cDNA array and the 1,545 genes on our 1.9k 8-pack microarray. Of the 495 gene set used in our nal classier, 227 (46%) were present in this common subset. To compensate for potential differences between the 2 microarray platforms and for a high proportion of missing data points in the cDNA dataset, we retrained our kNN classier on the 663 genes identied in common to both arrays. No data from the validation samples were used in the retraining process. After application of the kNN classier to the Tothill series, we observed the correct tumor type was predicted in 186 cases (type A results; 81.2%). Six type B results (2.6%) and 22 type C results (9.6%) were also observed. Combining all result types, 93% of these independent samples received predictions that would assist in the identication of a primary tumor. These gures are highly comparable to results obtained from the internal LOOCV observed procedure (Table III).

TABLE IV PREDICTED TUMOR ORIGIN FOR A PREVIOUSLY PUBLISHED SERIES OF TRUE CUPS Peter Mac result (score) CupPrint result (score) Comment

Patient ID

Differential at initial presentation

P000780 Lung (70) Breast (48) Adenocarcinoma of the pancreas (40) Adenocarcinoma of the Large Bowel (40)

Lung or thyroid

Lung (82)

Thyroid follicular papillary (100)

P000459

Clinical picture most consistent with lung but uncertain in a young non-smoker

P001169

Pathologist favoured ovary but could not exclude breast, lung, or gastrointestinal primary.

P001328 P001405 Colorectal (100) Lung (71) Breast (100) Lung adeno large cell (40) Adenocarcinoma of the breast (100) Adenocarcinoma of the colon/rectum (40)

Breast (100) Renal (88)

Adenocarcinoma of the breast (100) Kidney (100)

Thyroid tumors not present in Tothill et al database, possibly indicating an incorrect prediction. High condence CupPrint result. Biopsy site: Lymph nodes of axilla or arm. Lower condence prediction, however 3/5 tumors in CupPrint top 5 are of gastrointensital origin. Biopsy site: Pelvic bones, sacrum, coccyx and associated joints. Result of Adenocarcinoma of the pancreas obtained after excluding all soft tissue malignancies from the top 25 neighbours returned by the CupPrint kNN, based on biopsy sites specied. Agreement between predictions. Agreement between predictions. Agreement between predictions.
VAN

P001245

P001382 P000563

Agreement between predictions. Agreement between predictions

LAAR ET AL.

P001698

Ovarian (92)

Adenocarcinoma of the endometrium (40)

Endometrium and ovarian tumors combined as a single class in Tothill et al. CupPrint result is more supportive of the pathologists comments Agreement between predictions. Agreement between predictions, despite Low condence prediction by Tothill et al. Three of the top 5 neighbors in the CupPrint kNN were adenocarcinoma of the lung

P001946 P002971

Lung (60) SCCo / Lung (0)

Adenocarcinoma of the Lung (NSCLC) (80) Adenocarcinoma of the Lung (60)

P002989

Renal (62)

Renal cell carcinoma (100)

Agreement between predictions.

P002864

Ovary, gastric or breast Pathology review favored sarcomatoid renal cell cancer; but renal CT and MRI normal. Pathology review of liver lesion histology favored colon but could not exclude endometrial origin. Renal or Lung Pathologist in 1998 favored recurrent ovary or gastrointestinal primary. Clinical picture raised question of breast. Pathologist thought that morphology strongly suggested nonovarian origin (e.g., gastric, colorectal, pancreas, or lung). Clinical picture consistent with ovarian cancer. Lung or colorecal Pathologist suggested possible primaries included lung, endometrium, breast and gastrointestinal. Clinical pattern of disease suggestive of lung or breast and colon needed to be excluded based on PET nding. Renal favored with differential of adrenal or hepatocellular carcinoma. However, no renal mass identied and histology atypical Skin, renal or hepatocellular. SCCo / Lung (0) Adenocarcinoma of the Lung (40)

Agreement between predictions. Low condence prediction by both tests

Differential diagnosis obtained from Supplementary Information Table 2 of Tothill et al. Result scores are shown in brackets after the predicted tumor type; higher values indicate greater result condence (range: 0100).

MICROARRAY-BASED DIAGNOSTIC TEST FOR CANCER

assays for clinical application, particularly, for a heterogeneous disease like CUP. It should be noted that a suboptimal gene set was used for this analysis, as less than half of the 495 genes used by our classier in routine application (CupPrint) were present on the 10.5k cDNA microarray platform used by Tothill et al.20 Despite this, the results from this independent validation agreed with those obtained from our internal cross-validation. Clinically useful predictions of tumor origin were observed in over 90% of tumors analyzed. The effectiveness of integrating results from our microarraybased classier and traditional diagnostic procedures was recently described by Ismael et al.27 Extensive IHC and other assessments performed on a patient with multiple hepatic metastases, failed to identify a clear tumor origin. A biopsy of one metastasis was analyzed using CupPrint and the result suggested an upper gastrointestinal origin. Based on this information, MRI cholangiopancreatography was used to identify an oval, polypoid image inside the gallbladder. Further analysis revealed that the largest of the liver metastases was physically attached to the gallbladder, suggesting this organ as the most likely primary site of the patients disease. Following the recent development of a breast cancerspecic 8pack diagnostic microarray, integrated quality system and data analysis system,25 we have now established a robust and comprehensive microarray-based classier to assist in the timely diagno-

sis of CUP patients. Rather than attempt to replace traditional diagnostic procedures, gene expression proling may assist to focus these approaches on a substantially reduced number of possibilities, potentially increasing their utility in diagnosing CUPs. As the range of malignancies for which targeted molecular therapies are available continues to expand, we see a growing need for methods to assist clinicians in identifying the primary origin of all tumors, with a high degree of sensitivity and specicity.5,3335 For many CUP patients, access to gene expression based classiers, coupled with robust analytical algorithms, quality control and result reporting systems may facilitate a rapid and more accurate diagnosis. A retrospective analysis of 21 CUP patients indicated that the identication of the primary tumor using microarray analysis may have inuenced patient management in 12 of the cases.28 For CUP patients in general, incorporation of a microarray test may prevent additional anxiety caused by ongoing invasive diagnostic procedures. It may also enable a course of treatment that is tailored toward a specic tumor type or even clinical trial eligibility for some patients. Acknowledgements We thank the authors of the Tothill et al.20 study for making their expression and summarized clinical data publically available.

References
1. 2. 3. 4. Pavlidis N, Briasoulis E, Hainsworth J, Greco FA. Diagnostic and therapeutic management of cancer of an unknown primary. Eur J Cancer 2003;39:19902005. Varadhachary GR, Abbruzzese JL, Lenzi R. Diagnostic strategies for unknown primary cancer. Cancer 2004;100:177685. Pentheroudakis G, Briasoulis E, Pavlidis N. Cancer of unknown primary site: missing primary or missing biology? Oncologist 2007;12:41825. Ben-Yosef R, Starr A, Karaush V, Loew V, Lev-Ari S, Barnea I, Lidawi G, Shtabsky A, Greif Y, Yarden Y, Vexler A. ErbB-4 may control behavior of prostate cancer cells and serve as a target for molecular therapy. Prostate 2007;67:87180. Ciardiello F, Tortora G. Epidermal growth factor receptor (EGFR) as a target in cancer therapy: understanding the role of receptor expression and other molecular determinants that could inuence the response to anti-EGFR drugs. Eur J Cancer 2003;39:134854. Kioi M, Kawakami M, Shimamura T, Husain SR, Puri RK. Interleukin-13 receptor alpha2 chain: a potential biomarker and molecular target for ovarian cancer therapy. Cancer 2006;107:140718. Lebedeva IV, Sarkar D, Su ZZ, Gopalkrishnan RV, Athar M, Randolph A, Valerie K, Dent P, Fisher PB. Molecular target-based therapy of pancreatic cancer. Cancer Res 2006;66:240313. DeYoung BR, Wick MR. Immunohistologic evaluation of metastatic carcinomas of unknown origin: an algorithmic approach. Semin Diagn Pathol 2000;17:18493. van de Wouw AJ, Jansen RL, Grifoen AW, Hillen HF. Clinical and immunohistochemical analysis of patients with unknown primary tumour. A search for prognostic factors in UPT. Anticancer Res 2004;24:297301. Abbruzzese JL, Abbruzzese MC, Lenzi R, Hess KR, Raber MN. Analysis of a diagnostic strategy for patients with suspected tumors of unknown origin. J Clin Oncol 1995;13:2094103. Hainsworth JD, Greco FA. Management of patients with cancer of unknown primary site. Oncology (Huntingt) 2000;14:56374. Dennis JL, Hvidsten TR, Wit EC, Komorowski J, Bell AK, Downie I, Mooney J, Verbeke C, Bellamy C, Keith WN, Oien KA. Markers of adenocarcinoma characteristic of the site of origin: development of a diagnostic algorithm. Clin Cancer Res 2005;11:376672. Burton EC, Troxclair DA, Newman WP III. Autopsy diagnoses of malignant neoplasms: how often are clinical diagnoses incorrect? JAMA 1998;280:12458. Raab SS, Grzybicki DM, Janosky JE, Zarbo RJ, Meier FA, Jensen C, Geyer SJ. Clinical impact and frequency of anatomic pathology errors in cancer diagnoses. Cancer 2005;104:220513. Bloom G, Yang IV, Boulware D, Kwong KY, Coppola D, Eschrich S, Quackenbush J, Yeatman TJ. Multi-platform, multi-site, microarraybased human tumor classication. Am J Pathol 2004;164:916. 16. Giordano TJ, Shedden KA, Schwartz DR, Kuick R, Taylor JM, Lee N, Misek DE, Greenson JK, Kardia SL, Beer DG, Rennert G, Cho KR, et al. Organ-specic molecular classication of primary lung, colon, and ovarian adenocarcinomas using gene expression proles. Am J Pathol 2001;159:12318. 17. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 2001;98:1514954. 18. Shedden KA, Taylor JM, Giordano TJ, Kuick R, Misek DE, Rennert G, Schwartz DR, Gruber SB, Logsdon C, Simeone D, Kardia SL, Greenson JK, et al. Accurate molecular classication of human cancers based on gene expression using a simple classier with a pathological tree-based framework. Am J Pathol 2003;163:198595. 19. Su AI, Welsh JB, Sapinoso LM, Kern SG, Dimitrov P, Lapp H, Schultz PG, Powell SM, Moskaluk CA, Frierson HF Jr, Hampton GM. Molecular classication of human carcinomas by use of gene expression signatures. Cancer Res 2001;61:738893. 20. Tothill RW, Kowalczyk A, Rischin D, Bousioutas A, Haviv I, van Laar RK, Waring PM, Zalcberg J, Ward R, Biankin AV, Sutherland RL, Henshall SM, et al. An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin. Cancer Res 2005;65:403140. 21. Dumur CI, Lyons-Weiler M, Sciulli C, Garrett CT, Schrijver I, Holley TK, Rodriguez-Paris J, Pollack JR, Zehnder JL, Price M, Hagenkord JM, Rigl CT, et al. Interlaboratory performance of a microarray-based gene expression test to determine tissue of origin in poorly differentiated and undifferentiated cancers. J Mol Diagn 2008;10:6777. 22. Rosenfeld N, Aharonov R, Meiri E, Rosenwald S, Spector Y, Zepeniuk M, Benjamin H, Shabes N, Tabak S, Levy A, Lebanony D, Goren Y, et al. MicroRNAs accurately identify cancer tissue origin. Nat Biotechnol 2008;26:4629. 23. Varadhachary GR, Talantov D, Raber MN, Meng C, Hess KR, Jatkoe T, Lenzi R, Spigel DR, Wang Y, Greco FA, Abbruzzese JL, Hainsworth JD. Molecular proling of carcinoma of unknown primary and correlation with clinical evaluation. J Clin Oncol 2008; 26:44428. 24. Ma XJ, Patel R, Wang X, Salunga R, Murage J, Desai R, Tuggle JT, Wang W, Chu S, Stecker K, Raja R, Robin H, et al. Molecular classication of human cancers using a 92-gene real-time quantitative polymerase chain reaction assay. Arch Pathol Lab Med 2006;130:46573. 25. Glas AM, Floore A, Delahaye LJ, Witteveen AT, Pover RC, Bakx N, Lahti-Domenici JS, Bruinsma TJ, Warmoes MO, Bernards R, Wessels LF, Vant Veer LJ. Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics 2006;7:278. 26. Couzin J. Diagnostics. Amid debate, gene-based cancer test approved. Science 2007;315:924.

5.

6. 7. 8. 9.

10. 11. 12.

13. 14. 15.

VAN

LAAR ET AL.

27. Ismael G, de Azambuja E, Awada A. Molecular proling of a tumor of unknown origin. N Engl J Med 2006;355:10712. 28. Bridgewater J, van Laar R, Floore A, Van TVL. Gene expression proling may improve diagnosis in patients with carcinoma of unknown primary. Br J Cancer 2008;98:142530. 29. Horlings HM, van Laar RK, Kerst J-M, Helgason HH, Wesseling J, van der Hoeven JJM, Warmoes MO, Floore A, Witteveen A, LahtiDomenici J, Glas AM, Vant Veer LJ, et al. Gene expression proling to identify the histogenetic origin of metastatic adenocarcinomas of unknown primary. J Clin Oncol 2008;26:443541. 30. Wettenhall JM, Smyth GK. limmaGUI: a graphical user interface for linear modeling of microarray data. Bioinformatics 2004;20:37056. 31. Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini

32. 33. 34. 35.

Z. Tissue classication with gene expression proles. J Comput Biol 2000;7:55983. Pentheroudakis G, Golnopoulos V, Pavlidis N. Switching benchmarks in cancer of unknown primary: from autopsy to microarray. Eur J Cancer 2007;43:202636. Tamura K, Fukuoka M. Molecular target-based cancer therapy: tyrosine kinase inhibitors. Int J Clin Oncol 2003;8:20711. Wu JT, Kral JG. The NF-kappaB/IkappaB signaling system: a molecular target in breast cancer therapy. J Surg Res 2005;123:158 69. Chen LA. [Molecular target therapy: a new method in treatment of lung cancer]. Zhonghua Yi Xue Za Zhi 2006;86:2599602.

You might also like