You are on page 1of 10

Statistical Analysis of in Vivo Tumor Growth Experiments

Daniel F. Heitjan, Andrea Manni and Richard J. Santen Cancer Res 1993;53:6042-6050.

Updated Version

Access the most recent version of this article at: http://cancerres.aacrjournals.org/content/53/24/6042

Citing Articles

This article has been cited by 7 HighWire-hosted articles. Access the articles at: http://cancerres.aacrjournals.org/content/53/24/6042#related-urls

E-mail alerts Reprints and Subscriptions Permissions

Sign up to receive free email-alerts related to this article or journal. To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at pubs@aacr.org. To request permission to re-use all or part of this article, contact the AACR Publications Department at permissions@aacr.org.

Downloaded from cancerres.aacrjournals.org on April 10, 2012 Copyright © 1993 American Association for Cancer Research

conventionally 5% for type I error and 90% for power. J. ternational Journal of Cancer.e.M. Michigan 48201 JR. at Center for Biostatistics and Epidemiology. because no modeling is being done. F. or quadrupling times. Still others (9. PA 17033. consequently.(CANCER RESEARCH 53./. The most popular method was analyzes to determine whether and how the treatment affects tumor to execute a test at each time point and report all the times where the growth. Hershey. No worthwhile test can be foolproof in the sense of being rate of 5% but is deficient in power. The past three decades have seen the development of classes of the desired power is smaller with a more powerful method. tripling. three (3-5) used the Duncan INTRODUCTION multiple-range test (17) to account for multiple comparisons. GEE.org on April 10. Our review revealed that a variety of methods are in use. DFMO. Its main weakness is that its type I error rate is not available for analyzing such experiments. a-difluoromethylornithine. 11-14) analyzed tumor regrowth or doubling. gressive Downloaded from cancerres. International Journal of Radiation Oncology. European Journal of Cancer. This procedure has the conventional type I error is an effect. Thus the statistical methods designed to avoid these criticisms. The best one can do is to fix the performance at analyzed their data by ANOVA or its rank-based analogues. one randomizes tumor-bearing animals into various treat Table 1 lists these methods along with a brief summary of their ment groups. Consequently the overall chance of signifi on significance tests and their properties. In this article we discuss the statistical methods that are test is significant. without any formal statistical testing. for the power to exceed 90%. likeli hood ratio.data series and permit detailed modeling of growth curves and intraanimal correlation patterns. A more can state which groups are significantly different and possibly rank the powerful method may find significance when a less powerful method groups. note that if there truly are no The standard way of demonstrating treatment effects is to establish that intergroup differences are statistically significant. The methods most commonly used are defi cient in that they have either low power or misleading type I error rates. i. H. multivariate analysis of variance. Our purpose in choice of statistical method. Such preselected levels. an ANOVA3 F test. They use the entire journals that commonly report such studies: Breast Cancer Research and Treatment. A final criticism that applies to all the methods reviewed is that they Just as there can be many ways to measure a biological parameter. and Radiation Research. All the methods find significant differences between the or-difluoromethylornithine dose groups. Heitjan. This procedure is attractive because it is simple and uses all the data. The resulting dataset consists of a series of volumes for each animal. This article must therefore be hereby marked advertisement in accordance with 18 U. illustrating their application with data from a study of the effect of a-difluoromethylornithine on growth of the li 1-20 human breast tumor in nude mice. S. can tangibly affect the efficiency of experimentation and the credibility of results. Detroit. inasmuch as it compares only the always significant when there is an effect and never significant when ends of the curves and may miss real differences at intermediate times. and the second popular strategy is to analyze only the data from the final power is the chance of obtaining a significant result when there truly measurement time. To see this. For a given type I error rate and actual difference. random effect. a Mann-Whitney test. Pennsylvania 17033. and Richard J. and choose n to be just large enough account for censoring. experiment. we surveyed two summer 1992 issues of each of seven leading on an animal as a single multivariate observation. Classical statistics evaluates cance. not all of which are equally efficient. MANOVA. We conclude that the multivariate methods are preferable and present guidelines for their use.e.S. Several authors (1-7) did a separate analysis of tumor volumes at each time point. LR. Cancer Research.¡ State University College of Medicine. 6042 3 The abbreviations used are: ANOVA. The analysis at each time was either a / test. The experimenters who looked at doubling and regrowth times there is none. Of the articles where ANOVA was used. such as the logrank test (18). We call the To determine current statistical practices among in vivo experiment methods "multivariate" because they treat the series of tumor volumes ers. December 15. Wayne State University. analysis of variance. far from being irrelevant. 2012 Copyright © 1993 American Association for Cancer Research . this article is to present a subset of these methods that we find best suited to the analysis of tumor growth experiments. there can be many ways to test generally yield little biological insight. 1 Supported by USPHS Grant CA-40011. We propose a set of multivariate statistical modeling methods that correct these problems. being the chance of significance at any time. accepted 10/12/93. The type I error rate is the chance of obtaining a significant result when there is no effect. One The analysis of alterations in in vivo tumor growth is a powerful tool for studying the effects of potential cancer treatments. which one underlying assumptions and properties. the minimum sample size required to achieve ficult to relate results to underlying mechanisms. Hershey. degrees of freedom. d.. The costs of publication of this article were defrayed in part by the payment of page charges. i.¡and Department of Medicine ¡A. periodically observing tumor volumes.f. generalized estimating equations. Pennsylvania State University College of Medicine. if tumors may fail to double or regrow by the end of the observation increases with the sample size. estimate the size of period. The Prostate. Pennsylvania Department of Medicine. 6042-6050. A tests in terms of type I error rate and power.. 16) analyzed animal survival times but not tumor volumes.. In. the power methods are not applicable if the times are subject to censoring. 2 To whom requests for reprints should be addressed. thus we focus treatment differences. not all of which are equally powerful. The methods Received 3/9/93. Santen Center for Biosiaiisiics and Epidemiology [D. These methods generally have correct type I error rates but suboptimal power. exceeds 5%.2 Andrea Manni. thus substantially improving the effi ciency of testing and reducing sample size requirements.aacrjournals. the conventional 5%. In a typical article (14) fit a Gompertz curve to tumor growth in an untreated control group. Two oth ers (15. there is a 5% chance of significance (a type I error) at each time point.C. but otherwise. indicating all the times at which differences were significant. We selected ar ticles that presented analyses of in vivo tumor growth data and re viewed the statistical methods used. and ABSTRACT We review and compare statistical methods for the analysis of in vivo tumor growth experiments. For this reason it is preferable to use methods that explicitly the effect (often from past data). Section 1734 solely to indicate this fact. 1993] Statistical Analysis of in Vivo Tumor Growth Experiments1 Daniel F. or a Kruskal-Wallis test. it is dif does not. RE/AR. with these methods one a statistical hypothesis. Others (8-11) executed tests only at the final measurement time or the final time when a substantial fraction of the animals were alive. but recommended sample sizes for a subsequent study are much smaller with the multivariate meth ods. Thus a common method for determin ing the sample size is to fix the type I error rate.

If there is a dose effect. measuring tumor volume on days 0 (base line)..aacrjournals.I") takes the animal's vector of log tumor volumes to be the unit of data. We size-matched mice bearing established tumors to one of six where Y is the matrix of observed log volumes. and 16 posttreatment. The matrix XG consists of indicator variables denoting group membership. The matrix BM has as many columns as there are measurement times. analo gous to the XM matrix in the MANOVA model of Equation A. e.. Chap. This model (23) (see Appendix Section A. 1%. although a noncentral F approximation is available (22). The growth curve model is not a special case of the general multivariate linear model. Tumors from the BT-20 cell line were established in 4-6-week-old ovariectomized athymic Ncr-/iu mice (National Cancer Institute. and e is a matrix of random errors. Bethesda. Mathematically. for example. PG is a withinindividuals design matrix.g. We illustrate the methods by applying them to previously pub lished data on the effect of DFMO. 2012 Copyright © 1993 American Association for Cancer Research . we assume in this section that the loga rithmic transformation is appropriate. and 3% in drinking water. another on day 6 (a pattern of effects called a doseby-time interaction). the rows of which are independent and multivariate normal. In the BT-20 data. MD) by injecting 5 x IO6 cells resuspended randomized in 0. One can represent a number of tumor growth models with Equation A by appropriate selection of XM. the model is Y = X„BU e + Y = XrBrP. 10. sup- (A) 6043 Downloaded from cancerres. If the test is significant one can proceed to univariate tests at each time: the require ment of a significant MANOVA pretest guarantees that the type I error rate is preserved at 5%. thus animals for which any volume measurements are missing are excluded from the analysis. MATERIALS AND METHODS Experimental Methods: The BT-20 Experiment The objective of the experiment (20) was to determine whether hormoneindependent human breast cancer cells growing in nude mice manifest sensi tivity to the polyamine-biosynthetic inhibitor DFMO. but potentially sensitive to assumptions Includes all cases are not new to statistics or even to cancer research (see Ref.org on April 10. 14. If the volumes are not normal they can often be made so by transformation. al though the coefficients (the rows of BG) may differ from group to group. XG is a between-animals design matrix. e. To test this. although evidently they are not well known to in vivo experiment ers.2 and A. and height (h) of the tumors with a Jamison caliper and calculated volume from the hemiellipsoid formula Y = TTlwh/6 Statistical Methods In this section we describe three multivariate methods for analyzing tumor growth data (for details see the "Technical Appendix"). 19. the difference between the 0% mean and the 0. and the unknown variance-covariance matrix is the same for all animals in the population. 0. It can be executed in the GLM pro cedure of the SAS System (SAS Institute. which we test using the Hotelling-Lawley trace test (21). it assumes that the animals are independent but that the observations within an animal may be correlated.TUMOR GROWTH EXPERIMENTS: STATISTICAL ANALYSIS Table 1 Methods for statistical analysis of in vivo tumor growth data Data used Volumes at all times Analysis method ANOVA or Kruskal-Wallis. Its power function is complicated. on the growth of BT-20 human breast cancer cells in nude mice.5%.. width (w). 8). we translate it into a linear hypothesis about the coefficient matrix BM (Appendix Equations A.g. The underlying mean vec tor is the same for all animals in a treatment group.25 ml medium into two mammary fat pads per mouse. a polyamine biosynthetic inhibitor. It states that in each treat ment group the data follow the same kind of curve (specified by PG). in SAS. Inc.3). with each column representing the mean log volume for the five groups at that time. XM is a design matrix.: + € (B) where Y is the matrix of observed log volume data. A popular approach is to compute a set of regression coefficients from each ani mal and analyze these by MANOVA. the test requires complete data on all animals. As commonly practiced. 2%. The Multivariate Growth Curve Model. with the random errors of measurements within the same animal possibly correlated. Cary. Note that the methods require that the data be normally distributed. but more powerful when growth curve is correct Entire curve MANOVA Multivariate growth-curve analysis Regression with RE/AR errors Most powerful. The MANOVA Model. but it can be made so by transforming the log volume data. BM is a matrix of regression coefficients. This model asserts that a separate ANOVA model obtains at each time. the rows of which are independent and multivariate normal with mean 0. the columns of XM are indicators of dose group membership. between-group differences depend on the measure ment time. We measured the length (/). NC) and other commer cial statistical programs. BG is a matrix of regression coefficients. The multivariate linear model (21) (see "Appendix Section A. We continued treatment until the mice in the 0% group had to be sacrificed because of large tumor burden. All three correct the main flaws of the currently popular methods by achieving their nominal type I error rates and using the entire volume series. In it. and e is a matrix of random errors. 3. repeat until significance ANOVA (I test) Independence Normality Same variance in all treatment groups Independence Independence Proportional hazards Independence Normality Same variance in all treatment groups Independence Normality Same variance in all treatment groups Growth curve Independence Normality Same variance in all groups and at all times RE/AR correlations Growth curve Key assumptions Critique Inflated type I error rate Volumes at final time Suboptimal power Sensitive to normality Kruskal-Wallis (Mann-Whitney) Doubling times Logrank test Suboptimal power Suboptimal power Sensitivity to hazards assumption Excludes cases having missing values Suboptimal power Same as MANOVA. In other words. Henceforth we refer to this as the MANOVA dose-by-time interaction test.2) assumes that DFMO dose groups: 0% (control). 7. The model we consider here is called the MANOVA model.5% mean would be one thing on day 0.

is the matrix of is predictors. If of the steps involved in a multivariate analysis. We henceforth refer to this as the RE/AR regression model. we have used LR tests. The key assumptions. Regression Modeling of the BT-20 Data. This suggests that departures from linearity are small relative to the discriminating power of the data. Inc.f. The power can again be computed using the noncentral F (22). unlike the MANOVA and growth curve models. Any good statistical package can do such a test. although computing the within-animal slopes may require some effort. Hypotheses about the regression parameters can be tested in a number of ways. In tumor growth series and other biological data. as described below. although we had some idea that the time sampling was so short that the curves would be nearly linear..aacrjournals. i. An "autoregressive process" is a random process in which the correlation between observations decreases with increasing separation in time. We have fit the RE/AR regression model using a program we have written in Fortran and S-Plus Version 3. Gary. WA). Solid tumor growth curves are often Gompertzian in that they start out nearly log-linear but later flatten at a limiting volume. For the BT-20 data the LR x2 statistic is 3. We had expected to see flattened curves in this experiment. (a) the SDs are equal at all times in all groups. The slope estimates (Table 3) bear least approximately. (b) the errors are first comparing the full linear model to a linear model with a common symmetrical. X. By "random effect" we mean a random difference between animal / and the mean volume for all animals. The comparable sizes of the boxes demonstrate that the log transformation has rendered the SDs nearly equal. for a P value of 0. As Fig. 2012 Copyright © 1993 American Association for Cancer Research .. and a "full quadratic" model with a common intercept and dose-specific linear and quadratic terms. more general covariance models proposed by Diggle (26). and conse quently there is no need to exclude incompletely observed animals. Y¡ the vector of log tumor volumes. growth curve is adequately represented by the proposed parametric form (e. a straight line. The RE/AR model. most often on the low side. We chose the log because it has been selected by many previous dataseis. animals with observations missing are typically deleted.10-A. are that this out. Incorporating these more restrictive assumptions makes the analysis more powerful. The RE/AR model uses all the data. Fig. We thus compared two key models: a "full linear" model with a common intercept and dose-specific slopes (Equations A. the higher is their correlation (27). Los Angeles. Fitting an autoregressive error model is one way to model this phenomenon. although there are occasional outliers (displayed as dots). Having established that the groups differ. 1 suggests.536.org on April 10. 26) (see Appendix Section A3) asserts that X. A class of data-analytic techniques (29) is available for assessing assumptions (i). The growth curve model is similar to MANOVA except that it restricts the mean growth curves to be of the same parametric type (in our example a straight line). The In this section we analyze the BT-20 data and compare the candi semivariogram estimate of the variance of a single log volume is date methods in terms of power and type I error rate. is the vector of random errors.64. if they are justified. and slopes of log-scale data have a simple biological interpretation. To test the RE/AR Assumption iii. Fig. The MANOVA model is the most general of the multivariate models in that it places no restrictions on the shape of the growth curves or the variance matrix. the full model fits significantly better (LR = 74. The whiskers show that the distributions are roughly symmetrical. we sought 6044 Downloaded from cancerres. To give an idea 0. we reestimated the variance and covariance empirically using the semivariogram (26). Note that in P = 3 x IO"15). the tendency for the tumor on one animal to be always larger or always smaller than the mean among all animals. To test for dose effects. I/) shows that growth is roughly log-linear. 2 compares the empirical and best-fitting RE/AR semivariograms from the BT-20 data. Comparison of the Multivariate Models.1 (Statistical Sciences Inc. RESULTS suggesting that the RE/AR assumption is adequate for these data. This model (C) where for animal i. can accommodate this assumption. Seattle. To select an empirical best model we fit a sequence of models.g. we test whether the slopes are the same. it assumes that the standard deviation is the same at each time in all groups. a plot of the covariance of observations 0 units part minus the covariance at A units apart. The RE/AR model differs from the growth-curve model in allowing the dose groups to have common regression coefficients (in our example the intercept). and (d) the slope and intercept. a quadratic or a spline).4 on 5 d. Application to the BT-20 data directed us to a range of possible transformations. See "Discussion" for more on this point. To assess this empirically we conducted a series of LR tests. In the BT-20 study we want to fit a linear model with different slopes but a common intercept. Unlike the other models. both type I error rate and power may suffer. The RE/AR model assumes that the error term is the sum of an animalspecific random effect and an autoregressive process. assumption b we attempt to assess error symmetry rather than nor mality because it is difficult to formally test normality. The plot of mean log growth curves (Fig. Agreement is good. Although the quadratic model is not Gomp ertzian.. and correlation within animals follows the RE/AR pattern. it should be a much better approximation than the linear model. The error is assumed to have a RE/AR variancecovariance matrix. as shown in Table 2.. in good agreement with the model-based estimate of 0. 1 displays boxplots (30) of tumor volume by time for the five dose groups. 1 suggests a monotonie dependence of the modeling is to verify that the data reflect the model assumptions. comparing their fits via significance tests. To assess dose effects one simply tests the hypothesis that all the slopes are equal. It can also be fit in BMDP program 5V (BMDP.e. at growth rate on the DFMO dose. ßK a regression coefficient is vector common to all animals. If not..e. Fig. As in MANOVA. Regression with Random Effects/Autoregressive (25.5 on 4 d. includ ing the log and the square root. (ii) and (iv). NC)]. The first step in RE/AR As noted above. These tools highlight departures from the assumptions and suggest ways to transform the data to make the assumptions more nearly true. we present detailed the fit had been less encouraging we could have adopted one of the results for the RE/AR model. The HotellingLawley test of this hypothesis reduces to a univariate ANOVA F test applied to the within-animal slopes..f. although some adap tations (24) retain all the data. one obtains the linear from the quadratic by setting the time-squared coefficients to 0) and thus can be compared by a LR test. with different coeffi cients in each group. outlined in Table 2. essentially a test for any dose effects.-level (29) Straightness plot Goodness-of-fit significance testsAction takePower to transformationPower transformation Adjust variance model Power transformation Select best-filling model within appropriate family Errors. A first question is the basic shape of curves to use in subsequent modeling. Inc.541. with slope decreasing as dose increases. as a function of A. (c) correlations follow the RE/AR model. and symmetry is presumably the critical feature of normality. Table 2 Key assumptions of the REIAR regression model AssumptionEqual (29)Symmetry plot SD at all times in all groups Symmetrical errors plot (29) RE/AR covariances Semivariogram (26) Shape of the curveDiagnosticSpread-vs. we commonly observe that the shorter the time between measurements.07 and later (SAS Institute. the power of which we approximate with the noncentral \2 (28). and e. 12). CA) and SAS procedure MIXED [available in Versions 6..TUMOR GROWTH EXPERIMENTS: STATISTICAL ANALYSIS pose that in each dose group the log volume curve is a straight line. =*. The full models are nested (i.

Each dataset the BT-20 data using the other multivariate methods and the methods consisted of 5 groups of 15 animals. with F = and a linear model with an additional parameter for active versus 15. 3. 13). and mean log growth curves for all five dose a simpler description of the dependence of the slope on the dose. n = 2. and (b) the relationship between DFMO dose and growth rate is negative and and ANOVA at the final measurement. We also analyzed data sets having the design of the BT-20 experiment. Thus we concluded that. but (c) a simple linear or quadratic function cannot ad the type I error rates are close to 5% as long as model assumptions are nearly correct. The growth curve slope ANOVA was also significant. all the methods placebo. A logrank test comparing rates as the fraction of simulated dataseis where significance was the doubling time distributions was highly significant. . raising questions about the variance matrix is RE/AR (Equation A. In short. We tests for tripling and quadrupling times. groups (/). a quadratic model. 3. Interestingly. We generated data under the creases with increasing dose suggests that DFMO affects one or both assumptions that (a) the true.5% le) 1. 2012 Copyright © 1993 American Association for Cancer Research . 14.0% If) Mean log growth curve 8 - 8 .1 on 4 and 62 d. We estimated type I error and days 7 and beyond by Kruskal-Wallis. including a linear model.aacrjournals. statistical theory tells us that monotonie. analysis of a with intercept and slope equal to estimates from the 0 dose group parallel experiment involving the hormone-dependent cell line MCF-7 under the RE/AR model (from Table 3) and (b) the true underlying showed a nonmonotone dose effect. Comparison with Other Methods of Analysis. of these rates in a dose-dependent manner. We estimated the error rates of the other tests (ANOVA equately describe the dependence of slope on dose.) gave a P value of 5 X 10~8. underlying growth curves are all equal. We applied all the Kruskal-Wallis tests comparing the dose groups at each time point. 15... Comparison of Type I Error Rates.8 with 24 and 222 d.f. None fit as well as the model with arbitrary slopes..0% Ib) 0. each animal having tumor vol gleaned from the cancer literature. interaction test (F = 3. (a) tumor growth rates differ between dose groups.f. The fact that the slope de times) by a Monte Carlo experiment. Boxplots (30) of mean tumor volume (log scale) versus time since the beginning of DFMO therapy. 1.0% le) 3. tests to each simulated dataset at each sample size. and logrank on doubling proliferation rate minus the death rate. The MANOVA dose-by-time tried several possibilities.TUMOR GROWTH EXPERIMENTS: STATISTICAL ANALYSIS la) 0. (P = 1 X IO"8). as were similar attained. 6045 Downloaded from cancerres. with parameters equal to the estimates from the BT-20 data. The slope in a log-linear growth model can be interpreted as the or Kruskal-Wallis at all measurement times. We simulated 1000 independent mechanisms of DFMO action in these cell lines. a 7 10 7 10 Time (Days) Time (Days) Time (Days) Fig. by dose group (a-e). For the multivariate methods tions. 10.org on April 10.0% B §S - 40 flÃBï 8 14 Tu»(Days) 16 03 g 7 o ss10 14 16 Time (Days) û0 û i 7 10 14 16 Time (Days) Id) 2. First we executed ANOVA and umes measured on days 0. and 16. by taking the first The groups differed significantly at days 10 and beyond by ANOVA n units from each group. within the ability of these data to resolve such ques conclude that the dose groups are significantly different. 7.

MANOVA. Among the multivariate models.0EstimatedSlope because. The methods that use all the data and exploit the un derlying model—in this case the RE/AR model and the ANOVA on slopes—have greatest power. our computations assume that all tumors grow and no animals die prematurely. The power of RE/AR is therefore somewhat overstated. the most powerful is generally preferable.TUMOR GROWTH EXPERIMENTS: STATISTICAL ANALYSIS O O ci p J Empirical Model Iï s J ° u Fig. Second. However.. 6046 Downloaded from cancerres. if they were not. the minimum n is 3.0906 ± 0. Although our diagnostic analyses suggest that log-linear growth and RE/AR covariance are reasonable assumptions for the BT-20 data.00580. We computed powers for three tests (ANOVA of data from the final day. the power advantage of RE/AR could be reduced or even reversed. thus a symbol lying outside the two outer solid lines differs significantly from the target value. SO p ^ p 0 cs - p Ö 10 Lag 11 13 14 16 Fig.05 X 0.13). there is no alternative to a univariate analysis for this end point. we were concerned that it would be off somewhat because of the small sample sizes and discreteness in the doubling time distribution.. and the power of the logrank test on doubling times by Monte Carlo simulation.00600. underlying growth curves differed. Fig.95/1000). 3 plots the type I error rates versus the sample size per group n. it is generally true that the use of detailed model information leads to more powerful tests. Table3 Estimated slopes of log tumor growth: BT-20 DFMO experiment DFMOdose (%)0.00640. This time we assumed that (a) the true.51. We find the advantages of the multivariate methods (correct type I error probabili ties. Although our calculations suggest that n = 3/group is adequate. 4 plots the power of each test as a function of n. The least powerful method is ANOVA on data from the final day.0723 ± 0.0061 Table 1 summarizes the methods we have discussed. On the other hand. and (b) the true underlying variance matrix is RE/AR (equation A. in the BT-20 experiment tumor polyamine levels were an important end point. in practice we would not run such a small study. the sample size per group. for MANOVA and the logrank test it is 7. whereas in reality such data losses are com mon and need to be provided for.0411 ± 0. 3 shows that the error rate of the test is never far from 5% and improves rapidly with increasing n. The MANOVA dose-by-time interac tion test and the logrank test on doubling times are less powerful.00. The minimum sample sizes required for 90% power reflect these differences: For RE/AR.. Thus we compared our tests by computing their powers under the BT-20 design for n = 2. the type 1 error rate of testing at each time point (by either ANOVA or Kruskal-Wallis) considerably exceeds 5%. For example. with growth parameters equal to the RE/AR parameter estimates for the BT-20 data (Table 3). We computed the power of the LR test in the RE/AR model by a noncentral x2 approximation (28). These comparisons do not imply that RE/AR is best in every situation. which uses only a small fraction of the data and totally ignores the shape of the curves. and ANOVA on the slopes) by noncentral F approxi mation (22). 15. they do not exploit the lin earity of the log-volume curves.0445 ± 0. with param eters equal to the estimates from the RE/AR model for the BT-20 data.aacrjournals. the power ap proximation for the RE/AR model assumes that the variance param eters are known a priori. which is never the case. although we suspect that the effect is to underestimate sample size by only one or two animals per group.0137 ± ±0. Fig. Comparison of Powers.00580. Finally.02. The solid lines indicate the target rate (5%) ±two Monte Carlo SEs (SE = VO. First. and the capacity to model data rather than just test significance) compelling. Empirical and model semivariograms based on the full regression model. the RE/AR model uses the data most efficiently but requires the most work to apply. and for ANOVA on the final day it is 11. n = 3/group may not give sufficient power for other outcomes of interest. enhanced power. DISCUSSION SE0. and consequently a larger sample size is necessary. Although we expected the logrank test to have an error rate near 5%. thus it is best to use as much of this information as is available. although they use all the data. Among tests having equal type I error rates.org on April 10. 2012 Copyright © 1993 American Association for Cancer Research . . for the slope ANOVA it is 4. 2.113. Because these can be measured only at sacrifice.

The practical price of the superior properties of the multivariate methods is the greater expense of applying them. is a common and potentially serious problem in growth analyses.org on April 10.. Fig.~-i V—x- . A ' Ä '. Thus many in vestigators will need professional statistical assistance to apply these models. • i A /'' y**-'' A SE _JH oo 0 7'/ / ''/ VO 0 /f1 / / /1 /. If the missing data are ignorable. use other shapes to describe the basic growth curve. then it is appropriate to treat the missing data M/ ^/ F *"'"¡ . and often should.TUMOR GROWTH EXPERIMENTS: STATISTICAL ANALYSIS "l O Target Rate (5%) +/. cancer treatments are often administered in pulses that re duce tumor volume transiently before a period of regrowth. and the alternatives available when there are problems.'ÃŒYi. 3. 4.Ja"""* Q Û. Table 2 summarizes the assumptions of the models. In such cases the volume curves are not monotone and are better described by spline (piecewise polynomial) models (31).XL__X~-H—+—i—¥----¿r.-. \y\)/o)\JtJf\\T rower r\«ïrAWUVA.. Missing data.L 1 / // / ••-'/ A/ A/ 17 /' / :' -rt! A"' "\larget D/^ri/Är fQT\Q7 Fig. usually resulting from animal mortality or morbidity. For example.A -A A-' q ö 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sample Size Although we have concentrated on log-linear models. + ~ ~~X-~-.2SE A + X g t (O Logrank Test on Doubling Times ANO VA at Each Time Kruskal-Wallis at Each Time en O x.' rj 0 —/ x —Ä + o —•tÄTorrtof v / 2 3 4 5 6 7 8 9 ••. 2012 Copyright © 1993 American Association for Cancer Research .aacrjournals. We have used this ap proach to analyze data from an in vivo study of androgen priming in prostate cancer (32). Successful execution of the diagnostics and the modeling requires a level of programming skill and statistical judgment beyond what one can acquire in a typical elementary statistics course./: L/ayV rinai interactionANOVA SlopesRE/AR on ModelLogrank on Doubling Times TMAJNUvA »A XT/~\T T A T'. Type I error rate as a function of sample size per group for three methods of analysis. in the sense that the probability of the observed missingness pattern does not depend on the observed or unobserved data values. the diagnostics that address their adequacy.l p O 10 11 12 13 14 15 SampleSize 6047 Downloaded from cancerres. one can... Power as a function of sample size per group for five methods of analysis whose type I error rate is exactly or approximately 5%.

several analysis strategies are avail cient and humane experiments and more valid and comprehensive able. erroneous assumptions about the each animal's tumor volume is measured at six times. The MANOVA model (21) states that have proposed a method for fitting these models that uses all the data. the tumors grow) and a group effect (the volume depends on the dose). with some loss of efficiency if a normal model is actually Y = XGBGPG + € (A. Here N refers to the number of animals. In the first (34.I). curve and missingness parameters simultaneously. suppose for Unfortunately. esis of parallel growth curves (i.4) appropriate. 35). XM is an N X r design When missingness is not ignorable. with five groups underlying distributions. and variance-covariance matrix 2.org on April 10. Although these more (A. This is more difficult to apply treatment groups. the RE/AR model is a special case of the longitudinal-data linear model and of Laird and Ware (39).e. for an animal in group /. no dose-time interaction) has contrast therefore a more rigorous analysis would adjust estimates and tests to matrices account for this selection. Careful application of these methods will lead to more effi When missingness is ignorable. They'th row processes of tumor growth. one assumes that the time of predictors in the design matrix.2) 1 0 0-1 0 The methods we have presented are just a few of the many tech 1 0 0 0 -1.e. The Whichever model one uses. not the under between-animals design matrix. where CM is a "betweenargue that this is unnecessary (38). As indicated above. XM is simply n Some statisticians claim that one should explicitly adjust the analysis copies of the 5X5 identity matrix stacked vertically.aacrjournals. The /th row of y is the vector of log volumes dropout can depend on previous and current values of tumor volume. it is necessary to estimate the growth columns of BM are the model coefficients.1) data. A simple model would assume a time effect serious errors in inferences (37). GEE is robust to errors in the assumed correlation PC is a q X p within-individuals design matrix.. research is needed to elucidate the proper methods for modeling There is a dose-by-time interaction if the time effects differ by group. As has been indicated. we believe that the RE/AR model captures the most important features of tumor growth data. and r to the number of depends on its slope. If the missingness is not In summary. growth curve models in the major statistical packages require bal anced data. N = 5. For example. but other methods are available when such assumptions are restrictive or The Hotelling-Lawley test of this hypothesis (21) can be executed with the GLM procedure in the SAS System. we express the null hypothesis as a if the data are used to select a transformation or model. whereas others general multivariate linear hypothesis CMBMUM= 0. studies. for example. 6048 Downloaded from cancerres. The hypoth choice of log-linear models was based on analyses of the same data. Further theoretical and empirical (i. The /th row of Y is the vector of log volumes for the /th animal and the /th row of X(1 is the between-animals design for animal /.e. The GEE approach (42) involves specifying only the where Y is an W X p matrix of observed log volume data. Vonesh and Carter (24) A. and e is an N X p error growth data and the missingness pattern. This can result in a considerable loss of efficiency if many animals have missing data. our units" contrast matrix and UM is a "within-units" contrast matrix. p to the number of times each tumor is measured. None of the methods cited in this para of BG is the vector of regression coefficients for animals in the y'th treatment graph is available in production versions of major statistical packages. Xc is an N X g shape of the growth curve and the covariance matrix. Cu = (A. and the /th row of XM is the design for the /th animal. The multivariate growth curve model (23) states that bution (41). 2012 Copyright © 1993 American Association for Cancer Research . implementations of the MANOVA and data analyses. In this study. one must model both the matrix. The power can be approximated with the unwarranted. and q to the number of predictors in the within-individual than linear modeling but can give greater insight into the biological design matrix. niques available for analyzing tumor growth data. like the the rows of which are independent and multivariate normal with mean 0 rank and randomization tests. it is reliable but potentially inefficient. and significance methods that can dramatically improve the analysis of tumor growth tests may have type I error rates exceeding their target values (33). and € an N X p error matrix is structure and can handle arbitrary patterns of missing data. To test for a dose-by-time interaction. Because = sitive to the assumed model. p = 6. g to the number of linear rather than linear models (43^6). RE/AR modeling automatically uses all available (A. In the second (36).. This model accommodates more general random effects (such as random slopes) and serial correlation struc tures (including higher-order autoregressions). When there are « animals per group. In terms incompleteness in tumor growth data. group. of model (A. inferences under nonignorable models can be sen the moment that there is « 1 animal in each of the five dose groups. statisticians have developed an array of multivariate ignorable.TUMOR GROWTH EXPERIMENTS: STATISTICAL ANALYSIS as though they were missing by design. one assumes that each animal has its 0 and variance-covariance matrix 'S. with one column for each of the p measurement times. and to obtain it all animals are deleted for which there are TECHNICAL APPENDIX any missing observations. although some good programs are publicly available. To apply the multivariate linear model in the BT-20 example.. where Y is an N X p matrix of observed log volume data. One approach is to base significance tests on the distri noncentral F (22). bution of multivariate rank statistics (40) or the randomization distri A. BG is a g X q matrix of regression coefficients. can cause and one animal per group. own underlying slope and that the probability that the animal is lost p to the number of times each animal is measured. Here N refers to the number of animals Another approach involves modeling tumor growth curves with non studied. BM is an r X p matrix of regression coefficients.2. then parameter estimates may be biased.. and there exist no practical methods for making such ad :1 -1 0 0 0N 10-100 justments. usually very difficult to detect. These tests are reliable under assumptions more general than ours. XM is the 5X5 identity matrix and BM is the 5X6 matrix A second problem with the analysis of tumor growth data involves where the row-//column-) element is the expected tumor volume at they'th time selection of the transformation to normality and the regression model. Two approaches have been matrix the rows of which are independent and multivariate normal with mean proposed.3) general correlation models may be valuable in many applications. for the /th animal. i. Yet it is common practice to ignore this problem. All our multivariate models assume normality and linearity.l. lying distribution.

Jani. 1992. J. Mella. J. REFERENCES 1. Doxorubicin: monoclonal antibody conjugate for therapy of human cervical carcinoma. 1992. T. and Hiestand. Inc. the correlation of measurements at a distance of Ai time units is p'A". Y. and with five groups and one animal per group. Christensson. H. 2. Selective in vivo localization of daunorubicin small unilamellar vesicles in solid tumors. H.org on April 10.. S. J. Lack of therapeutic Cr. and Visser. Kwon.. 13. 1992. C. S. van Geel.. 20: 269-280. J.10-A. T. T. where p is the autocor relation. which one then analyzes using MANOVA. R. C. M... A. Zihlmann.5) The error term in this model is the sum of a random animal effect and an autoregressive process.. Cancer. U. Cancer Res. S.) and BMDP program 5V (BMDP.. Cancer Res. 0N (A. A. S. Breast Cancer Res. Schem.. 1992. = ß/01. Thus in Equation A..= and t/c is the 2X1 matrix (A. and Dahl. M. Szepeshazi. A general linear hypothesis for testing equality of slopes is CcBGUG= 0 where CG is the 4 X 5 contrast matrix :1 -1 10-100 100-10 1 0 0 0 0 0 0N (A. 1992. Yu.+ (A. and Newman. Therapy of an animal model of human gastric cancer using a combination of anti-eroB-2 monoclonal antibodies. K.. 21: 35-45. 1992. X¡ the is log is p¡ q matrix of predictors. Because it reduces the data this test involves some loss of information. 4. N = 5 and g = 5.. B. 7... One can also fit this model in the SAS MIXED procedure (SAS Institute. 52: 3022-3028.4 is not a special case of A. T. 130: 205-210. BG is the 5 X 2 matrix the /th row of which is the intercept and slope for group j. The error term is the sum of a random animal effect and an autoregressive process. Thermochemotherapy with cisplatin or carboplatin in the BT4 rat glioma in vitro and in vivo. p¡ 6. Kasprzyk.6) The transformed data satisfy (A. Hasegawa. Int. hence we call this the RE/AR regression model. A. K. Altermatt. T. and Sebti.. Milovanovic. P.'05'. Assuming tumor volume is log-linear in time.. Cancer Res. Korkut. M-H. Inhibition of growth of PC-82 human prostate cancer line xenografts in nude mice by bombesin antagonist RC-3095 or combination of agonist [D-Trp6]-luteinizing hormone-releasing hormone and somatostatin analog RC-160.. Di Fiore. Forssen. Yano.12. P. Ichikawa. L. Pharmacologie interactions between the resistance-modifying cyclosporine SDZ PSC 833 and etoposide (VP 16-213) enhance in vivo cytostatic activity and toxicity. Morris. 8....TUMOR GROWTH EXPERIMENTS: STATISTICAL ANALYSIS In the BT-20 example. Pop. 9.I. 1992.. H.. S. Inc. Keller. = There are five treatment groups with a common intercept and a separate slope for each group. ß1(3')r (A..e..13) With n animals per group. Y.. T. G. P.. Milovanovic. J. K. D. 1992. Int. M. Milovanovic. Radiât. J. L.. 52: 2771-2776.. Pinski. 52: 2931-2937. Groot. Treat. C. in this case 111111 0 3 7 10 14 16 (A.5. P.. ßK a X r is cXl regression coefficient common to all animals. 52: 3255-3261. in Equation A. C. and blood flow of the Dunning R3327 prostatic adenocarcinoma. One can compute the exact power from the noncentral F (22). We assume that the random effects are normal with mean 0 and variance T2. and Landström.aacrjournals. and Isaacs. S.. J. Inhibition of growth of MCF-7 Mill human breast carcinoma in nude mice by treatment with agonists of LH-RH. but depending on the model and the reduction the loss can be small.. In vivo circumvention of human colon carcinoma resistance to bleomycin. = Xß„ e. Cancer Res. We fit the RE/AR model in a program we wrote in S-Plus (Statistical Sciences. Biol. the error term e. Pinski. Phys.. Cancer Res. Maehara.. R..'*( = 0.. For example. O.. Coulter. is the vector of p¡andom errors. XGis merely n copies of the 5X5 identity matrix stacked vertically.. We test hypotheses about the regression parameters using likelihood-ratio tests. P.10 is multivariate normal with mean 0 and variance-covariance.p2) = IP3P7P10P14P16PJ1p4P7 (A. E. the hypothesis of equal slopes is CKßK 0. K.14) where the rows of S are independent and multivariate normal with mean 0 and variance-covariance matrix F = tP^ÕH.'2'.3.. 11. XG is the 5 X 5 identity matrix. B-C. 51: 274-282. H. 20: 297-310. A. The power of this test can be approximated with the noncentral x2 (28). 20: 187-197. Comaru-Schally.10) where for animal i. A. Deurloo.. J. E..= (A. Yeh. but it can be made so by appropriate reduction of the volume data. Radulovic. Lazo.. again take n = 1 animal in each of the five dose groups. Lyons. ß. J. I-K. Because there are six tumor volumes per animal. C.. J. Int. M-Y. Cancer. V. R.). Sugimachi.. C. the tumor growth curve in each group is described by two parameters. Laissue. Because each animal's tumor volume is measured on six occasions. Radiât. V. Levendag. instead of analyzing Y. Baba.) riJ7 + S2/(1.4. P.. Damber. O. S. 23: 109-114. 5. and Proffitt. Increase in tumor oxygénationand radiosensitivity caused by pentoxifylline.12) 6049 Downloaded from cancerres. S. 14... Song. Model A. Inc... q = 2. and PG is the withinanimal design. Daehlin.. Roffler. 3. A. Donatsch. A. 26) asserts that Y. P. 52: 3306-3309.9) With these contrast matrices the Hotelling-Lawley test reduces to a univariate ANOVAF test on the within-animal slopes. Groot.. Mistry. K. R. 57: 433^(38. 12. 1992. In an autoregressive process.. J.ll) An animal in the dose group DFMO = 2% therefore has design matrix '1000 1000 1000 1000 X. J. M. 10. Y¡ the column vector of p¡ tumor volumes. where = 1-1 10-100 100-10 1 0 0 0 0 0-1. where (A. VarÃ-e. ß/". 2012 Copyright © 1993 American Association for Cancer Research . Groot..Oncol. and Schally. R. R. V. 6. Kusumoto. i. V. and e.Res. A. P.. T.. Bergh.0.8) -1. J-E. H. R.. therefore q = 6—oneparameter (ß0) the intercept and the is five others (ß..2. S. and King. S.3) are the slopes: d ßR (ß„. Effect of microcapsules of luteinizing hormone-releasing hormone antagonist SB-75 and somatostatin analog RC-160 on endocrine status and tumor growth in the Dun ning R-3327H rat prostate cancer model. morphology.. Davies. p = 6. and Schally.ß. W. 1992. P. Hartley-Asp.. Effects of y-methylene progesterone on growth.. Sakaguchi. and Levitt. C. K. Song. R. Lamb. G. Prostate. 1992.. ractically. Prostate..7) where y is a 6 X 1 matrix of Is. a slope and intercept. An autoregressive process is parameterized by p and a scale parameter c. I. With the model of Equations A. Szepeshazi. 1992.1. Prostate. 1992. In the BT-20 study we assume a linear model with dose-specific slopes and a common intercept. Clarke. Yano. J. and Schally. Under the BT-20 design. one ana lyzes Y"" = YH. Flavone acetic acid increases the antitumor effect of hyperthermia in mice. P. this means reducing each P animal's data from a vector of log volumes to an animal-specific intercept and slope. The RE/AR regression model (25. A.). The antitumor effects of quinoline-3-carboxamide linomide on Dunning R-3327 rat pros tatic cancers. Petrow.

D. Hoaglin. D. 1981. H. Am. 45: 939-955. D. Wu. 1983. New York: John Wiley & Sons. J. Chap. C. 1986.. Med.. V. Biometrics. R. and Ramey. E. A. A distribution-free test for tumor-growth curve analyses with application to an animal tumor immunotherapy experiment. F. J. and Stuart. Stat. Maxwell. M. Colmerauer. 1984. 51: 313-326. J.. P. Int. 87: 272-283. 24. 1988. Shoemaker. Assoc. Cancer Lett. Inc. 1986. D. Longitudinal data analysis using generalized linear models. 1992.. 1992. 1980. Kalbfleisch. D.. and Santen. 43. 1990. H. New York: John Wiley & Sons. and Bates. Muller. S. 1979. 1994. Krewski. France: International Agency for Research on Cancer. Heitjan. J. Babbar. 3. Koziol. 43: 617-628. 23. Stat. C. Potthoff. Jones. R. Biometrika. 28A: 1471-1474. Manni. F. J. J. J. Vonesh. Durrleman. 1989.. 39. L. 38: 963-974. F. L. 33. F. Int. 51: 1760-1765. Biometrics. In: D. Stat. M. (With discussion. Morrison.aacrjournals.org on April 10. Understanding Robust and Exploratory Data Analysis.. S.. Evaluation of metastatic human tumor burden and response to therapy in a nude mouse xenograft model using a molecular probe for repetitive human DNA se quences. 31. 3.. and Ware. J. 21. 2. K. Little.. Multivariate Statistical Methods. Vol... Serial correlation in unbalanced mixed models. N. Tukey (eds. O. Ed.. Cancer Res. J. 1992.. 1992. and Prentice. Lindstrom. Laird. New York: John Wiley & Sons. Randomization analysis of the completely randomized design extended to growth and response curves. N.. 41. D. M. 1989. Beneficial effects of androgen primed chemotherapy in the Dunning R3327 G model of prostatic cancer. 1981. Generalized Norton-Simon models of tumor growth. 1982. 46: 673-687. A. 45. N. L. and Pilch. D.. S. Biometrics. 43: in press. Assoc. G. D. 44. The Advanced Theory of Statistics. 40. R. Miller. F. M. D. Mosteller. Stat. Inc. J. D. 36. Am. 27. S. H. Biometrics. R. Mosteller. 1988. Tarane. R. Assoc. 37: 383-390.. 42. 52: 2791-2796... 16. N. 77: 237-250. Gart. A. Stat. 84: 452^159. J. 1991.. and Carter. S. G. 26. D. A generalized multivariate analysis of variance model useful especially in growth curve problems. F... Biometrics. 25. 73: 13-22. London: Charles Griffin. Cressie. L. Lasic. Ramey. M. 11: 1861-1870. New York: McGraw-Hill Book Co. M. 1982. Nonlinear mixed effects models for repeated measures data. F. New York: Springer-Verlag. J. Hoaglin. R. J. Heitjan. F.. and Bailey.). Methods for the analysis of informatively censored longitudinal data. Stat. G. Balaschak. R. 2. S. Mixed-effects nonlinear regression for unbalanced repeated measures. 87: 1209-1226. R. W. M. M. Emerson.. Models for nonresponse in sample surveys. An approach to the analysis of repeated measurements. and J. L. Biometrika. J. Hinkley.. A. A. M.. A. Ed. Am.. L. Understanding Robust and Exploratory Data Analysis. C. 1992. F. 1983..) Appi. J. 7: 305-315. 1989. H. E. The Statistical Analysis of Failure Time Data. 10: 1075-1088. Role of polyamines in the growth of hormone-responsive and -resistant human breast cancer cells in nude mice. L. Cancer. 34. Zerbe.. 1987.. Diggle. E. W. G. M. 22. Kendall. Smythe. Statistical Methods in Cancer Research. Simultaneous Statistical Inference.. M. Lyon.. R. Med. R. Liang... 17. Lancaster.. Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. M.. Ed. Efficient inference for random-coefficient growth curve models with unbalanced data.. and Demers. Cancer.. J. J.. 35. J. Biometrics.TUMOR GROWTH EXPERIMENTS: STATISTICAL ANALYSIS 15. Stat. L. Assoc. 37. 32. 1987. 48: 1-17. P. E. Pharmacokinetics and antitumor activity of epirubicin encapsulated in long-circulating liposomes incorporating a polyethylene glycol-derivatized phospholipid. and Boyd... Proc. Stat. English. J. Y. J. Vol. M. and J. 46th Session. A spatial statistical analysis of tumor growth. D. and Martin. Boxplots and batch comparison. and Reinsei. 1964. Eur. Inst. N. and Carter. R. J. Assoc. Tukey (eds. and Roy.. 46. and Wahrendorf. Bull. Am. 66: 1-9. Power calculations for general linear multivariate models including repeated measures applications. 18. 30. Fukushima. Am. H. 20.. Med. Informative dropout in longitudinal data analysis. F. R. Mayhew. Chap. Diggle. Vonesh. Assoc. J. M. T. gain when low dose rate interstitial radiotherapy is combined with cisplatin in an animal tumour model. C. LaVange.. F. Lee. Badger. and Strenio. G. J. Biomet rics. Random-effects models for longitudinal data. E. 1992. 8. Schluchter. Inc. 2012 Copyright © 1993 American Association for Cancer Research . E. 1992. 8: 551-561. Models for longitudinal data with random effects and AR(1) errors. E. 6050 Downloaded from cancerres. Laird. D. Martel... C. 1979. 44: 959-971... Y.. Am. J. 1991. Stat. and Simon. Wu. K. 4. Chi. and Runger. 2. Stat. L. Flexible regression models with cubic splines. Mathematical aspects of transformation.. M.. The analysis of transformed data.). Missing data in longitudinal studies. and Zeger. Stat. Stat. Cancer Res.. 28. and Hulling. 19. B. R. 1976. Med.. Emerson. and Kenward. 1992. G. 29.. S. 79: 302-320. 74: 215-221. M. 4: 105-122. 51: 302-309.. 38. K. P.