Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Methods and Applications of Statistics in Clinical Trials, Volume 2: Planning, Analysis, and Inferential Methods
Methods and Applications of Statistics in Clinical Trials, Volume 2: Planning, Analysis, and Inferential Methods
Methods and Applications of Statistics in Clinical Trials, Volume 2: Planning, Analysis, and Inferential Methods
Ebook2,106 pages23 hours

Methods and Applications of Statistics in Clinical Trials, Volume 2: Planning, Analysis, and Inferential Methods

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Methods and Applications of Statistics in Clinical Trials, Volume 2: Planning, Analysis, and Inferential Methods includes updates of established literature from the Wiley Encyclopedia of Clinical Trials as well as original material based on the latest developments in clinical trials. Prepared by a leading expert, the second volume includes numerous contributions from current prominent experts in the field of medical research. In addition, the volume features:

• Multiple new articles exploring emerging topics, such as evaluation methods with threshold, empirical likelihood methods, nonparametric ROC analysis, over- and under-dispersed models, and multi-armed bandit problems

• Up-to-date research on the Cox proportional hazard model, frailty models, trial reports, intrarater reliability, conditional power, and the kappa index

• Key qualitative issues including cost-effectiveness analysis, publication bias, and regulatory issues, which are crucial to the planning and data management of clinical trials

LanguageEnglish
PublisherWiley
Release dateJun 16, 2014
ISBN9781118595961
Methods and Applications of Statistics in Clinical Trials, Volume 2: Planning, Analysis, and Inferential Methods

Read more from N. Balakrishnan

Related to Methods and Applications of Statistics in Clinical Trials, Volume 2

Related ebooks

Medical For You

View More

Related articles

Reviews for Methods and Applications of Statistics in Clinical Trials, Volume 2

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Methods and Applications of Statistics in Clinical Trials, Volume 2 - N. Balakrishnan

    Chapter 2

    Analysis of Variance (ANOVA)

    Jörg Kaufman

    2.1 Introduction

    The development of analysis of variance (ANOVA) methodology has in turn had an influence on the types of experimental research being carried out in many fields. ANOVA is one of the most commonly used statistical techniques, with applications across the full spectrum of experiments in agriculture, biology, chemistry, toxicology, pharmaceutical research, clinical development, psychology, social science, and engineering. The procedure involves the separation of total observed variation in the data into individual components attributable to various factors as well as those caused by random or chance fluctuation. It allows performing hypothesis tests of significance to determine which factors influence the outcome of the experiment. However, although hypothesis testing is certainly a very useful feature of the ANOVA, it is by no means the only aspect. The methodology was originally developed by Sir Ronald A. Fisher [1], the pioneer and innovator of the use and applications of statistical methods in experimental design, who coined the name Analysis of Variance—ANOVA.

    For most biological phenomena, inherent variability exists within the response processes of treated subjects as well as among the conditions under which treatment is received, which results in sampling variability, meaning that results for a subject included in a study will differ to some extent from those of other subjects in the affected population. Thus, the sources of variability must be investigated and must be suitably taken into account when data from comparative studies are evaluated correctly. Clinical studies are in particular a fruitful field for the application of this methodology.

    The basis for generalizability of a successful clinical trial is strengthened when the coverage of a study is as broad as possible with respect to geographical area, patient demographics, and pretreatment characteristics as well as other factors that are potentially associated with the response variables. At the same time, heterogeneity among patients becomes more extensive and conflicts with the precision of statistical estimates, which is usually enhanced by homogeneity of subjects. The methodology of the ANOVA is a means to structure the data and their validation by accounting for the sources of variability such that homogeneity is regained in subsets of subjects and heterogeneity is attributed to the relevant factors. The ANOVA method is based on the use of sums of squares of the deviation of the observations from respective means (→ Linear Model).

    The tradition of arraying sums of squares and resulting F-statistics in an ANOVA table is so firmly entrenched in the analysis of balanced data that extension of the analysis to unbalanced data is necessary. For unbalanced data, many different sums of squares can be defined and then be used in the numerators of F-statistics. providing tests for a wide variety of hypotheses. In order to provide a practically relevant and useful approach, the ANOVA through the cell means model is introduced below.

    The concept of the cell means model was introduced by Searle [2,3]. Hocking and Speed [4], and Hocking [5] to resolve some of the confusion associated with ANOVA models with unbalanced data. The simplicity of such a model is readily apparent: No confusion exists on which functions are estimable, what their estimators are, and what hypotheses can be tested. The cell means model is conceptually easier, it is useful for understanding the ANOVA models, and it is, from the sampling point of view, the appropriate model to use.

    In many applications, the statistical analysis is characterized by the fact that a number of detailed questions need to be answered. Even if an overall test is significant, further analyses are, in general, necessary to assess specific differences in the treatments. The cell means model provides within the ANOVA framework the appropriate model for a correct statistical inference and provides such honest statements on statistical significance in a clinical investigation.

    2.2 Factors, Levels, Effects, and Cells

    One of the principal uses of statistical models is to explain variation in measurements. This variation may be caused by the variety of factors of influence, and it manifests itself as variation from one experimental unit to another. In well-controlled clinical studies, the sponsor deliberately changes the levels of experimental factors (e.g., treatment) to induce variation in the measured quantities to lead to a better understanding of the relationship between those experimental factors and the response. Those factors are called independent and the measured quantities are called dependent variables. For example, consider a clinical trial in which three different diagnostic imaging modalities are used on both men and women in different centers. Table 1 shows schematically how the resulting data could be arrayed in a tabular fashion.

    Table 1: Factors, Levels, and Cells (i j k)

    The three elements used for classifications (center, sex, and treatment) identify the source of variation of each datum and are called factors. The individual classes of the classifications are the levels of the factor (e.g., the three different treatments T1, T2, and T3 are the three levels of the factor treatment). Male and female are the two levels of the factor sex, and center1, center2, and center3 are the three levels of the factor center. A subset of the data present for a combination of one level of each factor under investigation is considered a cell of the data. Thus, with the three factors, center (3 levels), sex (2 levels), and treatment (3 levels), 3 × 2 × 3 = 18 cells numbered by triple indexing i j k exist. Repeated measurements in one cell may exist, which they usually do. Unbalanced data occur when the number of repeated observations per cell nijk are different for at least some of the indices (i, j, k). In clinical research, this occurrence is the rule rather than the exception. One obvious reason could be missing data in an experiment. Restricted availability of patients for a specific factor combination is another often-experienced reason.

    2.3 Cell Means Model

    A customary practice since the seminal work of R. A. Fisher has been that of writing a model equation as a vehicle for describing ANOVA procedures. The cell means model is now introduced via a simple example in which only two treatments and no further factors are considered. Suppose that yir, i = 1, 2, r = 1, …, ni represents a random sample of two normal populations with means μ1 and μ2 and common variance σ². The data point yir denotes the rth observation on the ith population of size ni and its value assumed to follow a Gaussian normal distribution: yir ~ Ni, σ²). The fact that the sizes n1 and n2 of the two populations differ indicates that a situation of unbalanced data exists.

    In linear model form, it is written

    (1)

    equation

    where the errors eir are identically independent normal N(0, σ²) distributed (i.i.d. variables). Note that, a model consists of more than just a model equation: It is an equation such as Equation 1 plus statements that describe the terms of the equation. In the example above, μi is defined as the population mean of the ith population and is equal to the expectation of yir

    (2)

    equation

    The difference yir E(yir) = yir − μi = eir is the deviation of the observed yir value from the expected value E(yir). This deviation, denoted eir, is called the error term or residual error term, and from the introduction of the model above, it is a random variable with expectation zero and variance v(eir) = σ². Note that, in the model above, the variance is assumed to be the same for all eirs. The cell means model can now be summarized as follows:

    (3) equation

    Note that that Equation 3 does not assume explicitly the Gaussian normal distribution. In fact, one can formulate the cell means model more generally by only specifying means and variances. Below, however, it is restricted to the special assumption eir i.i.d. N(0, σ²). It will be shown that the cell means model can be used to describe any of the models that are classically known as ANOVA models.

    2.4 One-Way Classification

    Let us begin with the case of a cell means model in which the populations are identified by a single factor with i levels and ni observations at the ith level for i = 1, …, i.

    2.4.1 Example 1

    A clinical study was conducted to compare the effectiveness of three different doses of a new drug and placebo for treating patients with high blood pressure. For the study, 40 patients were included. To control for unknown sources of variation, 10 patients each were assigned at random to the four treatment groups. As response, the study considered the difference in diastolic blood pressure measurement between baseline (pre-value) and the measurement 4 weeks after administration of treatment. The response measurements yir, sample means i, and sample variances si² are shown in Table 2.

    Table 2: Example 1, Data for a Dose Finding Study

    The cell means model to analyze the data of example 1 is then given as

    (4)

    equation

    where μi defines the population mean at the ith level as E(yir) = μi and σ² is the common variance of the eir.

    2.5 Parameter Estimation

    The μis in Equation 4 are usually estimated by the methods of least squares. The estimator for μi (i = 1, …, i) is then given by

    (5) equation

    (6)

    equation

    where

    equation

    The sample means and sample variances for each treatment and the estimated common error variance σ² are given in Table 2.

    2.6 The R(.) Notation—Partitioning Sum of Squares

    The ANOVA procedure can be summarized as follows: Given n observations yis, one defines the total sum of squared deviations from the mean by

    (7) equation

    with

    equation

    The ANOVA technique partitions the variation among observations into two parts: the sum of squared deviations from the model to the overall mean

    (8) equation

    and the sum of squared deviations from the observed values yi to the model

    (9)

    equation

    These two parts are called the sum of squares because of the model and the residual or error sum of squares, respectively. Thus, Total SS = Model SS + Error SS.

    The Total SS always has the same value for a given set of data because it is nothing other than the sums of squares of all data points relative to the common mean. However, the partitioning into Model SS and Error SS depends on model selection. Generally, the addition of a new factor to a model will increase the Model SS and, correspondingly, reduce the Error SS. When two models are considered, each sum of squares can be expressed as the difference between the sums of squares of the two models. Therefore, the approach related to, given sum of squares allows the comparison of two ANOVA models very easily.

    In the one-way classification in Equation 4, the total sum of squares of each observation is

    (10) equation

    The error sum of squares after fitting the model E(yir) = μi is

    (11) equation

    Ri) = SST – SSE(μi) is denoted the reduction in sum of squares because of fitting the model

    equation

    Fitting the simplest of all linear models, the constant mean model

    equation

    the estimate of E(yir) would be and the error sum of squares results as

    equation

    R(μ) = SST–SSE(μ) is denoted the reduction in sum of squares because of fitting the model E(yir) = μ. The two models E(yir) = μi and E(yir) = μ can now be compared in terms of their respective reductions in sum of squares given by Ri) and R(μ). The difference Ri) – R(μ) is the extent to which fitting E(yir) = μi brings about a greater reduction in sum of squares than does fitting E(yir) = μ.

    Obviously, the R(.) notation is a useful mnemonic for comparing different linear models in terms of the extent to which fitting each accounts for a different reduction in the sum of squares. The works of Searle [2,3], Hocking [5], and Littel et al. [6] are recommended for deeper insight.

    It is now very easy to partition the total sum of squares SST into terms that develop in the ANOVA. Therefore, the identity

    (12)

    equation

    or

    equation

    is used with Ri/μ) = Ri) – R(μ). The separation of Table 3a and Table 3b is appropriate to Equation 12 the first and last line.

    Table 3a: ANOVA—Partitioning the Total Sum of Squares 1-Way Classification, Cell Means Model

    Table 3b: Partitioning the Total Sum of Squares Adjusted for Mean

    Table 3a displays the separation into the components attributable to the model μ in the first line, to the model μi extent μ in the second line, to the error term in the third line, and to the total sum of squares in the last line.

    Table 3b displays only the separation into the two components attributable to the model μi extent μ and, in the second line, to the error term.

    2.7 ANOVA—Hypothesis of Equal Means

    Consider the following inferences about the cell means. This analysis includes the initial null hypothesis of equal means (global hypotheses, all means simultaneous by the same) so-called ANOVA hypothesis contingent with pairwise comparisons, contrasts, and other linear function, comprising either hypothesis tests or confidence intervals.

    In starting off with the model E(yij) = μI, the global null hypothesis

    equation

    is of general interest. A suitable F-statistic can be used for testing this hypothesis H0 (see standard References [2–6]). The F-statistic testing H0 and the sums of squares of Equation 12 are tabulated in Table 3a and Table 3b, columns 3–5.

    The primary goal of the experiment in example 1 was to show that the new drug is effective compared with placebo. At first, one may test the global null hypotheses

    equation

    with E(yir) = μi. Table 3c shows information for testing the hypothesis H0.

    Table 3c: ANOVA Example 1

    Ri/μ) = Ri) − R(μ) = 67.81 is the difference for the respective reductions in sum of squares for the two models E(yir) = μi and E(yir) = μ. From the mean squares Ri/μ)/(I − 1) = 22.60 and the error term (SST – SSE(μi))/(n, – I) = = 1.46 one obtains the F-statistic F = 15.5 for testing the null hypothesis H0 and the probability Pr(F > Fα) < 0.0001. As this probability is less then the type I error α = 0.05, the hypothesis H0 can be rejected in favor of the alternative μi ≠ μj for at least one pair i and j of the four treatments.

    Rejection of the null hypothesis H0 indicates that differences exist among treatments, but it does not show where the differences are located. Investigators’ interest is rarely restricted to this overall test, but rather to comparisons among the doses of the new drug or placebo. As a consequence, multiple comparisons comparing the three classes with placebo are required.

    2.8 Multiple Comparisons

    In many clinical trials, more than two drugs or more than two levels of one drug are considered. Having rejected the global hypothesis of equal treatment means (e.g., when the probability of the F-statistic in Table 3c, last column, is less than 0.05), questions related to picking out drugs that are different from others or determining what dose level is different from the others and placebo have to be addressed. These analyses generally require many (multiple) further comparisons among the treatments in order to detect effects of prime interest to the researcher. The excessive use of multiple significance tests in clinical trials can greatly increase the chance of false-positive findings. A large amount of statistical research has been devoted to multiple-comparison procedures and the control of false-positive results caused by multiple testing. Each procedure usually has the objective of controlling the experiment-wise or family-wise error rate. A multiple test controls the experiment-wise or family-wise multiple level α, if the probability to reject at least one of the true null hypotheses does not exceed α, irrespective of the number of hypotheses and which of them is in fact true.

    For ANOVA, several types of multiple-comparison procedures exist that adjust the critical value of the test statistic, for example, the Scheffé, Tukey, and Dunnett tests, procedures that adjust the comparison-wise P-values (e.g., the Bonferroni-Holm procedure), and the more general closed-test procedures [7,8]. Marcus et al. [9] introduced these so-called closed multiple test procedures, which keep the family-wise multiple level α under control. The closed-test principle requires a special structure among the set of null hypotheses, and it can be viewed as a general tool for deriving a multiple test.

    In general, the use of an appropriate multiple-comparison test for inference concerning treatment comparisons is indicated [10]

    1. to make an inference concerning a particular comparison that has been selected on the basis of how the data have turned out;

    2. to make an inference that requires the simultaneous examination of several treatment comparisons, for example, the minimum effective dose in dose finding studies; and

    3. to perform so-called data dredging, namely, assembling the data in various ways in the hope that some interesting differences will be observable.

    2.9 Two-Way Crossed Classification

    Two basic aspects of the design of experiments are the error control and the optimal structuring of the treatment groups. A general way of reducing the effect of uncontrolled variations on the error of treatment comparisons is grouping the experimental units (patients) into sets of units being alike (uniform) as much as possible. All comparisons are then made within and between sets of similar units [11]. Randomization schemes (e.g., as complete randomization, randomized blocks) are part of the error control of the experimental design.

    On the other hand, the structure of treatments, what factor(s) and factor levels are to be observed is called the treatment design [6]. The factorial treatment design is one of the most important and widely used treatment structures. Thereby one distinguishes between treatment and classification factors. The factorial design can be used with any randomization scheme. Factorial experiments can be compared with the one factor at a time approach, and they have the advantage of giving greater precision in the estimating of overall factor effects. They enable the exploration of interactions between different factors that are being considered and allow an extension of the range of validity of the conclusions by the insertion of additional factors.

    Factorial designs enhance the basis for any generalizability of the trial conclusion with respect to geographical area, patient demographics, and pretreatment characteristics as well as other factors that are potentially associated with the response variable. At the same time, the more extensive heterogeneity among patients conflicts with the precision of statistical estimates, which is usually enhanced by requiring homogeneity of subjects [12]. Two study design strategies are followed:

    1. Stratified assignment of treatments to subjects who are matched for similarity on one or more block factors, such as gender, race, age, initial severity, or on strata, such as region or centers. Separately and independently within each stratum, subjects are randomly assigned to treatment groups in a block randomization design [11]. When a stratified assignment is used, treatment comparisons are usually based on appropriate within-stratum/center differences [13].

    2. A popular alternative to the block randomization design is the stratification of patients according to their levels of prognostic factors after complete randomization in the analysis phase. This alternative strategy, termed post-stratification, leads exactly to the same kind of statistical analysis, but it has disadvantages compared with prestratification. Prestratification guards by design against unlikely but devastating differences between the groups in their distributions on the prognostic factors and in sample sizes within stratum. With prestratification, these will be equal—or least close to equal—by design. With post-stratification due to randomization, they will be equal only in a long-term average sense [14].

    2.10 Balanced and Unbalanced Data

    When the number of observations in each cell is the same, the data are described as balanced data, They typically come from well-designed factorial trials that have been executed as planned. The analysis of balanced data is relatively easy and has been extensively described [15]. When the number of observations in the cells is not uniform, the data are described as unbalanced data.

    This is illustrated by a numerical example. Suppose interest exists in the comparison of two contrast media. T = 1,2, regarding a liver enzyme and a second factor B, with categories B = 1,2, describing the initial severity of the liver impairment (B = 1 not impaired, B = 2 impaired). Table 4 shows the data of a clinical trial designed as a result.

    Table 4: Liver Enzyme (Log Transform)—2-Way Crossed Classification Treatment (T), Liver Impairment (B)

    Let xijk be the kth observation on treatment i for impaired liver at severity j, where i = 1,2, j = 1,2, and k = 1, …, nij, with nij being the number of observations on treatment i and impairment j. For the sake of generality, treatment is referred to as row-factor A with levels i = 1, …, i and liver impairment as column-factor B with levels j = 1, …, j. Note that the data of Table 4 are represented as logarithms yijk = log(xijk) of the liver enzyme data xijk.

    The cell means model is as follows:

    (13)

    equation

    A normal (Gaussian) distribution is assumed for the log-transformed measurements

    equation

    It follows as for the one-way classification that the cell means μij are estimated by the mean of the observation in the appropriate cells (i, j)

    (14) equation

    and the residual error variance is estimated as

    (15) equation

    where

    equation

    and where s = I × J – (number of empty cells). N = ∑i j nij is the total sample size.

    An important difference between balanced and unbalanced data is the definition of a mean over rows or columns, respectively. For example, the row mean of the cell means μij in row i is straightforward

    (16) equation

    and its estimator is

    (17) equation

    with variance estimate

    equation

    For example 2, = 1.22 and = 1.15 is obtained. A different row mean of the cell means μij in row i, sometimes thought to be an interesting alternative to ., is the weighted mean with the weights nij/ni:

    (18) equation

    estimated by the mean over all observation, in row i,

    (19) equation

    with variance estimate

    equation

    For example 2, = 1.16 and = 1.05 is obtained.

    The difference between these two cell mean estimates μi. and μ′i. consists of the fact that μi. is independent from the number of the data points in the cells, whereas ′i. depends on the number of observations in each cell. As the sample sizes of participating centers in a multicenter study are mostly different, a weighted mean over the centers is a better overall estimator for treatment comparisons [13,14,16]. If it can be ascertained that the population in a stratum or block is representative of a portion of the general population of interest, for example, in poststratification, then it is natural to use a weighted mean for an overall estimator of a treatment effect. Many clinical trials result in unbalanced two-way factorial designs. Therefore, it is appropriate to define row or column means respectively as weighted means resulting in different sum of squares.

    If weights other than these two possibilities are wanted (e.g., Cochran–Mantel–Haenszel weights), one may consider a general weighting of the form

    (20) equation

    with ∑j tij = 1 and the estimator = with .

    2.11 Interaction Between Rows and Columns

    Consider an experiment with two factors A (with two levels) and B (with three levels). The levels of A may be thought of as two different hormone treatments and the three levels of B as three different races. Suppose that no uncontrolled variation exists and that observations as Figure 1 are obtained. These observations can be characterized in various equivalent ways:

    Figure 1: Two-way classification, no interaction, 2 rows, 3 columns

    1. the difference between the observations corresponding to the two levels of A is the same for all three levels of B;

    2. the difference between the observations for any two levels of B is the same for the two levels of A; and

    3. the effects of the two factors are additive.

    For levels i and i′ of factor A and levels j and j′ of factor B, consider the relation among means given by

    (21) equation

    If this case holds, the difference in the means for level i and i′ of factor A is the same for levels j and j′ of factor B and, vice versa, the change from level j to level j′ of factor B is the same for both levels of factor A. If Equation 21 holds for all pairs of levels of factor A and all pairs of levels of factor B, or when the conditions (a), (b), or (c) are satisfied, one can say factor A does not interact with factor B: No interaction exists between factor A and B. If the relation in Equation 21 or the conditions a), b), or c) are not satisfied, then an interaction exists between factor A and factor B. Many ways exist in which interaction can occur. A particular method is shown in Figure 2.

    Figure 2: Two-way classification, interaction, 2 rows, 3 columns

    The presence of an interaction can mislead conclusions about the treatment effects in terms of row effects. It is therefore advisable to assess the presence of interaction before making conclusions, which can be done by testing an interaction hypothesis. The test may indicate that the data are consistent with the absence of interaction but may not prove that no real interaction exists.

    2.12 Analysis of Variance Table

    ANOVA tables have been successful for balanced data. They are generally available, widely documented, and-ubiquitously accepted [15]. For unbalanced data, often no unique, unambiguous method for presenting the results exists. Several methods are available but often not as easily interpretable as methods for balanced data. In the context of hypothesis testing, or of arraying sums of squares in an ANOVA format, a variety of sums of squares are often used. The problem in interpreting the output of computer-specific programs is to identify those sums of squares that are useful and those that are misleading. The information for conducting tests for row effects, column effects, and interactions between rows and columns is summarized in an extended ANOVA table. Various computational methods exist for generating sums of squares in an ANOVA table (Table 5) since the work of Yates [17].

    Table 5: Sample Size, Means

    The advantage of using the μij-model notation introduced above is that all μij are clearly defined. Thus, a hypothesis stated in terms of the μij is easily understood. Speed et al. [18], Searle [2, 3], and Pendleton et al. [16] gave the interpretations of four different types of sums of squares computed (e.g., by the SAS, SPSS, and other systems).

    To illustrate the essential points, use the model in Equation 13, assuming all nij > 0. For reference, six hypotheses of weighted means are listed in Table 6 that will be related to the different methods [16,18].

    Table 6: Cell Means Hypotheses

    A typical method might refer to H1, H2, or H3 as main effect of A, row effect. Hypotheses H4 and H5 are counterparts of H2 and H3 generally associated with main effect B, column effect. Hypothesis of no interaction is H6, and it is seen to be common to all methods under the assumption nij < 0. Hypotheses H1, H2, and H3 agree in the balanced case (e.g., if nij = n for all i and j but not otherwise). The hypothesis H3 does not depend on the nij. All means have the same weights 1/j and are easy to interpret. As it states, no difference exists in the levels of the factor A when averaged over all levels of factor B (Equation 16).

    Hypotheses H1 and H2 represent comparisons of weighted averages (Equations 18 and 20) with the weights being a function of the cell frequencies. A hypothesis weighted by the cell frequencies might be appropriate if the frequencies reflected population sizes but would not be considered as the standard hypothesis. Table 7 specifies the particular hypothesis tested by each type of ANOVA sums of squares.

    Table 7: Cell Means Hypotheses Being Tested

    Table 8 shows three different analyses of variance tables for the liver enzyme described in Section 2.10 computed with PROC GLM SAS.

    Table 8: Liver Enzymes—Two-Way Classification with Interaction Term Treatment (T), Impairment (B), Interaction (T*B)

    The hypotheses for the interaction term TxB is given by H6. The test is the same for Type I, II, and III sums of squares. In this example, no interaction exists between treatment and liver impairment, P-value Pr > F = 0.94.

    The hypothesis for main effect A—the treatment comparison—is given by H1, H2, or H3, corresponding to the different weighting means (Table 6 and Table 7).

    The range of the results for the treatment comparison are different (Table 8, source T)

    1. for Type I, the P-value, Pr > F = 0.0049, is less than 0.005 (highly significant)

    2. for Type II, the P-value, Pr > F = 0.074, is between 0.05 and 0.1 (not significant)

    3. for Type III, the P-value, Pr > F = 0.142, is greater than 0.1 (not significant).

    The hypothesis H2 (i.e., Type II) is appropriate for treatment effect in the analysis of this example—a two-way design with unbalanced data.

    No interaction exists between treatment and liver impairment. The different test results for H1, H2, and H3 result from the unbalanced data and factor liver impairment.

    Any rules for combining centers, blocks, or strata in the analysis should be set up prospectively in the protocol. Decisions concerning this approach should always be taken blind to treatment. All features of the statistical model to be adopted for the comparison of treatments should be described in advance in the protocol. Hypothesis H2 is appropriate in the analysis of multicenter trials when treatment differences over all centers [13,14,19] are considered. The essential point emphasized here is that the justification of a method should be based on the hypotheses being tested and not on heuristic grounds or computational convenience.

    In the presence of a significant interaction, the hypotheses of main effects may not be of general interest and more specialized hypotheses might be considered.

    With regard to missing cells, the hypotheses being tested can be somewhat complex for the various procedures or types of ANOVA tables. Complexities associated with those models are simply aggravated when dealing with models for more than two factors. How many factors exist or how many levels each factor has, the mean of the observations in each filled cell, is an estimator of the population mean for that cell. Any linear hypothesis about cell means of any non empty cell is testable; see the work of Searle [Reference 3, pp. 384–415].

    References

    [1] R. A. Fisher, Statistical Methods for Research Workers. Edinburgh: Oliver & Body, 1925.

    [2] S. R. Searle, Linear Models. New York: John Wiley & Sons, 1971.

    [3] S. R. Searle, Linear Models for Unbalanced Data. New York: John Wiley & Sons, 1987.

    [4] R. R. Hocking and F. M. Speed, A full rank analysis of some linear model problems. JASA 1975: 70: 706–712.

    [5] R. R. Hocking, Methods and Applications of Linear Models—Regression and the Analysis of Variance. New York: John Wiley & Sons, 1996.

    [6] R. C. Littel, W. W. Stroup, and R. J. Freund, SAS for Linear Models. Cary, NC: SAS Institute Inc., 2002.

    [7] P. Bauer, Multiple primary treatment comparisons on closed tests. Drug Inform. J. 1993; 27: 643–649.

    [8] P. Bauer, On the assessment of the performance of multiple test procedures. Biomed. J. 1987; 29(8): 895–906.

    [9] R. Marcus, E. Peritz, and K. R. Gabriel, On closed testing procedures with special reference to ordered analysis of variance. Biometrica 1976; 63: 655–660

    [10] C. W. Dunnett and C. H. Goldsmith, When and how to do multiple comparison statistics in the pharmaceutical industry. In: C. R. Buncker and J. Y. Tsay, eds. Statistics and Monograph, vol. 140. New York: Dekker, 1994.

    [11] D. R. Cox, Planning of Experiments. New York: John Wiley & Sons, 1992.

    [12] G. G. Koch and W. A. Sollecito, Statistical considerations in the design, analysis, and interpretation of comparative clinical studies. Drug Inform. J. 1984; 18: 131–151.

    [13] J. Kaufmann and G. G. Koch, Statistical considerations in the design of clinical trials, weighted means and analysis of covariance. Proc. Conference in Honor of Shayle R. Searle, Biometrics Unit, Cornell University, 1996.

    [14] J. L. Fleiss, The Design and Analysis of Clinical Experiments. New York: John Wiley & Sons, 1985.

    [15] H. Sahai and M. I. Ageel, The Analysis of Variance—Fixed, Random and Mixed Models. Boston: Birkhäuser, 2000.

    [16] O. J. Pendleton, M. von Tress, and R. Bremer, Interpretation of the four types of analysis of variance tables in SAS. Commun. Statist.-Theor. Meth. 1986; 15: 2785–2808.

    [17] F. Yates, The analysis of multiple classifications with unequal numbers in the different classes. JASA 1934; 29: 51–56.

    [18] F. M. Speed, R. R. Hocking, and O. P. Hackney, Methods of analysis of linear models with unbalanced data. JASA 1978; 73: 105–112.

    [19] S. Senn, Some controversies in planning and analysing multi-centre trials. Stat. Med. 1998; 17: 1753–1765.

    Chapter 3

    Assessment of Health-Related Quality of Life

    C. S. Wayne Weng

    3.1 Introduction

    Randomized clinical trials are the gold standard for evaluating new therapies. The primary focus of clinical trials has traditionally been evaluation of efficacy and safety. As clinical trials evolved from traditional efficacy and safety assessment of new therapies, clinicians were interested in an overall evaluation of the clinical impact of these new therapies on patient daily functioning and well-being as measured by health-related quality of life (HRQOL). As a result, HRQOL assessments in clinical trials rose steadily throughout the 1990s and continue into the twenty-first century.

    What is HRQOL? Generally, quality of life encompasses four major domains [1]:

    1. Physical status and functional abilities

    2. Psychological status and well-being

    3. Social interactions

    4. Economic or vocational status and factors

    The World Health Organization (WHO) defines health [2] as a state of complete physical, mental, and social well-being and not merely the absence of infirmity and disease. HRQOL focuses on parts of quality of life that are related to an individual’s health. The key components of this definition of HRQOL include (1) physical functioning, (2) mental functioning, and (3) social well-being, and a well-balanced HRQOL instrument should include these three key components. For example, the Medical Outcomes Study Short Form-36 (SF-36), a widely used HRQOL instrument, includes a profile of eight domains: (1) Physical Functioning, (2) Role—Physical, (3) Bodily Pain, (4) Vitality, (5) General Health, (6) Social Functioning, (7) Role—Emotional, and (8) Mental Health. These eight domains can be further summarized by two summary scales: the Physical Component Summary (PCS) and Mental Component Summary (MCS) scales.

    This article is intended to provide an overview of assessment of HRQOL in clinical trials. For more specific details on a particular topic mentioned in this article, readers should consult the cited references. The development of a new HRQOL questionnaire and its translation into various languages are separate topics and are not covered in this article.

    3.2 Choice of HRQOL Instruments

    HRQOL instruments can be classified into two types: generic instruments and disease-specific instruments. The generic instrument is designed to evaluate general aspects of a person’s HRQOL, which should include physical functioning, mental functioning, and social well-being. A generic instrument can be used to evaluate the HRQOL of a group of people in the general public or a group of patients with a specific disease. As such, data collected with a generic instrument allow comparison of HRQOL among different disease groups or against a general population.

    A generic instrument, is designed to cover a broad range of HRQOL issues and it may be less sensitive regarding important issues for a particular disease or condition. Disease-specific instruments focus assessment in a more detailed manner for a particular disease. A more specific instrument allows detection of changes in disease-specific areas that a generic instrument is not sufficiently sensitive to detect. For example, the Health Assessment Questionnaire (HAQ) is developed to measure functional status of patients with rheumatic disease. The HAQ assesses the ability to function in eight areas of daily life: dressing and grooming, arising, eating, walking, hygiene, reach, grip, and activities.

    Table 1 [3–30] includes a list of generic and disease-specific HRQOL instruments for common diseases or conditions.

    Table 1: Generic and Disease-Specific HRQOL Instruments for Common Diseases or Conditions

    A comprehensive approach to assessing HRQOL in clinical trials can be achieved, using a battery of questionnaires when a single questionnaire does not address all relevant HRQOL components or a module approach, which includes a core measure of HRQOL domains supplemented in the same questionnaire by a disease- or treatment-specific set of items. The battery approach combines a generic HRQOL instrument and a disease-specific questionnaire. For example, in a clinical trial on rheumatoid arthritis (RA), one can include the SF-36 and HAQ to evaluate treatment effect on HRQOL. The SF-36 allows comparison of RA burden on patients’ HRQOL with other diseases as well as the general population. The HAQ, being a disease-specific instrument, measures patients’ ability to perform activities of daily life and is more sensitive to changes in a RA patient’s condition. The module approach has been widely adopted in oncology, as different tumors impact patients in different ways. The most popular cancer-specific HRQOL questionnaires, EORTC QLQ-C3O and FACT, both include core instruments that measure physical functioning, mental functioning, and social well-being as well as common cancer symptoms, supplemented with a list of tumor- and treatment-specific modules.

    In certain diseases, a disease-specific HRQOL instrument is used alone in a trial because the disease’s impact on general HRQOL is so small that a generic HRQOL instrument will not be sensitive enough to detect changes in disease severity. For example, the disease burden of allergic rhinitis on generic HRQOL is relatively small compared with the general population. Most published HRQOL studies in allergic rhinitis use Juniper’s Rhinoconjunctivitis Quality of Life Questionnaire (RQLQ), a disease-specific HRQOL questionnaire for allergic rhinitis.

    3.3 Establishment of Clear Objectives in HRQOL Assessments

    A clinical trial is usually designed to address one hypothesis or a small number of hypotheses, evaluating a new therapy’s efficacy, safety, or both. When considering whether to include HRQOL assessment in a study, the question of what additional information will be provided by the HRQOL assessment must be asked. As estimated by Moinpour [31], the total cost per patient is $ 443 to develop an HRQOL study, monitor HRQOL form submission, and analyze HRQOL data. Sloan et al. [32] have revisited the issue of the cost of HRQOL assessment in a number of settings including clinical trials and suggest a wide cost range depending on the comprehensiveness of the assessment, which is not a trivial sum of money to be spent in a study without a clear objective in HRQOL assessment.

    The objective of HRQOL assessment is usually focused on one of the four possible outcomes: (1) improvement in efficacy leads to improvement in HRQOL, (2) treatment side effects may cause deterioration in HRQOL, (3) the combined effect of (1) and (2) on HRQOL, and (4) similar efficacy with an improved side effect profile leads to improvement in HRQOL.

    After considering possible HRQOL outcomes, one will come to a decision, whether HRQOL assessment should be included in the trial. In many published studies, HRQOL was included in the studies without a clear objective. These studies generated HRQOL data that provided no additional information at the completion of the studies. Goodwin et al. [33] provide an excellent review of HRQOL measurement in randomized clinical trials in breast cancer. They suggest that, given the existing HRQOL database for breast cancer, it is not necessary to measure HRQOL in every trial, at least until ongoing trials are reported. An exception is interventions with a psychosocial focus, where HRQOL must be the primary outcome.

    3.4 Methods for HRQOL Assessment

    The following components should be included in a study protocol with an HRQOL objective:

    Rationale for assessing HRQOL objective(s) and for the choice of HRQOL instrument(s):

    To help study personnel understand the importance of HRQOL assessment in the study, inclusion of a clear and concise rationale for HRQOL assessment is essential, along with a description of the specific HRQOL in strument(s) chosen.

    HRQOL hypotheses:

    The study protocol should also specify hypothesized HRQOL outcomes with respect to general and specific domains. It is helpful to identify the primary do main and secondary domains for HRQOL analysis in the protocol.

    Frequency of HRQOL assessment:

    In a clinical trial, the minimum number of HRQOL assessments required is two, at baseline and the end of the study for studies with a fixed treatment duration where most patients are expected to complete the treatment. One or two additional assessments should be considered between baseline and study endpoint depending on the length of the study so that a patient’s data will still be useful if end point data were not collected. More frequent assessments should be considered if the treatment’s impact on HRQOL may change over time. Three or more assessments are necessary to characterize patterns of change for individual patients. In oncology trials, it is common to assess HRQOL on every treatment cycle, as patients’ HRQOL is expected to change over time. However, assessment burden can be minimized if specific time points associated with expected clinical effects are of interest and can be specified by clinicians (e.g., assess HRQOL after the minimum number of cycles of therapy required to observe clinical activity of an agent). Another factor to be considered in the frequency of HRQOL assessment is the recall period for a particular HRQOL instrument. The recall period is the period during which a subject is asked to assess his/her responses to an HRQOL questionnaire. The most common recall periods are 1 week, 2 weeks, and 4 weeks.

    Administering HRQOL questionnaires:

    To objectively evaluate HRQOL, one needs to minimize physician and study nurse influence on patient’s response to HRQOL questions. Therefore, the protocol should indicate that the patient is to complete the HRQOL questionnaire in a quiet place in the doctor’s office at the beginning of his/her office visit, prior to any physical examination and clinical evaluation by the study nurse and physician.

    Specify the magnitude of difference in HRQOL domain score that can be detected with the planned sample size:

    This factor is especially important when the HRQOL assessment is considered a secondary end point. As the sample size is based on the primary end point, it may provide only enough power to detect a relatively large difference in HRQOL scores. The question of whether to increase the sample size to cover HRQOL assessment often depends on how many additional patients are needed and the importance of the HRQOL issue for the trial. Collecting HRQOL data when power will be insufficient to detect effects of interest is a waste of clinical resources and the patient’s time.

    Specify how HRQOL scores are to be calculated and analyzed in the statistical analysis section:

    Calculation of HRQOL domain scores should be stated clearly, including how missing items will be handled. As a result of the nature of oncology studies, especially in late-stage disease, patients will stop treatment at different time points because of disease progression, intolerance to treatment side effects, or death and therefore fail to complete the HRQOL assessment schedule. For example, if data are missing because of deteriorating patient health, the study estimates of effect on HRQOL will be biased in favor of better HRQOL; the term informative missing data is the name for this phenomenon, which must be handled with care. Fairclough [34] has written a book on various longitudinal methods to analyze this type of HRQOL data. However, analyzing and interpreting HRQOL data in this setting remain a challenge.

    Strategies to improve HRQOL data collection: Education at the investigators’ meeting and during the site initiation visit: It is important to have investigators and study coordinators committed to the importance of HRQOL assessment. Without this extra effort, HRQOL assessment is likely to be unsuccessful, simply because collecting HRQOL data is not part of routine clinical trial conduct.

    Emphasize the importance of the HRQOL data:

    Baseline HRQOL forms should be required in order to register a patient to a trial. Associate grant payment per patient with submission of a patient’s efficacy data. Specifying some portion of the grant payment with the submission of HRQOL form has significantly increased the HRQOL completion rate in the author’s clinical experience.

    Establish a prospective reminder system for upcoming HRQOL assessments and a system for routine monitoring of forms at the same time clinical monitoring is being conducted.

    The checklist in Table 2 may be helpful when considering inclusion of HRQOL assessment in a clinical trial protocol.

    Table 2: Checklist for HRQOL Assessment

    3.5 HRQOL as the Primary End Point

    To use HRQOL as the primary endpoint in a clinical trial, prior information must demonstrate at least comparable efficacy of a study treatment to its control. In this context, to design a study with HRQOL as the primary end point, the sample size will have to be large enough to assure adequate power to detect meaningful differences in HRQOL between treatment groups. Another context for a primary HRQOL end point is in the setting of treatment palliation. In this case, treatment efficacy is shown by the agent’s ability to palliate disease-related symptoms and overall HRQOL without incurring treatment-related toxicities. For example, patient report of pain reduction can document the achievement of palliation [e.g., see Tannock et al. [35] example below].

    A HRQOL instrument usually has several domains to assess various aspects of HRQOL. Some HRQOL instruments also provide for an overall or total score. The HRQOL end point should specify a particular domain, or the total score, as the primary end point of the HRQOL assessment in order to avoid multiplicity issues.

    If HRQOL is included as a secondary end point, it is a good practice to identify a particular domain as the primary focus of the HRQOL assessment. This practice forces specification of the expected outcomes of HRQOL assessments.

    Some investigators have applied multiplicity adjustments to HRQOL assessments. The approach may be statistical prudent, but it does not provide practical value. The variability of HRQOL domain scores is generally large. With multiple domains being evaluated, only a very large difference between groups will achieve the required statistically significance level. When evaluating HRQOL as a profile of a therapy’s impact on patients, clinical judgment of the magnitude of HRQOL changes should be more important than the statistical significance. However, this exploratory analysis perspective should also be tempered with the recognition that some significant results may be marginally significant and subject to occurrence by chance.

    3.6 Interpretation of HRQOL Results

    Two approaches have been used to interpret the meaningfulness of observed HRQOL differences between two treatment groups in a clinical trial: distribution-based and anchor-based approaches. The most widely used distribution-based approach is the effect size, among other methods listed in Table 3 [36–45]. Based on the effect size, an observed difference is classified into (1) 0.2 = a small difference, (2) 0.5 = a moderate difference, and (3) 0.8 = a large difference. To advocate using the effect size to facilitate the interpretation of HRQOL data, Sloan et al. [46] suggested a 0.5 standard deviation as a reasonable benchmark for a 0–100 scale to be clinically meaningful. This suggestion is consistent with Cohen’s [47] suggestion of one-half of a standard deviation as indicating a moderate effect and therefore clinically meaningful.

    Table 3: Common Methods Used to Measure a Questionnaire’s Responsiveness to Change

    The anchor-based approach compares observed differences relative to an external standard. Investigators have used this approach to define the minimum important difference (MID). For example, Juniper and Guyatt [26] suggested that a 0.5 change in RQLQ be the MID (RQLQ score ranges from 1 to 7). Osoba et al. [48] suggested that a 10-point change in the EORTC QLQ-C30 questionnaire would be a MID. Both of these two MIDs are group average scores. How these MIDs apply to individual patients is still an issue. Another issue in using MID is related to the starting point of patients’ HRQOL scores. Guyatt et al. [49] provide a detailed overview of various strategies to interpret HRQOL results.

    3.7 Examples

    3.7.1 HRQOL in Asthma

    To evaluate salmeterol’s effect on quality of life, patients with nocturnal asthma were enrolled into a double-blind, parallel group, placebo-controlled, multicenter study [50]. The study rationale was that patients with nocturnal asthma who are clinically stable have been found to have poorer cognitive performance and poorer subjective and objective sleep quality compared with normal, healthy patients. To assess salmeterol’s effect on reducing the impact of nocturnal asthma on patients’ daily functioning and well-being, patients were randomized to receive salmeterol 42 μg or placebo twice daily. Patients were allowed to continue theophylline, inhaled corticosteroids, and as-needed albuterol. Treatment duration was 12 weeks, with a 2-week run-in period. The primary study objective was to assess the impact of salmeterol on asthma-specific quality of life using the validated Asthma Quality of Life Questionnaire [24] (AQLQ). Patients were to return to the clinic every 4 weeks. Randomized patients were to complete an AQLQ at day 1; weeks 4, 8, 12; and at the time of withdrawal from the study for any reason. Efficacy (FEV1, PEF, nighttime awakenings, asthma symptoms, and albuterol use) and safety assessments were also conducted at these clinic visits. Scheduling HRQOL assessment prior to efficacy and safety evaluations at office visits minimizes investigator bias and missing HRQOL evaluation forms.

    The AQLQ is a 32-item, self-administered, asthma-specific instrument that assesses quality of life over a 2-week time interval. Each item is scored using a scale from 1 to 7, with lower scores indicating greater impairment and higher scores indicating less impairment in quality of life. Items are grouped into four domains: (1) activity limitation (assesses the amount of limitation of individualized activities that are important to the patient and are affected by asthma); (2) asthma symptoms (assesses the frequency and degree of discomfort of shortness of breath, chest tightness, wheezing, chest heaviness, cough, difficulty breathing out, fighting for air, heavy breathing, difficulty getting a good night’s sleep); (3) emotional function (assesses the frequency of being afraid of not having medications, concerned about medications, concerned about having asthma, frustrated); and (4) environmental exposure (assesses the frequency of exposure to and avoidance of irritants such as cigarette smoke, dust, and air pollution). Individual domain scores and a global score are calculated. A change of 0.5 (for both global and individual domain scores) is considered the smallest difference that patients perceive as meaningful [51].

    To achieve 80% power to detect a difference of 0.5 in AQLQ between two treatment arms would only require 80 patients per arm at a significance level of 0.05. However, this study was designed to enroll 300 patients per arm so that it could also provide 80% power to detect differences in efficacy variables (e.g., FEV1, nighttime awakening) between two treatment arms at a significance level of 0.05.

    A total of 474 patients were randomly assigned to treatment. Mean change from baseline for the AQLQ global and each of the four domain scores was significantly greater (P < 0.005) with salmeterol compared with placebo, first observed at week 4 and continuing through week 12. In addition, differences between salmeterol and placebo groups were greater than 0.5 at all visits except at week 4 and week 8 for the environmental exposure domain. At week 12, salmeterol significantly (P < 0.001 compared with placebo) increased mean change from baseline in FEV1, morning and evening PEF, percentage of symptom-free days, percentage of nights with no awakenings due to asthma, and the percentage of days and nights with no supplemental albuterol use. This study demonstrated that salmeterol’s effect in improving patients’ asthma symptoms had a more profound effect on improving patients’ daily activity and well-being.

    3.7.2 HRQOL in Seasonal Allergy Rhinitis

    A randomized, double-blind, placebo-controlled study was conducted to evaluate the effects on efficacy, safety, and quality of life of two approved therapies (fexofenadine HCI 120 mg and loratadine 10 mg) for the treatment of seasonal allergy rhinitis (SAR) [52]. Clinical efficacy was based on a patient’s evaluation of SAR symptoms: (1) sneezing; (2) rhinorrhea; (3) itchy nose, palate, or throat; and (4) itchy, watery, or red eyes. The primary efficacy end point was the total score for the patient symptom evaluation, defined as the sum of the four individual symptom scores. Each of the symptoms was evaluated on a 5-point scale (0 to 4), with higher scores indicating more severe symptoms. Treatment duration was 2 weeks, with a run-in period of 3–7 days. After randomization at study day 1, patients were to return the clinic every week. During these visits, patients were to be evaluated for the severity of SAR symptoms and to complete a quality of life questionnaire.

    Patient-reported quality of life was evaluated using a validated disease-specific questionnaire—the Rhinoconjunctivitis Quality of Life Questionnaire (RQLQ) [26]. The RQLQ is a 28-item instrument that assesses quality of life over a 1-week time interval. Each item is scored using a scale from 0 (i.e., not troubled) to 6 (i.e., extremely troubled), with lower scores indicating greater impairment and higher scores indicating less impairment in quality of life. Items are grouped into seven domains: (1) sleep, (2) practical problems, (3) nasal symptoms, (4) eye symptoms, (5) non-nose/eye symptoms, (6) activity limitations, and (7) emotional function. Individual domain scores and an overall score are calculated. A change of 0.5 (for both global and individual domain scores) is considered the smallest difference that patients perceive as meaningful [53]. The RQLQ assessment was a secondary end point.

    No sample size and power justification was mentioned in the published paper. A total of 688 patients were randomized to receive fexofenadine HCI 120 mg, loratadine 10 mg. or placebo once daily. Mean 24-hour total symptom score (TSS) as evaluated by the patient was significantly reduced by both fexofenadine HCI and loratadine from baseline (P ≤ 0.001) compared with placebo. The difference between fexofenadine HCI and loratadine was not statistically significant. For overall quality of life, a significant improvement from baseline occurred for all three treatment groups (mean improvement was 1.25, 1.00, and 0.93 for fexofenadine HCI, loratadine, and placebo, respectively). The improvement in the fexofenadine HCI group was significantly greater than that in either the loratadine (P ≤ 0.03) or placebo (P ≤ 0.005) groups. However, the magnitude of differences among the treatment groups was less than the minimal important difference of 0.5.

    The asthma example demonstrates that salmeterol not only significantly improved patients’ asthma-related symptoms, both statistically and clinically, but also relieved their asthma-induced impairments on daily functioning and well-being. On the other hand, the SAR example demonstrates that both fexofenadine HCI and loratadine were effective in relief of SAR symptoms. The difference between fexofenadine HCI and loratadine in HRQOL was only statistically significant, but not clinically. However, Hays and Woolley [54] have cautioned investigators about the potential for oversimplication when applying a single minimal clinically important difference (MCID).

    3.7.3 Symptom Relief for Late-Stage Cancers

    Although the main objective for the treatment of early-stage cancers is to eradicate the cancer cells and prolong survival, it may not be achievable in late-stage cancers. More often, the objective for the treatment of late-stage cancers is palliation, mainly through relief of cancer-related symptoms. As the relief of cancer-related symptoms represents a clinical benefit to patients, the objective of some clinical trials in late-stage cancer is relief of a specific cancer-related symptom such as pain.

    To investigate the benefit of mitoxantrone in patients with symptomatic hormone-resistant prostate cancer, hormone-refractory patients with pain were randomized to receive mitoxantrone plus prednisone or prednisone alone [35]. The primary end point was a palliative response defined as a two-point decrease in pain as assessed by a six-point pain scale completed by patients (or complete loss of pain if initially 1+) without an increase in analgesic medication and maintained for two consecutive evaluations at least 3 weeks apart. Palliative response was observed in 23 of 80 patients (29%; 95% confidence interval; range 19–40%) who received mitoxantrone plus prednisone and in 10 of 81 patients (12%; 95% confidence interval; range 6–22%) who received prednisone alone (P = 0.01). No difference existed in overall survival.

    In another study assessing gemcitabine’s effect on relief of pain [55], 162 patients with advanced symptomatic pancreatic cancer completed a lead-in period to characterize and stabilize pain and were randomized to receive either gemcitabine 1000 mg/m2 weekly × 7 followed by 1 week of rest, then weekly × 3 every 4 weeks thereafter, or to fluorouracil (5-FU) 600 mg/m2 once weekly. The primary efficacy measure was clinical benefit response, which was a composite of measurements of pain (analgesic consumption and pain intensity), Karnofsky performance status, and weight. Clinical benefit required a sustained (≥4 weeks) improvement in at least one parameter without worsening in any others. Clinical benefit response was experienced by 23.8% of gemcitabine-treated patients compared with 4.8% of 5-FU-treated patients (P = 0.0022). In addition, the median survival durations were 5.65 and 4.41 months for gemcitabine-treated and 5-FU-treated patients, respectively (P = 0.0025). Regarding the use of composite variables, researchers have urged investigators to report descriptive results for all components so that composite results do not obscure potential negative results for

    Enjoying the preview?
    Page 1 of 1