You are on page 1of 8

This article was downloaded by: [University of Alberta] On: 7 January 2009 Access details: Access Details: [subscription

number 713587337] Publisher Informa Healthcare Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Encyclopedia of Biopharmaceutical Statistics


Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713172960

Analysis of Repeated Measures Data with Missing Values: An Overview of Methods


K. C. Carrire a; Taesung Park b; Yuanyuan Liang a a University of Alberta, Edmonton, Alberta, Canada b Seoul National University, Seoul, South Korea Online Publication Date: 18 July 2005

To cite this Section Carrire, K. C., Park, Taesung and Liang, Yuanyuan(2005)'Analysis of Repeated Measures Data with Missing

Values: An Overview of Methods',Encyclopedia of Biopharmaceutical Statistics,1:1,1 7

PLEASE SCROLL DOWN FOR ARTICLE


Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Analysis of Repeated Measures Data with Missing Values: An Overview of Methods


` re K. C. Carrie
University of Alberta, Edmonton, Alberta, Canada

Taesung Park
Seoul National University, Seoul, South Korea

Yuanyuan Liang
University of Alberta, Edmonton, Alberta, Canada

INTRODUCTION The missing data problem, which persists in much of empirical scientic investigations, is particularly common in repeated measures data. One reason for this is that the same subjects are used repeatedly over time. One of the main goals in dealing with missing data is to try to remove biases and reduce the estimation variances in an attempt to improve the overall estimation efciency, thereby increasing study power. With current advancements in computer technology, many computational analysis methods have been developed, but most have a major drawback: they rely heavily on large sample theory.[13] In their overview of missing data methods for approximately normally distributed repeated measures ` re et al.[4] discuss nondata in small samples, Carrie iterative procedures for data with compound symmetry and unstructured covariance matrices, as well as the use of proxy information. In general, the approach to incomplete data involves identifying appropriate missing data mechanisms. In this work, which supplements the previous work ` re et al.,[4] we discuss the merits and done by Carrie drawbacks of available small sample approaches with approximately normally distributed repeated measures data with missing values. This discussion is limited to cases where the missing data occurs on the outcome variable. In particular, this entry expands the discussion to include multiple imputation approaches. We use a numerical example to demonstrate the practical implications. MISSING DATA MECHANISMS AND IMPLICATIONS Most missing data methods are based on an assumption that missingness indicators contain true values that are meaningful for analysis.[5] Therefore, we must
Downloaded By: [University of Alberta] At: 06:32 7 January 2009

make an effort to reveal the true values. Obviously, procedures based only on the complete subset data can create serious biases and result in inefcient analyses.[1,5] We consider repeated measures data yij for subject j 1; . . . ; N in period i 1; . . . ; p in the presence of treatment effects, along with a covariate or design vector xij with mij 0 if yij is missing and 1 if yij is observed. The goal is to efciently estimate the contrast of treatment effects in the presence of missing values. Little and Rubin[5] dene three unique of types missing data mechanisms that occur in different situations: missing completely at random (MCAR), missing at random (MAR), and nonignorable missing (NIM). In the MCAR situation, cases with complete data are indistinguishable from cases with incomplete data, so that Eyij jmij 1 Eyij jmij 0. Also, we have pmij 1jxij ; yij pmij 1. This implies that the investigator can get consistent results by examining only the complete subset data. There is no danger of biased estimation, whether or not incomplete pairs receive appropriate attention. However, data can be missing for uncontrolled events in the course of data collection, and missing data are often associated with study variablesboth the outcome and the covariates. The MCAR is too strong an assumption in reality. In the MAR situation, cases with incomplete data differ from cases with complete data. The probability of observing a missing value depends on the observed values, but not on the unobserved values (both the covariates and outcome). Here, we have Eyij jxij ; mij 1 Eyij jxij ; mij 0 and within some subclasses of the data, they are still a random sample of cases.[5] Therefore, missing values are traceable or predictable from other variables available in the database already observed. Investigators have approached this situation in a variety of ways, including imputation,[611] weighting,[5,12,13] resampling methods,[7,14] data augmentation,[15] and the Gibbs
1

Encyclopedia of Biopharmaceutical Statistics DOI: 10.1081/E-EBS-120023806 Copyright # 2005 by Taylor & Francis. All rights reserved.

Analysis of Repeated Measures Data with Missing Values

sampler.[16] The success of these techniques depends on correctly specifying the missingness mechanism and=or the imputation model. Rubin and Little[5] show that, in the case of MAR data, the likelihood inference produces consistent and efcient results even if the missing data mechanism is ignored. In the context of a generalized estimating equations approach, Liang and Zeger[17] have found that the normal-model GEE is consistent when the missing data depend on any number of previous observations (i.e., MAR), if the mean function is correctly specied. It appears that both the bias and the standard error of estimates can be reduced as long as all available cases are used for both MCAR and MAR data, especially when using likelihood-based methods.[5] Little[10] and Park and Lee[18,19] advocated simple pattern mixture models to test the dependence of responses on the missing data mechanisms, assuming that the missing data are ignorable. If the test is rejected, the missing data are determined not to be of type MCAR, but MAR. In the NIM situation, the missing data mechanism is not completely random (non-MCAR), nor is it predictable from other variables in the database (non-MAR). Here, the reason for the missing data is explainable, but unmeasurable, because the very variable causing the data to be missing is unobservable.[5] For valid analysis results, it is important to handle nonignorable missing data appropriately. Imputation, or a samplingimportance resampling approach, has been used to achieve the desired efciency and unbiased analysis.[12] ` re[14] conclude that, in item response Sheng and Carrie data analysis, the bootstrap under imputation approach can accommodate all missing data mechanisms, including the NIM situation, efciently and without bias. However, this procedure is based on large sample theory. In summary, consistent and efcient analysis depends on the investigators ability to choose an appropriate missing data mechanism. It is especially important to devise a model for the nonignorable missing values, which accommodates a particular situation. Since any model for nonignorable data is specic to a given study and cannot be discussed in general terms, we have limited this discussion to MAR and MCAR situations. AVAILABLE DATA ANALYSIS Many investigators developed methods to utilize all available data rather than discarding incomplete pairs. Little and Rubin[5] noted that available case analysis methods generally lack practical appeal due to the imbalance problem of varying sample bases. When the sample bases vary from one variable to another variable within a study, obtaining estimates of the parameters, or even asymptotic standard errors, can

become very complicated. However, this is not an issue when using the ML method under an MCAR or MAR missing data mechanism with large samples, as there are valid standard errors and tests available based on large sample theory.[3] For small sample cases, though, further work is still needed. Several studies, although limited in scope, have indicated that the available case analysis method is superior to complete subset analysis.[1,2,20] Approximate inference procedures have also been proposed for small samples to utilize all available data, ` re et al.[4] present an if not all available cases. Carrie overview of missing data strategies, with a focus on available data analysis methods for approximately normal repeated measures data. Their concern is with small sample data, and they discuss approximate distributions of estimators for making inferences for parameters of interest. ` re[1,2] has developed small In particular, Carrie sample testing procedures based on the maximum likelihood method for two particular situations of the within-subject covariance matrixwhen it has a compound symmetry pattern and when it is unstructured. Approximate solutions for small sample inference procedure were found upon obtaining the explicit forms of the standard errors of the parameter estimators ` re[1,2] also suggested an approxiof interest. Carrie mate degrees of freedom approach based on the Satterthwaite[21] approximation method and on the assumption that the variability of higher-order terms is negligible. Although rather complex, these methods have been demonstrated to work well in computer simulation studies. Other small sample available data analysis methods can also be applicable when using common software (for example, PROC MIXED in SAS[22]), with approximation techniques similar to those suggested by ` re,[1,2] but based on different assumptions. Carrie Comparing her procedure with the SAS procedure, ` re[1] noted that the inference based on the availCarrie able data method depends on the design structure. PROC MIXED in SAS approximates the distribution of the estimators uniformly to make inferences for all model parameters. It appears that caution is advised if the design is not orthogonal, as the analysis results can be more liberal or more conservative than expected.[1] ` re procedure[1,2] can generally Although the Carrie apply to any repeated measures data with missing values occurring monotonically and at random, the estimation methods are applicable to only two, very specic cases. The procedure can be extended to other forms of covariance structures in an attempt to obtain approximate or even asymptotic distributions of the estimators. But because of the complexity involved, it may not be practical or even possible to use the

Downloaded By: [University of Alberta] At: 06:32 7 January 2009

Analysis of Repeated Measures Data with Missing Values

procedure without making many unrealistic assumptions. For this reason, we echo Little and Rubin[5] who concluded that the current state of available data=case analysis is not generally satisfactory for small sample repeated measures data. Further development is needed.

implications in order to provide general guidelines for utilizing available proxy information to improve data collection strategy for missing data situations. However, when the variability between the actual and proxy data sets is different, it is better to use other available case analyses so as to maintain the Type I error rate and to increase the power of testing parameters of interest. Multiple Imputations In the multiple imputation strategy, which was rst proposed by Rubin,[8] the imputed values are supposed to represent repeated random draws under a given model for each missing value. Overall inferences can then be drawn by combining results from complete data sets. Multiple imputation involves drawing multiple values for each missing data point and there are various methods available for drawing the values to impute.[9] For each draw of the data under a chosen imputation method, a complete data set ym m yobs ; ym mis is created by combining yobs and ymis , m 1; . . . ; M , as dened earlier. Substantial empirical work[2426] has shown that multiple imputation with M 3 or 5 works well with typical fractions (< 30%) of missing data in surveys. Rubin[9] suggests using standard inference methods to analyze each of the complete data sets ym and then combining them for overall analysis results. In this section, we discuss the imputation strategy for repeated measures data. We assume that missing data occur monotonically, so that the data can be clustered into L groups according to the number of periods the subjects completed. ` re[1] has noted that some intermediate missing Carrie values, which are frequently few in number, can be imputed or removed before proceeding with the analysis, without much loss of information. Let yj y1j ; . . . ; ypj T be the vector of responses from subject j where yij is from subject j in period i, i 1; . . . ; p and j 1; . . . ; N . We can express the data from this repeated measurement design by the model yj lj ej xj b ej 1

IMPUTATIONS Imputations (both single and multiple) require creating a predictive distribution based on the observed data, either implicitly or explicitly.[9] Examples include mean, regression, stochastic regression, hot or cold deck imputations, and substitutions. Composite methods combining some of these imputations and other techniques can also be used. Typically, imputation builds on the conditional mean of the missing outcome given observed data. Practically, imputation approaches are attractive because of their easy implementation and computational convenience, because the data are completely lled in even if the values are articial, and the pseudocomplete data can be analyzed using standard software. Single Imputations Single imputation approaches impute a single data point for each missing value. Single imputations were used in empirical research even before the theory had been formally developed. Using imputation approaches for each draw of the data, a complete data set y yobs ; yI mis is obtained by combining the observed data set yobs and the imputed data set for the missing values yI mis . The most popular of these single imputations is probably the mean substitution based on a specic covariate pattern. See Rubin[9] for details. From a slightly different perspective, Huang, et al.[23] discussed a single imputation strategy where the imputation uses proxy information from caregivers or family members of a respondent who can provide approximate information on behalf of the respondents. Then, this proxy information can be used for analysis instead of dealing with missing responses. The principal idea of this approach is implicit, in that it assumes an underlying model where the respondents and their proxies share a common mean and variance. Possible differences in the mean and variance can be accommodated. Huang et al.[23] also provided an approximate degrees of freedom solution to the testing hypothesis problem for missing repeated measures data with proxy. Huang et al.[23] reported that a single imputation with proxy information can play a signicant role in missing data analyses. They also discussed design

Downloaded By: [University of Alberta] At: 06:32 7 January 2009

for a q 1 vector of parameters b linking responses to some design or covariate matrix x. Richardson and Flack[3] propose a multiple imputation strategy for analyzing the data based on regression models. Called the residual draw (RD) method, this approach is based on Rubin[9] and is specic to three-period crossover trial data. The RD method imputes the conditional predictive mean of the incomplete case with additional noise. Basically, the

Analysis of Repeated Measures Data with Missing Values

investigator draws noise from the empirical distribution, and is therefore less sensitive to violations of the normal assumption. The RD method is implemented separately for each group=sequence. It involves estimating the least squares estimators to t a regression of the observed p 1th period data on the observed rst p periods ^ for each regression model. Then, the data to obtain b error variance is set as s2 SSE=W where SSE is the residual sum of squares from the least squares t and W is a draw from a chi-square distribution with the degrees of freedom of SSE. Then, a b is drawn ^; s2 xT x1 . Finally, to impute for from the N (b an incomplete case i, the method imputes a value T T where Yi xT i b r0 Y0 xi x0 b those with a subscript 0 are data values observed for a complete case drawn at random from the subjects is the residual of that case in the given sequence. The r0 when b b : Through simulations, Richardson and Flack[3] show that their strategy works well. However, their simulation uses relatively large sample data. ` re[27] suggest that a less paraHuang and Carrie metric approach than the one proposed by Richardson and Flack would be more robust. They impute the missing values using conditional distribution of the missing data, given the observed data in previous periods. The p1 -vector yT p1 j y1j ; . . . ; yp1 j for the complete subset data up to the rst p1 periods is assumed to be distributed as multivariate normal with mean lT p1 m1 ; . . . ; mp1 and covariance matrix S11 , the p1 p1 subcovariance matrix for the rst p1 periods. Then, the conditional distribution of yp1 1;j 1 given yp1 j is normal with mean mp1 1 r21 S 11 1 yp1 j lp1 , and variance s22 r21 S r , where 11 12 S11 is the submatrix of S for the rst p1 rows and p1 columns, r21 is that for the p1 1th row and the rst p1 columns, with r12 rT 21 , and s22 is the p1 1th diagonal element of S. For subsequent periods, replace p1 with p1 1, and repeat the pro` re.[2] Since the covariance matrix cess. See also Carrie is usually unknown, the investigator uses the respective sample estimates, obtained based on complete subsets of the data and denoted by sij , sij , and Sij . Then, ` re[27] extended the imputation Huang and Carrie strategy that Rubin[9] suggested for a univariate normal model to a multivariate normal model. This strategy involves estimating the conditional 1 ^ p 1 1  yp1 1 s21 S mean m 11 yp1 j yp1 : , and 1 2 ^ s22 s21 S11 s12 . then, the condithe variance s tional estimators are updated by drawing a chi-square random variable g with degrees of freedom N l s and a random variable from a standard normal distri^N l s=g and m bution z asp s s p1 1 ^p1 1 s z= N l , where N l is the total number of m observations at the missing data stage l and s is the number of sequences. Then, a random variable z is

Downloaded By: [University of Alberta] At: 06:32 7 January 2009

drawn from a standard normal distribution to impute for the missing values in the period p1 1, as yp1 1;j m p1 1 s z. This is repeated for all missing components in the period p1 1. Treating the imputed values as if they were actual values, the steps are repeated for the next periods, with p1 replaced by p1 1. This whole process is repeated M times to create M multiply imputed data sets. The multiply imputed data sets are analyzed to obtain an overall result. First, the usual data analysis is performed for each of the M imputed data sets. The M anaysis results are then combined to give ^ and a repeated-imputation inference as follows. Let b m W m be the estimators and their associated variance covariance matrix for b from the complete data set m, m 1; . . . ; M . The overall estimator of b from M data ^  PM b sets is obtained from b M m1 m =M and its  variance as V (b M ) T M W M 1 M 1 BM . The M within-imputation variances Wm are averPM aged to obtain W M m1 W m =M , and BM PM ^  ^  b m b M T =M 1 is the m1 b m b M between-imputation variance. To test a hypothesis of a linear contrast y lT b , Rubin[9] considers an approximate distribution for y given by y  yM lT T M l1=2  tv  , and the degree of freedom v is where  yM lT b M ~ v v0 ff v0 1 rM 1 v0 =vg1 3 2

rM where f v0 v0 1=v0 3, 1 1 M 1 trBM T M =q; and v0 is the degree of freedom based on the complete subset data. The rM estimates the fraction of information on b that is missing due to nonresponses. This fraction can be no larger than that of all missing data.[9]

IMPUTATION OR AVAILABLE CASE ANALYSIS? The available data analysis method is not always satisfactory for the reasons stated in the section Available Data Analysis above. Then, the imputation approach might be one alternate method to use. There are others, but most of them build on simulation methods (for example, data augmentation[15] and Gibbs sampler[16]). This entry compares available data analysis methods and imputations, among other possible approaches. Many investigators favor multiple imputations over single imputations because they can generate random variations in imputed articial data. However, many studies have also found that this method does not

Analysis of Repeated Measures Data with Missing Values

perform as effectively as expected. For example, in the context of generalized linear models, Xie and Paik[13] considered four multiple imputation strategies and sample average imputations. They conclude that as long as one uses all available data, all approaches are consistent and efcient. For continuous responses, ` re[27] found that there is no real Huang and Carrie advantage in creating multiple complete data sets, as the analysis method that uses only available data performs as well as or better than the multiple imputation method. ` re[27] note that the Specically, Huang and Carrie corresponding asymptotic distributions appear to t reasonably well for the two approaches they considered (ML and MI approaches) as long as all available data are used. In that study, the multiple imputation method performed as well as the ML method[1,2] for all situations considered. However, the ML method was found generally superior to the multiple imputation method, especially when the correlation is large and the covariance structure is unspecied. In light of the technical limitations noted earlier regarding the available data method, Huang and ` re[27] suggest adopting the multiple imputation Carrie method when types of covariance structures other than compound symmetry or unstructured covariance matrices provide a good t to the data in small samples. Although their implementation of the multiple imputation method for small samples was not entirely satisfactory in terms of keeping the Type I error and testing power, they recommended that the violation may be far less serious than that by the ML approach with large sample theory assumptions. Also, it could be considerably more complicated to try to calculate the asymptotic standard error of the ML estimators than using multiple imputations. See also Richardson and Flack.[3] As noted in the previous section, the real advantage of using multiple imputations is not in obtaining efcient estimators, but in obtaining unbiased estimators for parameters by attempting to reveal the true values for missing data. However, the one advantage of the multiple imputation (and of any other imputation techniques) is its capacity to use standard analysis statistical software. As long as all available data are used, both ML and MI approaches seem to be valid for univariate data as well as for repeated measures data analyses. Further work needs to be done on deriving the asymptotic small sample procedures for available data analysis. Also, there is a need to investigate the sample sizes required for the asymptotic ML procedures that can be validly used with standard software. This study has not dealt with the NIM case. We suggest that the alternative imputation strategy using proxy information, as in Huang et al.,[23] could be effective. This approach would likely work if proxy providers carefully assess reasonable responses from

the respondents with a full understanding of the reasons for the missing data in the given situation. ANALYSIS OF BRONCHIAL ASTHMA DATA To contrast the methods discussed in this paper, we analyzed the bronchial asthma data in Patel,[28] which utilizes the traditional two-period, two-treatment, twosequence design with eight and nine patients in each sequence, respectively, and N 17. We conducted an exploratory data analysis to determine the covariance structure, and we observed the compound symmetry covariance structure to provide an adequate t to the data. We analyzed the original complete data using PROC MIXED in SAS, and compared the (exact) test for the treatment and carryover effects to the t distribution with the degrees of freedom computed from (N 2 15 from the complete actual data. Grizzle[29] suggested a two-stage analysis for assessing the signicance of the residual effects, using a P -value of 0.15 to determine insignicance of a residual effect. Based on the two-stage analysis, we did not remove the residual effect, and thus concluded that, in the presence of a nonnegligible residual effect, the test of a direct treatment effect was signicant at 0.05 (rst row of Table 1). Next, we induced missing values by deleting the measurements in the second period from the four subjects in the BA treatment sequence to produce MAR data. We then compared the analysis based on the complete subset data omitting both measurements from the four subjects with missing observations in the second period to t distribution, with the exact degrees of freedom computed from the complete subset data of 17 4 13 patients (second row of Table 1). The results are similar to the original data analysis, with a slightly lower estimate for the treatment effect and a higher estimate for the residual effect, but higher standard errors for both. Applying the available case analysis method of ` re,[1] the tests are compared against t11 , the Carrie approximate degrees of freedom of N 2 2, where N 2 is the number of subjects with complete observations. We obtained results similar to the original data analysis (third row of Table 1). We see improved power over the complete subset analysis. A slightly different approximation procedure by SAS PROC MIXED produced results similar to complete subset analysis (fourth row of Table 1). In particular, the residual effect is now insignicant according to Grizzle.[29] Some comparisons of these two available case methods ` re.[1] can be found in Carrie Applying the single imputation approach of Huang et al.[23], we considered: 1) choosing values close to

Downloaded By: [University of Alberta] At: 06:32 7 January 2009

Analysis of Repeated Measures Data with Missing Values

Table 1 Analysis results of bronchial asthma data Method Full original data Complete subset data Incomplete methoda Incomplete methodb Proxyc Proxy Proxy MIe
a b

^ s 0.384 0.404 0.384 0.384 0.384 0.384 0.384 0.384 0.384

se^ s 0.169 0.178 0.152 0.163 0.177 0.180 0.166 0.169 0.164

df 15 11 11 10 13 13 13 13 9.429

P-value 0.038 0.044 0.028 0.040 0.049 0.053 0.038 0.041 0.043

^ c 0.512 0.503 0.471 0.470 0.468 0.468 0.468 0.468 0.554

se^ c 0.315 0.323 0.285 0.309 0.340 0.348 0.319 0.326 0.333

df 15 11 11 10 13 13 13 13 6.554

P-value 0.125 0.148 0.127 0.159 0.193 0.206 0.166 0.174 0.143

Proxyd
d

` re.[1] Method by Carrie Method by PROC MIXED of SAS. c Method by Huang et al.[23] d Method by PROC MIXED of SAS with df adjustment of Huang et al.[23] e ` re.[27] Multiple imputation method by Huang and Carrie
Downloaded By: [University of Alberta] At: 06:32 7 January 2009

their rst period values, taking into account that they tend to be smaller in period 2 than in period 1 (proxy I 2:5; 1:5; 3:0; 1:0 and 2) slightly overestimating the missing data (proxy II 3:0; 2:0; 3:5; 1:25. We rst examined the bias and possible variance heterogeneity of the proxy data from the actual responses. The variances of the proxy data sets are a little larger than those of the actual data, but the test of a common variance is not rejected. Hence, the tests are compared against t13 , with approximate degrees of freedom of N 4, assuming a common variance between proxy and the real data. Rows 5 and 6 in Table 1 show the results of the analysis described in Huang et al.[23], and rows 7 and 8 show those using SAS PROC MIXED. When using PROC MIXED, we adjusted the degrees of freedom as suggested by Huang et al.[23] These proxy approaches reected the mechanism used in choosing the proxy values with slightly less sensitive results in the proxy II than in proxy I, and an indication that the residual effects are nonexistent. Huang et al.[23] note that tests of treatment effects are affected by the presence of proxy data, resulting in a less sensitive outcome than for nonproxy approaches in this case. However, if we remove the insignicant residual effects from the model, we reject the null hypothesis of equal treatment effects in all cases, and conclude that the treatment effects are signicantly different, as in the analysis of full original data. Finally, we applied the non-regression-based multi` re.[27] ple imputation approach of Huang and Carrie The results are reported in row 9 of Table 1. The advantage of this approach is the convenience of using any standard software of choice for the analysis, but the penalty is quite high, as reected in the adjusted degrees of freedom. The qualitative results are similar

to those from the original data analysis and the available case analysis (see rows 1 and 3). CONCLUSIONS The purpose of this entry is twofold: the rst is to sup` re et al.[4] by including the imputaplement the Carrie tion procedure for small sample repeated measures data and the second is to compare implications of various incomplete data methods. Available data analyses are said to be generally unsatisfactory, because the calculation of the asymptotic standard errors of estimators is quite complex even for MCAR. However, when available, they are found to be more powerful than multiple imputations. Imputation-based procedures ll in for the missing values to analyze them by standard methods. When desired, the strategy of the multiple imputation method discussed by Huang and ` re[27] may be used for small sample repeated Carrie measures data; its overall performance was not substantially worse than that of the alternative. The discussion was limited to the MCAR and MAR cases and future work dealing with the NIM case will be forthcoming. ACKNOWLEDGMENTS This work was funded by grants from the Natural Sciences and Engineering Research Council of Canada, the Alberta Heritage Foundation for Medical Research, and the Korea Federation of Science and Technology Societies (Brain Pool) to K.C. Carriere and from the Korean Research Foundation (KRF2004-015-C0086) to T.S. Park.

Analysis of Repeated Measures Data with Missing Values

REFERENCES ` re, K.C. Incomplete repeated measures data 1. Carrie analysis in the presence of treatment effects. J. Am. Statist. Assoc. 1994, 89, 680686. ` re, K.C. Methods for repeated measures 2. Carrie data analysis with missing values. J. Statist. Plann. Inference 1999, 77, 221236. 3. Richardson, B.A.; Flack, V.F. The analysis of incomplete data in the three-period two-treatment cross-over design for clinical trials. Stat. Med. 1996, 15 (2), 127143. ` re, K.C.; Huang, R.; Sheng, X.; Liang, Y. 4. Carrie Missing values in repeated measures designs. In Encyclopedia of Biopharmaceutical Statistics, 2nd Ed.; Marcel Dekker Inc., in press. 5. Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley: New York, 1987. 6. Barnard, J.O.; Rubin, D.B. Small-sample degrees of freedom with multiple imputation. Biometrika 1999, 86 (4), 948955. 7. Efron, B. Missing data, imputation, and the bootstrap. J. Am. Statist. Assoc. 1994, 89, 463478. 8. Gelfand, A.E.; Smith, A.F.M. Sampling-based approaches to calculating marginal densities. J. Am. Statist. Assoc. 1990, 85, 398409. 9. Rubin, D.B. Multiple Imputation for Nonresponse in Surveys; John Wiley: New York, 1987. 10. Little, R. Pattern-mixture models for multivariate incomplete data. J. Am. Statist. Assoc. 1993, 88, 125134. 11. Rubin, D.B.; Schenker, N. Interval estimation from multiple imputed data: a case study using agriculture industry codes. J. Ofcial Statist. 1987, 3, 375387. 12. Paik, M.C. The generalized estimating equation approach when data are not missing completely at random. J. Am. Statist. Assoc. 1997, 92, 13201329. 13. Xie, F.; Paik, M.C. Generalized estimating equation model for binary outcomes with missing covariates. Biometrics 1997, 53, 14581466. ` re, K.C. Strategies for analyzing 14. Sheng, X.; Carrie missing item response data with an application. Biom. J. in press. 15. Tanner, M.A.; Wong, W.H. The calculation of posterior distributions by data augmentation 16.

17.

18.

19.

20.

21.

22. 23.

24.

25.

26.

27.

28.

29.

(C=R: p541550). J. Am. Statist. Assoc. 1987, 82, 528540. Gelman, A.; Rubin, D.B. Inference from iterative simulation using multiple sequences (Disc: p. 483 501, 503511). Statist. Sci. 1992, 7, 457472. Liang, K.-Y.; Zeger, S. Longitudinal data analysis using generalized linear models. Biometrika 1986, 73, 1333. Park, T.; Lee, S.Y. A test of missing completely at random for longitudinal data with missing observations. Statist. Med. 1997, 16, 18591871. Park, T.; Lee, S.Y. Simple pattern-mixture models for longitudinal data with missing observations. Statist. Med. 1999, 18, 29332941. Kim, J.O.; Curry, J. The treatment of missing data in multivariate analysis. Sociol. Methods Res. 1977, 6, 215240. Satterthwaite, F.E. An approximate distribution of estimates of variance components. Biometrics Bull. 1946, 2, 110114. SAS Institute. SAS Technical Report P-229 (6.07); SAS Institute, Inc.: Cary, NC, 2002. ` re, K.C. The role Huang, R.; Liang, Y.Y.; Carrie of proxy information in missing data analysis. Statist. Methods Med. Res. in press. Li, K.H.; Raghunathan, T.E.; Rubin, D.B.; Large-sample signicance levels from multiply imputed data using moment-based statistics and an F reference distribution. J. Am. Statist. Assoc. 1991, 86, 10651073. Li, K.H.; Meng, X.L.; Raghunathan, T.E.; Rubin, D.B. Signicance levels from repeated p-values with multiply-imputed data. Statist. Sinica 1991, 1, 6592. Meng, X.L.; Rubin, D.B. Performing likelihood ratio tests with multiply-imputed data sets. Biometrika 1992, 79, 103111. ` re, K.C. Comparison of methHuang, R; Carrie ods for incomplete repeated measures data analysis in small samples. J. Statist. Plann. Inference, in press. Patel, H.I. Analysis of incomplete data from a clinical trial with repeated measurements. Biometrika 1991, 78, 609919. Grizzle, J.E. The two-period change-over design and its use in clinical trials. Biometrics 1965, 21, 467480.

Downloaded By: [University of Alberta] At: 06:32 7 January 2009

You might also like