You are on page 1of 3

# Overview of multivariate analysis techniques in research and publication

Pradeep Deshmukh Professor, Department of Community Medicine Mahatma Gandhi Institute of Medical Sciences, Sewagram 442102 Multivariate analysis is a collection of techniques appropriate for the situation in which the random variation in several variables has to be studied simultaneously. The techniques help model reality where each situation, product, or decision involves more than a single variable. The quantum of data availability has challenged the ability to obtain a clear picture and make intelligent decisions. In such complex situations, multivariate analysis can be used to process the information in a meaningful fashion. In health research, we come across different types of research questions. The type of research question along with characteristics of the data set, we can determine which multivariate technique(s) is to be used. Research questions can be categorized into 1. Degree of relationship among variables 2. Significance of group differences 3. Prediction of group membership 4. Structure and 5. Questions that focus on the time course of events. Few important and commonly used techniques are introduced below. 1. Degree of relationship among variables: If the major purpose of analysis is to assess the associations among two or more variables, some form of correlation or regression is appropriate. The choice among different techniques is made by determining the number of independent and dependent variables, the nature of the variables and whether any of the independent variables are best conceptualized as covariate. 1.1Multiple correlation: Multiple correlation assesses the degree to which one continuous variable is related to a set of other (usually) continuous that have been combined to create a new, composite variable. Multiple correlation is a bivariate correlation between the original dependent variable and the composite variable created from the independent variables. 1.2Multiple Regression: In multiple regression, the general approach is to express the mean value of the dependent variable in terms of other independent variables (also called as predictor, explanatory variables or covariates). The beta coefficients are the impacts of each variable. It is often used as a forecasting tool. 1.3Canonical Correlation: In canonical correlation, there are several continuous dependent variables as well as several continuous independent variables, and the goal is to assess the relationship between the two sets of variables. It is the correlation of two canonical (latent) variables, one representing a set of independent variables and the other representing the set of dependent variables.

2. Significance of Group Differences When subjects are randomly assigned to groups (treatments), the major research question usually is the extent to which statistically significant mean differences on dependent variables are associated with group membership. The choice among techniques depends on the number of independent variables and dependent variables and whether some variables are conceptualized as covariates. 2.1One-way Analysis of Covariance: One-way analysis of covariance is designed to assess group differences on a single dependent variable after the effects of one or more covariates are statistically removed. The ANCOVA question is: Are there mean differences in outcome associated with type of educational therapy after adjusting for differences in age and degree of reading disability? 2.2One-way MANOVA: Multivariate analysis of variance evaluates differences among centroids (composite means) for a set of dependent variables when there are two or more levels of independent variables (groups). 2.3One- Way MANCOVA: In addition to dealing with multiple dependent variables, multivariate analysis of variance can be applied to problems when there are one or more covariates. In this case, MANOVA becomes multivariate analysis of covariance-MANCOVA. 2.4Factorial MANOVA / MANCOVA: Factorial MANOVA is the extension of MANOVA to designs with more than one independent variable and multiple dependent variables. It is sometimes desirable to incorporate one or more covariates into a factorial MANOVA design to produce factorial MANCOVA. 3. Prediction of Group Membership In research where groups are identified, the emphasis is frequently on predicting group membership from a set of variables. Discriminant analysis, logit analysis, and logistic regression are designed to accomplish this prediction. 3.1 Discriminant analysis : In discriminant analysis, the goal is to predict membership in groups (the dependent variable) from a set of independent variables. The analysis tells us if group membership is predicted at a rate that is significantly better than chance. Group membership serves as the independent variable in MANOVA and the dependent variable in discriminant analysis. 3.2 Logit analysis: The logit analysis may be used to predict group membership when all of the predictors are discrete. This technique allows evaluation of the odds that a case is in one group based on membership in various categories of predictors. 3.3 Logistic regression: Logistic regression allows prediction of group membership when predictors are continuous, discrete, or a combination of the two. Thus, it is an alternative to both discriminant analysis and logit analysis. 4. Structure

Another set of questions is concerned with the latent structure underlying a set of variables. 4.1Factor analysis: When there is a theory about underlying structure or the researcher wants to understand underlying structure, factor analysis is often used. In this case, the researcher believes that responses to many different questions are driven by just a few underlying structures called factors. Principal Component Analysis is more commonly used technique here. 4.2Cluster analysis: The purpose of cluster analysis is to reduce a large data set to meaningful subgroups of individuals or objects. The division is accomplished on the basis of similarity of the objects across a set of specified characteristics. The sample should be representative of the population, and it is desirable to have uncorrelated factors.

5. Time Course of Events Two techniques focus on the time course of events. Survival analysis asks how long it takes for something to happen. Time-series analysis looks at the change in a dependent variable over the course of time. 5.1 Survival analysis Survival analysis is a family of techniques dealing with the time it takes for something to happen: a cure, a failure, an employee leaving, a relapse, a death, and so on. For example, what is the life expectancy of someone diagnosed with breast cancer? Is the life expectancy longer with chemotherapy? 5.2 Time series analysis Time-series analysis is used when the dependent variable is measured over a very large number of time periods-at least 50; time is the major independent variable. Time-series analysis is used to forecast future events based on a long series of past events. Time-series analysis also is used to evaluate the effect of an intervention.