MULTIVARIATE ANALYSIS

Classification of MV techniques
• Dependence methods

Interdependence methods

Classification of MV techniques
Are some of the variables Dependent on others?

Yes

No

Dependence methods

Interdependence methods

What are dependence methods?
• If a multivariate technique attempts to explain or predict the dependent variables on the basis of 2 or more independent then we are analyzing dependence

Techniques
• Multiple regression analysis • Multiple discriminant analysis • Multivariate analysis of variance • Canonical correlation analysis are all dependence methods.

Analysis of Interdependence
• The goal of interdependence methods is to give meaning to a set of variables or to seek to group things together. No one variable or variable subset is to be predicted from the others or explained by them

Some methods
• The most common of these methods are factor analysis, cluster analysis and multidimensional scaling. A manager might utilize these techniques to identify profitable market segments or clusters. Can be used for classification of similar cities on the basis of population size, income distribution etc;

• As in other forms of data analysis, the nature of measurement scales will determine which MV technique is appropriate for the data

Classification of dependence Methods
How many variables are dependent?

One dependent variable

Several dependent variables

Multiple dependent and independent variables

One dependent variable

Metric

Non-metric

Multiple regression

Multiple discriminant analysis

Several dependent variables

Metric

Non-Metric

MANOVA

Conjoint

Multiple dependent and independent variables

Canonical Analysis

Analysis of dependence-Multiple regression
• Extension of bivariate regression analysis • Allows for the simultaneous investigation of the effect of two or more independent variables on a single interval-scaled dependent variable. • In reality several factors are likely to affect such a dependent variable.

The model
• An example of a multiple regression equation is • Y = a + B1X1 + B2X2 + B3X3+ …………..BnXn +e • Where B0= a constant, the value of Y when all X values =0 • Bi= slope of the regression surface, B represents the regression coefficient associated with each X • E= an error term, normally distributed about a mean of 0

Example
• Y = 102.18 + 387X1 + 115.2X2 + 6.73X3 • R2 = 0.845 • F value 14.6

Regression coefficients
• The regression coefficient can either be stated in raw score units (actual X values) or as standardized coefficients values in terms of their standard deviation.

Interpretation
• When regression coefficient are standardized they are called as beta weights B an their values indicate the relative importance of the associated X values especially when predictors are unrelated. • If B1= .60 and B2 = .20 then X1 has three times the influence on Y as X2

• In multiple regression the coefficients B1 and B2 etc are called coefficients of partial regression because the independent variables are correlated with other independent variables

• The coefficient of multiple determination indicates the percentage of variation in Y explained by the variation in the independent variables. • R2 = .845 tells us that the variation in the independent accounted for 84.5% of the variance in the dependent variable.

• Adding more of the independent variables in the equation explains more of the variation in Y.

• To test for statistical significance an F-test comparing the different sources of variation is necessary. The F test allows for testing the relative magnitudes of the sum of squares due to the regression (SSe) and the error sum of squares (SSr) with their appropriate degrees of freedom

• A continuous interval-scaled dependent variable is required in multiple regression as in bivariate regression • Interval scaling is also required for the independent variables. • However dummy variables such as the binary variable in our example may be utilized. • A dummy variable is one that has two or more distinct levels 0 and 1

Uses of Multiple regression
• It is often used to develop a selfweighting estimating equation by which to predict values for a criterion variable (DV) from the values of several predictor variables (IV)

Uses of Multiple regression
A descriptive application of multiple reg calls for controlling for confounding variables to better evaluate the contribution of other variables- control brand and study effect of price alone

Uses of Multiple regression
• To test and explain casual theoriesreferred to as Path analysis-reg is used to describe an entire structure of linkages that have advanced from casual theories Used as an inference tool to test hypotheses and estimate population values

How can equations be built?
• Regression eqn can be built with • specific variables • all variables • select a method that sequentially adds or removes variables.

Methods in adding and removal of methods
1. Forward selection starts with the constant and variables that result in large R2 increases. 2. Backward elimination begins with a model containing all independent var and removes var that changes R2 the least.

Stepwise selection
Most popular, combines the two. The independent var that contributes the most • to explaining the dependent var is added first. • Subsequent var are added based on their incremental contribution over the first var whenever they meet the criterion of entering the Eqn (eg a level of sig of .01. var may be removed at each step if they meet the removal criterion which is larger sig level than for entry

Collinearity and Multicollinearity
• Is a situation where two or more of the independent variables are highly correlated and this can have a damaging effect on the multiple regression. • When this condition exists, the estimated regression coeff can fluctuate widely from sample to sample making it risky to interpret the coeff as an as an important indicator of predictor var

Just how high can acceptable correlations be between indep var?
• There is no definitive answer, but cor at . 80 or> should be dealt with in one of the following two ways 1. Choose one of the var and delete the other

Just how high can acceptable correlations be between indep var?
2. Create a new var that is composite of the highly inter-correlated variables use this var in place of its components. Making this decision with a corr matrix alone is not sufficient.

Just how high can acceptable correlations be between indep var?
The exhibit shows a VIF index. This is a measure of the effect of other indep var on a reg coeff. Large values of 10 or more suggests Collinearity or Multicollinearity. With only 3 predictors his is not a problem.

Just how high can acceptable correlations be between indep var?
1. The exhibit shows a VIF index. This is a measure of the effect of other indep var on a reg coeff. Large values of 10 or more suggests Collinearity or Multicollinearity. With only 3 predictors his is not a problem.

Difficulties in regression
Another difficulty with reg occurs when researchers fail to evaluate the eqn with data beyond those used originally to calculate it. A solution would be to set aside a portion of the data and use only the remainder to estimate the eqn. This is called a hold out eg.

Difficulties in regression
One then uses the eqn on the holdout data to calculate R2. This can then be compared to the original R2 to determine how well the eqn predicts beyond the database.