# MULTIVARIATE ANALYSIS

Statistical procedure for analysis of data involving more than one type of measurement or observation. It may also mean solving problems where more than one dependent variable is analyzed simultaneously with other variables. Multivariate analysis (MVA) is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest.

Multivariate Analysis Techniques
Multiple Regression Analysis Multiple regression is the most commonly utilized multivariate technique. It examines the relationship between a single metric dependent variable and two or more metric independent variables. The technique relies upon determining the linear relationship with the lowest sum of squared variances; therefore, assumptions of normality, linearity, and equal variance are carefully observed. The beta coefficients (weights) are the marginal impacts of each variable, and the size of the weight can be interpreted directly. Multiple regression is often used as a forecasting tool. Discriminant Analysis The purpose of discriminant analysis is to correctly classify observations or people into homogeneous groups. The independent variables must be metric and must have a high degree of normality. Discriminant analysis builds a linear discriminant function, which can then be used to classify the observations. The overall fit is assessed by looking at the degree to which the group means differ (Wilkes Lambda or D2) and how well the model classifies. To determine which variables have the most impact on the discriminant function, it is possible to look at partial F values. The higher the partial F, the more impact that variable has on the discriminant function. This tool helps categorize people, like buyers and nonbuyers.

There are two main factor analysis methods: common factor analysis. in which there is no dependent variable. the independent variables are normal and continuous. There are three main clustering methods: hierarchical. with at least three to five variables loading onto a factor. Outliers are a problem with this technique. which extracts factors based on the variance shared by the factors. and usually a hypothesized relationship between dependent measures is used.Multivariate Analysis of Variance (MANOVA) This technique examines the relationship between several categorical independent variables and two or more metric dependent variables. which extracts factors based on the total variance of the factors. Ideally. Factor Analysis When there are many variables in a research design. nonhierarchical. Rather. often caused by too many irrelevant variables. This technique is slightly different in that the independent variables are categorical and the dependent variable is metric. and a . and it is desirable to have uncorrelated factors. Common factor analysis is used to look for the latent (underlying) factors. This is an independence technique. Cluster Analysis The purpose of cluster analysis is to reduce a large data set to meaningful subgroups of individuals or objects. which requires specification of the number of clusters a priori. The division is accomplished on the basis of similarity of the objects across a set of specified characteristics. Whereas analysis of variance (ANOVA) assesses the differences between groups (by using T tests for two means and F tests between three or more means). it is often helpful to reduce the variables to a smaller set of factors. Typically this analysis is used in experimental design. which is a treelike process appropriate for smaller data sets. the researcher is looking for the underlying structure of the data matrix. and principal component analysis. MANOVA examines the dependence relationship between a set of dependent measures across a set of groups. whereas principal component analysis is used to find the fewest number of variables that explain the most variance. The first factor extracted explains the most variance. The sample should be representative of the population.

” since it allows for the evaluation of objects and the various levels of the attributes to be examined. is calculated for each level of each attribute. the Z-test has a single critical value (for example.combination of both. Δ is a specified value to be tested. Conjoint Analysis Conjoint analysis is often referred to as “trade-off analysis. the Student t-test may be more appropriate. many statistical tests can be conveniently performed as approximate Z-tests if the sample size is large or the population variance known. Models can be built that identify the ideal levels and combinations of attributes for products and services. . they should be measurable. they should be reachable. σ is the population standard null hypothesis: H0: μ = 5 alternative hypothesis: Ha: μ > 5 deviation. many test statistics are approximately normally distributed for large samples. There are four main rules for developing clusters: the clusters should be different. 1. and the clusters should be profitable (big enough to matter). It is both a compositional technique and a dependence technique. Z-TEST A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Therefore. and combinations of attributes at specific levels are summed to develop the overall preference for the attribute at each level.96 for 5% two tailed) which makes it more convenient than the Student's t-test which has separate critical values for each sample size. in that a level of preference for a combination of attributes and levels is developed. or utility. For each significance level. A part-worth. The formula for calculating z test: where is the sample mean. Because of the central limit theorem. If the population variance is unknown (and therefore has to be estimated from the sample itself) and the sample size is not large (n < 30). This is a great tool for market segmentation. and n is the size of the sample.

A two-sample t-test examines whether two samples are different and is commonly used when the variances of two normal distributions are unknown and when an experiment uses a small sample size. In one group he has participants sleep for eight hours and in the other group he has them sleep for four. He brings sixteen participants into his sleep lab and randomly assigns them to one of two groups.5? Interpretation: Rounding up. a sample of 48 students would be sufficient to determine students' mean age plus or minus one year.Example How many subjects will be needed to find the average age of students at Fisher College plus or minus a year. T-TEST A statistical examination of two population means. Note that the confidence interval width is always double the “plus or minus” figure. The next morning he administers the SCAT (Sam's . The formula for calculating t-test: M = mean n = number of scores per group x = individual scores M = mean n= number of scores in group Example Sam Sleep researcher hypothesizes that people who are allowed to sleep for only four hours will score significantly lower than people who are allowed to sleep for eight hours on a cognitive skills test. with a 95 percent significance level and a population standard deviation of 3.

(Scores on the SCAT range from 1-9 with high scores representing better performance).My)2 16 9 0 4 4 0 9 4 Sx=40 S(x-Mx)2=32 Sy=32 S(y-My)2=46 Mx=5 My=4 . SCAT scores 8 hours sleep group (X) 4 hours sleep group (Y) 57535339 81466412 X 5 7 5 3 5 3 3 9 (x-Mx)2 0 4 0 4 0 4 4 16 Y 8 1 4 6 6 4 1 2 (y .Cognitive Ability Test) to all participants.

and 30% would choose white. you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males. She now took a random sample of 150 customers and asked them their color preferences. which states that there is no significant difference between the expected and observed result. then you might want to know about the "goodness to fit" between the observed and expected. According to Thai. The five colors that he ordered were red.05. yellow. She felt 20% would choose yellow." . the expected frequencies or number of customers choosing each color should follow the percentages of last year. t must be at least 2. He did not find a significant difference between those who slept for four hours versus those who slept for eight hours on cognitive test performance. For example. green. The results of this poll are shown in Table 1 under the column labeled �observed frequencies. and white. 10% would choose blue. How much deviation can occur before you. The formula for calculating chi-square ( = Σ (o-e)2/e Example ): Thai. the manager of a car dealership. The chi-square test is always testing what scientists call the null hypothesis. did not want to stock cars that were bought less frequently because of their unpopular color.*(according to the t sig/probability table with df = 14. CHI-SQUARE TEST Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. blue. 30% would choose red. the investigator. so this difference is not statistically significant) Interpretation: Sam's hypothesis was not confirmed. or were they due to other factors.145 to reach p < . must conclude that something other than chance is at work. 10% would choose green. if. according to Mendel's laws. Were the deviations (differences between observed and expected) the result of chance. causing the observed to differ from the expected.

Category yellow red green blue white 2 O 35 50 30 10 25 E 30 45 15 15 45 (O .95) is equal to or greater than the table value (9. than you can probably say that any differences are due to chance alone.E)2 E 0.49.95 The table value for Chi Square in the correct box of 4 df and P=.83 0. reject the null hypothesis.49 ). We will set up a worksheet. then you will follow the directions to form the columns and solve the formula.89 X = 26. There IS a significant difference between the data sets that cannot be due to chance alone.Table 1 . If the calculated chi-square value for the set of data you are analyzing (26.E) 5 5 15 -5 -20 (O . If the number you calculate is LESS than the number you find on the table.67 8.E)2 25 25 225 25 400 (O .Color Preference for 150 Customers for Thai's Superior Car Dealership Category Color Yellow Red Green Blue White Observed Frequencies 35 50 30 10 25 2 Expected Frequencies 30 45 15 15 45 We are now ready to use our formula for X and find out if there is a significant difference between the observed and expected frequencies for the customers in choosing cars. .05 level of significance is 9.56 15 1.

the sum of squares. the rejection of the null hypothesis means that the differences between the expected frequencies (based upon last year's car sales) and the observed frequencies (based upon this year's poll taken by Thai) are not due to chance. It is used to test whether the means of a number of populations are equal. in deciding what color autos to stock. A table showing the source of variation.SSamong dfamong = r-1 dfwithin = N-r x = individual observation r = number of groups N = total number of observations (all groups) n = number of observations in group . That is. The formula for calculating ANOVA SSwithin = SStotal . there is a real difference between them.Interpretation: In this situation. they are not due to chance variation in the sample Thai took. Therefore. it would be to Thai's advantage to pay careful attention to the results of her poll! ANOVA The technique of analysis of variance is referred to as ANOVA. degrees of freedom. mean square (variance) and the formula for the F-ratio is known as ANOVA TABLE.

as opposed to an unpredictable sound or no sound at all. Those in group 1 study with background sound at a constant volume in the background. She randomly divides twenty-four students into three groups of eight. Their scores follow: Group 1) constant sound 2) random sound 3) no sound test scores 74686629 55344722 24712155 x1 7 4 6 8 6 6 2 9 Sx1 = 48 (Sx1)2 = 2304 M1 = 6 x1 2 49 16 36 64 36 36 4 81 Sx12 = 322 x2 5 5 3 4 4 7 2 2 Sx2 = 32 (Sx2)2 = 1024 M2 = 4 x22 25 25 9 16 16 49 4 4 Sx22 = 148 x3 2 4 7 1 2 1 5 5 Sx3 = 27 (Sx3)2 = 729 M3 = 3. Those in group 3 study with no sound at all. Those in group 2 study with noise that changes volume periodically.375 x3 2 4 16 49 1 4 1 25 25 Sx32 = 125 . All students study a passage of text for 30 minutes.One-Way ANOVA example Susan Sound predicts that students will learn most effectively with a constant background sound. After studying. all students take a 10 point multiple choice test over the material.

08 SSwithin = 117.= 595 .08 2 15. in that the constant music group has the highest score. so F score is statistically significant) Interpretation: Susan can conclude that her hypothesis may be supported. but she can't know which specific mean pairs significantly differ until she conducts a posthoc analysis .477.13 . However.88 21 4. The means are as she predicted.08 = 87.477.96 = 507.18 *(according to the F sig/probability table with df = (2.96 .4668 to reach p < .30.04 SStotal = 117.88 Source SS df MS F Among 30.04 3.59 Within 87.05.04 SSamong = 30.21) F must be at least 3. the signficant F only indicates that at least two means are signficantly different from one another.

ASSIGNMENT ON MULTIVARIATE ANALYSIS SUBMITTED BY P.Phil Management 26/11/2012 .J CERLIN PAJILA M.

Sign up to vote on this title