Multivariate Data Analysis A typical solution

Checking research design of a multiple regression: The data included 80 responses from the customer base and 13 independent variables. The data can be used to detect relationships with R-square of approximately 29% at a power of 0.80 with significance level set at 0.01. 2. Checking Assumptions of Model: a. Linearity: Scatter plots did not indicate any non linear relationship between dependent and independent variables. b. Homoscedasticity: Tests for heteroscedasticity was done using comparison of variances of metric variables (x6 to x18) across levels of non metric variables (x1 to x5). This test is called Levene Test. At significance level of 0.05, only x1 showed patterns of heteroscedasticity. c. Normality: Both empirical measures and normal probability plots were used to evaluate the normality of the metric variables. KS and Shapiro-Wilk tests indicated that six variables (x6, X7, X12, X13,X16, X17) violate the normality assumption. Initial regressions were conducted without any transformation. To check if normality violation was affecting the regression model, transformation was applied and another regression was carried out. 3. Estimating the Regression Model and Assessing the Overall Model Fit Stepwise regression procedure was employed to select variables for inclusion in the regression model. The step wise model selects variable based on bivariate correlations. Table below provides the summary of results obtained from stepwise regression. Model Summary Model R R Square Adjusted R Square Std Error R Square Change 0.29 0.21 0.27 0.02 0.01 Change Statistics F Change 32.04 32.22 86.86 5.89 4.21 df1 df2 Sig. F Change 0.00 0.00 0.00 0.02 0.04 DurbinWatson

1 0.54 0.29 0.28 1.01 1.00 78.00 2 0.71 0.50 0.49 0.85 1.00 77.00 2.40 3 0.88 0.77 0.76 0.59 1.00 76.00 4 0.89 0.78 0.77 0.57 1.00 75.00 5 0.89 0.80 0.78 0.56 1.00 74.00 a. Predictors: (Constant), X9 - Complaint Resolution b. Predictors: (Constant), X9 - Complaint Resolution, X12 - Salesforce Image c. Predictors: (Constant), X9 - Complaint Resolution, X12 - Salesforce Image, X6 - Product Quality d. Predictors: (Constant), X9 - Complaint Resolution, X12 - Salesforce Image, X6 - Product Quality, X7 - E-Commerce Activities e. Predictors: (Constant), X9 - Complaint Resolution, X12 - Salesforce Image, X6 - Product Quality, X7 - E-Commerce Activities, X18 - Delivery Speed f. Dependent Variable: X19 Satisfaction Table 1: Overall Model Fit Results of Stepwise Regression As can be seen from the table above the first 3 variables are explaining 77% of the variance in the dependent variable. The addition of 2 more variables increased the R-square by only 3%. Overall Model Fit: The first variable to be entered was X9 (Complaint Resolution) as it had the highest bivariate correlation with the dependent variable (0.54). The other variables to be added subsequently were X12, X6, X7 and X18. The final regression model with 5 independent variables explains almost 80 percent of the variance of X19

(Customer Satisfaction). The adjusted R-square of 78% indicates no overfitting of the models has taken place and the results are generalizable. The table below provides the regression coefficients and their significance. Coefficients Model Unstandardized Coefficients B 1 2 (Constant) X9 - Complaint Resolution (Constant) X9 - Complaint Resolution X12 - Salesforce Image 3 (Constant) X9 - Complaint Resolution X12 - Salesforce Image X6 - Product Quality 4 (Constant) X9 - Complaint Resolution X12 - Salesforce Image X6 - Product Quality X7 - E-Commerce Activities 5 (Constant) X9 - Complaint Resolution X12 - Salesforce Image X6 - Product Quality X7 - E-Commerce Activities X18 - Delivery Speed 3.80 0.56 1.56 0.49 0.52 1.86 0.41 0.60 0.44 1.48 0.42 0.79 0.44 0.37 1.88 0.23 0.77 0.45 0.36 0.36 Std. Error 0.57 0.10 0.62 0.09 0.09 0.56 0.06 0.06 0.05 0.57 0.06 0.10 0.05 0.15 0.59 0.11 0.10 0.05 0.15 0.18 0.22 0.69 0.54 -0.21 0.21 0.40 0.71 0.52 -0.21 0.39 0.54 0.53 0.47 0.46 0.54 Standardized Coefficients Beta 6.72 5.66 2.51 5.73 5.68 3.31 6.96 9.49 9.32 2.61 7.28 7.99 9.56 2.43 3.20 2.10 7.89 9.96 2.42 2.05 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.02 0.00 0.04 0.00 0.00 0.02 0.04 0.54 0.54 0.51 0.35 0.52 0.24 0.68 0.76 -0.27 0.23 0.11 0.41 0.52 0.13 0.11 0.26 0.37 0.94 0.37 0.26 3.90 2.73 1.06 2.69 3.91 0.54 0.54 0.51 0.35 0.64 0.68 0.74 -0.27 0.39 0.43 0.51 0.13 0.95 0.37 0.97 0.37 1.05 2.70 1.04 2.68 0.54 0.54 0.51 0.62 0.74 0.73 0.39 0.53 0.52 0.96 0.96 0.97 1.05 1.05 1.04 0.54 0.54 0.55 0.54 0.46 0.46 0.98 0.98 1.03 1.03 0.54 0.54 0.54 1.00 1.00 t Sig. Correlations Zeroorder Partial Part Collinearity Statistics Tolerance VIF

a. Dependent Variable: X19 Satisfaction Table 2: Regression Coefficients with their significance and correlation statistics Estimated Coefficients: All the five regression coefficients are significant at .05 level but X9, X17 and X18 are not significant at .01 level. Multicollinearity: Multicollinearity is significant for the model. Of the five variables used by the regression model, four have tolerance values less than 0.5 indicating that over half of their variance is accounted for by the other variables in the equation. For example Complaint resolution which has the highest bivariate correlation of 0.540 among all the variables has a very low part correlation of 0.110 may due to the effect of multicollinearity. It has the highest VIF among all the other variables used in the regression model. It may due to the fact that complaint resolution is related to several other variables like Technical Support (X8), Product Quality (X6) and Sales Force

Image (X12). Only product quality has no or insignificant multicollinearity.Examining the partial correlations of variables not included in the model, it can be seen that none of the remaining variables have a significant partial correlation at 0.05 level needed for entry. 4. Evaluating the Variate for the Assumptions of Regression Analysis The measure used to evaluate the regression variate is the residual. The assumptions that have been checked are: Linearity: The scatter plot of studentized residual and standardized predicted value (provided in the chart below) indicates no non linear pattern to the residuals, thus ensuring that the overall equation is linear.Partial regression plots were obtained for each independent variable in the equation. This was done to check if each independent variables relationship is also linear. The scatter plots show that the relationships for X6, X7 and X12 are reasonably well defined where as X9 and X18 are relatively less well defined. Homoscedasticity: Analysis of the residual plot does not indicate any heteroscedasticity patterns. Independence of Residuals: Assuming that the identification number represents the order in which the responses were collected, residual were plotted with respect to the IDs to determine if any relationship is present. No consistent pattern can be observed in the plot indicating that the residuals are independent.

Figure 1: Studentized residual plot and p-p plot for the dependent variable Normality: The normality of error term of the variate was checked using visual inspection of the normal probability plots of the residual (given in the chart above). As seen in the probability plot, most of the values fall along the diagonal with no systematic departures indicating that the residuals represent a normal distribution. Identification of Outliers: One of the ways to identify the outliers is to examine the studentized residuals. The observations which are greater than the upper limit of 95% confidence interval (i.e. 1.96) are considered to be outliers. As can be seen from the studentized residual plot below, ID 10, 20, 25 and 71 have significant residuals and can be classified as outliers.

Figure 2: Studentised residual plot used to identify outliers 5. Interpreting Regression Variate The model obtained from step wise regression can be written as X19 (Customer Satisfaction) = -1.88 + 0.23 * X9 (Complaint Resolution) + 0.77 * X12 (Sales force Image) + 0.45* X6 (Product Quality) 0.36 * X7 (E-commerce Activities) + 0.36 * X18 (Delivery Speed) In this model, all the coefficients including constant are significant. Constant can be interpreted as the average customer satisfaction when all the current variables have values zero and other variables are considered which have not been included in the current model. Variable Importance: All variables except X7 (E-commerce activities) have positive relationship with customer satisfaction indicating that perception that increase e-commerce activities will negatively affect customer satisfaction. This is puzzling because bivariate correlation between X17 and X19 is positive (0.35). In order to make direct comparison of variables, their standardized Beta coefficients were compared. It was found that Salesforce Image (X12) was the most important variable followed by Product Quality(X6), Complaint Resolution (X9), Delivery Speed (X9) and E-commerce activities (X7). Multicollinearity: Tolerance values range from 0.94 to 0.26, indicating wide range of multicollinearity effects. VIF also ranges from 1.06 to 3.91 but below the VIF cutoff of 4 which would have indicated serious multicollinearity problem. As the presence of multicollinearity would impact the interpretation of coefficients, it can be a cause of negative sign of E-commerce activities. 6. Validating the Results The validity of the results was determined by the two methods described below: Examining adjusted R-Square: There is little difference between adjusted R-square (0.78) and R-square (0.80) which indicates that overfitting hasnt taken place by addition of variables. Split Sample Validation The sample set of 80 variables was split into two sub samples and step wise regression was applied on both samples separately. The comparison of overall fit (given in the table below) demonstrates high level of similarity in terms of R-square and adjusted R-square. But differences appear when individual coefficients are compared. X9 and X12 fail to enter the model for sub sample 2 where as X18 has been dropped in the regression model for sub sample 1. Also both the models are different to the model that was obtained for the entire set. (Models provided in Appendix).

Overall Model Fit Parameters Multiple R Coefficient of Determination(R- Square) Adjusted R-Square Sample 1 0.905 0.82 0.799 Sample 2 0.871 0.759 0.739

Std Error 0.5518 0.5715 Table 3: Overall model fit for split sample validation 7. Assessing the Impact of Non-Normality of Independent Variables As noted in section 2, some of the independent variables did not follow the normality condition. They were transformed according to the shape characteristics they displayed. The table below provides details about the transformations used. Variable X6 X7 X12 X13 X16 Transformation Required Squared Term Logarithm Not Required Cubed Term Squared Term

X17 Inverse Table 4: Transformations used to get over non-normality Transformation did not improve overall model fit, the adjusted R-square obtained was around 0.78 which is similar to what was obtained in the earlier model. Constant and X9 coefficient were significant in the new model. The coefficient for X7 was still negative in the new model. Overall transformation did not have any significant effect on the model, so it was decided to continue with the model obtained without any transformation. 8. Evaluating Alternative Regression Models Including all the variables: Confirmatory regression was carried out by including all the 13 perceptual measures as independent variables. Comparison of step wise regression model and full regression model was carried out, results of which is given below: 1. Overall Model Fit: The overall model fit decreases when all the variables are included. Even though the multiple R and R-square increased slightly, the adjusted R-square decreased for the full model. Also, standard error increased in the full model. Overall Model Fit Parameters Multiple R Coefficient of Determination(R- Square) Adjusted R-Square Std Error Full 0.901 0.811 0.774 0.5675 Stepwise 0.892 0.795 0.782 0.558

2. Variate Interpretation: Only 3 variables (X6, X7 and X12) were found to be significant. In step wise model, X11 was the least significant variable which was rendered insignificant in the full model due to multicollinearity. Multicollinearity was found to be very significant for atleast 4 variables (VIF>4) which would render the model useless. X11, X17 and X18 have VIFs greater than 50 with tolerance values less than 0.05. Including Dummy Variables: The model obtained through stepwise regression can be further improved by including non metric variables. Five non metric variables are available for inclusion in the model. Stepwise regression was carried out using the dummy variable for Firm Size. The overall model fit results are provided in the table below and the full model has been given in the Appendix. 1. Overall model fit was better than the previous model obtained. The adjusted R-square was 0.811 which is higher than then what was obtained in step wise regression, also the standard error was found to be lower in the new model. Overall Model Fit Parameters Multiple R Coefficient of Determination(R- Square) Adjusted R-Square Values 0.908 0.825 0.811

Std Error 0.5191 Table 5: Overall Model Fit when dummy variable was included 2. All the coefficients were found to be significant. The coefficient for firm size was found to be positive which indicates that the large firms have higher customer satisfaction. The other variables included in the model are Complaint Resolution (X9), Salesforce Image (X12), Product Quality (X76), Delivery Speed (X18) and Competitive Pricing (X13). 9. Managerial Overview The regression models achieve very high levels of predictive accuracy as the amount of variance explained is about 80%. The variables which have significant impact on customer satisfaction were Salesforce Image (X9) and Product Quality (X6). Increases in any of these variables will increase customer satisfaction. The other three variables (Delivery Speed X18, Ecommerce Activities X7 and Complaint Resolution-X9) also have an impact on customer satisfaction but it is much lesser. X7 has a reversed sign which represents a result which is corollary to what is expected.

APPENDIX 1.Levenes Test Results X1:Customer Type Leven e Statis Significa tic nce 16.29 3.43 0.00 0.04 X2: Industry Type Leven e Statis Significa tic nce 0.00 0.52 0.95 0.47

Metric Variable X6 - Product Quality X7 - ECommerce Activities X8 Technical Support X9 Complaint Resolution X10 Advertising X11 Product Line X12 Salesforce Image X13 Competitive Pricing X14 Warranty & Claims X15 - New Products X16 - Order & Billing X17 - Price Flexibility X18 Delivery Speed

X3: Firm Size Leven e Statis Significa tic nce 0.05 0.25 0.82 0.62

X4: Region Leven e Statis Significa tic nce 12.94 0.00 0.00 0.98

X4: Region Leven e Statis Significa tic nce 0.01 0.42 0.94 0.52

0.01

0.99

0.02

0.90

0.50

0.48

0.23

0.63

0.03

0.86

6.32

0.00

0.54

0.46

0.01

0.92

0.33

0.57

0.05

0.82

0.38

0.68

8.46

0.00

3.21

0.08

0.42

0.52

1.24

0.27

2.K-S and Shapiro-Wilk Test Results Tests of Normality KolmogorovSmirnova Statistic Sig. 0.1 0.02 0.1 0.01 0.1 0.20 0.1 0.20 0.1 0.20 0.1 0.20 0.1 0.02 0.1 0.00 0.1 0.20 0.1 0.20 0.1 0.02 0.1 0.05 0.1 0.07 0.1 0.20 Shapiro-Wilk Statistic 0.9 1.0 1.0 1.0 1.0 1.0 1.0 0.9 1.0 1.0 1.0 1.0 1.0 1.0 Sig. 0.00 0.02 0.23 0.30 0.12 0.23 0.32 0.00 0.65 0.48 0.26 0.04 0.35 0.11

X6 - Product Quality X7 - E-Commerce Activities X8 - Technical Support X9 - Complaint Resolution X10 - Advertising X11 - Product Line X12 - Salesforce Image X13 - Competitive Pricing X14 - Warranty & Claims X15 - New Products X16 - Order & Billing X17 - Price Flexibility X18 - Delivery Speed X19 - Satisfaction a. Lilliefors Significance Correction *. This is a lower bound of the true significance.

3. Collinearity Variables X19 - Satisfaction X6 - Product Quality X7 - E-Commerce Activities X8 - Technical Support X9 - Complaint Resolution X10 - Advertising X11 - Product Line X12 - Salesforce Image X13 - Competitive Pricing X14 - Warranty & Claims X15 - New Products X16 - Order & Billing X17 - Price Flexibility X19 1.00 0.51 0.35 0.02 0.54 0.30 0.47 0.54 0.12 0.08 0.07 0.44 0.08 1.00 0.10 0.07 0.12 0.07 0.46 0.12 0.41 0.09 0.07 0.10 1.00 0.00 0.15 0.46 0.04 0.79 0.24 0.04 0.02 0.11 0.25 1.00 0.04 0.08 0.14 0.07 0.24 0.77 0.10 0.05 1.00 0.12 0.55 0.16 0.02 0.04 0.06 0.71 0.39 1.00 0.06 0.54 0.16 0.01 0.05 0.10 0.29 1.00 0.08 0.46 0.26 0.05 0.34 1.00 0.33 0.03 0.03 0.11 0.31 1.00 0.26 0.04 0.02 0.54 1.00 0.04 0.13 1.00 0.10 0.10 1.00 0.43 1.00 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17

0.23 0.08

4. Stepwise Regression Results Coefficients Model Unstandardized Coefficients B 1 2 (Constant) X9 - Complaint Resolution (Constant) X9 - Complaint Resolution X12 - Salesforce Image 3 (Constant) X9 - Complaint Resolution X12 - Salesforce Image X6 - Product Quality 4 (Constant) X9 - Complaint Resolution X12 - Salesforce Image X6 - Product Quality X7 - E-Commerce Activities 5 (Constant) X9 - Complaint Resolution X12 - Salesforce Image X6 - Product Quality X7 - E-Commerce Activities X18 - Delivery Speed 3.80 0.56 1.56 0.49 0.52 1.86 0.41 0.60 0.44 1.48 0.42 0.79 0.44 0.37 1.88 0.23 0.77 0.45 0.36 0.36 Std. Error 0.57 0.10 0.62 0.09 0.09 0.56 0.06 0.06 0.05 0.57 0.06 0.10 0.05 0.15 0.59 0.11 0.10 0.05 0.15 0.18 0.22 0.69 0.54 -0.21 0.21 0.40 0.71 0.52 -0.21 0.39 0.54 0.53 0.47 0.46 0.54 Standardized Coefficients Beta 6.72 5.66 2.51 5.73 5.68 3.31 6.96 9.49 9.32 2.61 7.28 7.99 9.56 2.43 3.20 2.10 7.89 9.96 2.42 2.05 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.02 0.00 0.04 0.00 0.00 0.02 0.04 0.54 0.54 0.51 0.35 0.52 0.24 0.68 0.76 -0.27 0.23 0.11 0.41 0.52 0.13 0.11 0.26 0.37 0.94 0.37 0.26 3.90 2.73 1.06 2.69 3.91 0.54 0.54 0.51 0.35 0.64 0.68 0.74 -0.27 0.39 0.43 0.51 0.13 0.95 0.37 0.97 0.37 1.05 2.70 1.04 2.68 0.54 0.54 0.51 0.62 0.74 0.73 0.39 0.53 0.52 0.96 0.96 0.97 1.05 1.05 1.04 0.54 0.54 0.55 0.54 0.46 0.46 0.98 0.98 1.03 1.03 0.54 0.54 0.54 1.00 1.00 t Sig. Correlations Zeroorder Partial Part Collinearity Statistics Tolerance VIF

6. Checking Heteroscedasticity

7. Validation: Split Sample Check Sample 1 Variables Entered Regression Coefficients Std. B Beta Error -1.91 0.53 0.88 0.39 -0.54 0.75 0.07 0.15 0.08 0.22 0.59 0.77 0.37 0.32 Statistical Significance t 2.56 7.88 5.85 4.95 2.48 Sig 0.02 0.00 0.00 0.00 0.02 0.67 0.13 0.43 5.03 0.00 Sample 2 Regression Coefficients Std. B Beta Error -1.90 0.41 0.60 0.88 0.06 0.09 0.54 0.57 Statistical Significance t 2.17 6.49 6.73 Sig 0.04 0.00 0.00

(Constant) X6 - Product Quality X12 - Salesforce Image X9 - Complaint Resolution X7 - E-Commerce Activities X18 - Delivery Speed

8. Stepwise Regression Results for Full Model Coefficients Unstandardized Coefficients Model 1.58 0.41 0.40 Std. Error (Constant) X6 - Product Quality X7 - E-Commerce Activities 1.27 0.06 0.15 0.49 -0.23 Std. t Beta -1.24 7.10 -2.60 0.22 0.00 0.01 0.51 0.35 0.66 -0.31 0.38 -0.14 0.61 0.36 1.63 2.78 Sig. Zeroorder Correlations Partial Part Collinearity Statistics Tolerance VIF

X8 - Technical Support X9 - Complaint Resolution X10 - Advertising X11 - Product Line X12 - Salesforce Image X13 - Competitive Pricing X14 - Warranty & Claims X15 - New Products X16 - Order & Billing X17 - Price Flexibility X18 - Delivery Speed

0.06 0.16 0.01 0.33 0.85 0.08 0.16 0.01 0.13 0.29 0.24

0.07 0.12 0.07 0.36 0.11 0.06 0.14 0.04 0.12 0.37 0.71

0.07 0.15 -0.01 0.35 0.76 -0.11 -0.11 -0.01 0.09 0.30 -0.14

0.82 1.31 -0.11 0.91 7.66 -1.53 -1.13 -0.17 1.09 0.79 -0.35

0.42 0.19 0.91 0.37 0.00 0.13 0.26 0.86 0.28 0.43 0.73

-0.02 0.54 0.30 0.47 0.54 -0.12 0.08 0.07 0.44 0.08 0.52

0.10 0.16 -0.01 0.11 0.69 -0.18 -0.14 -0.02 0.13 0.10 -0.04

0.04 0.07 -0.01 0.05 0.41 -0.08 -0.06 -0.01 0.06 0.04 -0.02

0.35 0.22 0.67 0.02 0.29 0.53 0.33 0.91 0.40 0.02 0.02

2.88 4.57 1.50 52.55 3.44 1.89 3.05 1.10 2.53 49.36 60.49

9. Stepwise Regression Results when Dummy Variable Included Coefficients Model Unstandardized Coefficients Variables B 1.32 0.22 0.55 0.43 0.54 0.13 0.37 Std. Error 0.64 Standardized Coefficients Beta -2.06 0.04 t Sig. Correlations Zeroorder Partial Part Collinearity Statistics Tolerance VIF

(Constant) X9 - Complaint Resolution X12 - Salesforce Image X6 - Product Quality X3 - Firm Size X13 - Competitive Pricing X18 - Delivery Speed

