You are on page 1of 14

SPSS-Output-Interpretation

Basics
From „Descriptive statistics“
to Histogram

Correlations
à Quantify the relationship between two variables

Corr = 0 à no correlation
Corr > 0 à positive correlation
0,1 – weak; 0,3 – middle,
0,5 – strong
Corr < 0 à negative correlation

One-sample t-test

à Determine whether the mean of a population, represented by a sample, significantly


differs from a specific value
Example: Check the following statement: “The average German bitumen-producing
company has more that 60 employees”
Sig=
0,021<0,05
à Reject H0
à “The
avergage
German…. has
more that 60
employees”
Independent samples t-test
à Determine whether two populations, represented by samples, are significantly
different in terms of means
Example: Determine whether Shell-owned companies generate higher turnover than
other companies

If Sig< 0,05 à
significant result
(reject H0, support
H1)
If Sig> 0,05 à
insignificant result
(we can’t reject H0)

Cluster Analysis
Overall objective: maximize variation between clusters, minimize variation within
clusters

Clustering Variables:
- significant differences between variables across the clusters
- relation between sample size and number of clustering variables (m) as well as
clusters (k) :
o Clusters of equal size: nmin= 10*m*k
o General recommendation nmin= 70*m
- Avoid using highly correlated variables (correlation 0,9 or higher)
- Data of strong quality

Step 1: Decide on clustering variables


Correlations

Step 2: Measure of (dis-)similarity & clustering algorithm


Algorithm: single linkage (nearest neighbour), complete linkage (furthest neighbour),
average linkage (between-groups)
Distance measure: Euclidean distance (mostly used, recommended), …
à Data must be standardized (range -1 to 1)

Proximity matrix: shows the Euclidean distances between variables


à look for the smallest distance (e.g. Peugot 207 & Fiat Grande Punto) and merge
(cluster)

Agglomeration schudule

à Different clustering algorithm (e.g. single linkage vs. average linkage) would lead to
different agglomeration schedules
Step 3: Decide on the number of clusters

à Dendogram and Icicle Graph show the different clustering possibilities

Cluster Membership: Shows the cluster allocation of a single solution or a range of


solutions

à deciding (subjective) how much clusters are


appropriate

ß
shows the differences in variables
between clusters.
Helps to name clusters

Step 4: Validate the cluster solution (k-means analysis)

à Reassignment of the objects until the overall within-cluster variation is minimized

Identical solution:
initial partitioning of
objects was retained
Factor Analysis
à As Set of methods to reduce complex data structures (typically induced by the large
number of variables) by identifying a smaller number of unifying variables called
factors that represent the original variables in the best possible way.

Step 1: Check assumptions

- Scale level: Variables should be measured on an interval or ratio scale level


- Sample size: Minimum number of (valid) observations ≥ 10* number of
items/variables
- Dependence of observations: Observations need to be independent (i.e., only one
observations from each individual, company, coutry…)
- Correlation between items/variables: Variables must be sufficiently correlated

Corr = 0 à no
correlation
Corr > 0 à positive
correlation
0,1 – weak;
0,3 – middle,
0,5 – strong
Corr < 0 à negative
correlation

Anti-image Covariance:
Covariance that is
independent from other
variables
If > 25 % of absolute
values are greater than
0.09, the variables may be
inappropriate for factor
analysis
Measure of sampling adequacy: KMO- Criterion and Bartlett’s Test of Sphericity:
à Evaluation if the correlation matrix as a whole

KMO should be at
least 0,6 to continue
with the factor
analysis

H0: the variables of the population are uncorrelated


p-value < a (0,05) à H0 can be rejected
à Data appropriate for factor analysis

Step 2: Extract the factors


à Transforming the original variables into new, uncorrelated variables

Step 3: Determine the number of factors


à Standard approach: Kaiser criterion à Eigenvalues > 1

à Alternative approach: Scree plot

Always extract one factor less than the number


indicated by the distinct break (“elbow”)
Step 4: Interpret the factor solution

à Assign each variable to a certain factor based in its maximum absolute factor loading
à Find an umbrella term for each factor that best describes the set of variables
associated with that factor

Some variables might be highly


correlated with more than one
factor à better use rotated
factor solution

à Never use the unrotated solution when interpreting factors!


à unrotated is only used to determine the
number of extracted Eigenvalues! Faktor 1 (e.g., Satisfaction with the hotel room)
Faktor 2 (e.g., Satisfaction with the
service/personnel)

How much of the total variance is explained by the factor solution?

Example: “Furnishing of the hotel room”= 0,8322 + (-0,353)2 = 0,818

Total variance = sum of diagonal elements in the (original) correlation matrix (i.e.,
correlations of the variables with themselves): R = 1+1+1+1+1 = 5
Reproduced Variance= sum of diagonal elements in the reproduced correlation matrix
(i.e., communalities): Rrepr. = 0,821+0,818+0,796+0,902+0,912 = 4,25

à Proportion of total variance explained by the factors = Rrepr./R = 4,25/5 = 85%


Step 5: Evaluate the goodness-of-fit of the factor solution
à Residuals as measure of the goodness-of-fit:
Differences between the
observed correlations and
the reproduced
correlations can be
examined to determine
model fit

A value > 50% should raise concerns. If less than 50% of residuals have absolute (- or +)
values greater that 0,05, we can presume a good model fit!

à Communalities as measure of the goodness-of-fit:


- Amount of variance a variable shares with all the other variables being considered.
(Equals the proportion of variances explained by the factors extracted)

81,8% of the variable’s variance is reproduced by


extracting two factors

Problem: If factor solution accounts fir less that 30 % of a


variable’s variance (i.e., communalities of less than 0,3),
reconsider set-up!

Cross Tabulations
à Verification of the hypothesis about the existence of any correlation between two
nominal scaled variables

Information about the shared frequency distribution of two variables and the absolute,
relative and expected frequency
The stronger ^hij and hij differ,
the stronger is the suspected
dependency between X and Y

Pearsons Chi-squared test


à Testing whether the variance of one variable differs from the populations variance

X2-Value of 79,277; p-value (,000) < a (=0,05)


à H0 (there is no correlation between the
variables) can be rejected. Conclusion about
causality is not possible

Likelihood Ratio based on the maximum likelihood method and delivers at large
sample sizes the same result as the Pearson Chi-squared test

Correlations measures

Higher Phi à stronger correlation (betw. 0-1);


Values > 0,3 indicating correlation
Cramer’s V à betw. 0-1 (0 = no correlation, 1=
perfect correlation
Contingency Coeff. à Estimation of the strengths
of correlations of > two variable characteristics

Analysis of Variance (ANOVA)


à Analysing the effect of one or multiple independent nominal criteria (single or
combined) on one dependent metrical object.
Test if means of 2 or more populations differ (2-sample t-test not applicable)

One-way ANOVA
à Examine mean differences between more than two groups. (One metric dependent
variable)
Step 1: Check the assumptions
a) The dependent variable should be normally distributed

H0: The variables are normal distributed


(à does not need to be rejected!)
If n < 50 à Shapiro-Wilk
If n > 50 à Kolmogorov-Smirnov
(if there is Significance, assume
Normality, because n > 30!)
No significance à H0 is not rejected à Normal distribution
( value > 0,05)

b) Homogeneity of variances: variances in the different groups of the design are


identical
à Levene-Test: proofs the Null-Hypothesis that the variances in the different groups are
identical (à should not be rejected!)

0,409 > 0,05 à no significance à variances are


homogeneous à Continue with ANOVA-Output

If variances are not homogeneous à continue with Welch (Robust Test of Equality
Means à assumes homogeneity!)

c) Independence of all observations


à not with repeated measures (mostly given)

Step 2: Calculate the test statistic: Output


Shows the means, minimum and
maximum as well as standard
deviation of the groups

Given the variances are


homogeneous, the significance (p-
value < 0,05) leads to the rejection
of H0: (There is no significant mean
differences between the groups)

à Supporting H1: There are significant differences in the means of the groups

Given the variances are inhomogeneous, the significance (p-


value < 0,05) leads to the rejection of H0: (There is no significant
mean differences between the groups

à Not applicable in this example since the homogeneity is given!


Step 3: Carry out Post-Hoc-Tests: Output

à showing information about the significant differences

Gives information about the


differences between the variables
à if p-value (Sig.) < 0,05 à
Significant difference of the
dependent variable depending the
respective independent variable

Example: A significant difference in sales (dependent var.) depending on the form of


promotional tool (independent variable).

The Homogenous Subsets output is


produced by a request for post hoc
tests and addresses the same questions
as the Multiple Comparisons table for
post hoc analysis, i.e. which pairs of
groups have significantly different
means on the dependent variable.

Shows the effects/influence of


(combined) subjects on the
dependent variable (e.g. the
influence of promotion and storesize
on sales)

à Not significance à no sig.


influence

If Sig. < 0,05 à Significant effect/influence on the dependent variable

Two-way ANOVA

à Examine mean differences between more than two groups. (Two metric dependent
variables)
… What is the impact of the 2 factors?
… Is there any interdependency between factor 1 and 2?
Small
effect

Partial Eta Squared: Effect size (Rule of thumb: Take the squareroot and interprent like
normal correlations)

Observed Power: Probability of not making a Type-II-Error (à the smaller the effect
size, the smaller the power)

If Sig. < 0,05 à Significant effect/influence on the dependent variable (no significance
in this example à Sig.= 0,967 > 0,05 à no significant influence)

Ordinary Least Squares (OLS) – Regression analysis


à Analysis of the interrelation between one dependent variable Y (measures on a
metric scale) and one or more independent variables X1,…, Xm
à Indicate if independent variables have a significant relationship with a dependent
variable
à make predictions (develop scenarios)

Examples: - Prediction of the sales changes by increasing/decreasing of the marketing


costs

Requirements:

a) Large Sample size ( n ≥ 50+8*k to test the overall relationship (k = number of


independent variables); n ≥ 104 + k to test for individual parameter effects))
b) Sample is representative of overall population
c) Variables show variation (they are not constant)
d) Dependent variable is interval or ratio scaled (if this is not met à logistic
regression)
e) No Linear dependence between independent variables (no multicollinearity)

Example: Strong correlation


between Customer expectations
and perceived value

à Analysis of pairwise
correlation
Test of model assumptions:

a) Linear model
à Diagnosis: visual inspection

a: linear
b: log (x) transformation
c: x2 transformation

b) No systematic errors

OLS will always satisfy this assumption

c) Homoscedasticity (constant variance) of error


terms

achieved by using generalized least squares (GLS) or weighted least squares (WLS)
regression models

d) No Autocorrelation

Autocorrelation: Positive or negative correlation of error terms over time


May occur if several observations are collected from a single respondent at different
points in time
Diagnosis: Durbin-Watson test (Page 328 Exercise)

e) Normal distribution

H0: The variables are normal distributed (à


does not need to be rejected!)
If n < 50 à Shapiro-Wilk
If n > 50 à Kolmogorov-Smirnov
(if there is Significance, assume Normality,
because n > 30!)
No significance à H0 is not rejected à Normal distribution
( p-value > 0,05)

Output – model interpretation

Significance of the regression model (F-test):

F-value is significant à The overall


model is significantly appropriate
Goodness of fit (R2)

R2 (between 0 – 1) shows how much of the variance


in the dependent variable can be explained by the
independent ones à The higher the R squared, the
higher the model-fit (e.g., R1 = 1 à the model
perfectly estimates the observed values )

If the number of independent variables is very high, use Adjusted R2

Significance of the regression coefficients (t-tests):

Constant (y-intercept) and regression coefficients of the different independent variables


A significant constant (a) means that the y-intercept is not zero (y=a + ßxn)
A significant regression coefficients (ßxn) means a significant effect on the
dependent variable (y)

Powered by TCPDF (www.tcpdf.org)

You might also like