Professional Documents
Culture Documents
Basics
From „Descriptive statistics“
to Histogram
Correlations
à Quantify the relationship between two variables
Corr = 0 à no correlation
Corr > 0 à positive correlation
0,1 – weak; 0,3 – middle,
0,5 – strong
Corr < 0 à negative correlation
One-sample t-test
If Sig< 0,05 à
significant result
(reject H0, support
H1)
If Sig> 0,05 à
insignificant result
(we can’t reject H0)
Cluster Analysis
Overall objective: maximize variation between clusters, minimize variation within
clusters
Clustering Variables:
- significant differences between variables across the clusters
- relation between sample size and number of clustering variables (m) as well as
clusters (k) :
o Clusters of equal size: nmin= 10*m*k
o General recommendation nmin= 70*m
- Avoid using highly correlated variables (correlation 0,9 or higher)
- Data of strong quality
Agglomeration schudule
à Different clustering algorithm (e.g. single linkage vs. average linkage) would lead to
different agglomeration schedules
Step 3: Decide on the number of clusters
ß
shows the differences in variables
between clusters.
Helps to name clusters
Identical solution:
initial partitioning of
objects was retained
Factor Analysis
à As Set of methods to reduce complex data structures (typically induced by the large
number of variables) by identifying a smaller number of unifying variables called
factors that represent the original variables in the best possible way.
Corr = 0 à no
correlation
Corr > 0 à positive
correlation
0,1 – weak;
0,3 – middle,
0,5 – strong
Corr < 0 à negative
correlation
Anti-image Covariance:
Covariance that is
independent from other
variables
If > 25 % of absolute
values are greater than
0.09, the variables may be
inappropriate for factor
analysis
Measure of sampling adequacy: KMO- Criterion and Bartlett’s Test of Sphericity:
à Evaluation if the correlation matrix as a whole
KMO should be at
least 0,6 to continue
with the factor
analysis
à Assign each variable to a certain factor based in its maximum absolute factor loading
à Find an umbrella term for each factor that best describes the set of variables
associated with that factor
Total variance = sum of diagonal elements in the (original) correlation matrix (i.e.,
correlations of the variables with themselves): R = 1+1+1+1+1 = 5
Reproduced Variance= sum of diagonal elements in the reproduced correlation matrix
(i.e., communalities): Rrepr. = 0,821+0,818+0,796+0,902+0,912 = 4,25
A value > 50% should raise concerns. If less than 50% of residuals have absolute (- or +)
values greater that 0,05, we can presume a good model fit!
Cross Tabulations
à Verification of the hypothesis about the existence of any correlation between two
nominal scaled variables
Information about the shared frequency distribution of two variables and the absolute,
relative and expected frequency
The stronger ^hij and hij differ,
the stronger is the suspected
dependency between X and Y
Likelihood Ratio based on the maximum likelihood method and delivers at large
sample sizes the same result as the Pearson Chi-squared test
Correlations measures
One-way ANOVA
à Examine mean differences between more than two groups. (One metric dependent
variable)
Step 1: Check the assumptions
a) The dependent variable should be normally distributed
If variances are not homogeneous à continue with Welch (Robust Test of Equality
Means à assumes homogeneity!)
à Supporting H1: There are significant differences in the means of the groups
Two-way ANOVA
à Examine mean differences between more than two groups. (Two metric dependent
variables)
… What is the impact of the 2 factors?
… Is there any interdependency between factor 1 and 2?
Small
effect
Partial Eta Squared: Effect size (Rule of thumb: Take the squareroot and interprent like
normal correlations)
Observed Power: Probability of not making a Type-II-Error (à the smaller the effect
size, the smaller the power)
If Sig. < 0,05 à Significant effect/influence on the dependent variable (no significance
in this example à Sig.= 0,967 > 0,05 à no significant influence)
Requirements:
à Analysis of pairwise
correlation
Test of model assumptions:
a) Linear model
à Diagnosis: visual inspection
a: linear
b: log (x) transformation
c: x2 transformation
b) No systematic errors
achieved by using generalized least squares (GLS) or weighted least squares (WLS)
regression models
d) No Autocorrelation
e) Normal distribution