Professional Documents
Culture Documents
Week 2:
Descriptive Statistics:
Mean – central number of a finite set of numbers
Calculation: mean = sum of all the values of the observations divided by the number of
observations
Mode – value that appears most often in a dataset
Median – middle value of a dataset separating the greater half of the observations from the
lower half of the observations
Variance – expectation of the squared deviation of a random variable from its mean value
High variance indicates a lot of variation
Low variance indicates not much variation
Standard deviation – measure of the amount of variation in a sample
Low SD indicates that values are close to the mean
High SD indicates that values are spread out of a wider range
Experimental design:
- One or more independent variables are manipulated (the treatment) while all other
influences remain constant
- Given that it is not possible that all other influences remain constant, control
variables can be added
- Data is collected on the outcome variable
- The effect of the treatment/manipulation on the outcome variable is investigated
Analysis of Variance:
Week 3:
Regression Analysis:
Y = a + b*x + e
Y – dependent variable
X – independent variable
a – constant (intersection with Y axis)
b – regression coefficient (slope of the regression line)
e – error (badness of fit: distance between observations and estimated regression line)
R-squared – to what extent does x explain y (from 0% to 100%)
Data:
- Outcome/dependent variable needs to be continuous
- Independent variables can be nominal ordinal or interval/ratio scaled
- For independent variables that are not continuous we need to rely on dummy
variables (values of 0 and 1)
Objectives:
- Uncovering causality between one dependent variable and one or more independent
variables
Assumption 4: Multicollinearity
- If two or more independent variables are highly correlated with each other, they may
explain the same variance in the dependent variable so that coefficients, standard
errors, and significance tests might be biased
- Test whether multicollinearity is an issue (correlations between independent
variables, variance inflation factors)
Interpreting regression results:
1. Look at R-squared: how well does model explain data
2. Is R-squared significant: not 0 (F-statistic in anova table)
3. Look at coefficients: are they significant
4. Are coefficients positive or negative
- To find out which independent variable has the strongest impact on satisfaction, we
need to compare standardized coefficients
- Whichever independent variable has the highest value has the greatest effect and
should be the first priority when providing recommendations
Importance-Performance Analysis:
Week 4:
Moderation Analysis:
- A moderator variable explains differences in the effect of an independent variable on
a dependent variable
- A moderator can be a continuous or a categorical variable (dummy variable)
- Multiple moderators and three-way moderations are possible
Mediation Analysis:
- A mediator explains the relationship between the independent variable and the
dependent variable
- It does not explain differences in the main effect but explains the average effect
Week 8:
Week 9:
Week 10: