You are on page 1of 15

Week 1:

About the Module:


- Consumer data to actionable insights

Week 2:
Descriptive Statistics:
Mean – central number of a finite set of numbers
Calculation: mean = sum of all the values of the observations divided by the number of
observations
Mode – value that appears most often in a dataset
Median – middle value of a dataset separating the greater half of the observations from the
lower half of the observations
Variance – expectation of the squared deviation of a random variable from its mean value
High variance indicates a lot of variation
Low variance indicates not much variation
Standard deviation – measure of the amount of variation in a sample
Low SD indicates that values are close to the mean
High SD indicates that values are spread out of a wider range

Obama’s campaign 2008:


- Team executed 500 A/B tests
- Testing increased donation conversions by 49% and sign-up conversions by 161%
- It addressed an additional 2880000 email addresses
- These translated into an additional $60 million in donations

Experimental design:
- One or more independent variables are manipulated (the treatment) while all other
influences remain constant
- Given that it is not possible that all other influences remain constant, control
variables can be added
- Data is collected on the outcome variable
- The effect of the treatment/manipulation on the outcome variable is investigated

Types of experimental designs:


- Laboratory experiment: the effect of all influential but not relevant independent
variables is kept to a minimum
- Field experiment: take place in real world settings and participants often are not
aware that they participate in the experiment  inside the control of the investigator
- Natural experiment: takes place in the real world but examines what happens when
the independent variables change naturally  outside the control of the investigator

What we are testing:


- Investigating how the dependent variable differs between two groups (difference
between mean values
- Investigating whether the difference between the mean values of both groups is
significantly different from 0

Evaluating the validity of experiments:


- Internal validity: the extent to which the observed results (difference in mean values)
are caused solely by the experiment manipulation
- External validity: the extent to which the observed results hold beyond the
experimental setting (whether the results remain the same in another context)

Analysis of Variance:
Week 3:
Regression Analysis:
Y = a + b*x + e

Y – dependent variable
X – independent variable
a – constant (intersection with Y axis)
b – regression coefficient (slope of the regression line)
e – error (badness of fit: distance between observations and estimated regression line)
R-squared – to what extent does x explain y (from 0% to 100%)

Data:
- Outcome/dependent variable needs to be continuous
- Independent variables can be nominal ordinal or interval/ratio scaled
- For independent variables that are not continuous we need to rely on dummy
variables (values of 0 and 1)

Objectives:
- Uncovering causality between one dependent variable and one or more independent
variables

Assumption 1: Linear relationship between independent and dependent variables:


- If the relationship is not linear, coefficients and standard errors can be biased leading
to incorrect significance tests
- Verify theoretically that the relationship is linear. Test whether quadratic effects are
significant
Assumption 2: Normal distribution of the error term (e):
- Significant tests rely on the normal distribution, thus, if the error term is not
normally distributed, significance tests may come to false conclusions
- Test whether the error term is distributed normally

Assumption 3: Omitted variables bias:


- If one or more of the relevant variables are omitted from the model, regression
coefficients and standard errors might be biased leading to incorrect significance
tests
- Include relevant control variables into your model

Assumption 4: Multicollinearity
- If two or more independent variables are highly correlated with each other, they may
explain the same variance in the dependent variable so that coefficients, standard
errors, and significance tests might be biased
- Test whether multicollinearity is an issue (correlations between independent
variables, variance inflation factors)
Interpreting regression results:
1. Look at R-squared: how well does model explain data
2. Is R-squared significant: not 0 (F-statistic in anova table)
3. Look at coefficients: are they significant
4. Are coefficients positive or negative

- To find out which independent variable has the strongest impact on satisfaction, we
need to compare standardized coefficients
- Whichever independent variable has the highest value has the greatest effect and
should be the first priority when providing recommendations
Importance-Performance Analysis:

Week 4:
Moderation Analysis:
- A moderator variable explains differences in the effect of an independent variable on
a dependent variable
- A moderator can be a continuous or a categorical variable (dummy variable)
- Multiple moderators and three-way moderations are possible
Mediation Analysis:
- A mediator explains the relationship between the independent variable and the
dependent variable
- It does not explain differences in the main effect but explains the average effect

Quality  Customer Satisfaction  Repurchase Intention


Week 5:
Week 6:
Week 7:

Week 8:

Week 9:

Week 10:

You might also like