You are on page 1of 42

A /B-TESTS

WHY
A /B-TESTS?
We make a change Effect?
What is the problem with this?

⏰ time
We make a change
before after

compare
📆 Weekday 🛠 Product changes
📺 Content changes
🎉 Holidays in the product
⏰ time
We make a change

📰 Press
🌦 Weather
👻 Something is trending 

📢 Marketing
on social media
💥 Campaign
Statistical
What change?? significance
in the effect
With an A/B-test the
change exists
Group A: gets no
change
simultaneously with
the control group, and
therefore minimizes
Group B: gets a the number of
change
variables that can
affect the outcome.
Statistical
Our change significance
in the effect
WHEN
A /B-TESTS?
Different purposes
1.
PURE OPTIMIZATION
What version gives the best
effect?
Example: 

Product image, title
optimization
2.
RELEASES
To make sure a “package of
changes” don’t have a
negative /unexpected
effect.
3.
VALIDATE A HYPOTHESIS
A /B tests are perfect for
this!
Primary reason to work with
A /B tests.
If we make change then we’ll
achieve effect
VALIDATED
LEARNING
Learning by validating
connection between change
and effect.
TIPS &
PITFALLS
Make sure there is enough
traffic for your A/B test.
FAIL: The amount of traffic
Not achieving needed depends on:
statistical • Number of variants
significance • Baseline
within a • Minimum Detectable Effect
reasonable (MDE)
time. Use 

https://conversionxl.com/ab-test-calculator/ 

https://www.optimizely.com/sample-size-calculator/ 

https://abtestguide.com/bayesian/
Make sure there are clear
goals and KPIs that you can
FAIL: affect.
Not knowing To know what effect we
how to evaluate achieve, we need to measure
the A /B test. the effect with relevant KPIs.
lacking a clear,
relevant and
measurable
goal.
Don’t forget to align with
FAIL: overall goals.
We achieve Make sure you have
positive results measurable overall goals.
from our A /B
tests, but don’t
realise we’re
suboptimizing
the product, or
cannibalising on
other parts.
Only test one change per
FAIL: variant in order to be able to
understand what change
Not knowing actually affects the result.
what actually
You can afford to learn when
affected the you have enough traffic.
results because If you’re not nearly close to
of too many your goal you just need to
changes. innovate and change
Hard to learn everything.
anything.
Test wide. Don’t chicken out!
To maximize and speed up
learning. Try many different
variants. Different types of
FAIL: solutions. Big differences
Slow learning instead of small details.
tempo and small Small changes = small
learning leaps. effects = small learnings
Big changes = big effects =
big learnings
Build as little as possible.
FAIL: The change should be good
It takes a long enough to show real users,
time to build / but don’t forget it’s just an
experiment.
create A /B
Only when we’ve validated a
tests. Too solution we build it “for
complex real”, otherwise we risk
solutions that unnecessary waste.
slow down the
learning tempo.
FAIL:
Incorrect results
(and decisions)
because of short
or incomplete
experiment
periods.
WEEKLY VARIATIONS

Sunday
Sunday
Sunday
Sunday
Sunday
Sunday
Sunday
Sunday
1. Always test whole cycles,
and at least two cycles.
FAIL: In this case: Test for whole
Incorrect results weeks and at least two full
weeks, due to weekly
(and decisions) variations.
because of short
or incomplete
experiment
periods.
CONVERSION RATE
THE CHANGE CURVE

😱
🤬
1. Always test whole cycles,
and at least two cycles.
FAIL: In this case: Test for whole
Incorrect results weeks and at least two full
weeks, due to weekly
(and decisions) variations.
because of short 2. Test long enough for the
or incomplete change curve to stabilize.
experiment May vary slightly depending
periods. on context and type of
change.
FAIL:
Incorrect results
(and decisions)
because of the
influence of
external
variables.
📆 Weekday 🎉 Holidays 🛠 Product changes

Group A: gets no 📺 Content changes


change
in the product

Group B: gets a 📰 Press


change
💥 Campaign

👻 Something is trending 

🌦 Weather 📢 Marketing
on social media
Think about what might
affect the results.
FAIL: Are we testing under good
Incorrect results conditions? Is the period
representative of "normal
(and decisions) use" of the service?
because of the
influence of
external
variables.
FAIL:
Assuming that a
positive
outcome of an
A /B test
directly means
that a
hypothesis is
validated.
BIAS

There may be biasses in the data, that


the groups have been unevenly
distributed.

Also biasses with us who read the data,


eg. that we actively sought out data that
fit pre-defined sentences (conscious or
unconscious).
Research like 

you are wrong.
Do follow-up test! To ensure
FAIL: reproducibility and exclude
bias.
Assuming that a
positive
outcome of an
A /B test
directly means
that a
hypothesis is
validated.
A /A-TEST

When: new A/B test tool or A/B testing


on a new platform.

A/A testing ensures that the distribution


of users in groups really is random and
representative.
SUMMARY
CHECKLIST BEFORE A /B TESTING

• Do we have enough traffic for statistical certainty in the results?


• How do we measure / evaluate the AB test? 

Is there a suitable measuring point?
• Do all measuring points work?
• Do we know what we're testing? Are we testing widely?
• Have we included an overall goal?
• Will we run the A / B test long enough?
• Is there anything that can affect the result? Is it an appropriate
time to run the test?
CHECKLIST AFTER A /B TESTING

• Do follow-up test if results show potential.


• Document learnings, share learnings, demonstrate learnings! 

So that learnings can be used as input for future A/B tests.
REMEMBER

• Very few A/B tests will be WINS.


• Test as much and as fast as you can to maximize learnings.
• Learning how to formulate good hypotheses, tests and measuring
points is also part of learning.
• Make sure to create the conditions to be able to test as much as
you can. We need tools and skills to lower the thresholds.
• Try to get analysts / growth people in the team. They are needed
throughout the pipeline when working with validated learning.
• Make sure you also learn from your “fails".
THANK YOU!
Jasmin Yaya

You might also like