Challenges in A/B Testing Mobile Native Apps

Eric Florenzano RuPy Brno 2012

Why am I interested in this?

• • • •

Started a company to make iOS apps Ended up building A/B testing tools Twitter acquired our IP All open sourced github.com/clutchio

Show of hands

Who does “web stuff?”

Show of hands

• •

Who does “web stuff?” Keep your hand up if you’ve done A/B testing on the web.

Show of hands

Who does iOS or Android apps?

Show of hands

• •

Who does iOS or Android apps? Keep your hand up if you’ve done A/B testing on these apps.

What is A/B Testing?

• • •

Show users different variations of your app Track those users and their actions Analyze their actions to find the most effective variation

Examples of A/B Tests
• • •
Button text: do people understand “Register” better than “Sign Up”? On a mobile website, try replacing the e-mail field with phone number field and verify by text message. Does sign-up conversion go up or down if we give people more introductory reading material?

A/B Tests on the Web Today

• •

Backend determines A/B test bucket Front-end changes display

Example of A/B Test on Web
{% ab user “phone-signup” %} <input name=”phone” type=”tel” placeholder=”Phone Number” /> {% else %} <input name=”email” type=”email” placeholder=”E-Mail Address”/> {% endab %}

Now Mobile

if(ab(user, @“phone-signup”)) { label.text = @“Phone Number”; } else { label.text = @”E-Mail Address”; }

However...

• • • •

How does this ab() function work on mobile? What can our latency be on this call? Can it talk to a database? What happens when you are offline?

Solution: Manifest Upfront

• •

Download manifest of all A/B tests on launch Client must be smart enough to make its own decision, immediately, offline

Problem: First Launch
• • • •
Manifest downloads asynchronously at launch What about the first launch? What about A/B tests on that first screen? One possible answer: bundle a manifest with the app. Sucks. Adds extra step to build process.

Goal Tracking

• •

How do we determine success or failure? When is a test completed?

Goal Tracking

goal_reached(user, “phone-signup”)

Goal Tracking

• • •

Must have capability on frontend and backend Frontend example: registration page completed successfully Backend example: e-mail verified successfully

Goal Tracking Mobile Gotcha

What happens when the phone is offline?

Goal Tracking on Mobile

• • •

Store all results in a local phone database Upload all of the information periodically If possible, upload everything when the app is quit

Note: Complications

• • •

May receive some data twice -- have to query to double-check first May receive very old data -- working set of data now very wide How long is too long to wait for data to come back?

Side-Note: User/Test Consistency
• • •
It’s important for a user to have a consistent experience Once a user is placed in a test bucket, they should remain there for that session How long is a “session”? On mobile, a good heuristic is “forever, until they update their app.”

So...Minimum Requirements
• • • • •
Download manifest up-front Make weighted random decisions offline Track goals and store progress in a local db Peg users to the same bucket during the session Periodically upload progress

Problem: Slow Release Cycle

• • •

Each release might take weeks or even months A/B testing at this time scale is frustrating Can anything be done to improve it?

Solution: Parameterized Tests

• • •

Instead of a simple boolean if-else, pass back data instead Now you can change the tests on the server Still need to think ahead, but this can add lots of flexibility

Parameterized Test Example

[AB test:@"login" data:^(NSDictionary *data) { btn.title = [data objectForKey:@”title”]; }];

Note

• • •

Remember, this decision still needs to be made instantly, and off-line So all of this data now becomes part of the manifest Data must be kept compact and can’t store e.g. a lot of binary

Interpreting the Data

• • •

We care about two things: Which variation is winning? How confident are we about it?

Which variation is winning?

• • •

Can easily calculate a ratio for each variation: How many people have seen this variation? How many of those people have reached the goal?

How confident are we about it?
• • • • •
Statistics. My worst subject in school. Must choose a “p-value” - the higher the value, the less results you need, but lower accuracy Now compute a confidence interval I’m told the Agresti-Coull Interval is a good choice for calculating confidence interval Open source JavaScript ABBA library is great!

ABBA Example
http://www.thumbtack.com/labs/abba/

Questions?

• •

@ericflo on Twitter github.com/clutchio