You are on page 1of 7

Introduction

A/B tests are very commonly performed by data analysts and data scientists. It is important that you
get some practice working with the difficulties of these.

For this project, you will be working to understand the results of an A/B test run by an e-commerce
website. Your goal is to work through this notebook to help the company understand if they should
implement the new page, keep the old page, or perhaps run the experiment longer to make their
decision.

As you work through this notebook, follow along in the classroom and answer the
corresponding quiz questions associated with each question. The labels for each classroom
concept are provided for each question. This will assure you are on the right track as you work
through the project, and you can feel more confident in your final submission meeting the criteria. As
a final check, assure you meet all the criteria on the RUBRIC.

Part I - Probability

To get started, let's import our libraries.


In [2]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
random.seed(100)
%matplotlib inline

1. Now, read in the ab_data.csv data. Store it in df. Use your dataframe to answer the


questions in Quiz 1 of the classroom.

a. Read in the dataset and take a look at the top few rows here:
In [3]:
# take a look at the first five rows of the data
df = pd.read_csv('/home/kesci/input/ab_testing_data9749/ab_data.csv')
df.head()
Out[3]:
l
a
t c
n
u i o
d
s m g n
i
e e r v
n
r s o e
g
_ t u r
_
i a p t
p
d m e
a
p d
g
e

2
0
1
7
-
0
1
-
2
1 o
c
8 l
o
5 2 d
n
1 2 _
0 t 0
1 : p
r
0 1 a
o
4 1 g
l
: e
4
8
.
5
5
6
7
3
9

1 8 2 c o 0
0 0 o l
4 1 n d
l
a
t c
n
u i o
d
s m g n
i
e e r v
n
r s o e
g
_ t u r
_
i a p t
p
d m e
a
p d
g
e

7
-
0
1
-
1
2

0
8 _
t
2 : p
r
2 0 a
o
8 1 g
l
: e
4
5
.
1
5
9
7
3
9

2 6 2 t n 0
6 0 r e
1 1 e w
5 7 a _
9 - t p
0 0 m a
l
a
t c
n
u i o
d
s m g n
i
e e r v
n
r s o e
g
_ t u r
_
i a p t
p
d m e
a
p d
g
e

1
-
1
1

1
6
:
5
e
5 g
n
: e
t
0
6
.
1
5
4
2
1
3

3 8 2 t n 0
5 0 r e
3 1 e w
5 7 a _
4 - t p
1 0 m a
1 e g
- n e
0
l
a
t c
n
u i o
d
s m g n
i
e e r v
n
r s o e
g
_ t u r
_
i a p t
p
d m e
a
p d
g
e

1
8
:
2
8
:
0 t
3
.
1
4
3
7
6
5

4 8 2 c o 1
6 0 o l
4 1 n d
9 7 t _
7 - r p
5 0 o a
1 l g
- e
2
1

0
l
a
t c
n
u i o
d
s m g n
i
e e r v
n
r s o e
g
_ t u r
_
i a p t
p
d m e
a
p d
g
e

1
:
5
2
:
2
6
.
2
1
0
8
2
7

b. Use the below cell to find the shape of rows in the dataset.
In [4]:
# take a look at the shape of the data
df.shape
Out[4]:
(294478, 5)

In [5]:
# take a look at the data types of the columns
df.dtypes
Out[5]:
user_id int64
timestamp object
group object
landing_page object
converted int64
dtype: object
c. The number of unique users in the dataset.
In [6]:
# the number of the unique user
# len(df['user_id'].unique())
df['user_id'].nunique()
Out[6]:
290584

In [7]:
# take a look at the description of the data
# df.describe()

d. The proportion of users converted.


In [8]:
# the proportion of user converted
df[df['converted'] == 1]['user_id'].nunique() / df['user_id'].nunique()
Out[8]:
0.12104245244060237

You might also like