Professional Documents
Culture Documents
Cohort Analysis
Cohort Analysis
0.1.3 Dataset
The dataset contains user interaction data, including metrics such as the number of new and
returning users, and their engagement durations on Day 1 and Day 7. Key columns in the dataset
are: * Date: The specific dates of user interactions. * New Users: The count of new users for
each date. * Returning Users: The count of users returning on each date. * Duration Day 1: The
average duration (possibly in minutes or seconds) of user interaction on their first day. * Duration
Day 7: The average duration of user interaction on their seventh day.
0.1.4 Task:
1. Identify trends in the acquisition of new users and the retention of returning users on a weekly
basis.
2. Understand how user engagement, as indicated by the average duration of interaction, evolves
from the first day to the seventh day of usage.
1
3. Detect any significant weekly patterns or anomalies in user behavior and engagement, and
investigate the potential causes behind these trends.
4. Explore the relationship between user retention (returning users) and engagement (duration
metrics), to assess the effectiveness of user engagement strategies.
5. Provide actionable insights that can guide marketing efforts, content strategies, and user
experience improvements.
[16]: Date New users Returning users Duration Day 1 Duration Day 7
0 25/10/2023 3461 1437 202.156977 162.523809
1 26/10/2023 3777 1554 228.631944 258.147059
2 27/10/2023 3100 1288 227.185841 233.550000
3 28/10/2023 2293 978 261.079545 167.357143
4 29/10/2023 2678 1082 182.567568 304.350000
Date 0
New users 0
Returning users 0
Duration Day 1 0
Duration Day 7 0
dtype: int64
Date object
New users int64
Returning users int64
Duration Day 1 float64
Duration Day 7 float64
dtype: object
2
0.2 Data Preparation
The Date column is in object (string) format. For effective analysis, I would convert this to a
datetime format.
[19]: # Convert 'Date' column to datetime format
data['Date'] = pd.to_datetime(data['Date'], format='%d/%m/%Y')
Duration Day 7
count 30.000000
mean 136.037157
min 0.000000
25% 68.488971
50% 146.381667
75% 220.021875
max 304.350000
std 96.624319
• New Users: The average number of new users is around 3,418 with a standard deviation of
approximately 677. The minimum and maximum new users recorded are 1,929 and 4,790,
respectively.
• Returning Users: On average, there are about 1,353 returning users, with a standard deviation
of around 247. The minimum and maximum are 784 and 1,766, respectively.
• Duration Day 1: The average duration on the first day is about 208 seconds with a consider-
able spread (standard deviation is around 65).
• Duration Day 7: The average 7-day duration is lower, around 136 seconds, with a larger
standard deviation of about 97. The range is from 0 to 304
# New Users
3
plt.plot(data['Date'], data['New users'], marker='o', linestyle='-', label='New␣
↪Users')
# Returning Users
plt.plot(data['Date'], data['Returning users'], marker='o', linestyle='-',␣
↪label='Returning Users')
# Add legend
plt.legend()
# Show plot
plt.grid(True)
plt.tight_layout()
plt.show()
4
0.4 Trend of duration over time
[10]: # Plot
plt.figure(figsize=(10, 6))
# Duration Day 1
plt.plot(data['Date'], data['Duration Day 1'], marker='o', linestyle='-',␣
↪label='Duration Day 1')
# Duration Day 7
plt.plot(data['Date'], data['Duration Day 7'], marker='o', linestyle='-',␣
↪label='Duration Day 7')
# Add legend
plt.legend()
# Show plot
plt.grid(True)
plt.tight_layout()
plt.show()
5
0.5 Correlation
[11]: # Correlation matrix
correlation_matrix = data.corr()
• The strongest correlation is between the number of new and returning users, indicating a
potential trend of new users converting to returning users.
6
0.6 Cohorts
Grouping the data by the week of the year to create cohorts. Then, for each cohort (week), I’ll
calculate the average number of new and returning users, as well as the average of Duration Day 1
and Duration Day 7.
0.7 Grouping the data by week and calculating the necessary metrices
[12]: # Grouping data by week
data['Week'] = data['Date'].dt.isocalendar().week
weekly_averages.head()
[12]: Week New users Returning users Duration Day 1 Duration Day 7
0 43 3061.800000 1267.800000 220.324375 225.185602
1 44 3503.571429 1433.142857 189.088881 168.723200
2 45 3297.571429 1285.714286 198.426524 143.246721
3 46 3222.428571 1250.000000 248.123542 110.199609
4 47 4267.750000 1616.250000 174.173330 0.000000
0.8 Weekly average of the new and returning users and the duration
[13]: # Plot 1: Weekly Average of New vs. Returning Users
plt.figure(figsize=(10, 6))
# New users
plt.plot(weekly_averages['Week'], weekly_averages['New users'], marker='o',␣
↪linestyle='-', label='New users')
# Returning users
plt.plot(weekly_averages['Week'], weekly_averages['Returning users'],␣
↪marker='o', linestyle='-', label='Returning users')
# Add legend
7
plt.legend()
# Show plot
plt.grid(True)
plt.tight_layout()
plt.show()
# Duration Day 1
plt.plot(weekly_averages['Week'], weekly_averages['Duration Day 1'],␣
↪marker='o', linestyle='-', label='Duration Day 1')
# Duration Day 7
plt.plot(weekly_averages['Week'], weekly_averages['Duration Day 7'],␣
↪marker='o', linestyle='-', label='Duration Day 7')
# Add legend
plt.legend()
# Show plot
plt.grid(True)
plt.tight_layout()
plt.show()
8
0.9 Calculating Metrics
Determining which metrics to analyze over time. Common metrics for cohort analysis include
retention rate, average duration, etc.
9
0.10 Cohort chart to understand the cohort matrix of weekly averages
In the cohort chart, each row will correspond to a week of the year, and each column will represent
a different metric.
• Average number of new users.
• Average number of returning users.
• Average duration on Day 1.
• Average duration on Day 7.
[14]: # Creating a cohort matrix
cohort_matrix = weekly_averages.set_index('Week')
We can see that the number of new users and returning users fluctuates from week to week. Notably,
there was a significant increase in both new and returning users in Week 47.
10
The average duration of user engagement on Day 1 and Day 7 varies across the weeks. The durations
do not follow a consistent pattern about the number of new or returning users, suggesting that other
factors might be influencing user engagement.
Benefits of Cohort Analysis
• Cohort analysis helps group customers for targeted strategies.
• Reveals patterns in user behavior over time.
• Informs improvements based on cohort responses.
• Evaluates campaign effectiveness for optimized resource allocation.
• Guides strategies for improved customer loyalty and retent.
[ ]:
11