DPDZero Assessment

DPDZero Assesment
The dataset contains Loan data for various borrowers in a loan portfolio
The columns are as follows
Amount Pending - This is the EMI amount
State - The borrower’s state
Tenure - Total tenure of the borrower
Interest Rate - Interest rate
City - The city of the borrower
Bounce String
• This is a string that explain’s customer’s bounce behaviour since the disbursal of the loan
- bounce means that the customer did not end up making the payment
• S or H- No bounce in that month
• B or L - Bounce in that month
• FEMI - first EMI - no known behaviour
• Last character denotes the last month - first character denotes the first month on book -
for example SSB means that customer was on book for 3 months and he has bounced the
in the last month
Disbursed Amount - the total disbursed amount
Loan Number - The loan number
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import os
df = pd.read_csv(r"C:\Users\Mohit\Downloads\
Data_Analyst_Assignment_Dataset.csv")
df.head()
Amount Pending State Tenure Interest Rate City Bounce

String \
0 963 Karnataka 11 7.69 Bangalore
SSS
SSB
2 1807 Karnataka 14 4.24 Hassan
BBS
SSS
4 2611 Karnataka 10 4.41 Mysore
SSB
Disbursed Amount Loan Number

0 10197 JZ6FS
1 12738 RDIOY
2 24640 WNW4L
3 23990 6LBJS
4 25590 ZFZUA
df.shape
(24582, 8)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24582 entries, 0 to 24581
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Amount Pending 24582 non-null int64
1 State 24582 non-null object
2 Tenure 24582 non-null int64
3 Interest Rate 24582 non-null float64
4 City 24582 non-null object
5 Bounce String 24582 non-null object
6 Disbursed Amount 24582 non-null int64
7 Loan Number 24582 non-null object
dtypes: float64(1), int64(3), object(4)
memory usage: 1.5+ MB
df.columns
Index(['Amount Pending', 'State', 'Tenure', 'Interest Rate', 'City',

'Bounce String', 'Disbursed Amount', 'Loan Number'],
dtype='object')
df.describe(include='all')
Amount Pending State Tenure Interest Rate

City \
count 24582.000000 24582 24582.000000 24582.000000
24582
unique NaN 7 NaN NaN
186
top NaN Maharashtra NaN NaN
Pune
freq NaN 6793 NaN NaN
1780
mean 1791.172687 NaN 9.415263 0.934960
NaN
std 937.565507 NaN 3.238904 3.114732
NaN
min 423.000000 NaN 7.000000 0.000000
NaN
25% 1199.000000 NaN 8.000000 0.000000
NaN
50% 1593.000000 NaN 8.000000 0.000000
NaN
75% 2083.000000 NaN 11.000000 0.000000
NaN
max 13349.000000 NaN 24.000000 37.920000
NaN
Bounce String Disbursed Amount Loan Number

count 24582 24582.000000 24582
unique 413 NaN 24579
top S NaN T7WLO
freq 3615 NaN 2
mean NaN 17705.195468 NaN
std NaN 14192.671509 NaN
min NaN 2793.000000 NaN
25% NaN 9857.750000 NaN
50% NaN 13592.000000 NaN
75% NaN 19968.000000 NaN
max NaN 141072.000000 NaN
df.isnull().sum()
Amount Pending 0
State 0
Tenure 0
Interest Rate 0
City 0
Bounce String 0
Disbursed Amount 0
Loan Number 0
dtype: int64
df.skew(numeric_only=True)
Amount Pending 3.077099

Tenure 3.199011
Interest Rate 4.910253
Disbursed Amount 3.282953
dtype: float64
df.duplicated().sum()
df['Bounce String'] = df['Bounce String'].str.replace('L','B')

df['Bounce String'] = df['Bounce String'].str.replace('H','S')
Task 1 - Calculating the risk labels for all the borrowers.

condition2 = df[(df['Bounce String'] != 'FEMI') & ((df['Bounce
String'].str.count('B') == 0) & (df['Bounce String'].str.len() <= 6))
|
((df['Bounce String'].str[-6:].str.count('B') == 0) &
(df['Bounce String'].str.len() > 6))]
condition2['Bounce String'].unique()
array(['SSS', 'SS', 'S', 'SSSSS', 'SSSS', 'BSSSSSSS', 'SSSSSSSS',

'BSSSSSS', 'SSSSSSS', 'SSSSSS', 'SSSSSSSSS', 'BBSSSSSS',
'BSSSSSSSSS', 'SSSSSSSSSS'], dtype=object)
condition3 = df[(df['Bounce String'] != 'FEMI') & (df['Bounce

String'].str[-1] != 'B')& ((df['Bounce String'].str.count('B') == 1) &
(df['Bounce String'].str.len() <= 6)) |
(df['Bounce String'].str.len() > 6))]
condition3['Bounce String'].unique()
array(['BS', 'BSSSSS', 'SBSSS', 'SSSBS', 'BSSSS', 'SSBSS', 'BSSS',

'SSBS',
'SBSS', 'SBS', 'SSSSSSSB', 'SSBSSSSS', 'BBBSSSSS', 'BSBSSSSS',
'BSSSSSBS', 'BBSSSSS', 'BSSSSBS', 'BSSSSSB', 'SBSSSSS',
'SSBSSSS',
'SSSSSBS', 'SSSBSSS', 'SSSSBSS', 'SSSSBS', 'SSSBSS', 'SSBSSS',
'BSSBSSSS', 'SBBSSSSS', 'SSSSSSBS', 'SSSSSSB', 'BSSSBSS',
'SBSSSS',
'BBBBBSSSSS', 'SBSSBSSSSS', 'BBBBSSSSS', 'SSSBSSSSS',
'BSSBSSS'],
dtype=object)
condition1 = (df['Bounce String'] == 'FEMI')

condition2 = ((df['Bounce String'] != 'FEMI') & ((df['Bounce
String'].str.count('B') == 0) & (df['Bounce String'].str.len() <= 6))
|
(df['Bounce String'].str.len() > 6)))
condition3 = ((df['Bounce String'] != 'FEMI') & (df['Bounce
String'].str[-1] != 'B') & ((df['Bounce String'].str.count('B') == 1)
& (df['Bounce String'].str.len() <= 6)) |
(df['Bounce String'].str.len() > 6)))
condition4 = ~(condition1 | condition2 | condition3)
conditions = [
condition1,
condition2,
condition3,
condition4
]
values = ['Unknown risk', 'Low risk', 'Medium Risk', 'High Risk']

df['Risk Label'] = np.select(conditions, values, default='Unknown')
df.head(2)

String \
SSS
SSB
Disbursed Amount Loan Number Risk Label

0 10197 JZ6FS Low risk
1 12738 RDIOY High Risk
df['Risk Label'].value_counts().plot(kind= 'bar')
<Axes: >
df['Months on Book'] = df['Bounce String'].str.len()
df

City \
0 963 Karnataka 11 7.69
Bangalore
1 1194 Karnataka 11 6.16
Bangalore
2 1807 Karnataka 14 4.24
Hassan
3 2451 Karnataka 10 4.70
Bangalore
4 2611 Karnataka 10 4.41
Mysore
... ... ... ... ... ..
.
24577 899 Andhra Pradesh 8 0.00
Chittoor
Krishna
Krishna
Guntur
24581 2254 Andhra Pradesh 11 0.00
Kurnool
Bounce String Disbursed Amount Loan Number Risk Label \

0 SSS 10197 JZ6FS Low risk
1 SSB 12738 RDIOY High Risk
2 BBS 24640 WNW4L High Risk
3 SSS 23990 6LBJS Low risk
4 SSB 25590 ZFZUA High Risk
... ... ... ... ...
24577 FEMI 7192 EAX5C Unknown risk
24578 FEMI 21592 5MCE9 Unknown risk
24579 FEMI 12320 9HO4Q Unknown risk
24580 FEMI 6592 3VV72 Unknown risk
24581 FEMI 24794 18XBC Unknown risk
Months on Book
0 3
1 3
2 3
3 3
4 3
... ...
24577 4
24578 4
24579 4
24580 4
24581 4
[24582 rows x 10 columns]
Task 2: Labeling customers based on where they are in their tenure.

Cond1 = df['Months on Book'] <=3
Cond2 = (df['Tenure']-df['Months on Book'])<=3
Cond3 = ~(Cond1 | Cond2)
conditions = [
Cond1,
Cond2,
Cond3 ]
values = ['Early Tenure', 'Late Tenure', 'Mid Tenure']

df['Tenure Label'] = np.select(conditions, values, default='Unknown')
df.head(2)
String \
SSS
SSB
Disbursed Amount Loan Number Risk Label Months on Book Tenure

Label
0 10197 JZ6FS Low risk 3 Early
Tenure
1 12738 RDIOY High Risk 3 Early
Tenure
df['Tenure Label'].value_counts().plot(kind= 'bar')
<Axes: >
Task 3: Segmenting borrowers based on ticket size
# Sort the DataFrame by 'Amount Pending'
df = df.sort_values(by='Amount Pending').reset_index(drop=True)
# Calculate the cumulative sum of 'Amount Pending'

df['Cumulative Sum'] = df['Amount Pending'].cumsum()
# Calculate the quantiles for the cumulative sum

df['Ticket Size Quantile'] = pd.qcut(df['Cumulative Sum'], q=3,
labels=['Low Ticket Size', 'Medium Ticket Size', 'High Ticket Size'])
# Display the distribution of borrowers in each cohort

print(df['Ticket Size Quantile'].value_counts())
Low Ticket Size 8194

Medium Ticket Size 8194
High Ticket Size 8194
Name: Ticket Size Quantile, dtype: int64
df.head()
Amount Pending State Tenure Interest Rate City \

0 423 Maharashtra 11 11.84 Sangli
1 444 Tamil Nadu 11 12.23 VIRUDHUNAGAR
2 451 Maharashtra 7 37.92 Pune
3 522 Karnataka 11 12.83 Bagalkot
4 522 Maharashtra 11 12.83 Pune
Bounce String Disbursed Amount Loan Number Risk Label Months on

Book \
0 FEMI 4389 HEMS0 Unknown risk
4
1 FEMI 4598 1BYJD Unknown risk
4
2 BSSSSB 2793 7COLC High Risk
6
3 FEMI 5390 587TX Unknown risk
4
4 S 5390 5QJN0 Low risk
1
Tenure Label Cumulative Sum Ticket Size Quantile

0 Mid Tenure 423 Low Ticket Size
2 Late Tenure 1318 Low Ticket Size
4 Early Tenure 2362 Low Ticket Size
Task 4: Chhannel spend recommendations
# low risk Customers
Low_risk_df = df[df['Risk Label']=='Low risk']
# Firt EMI Customers

First_EMI_df = df[df['Bounce String']=='FEMI']
# Low EMI Customers

Low_EMI_df = df[df['Ticket Size Quantile']=='Low Ticket Size']
# Low or Medium EMI Customers

Low_Med_EMI_df = df[df['Ticket Size Quantile']== ('Low Ticket Size' or
'Medium Ticket Size')]
# English Speaking Customers

Eng_df = df[df['City'] == ( "Mumbai" or "Pune" or "Delhi" or
"Ahmedabad" or "Surat" or "Chennai" or "Kolkata" or "Bangalore" or
"Hyderabad")]
# Hindi Speaking Customers

Hindi_df = df[df['State'] == ('Maharashtra' or 'Madhya Pradesh')]
# Hindi or English Speaking Customers

Hindi_Eng_df = pd.concat([Eng_df, Hindi_df]).drop_duplicates()
# Low Bounce Behaviour Customers

Low_bounce_df = df[df['Risk Label']=='Medium Risk']
# Define the cost of each channel

channel_costs = {'Whatsapp bot': 5, 'Voice bot': 10, 'Human calling':
50}
# Assign each borrower to a channel based on the defined criteria

def assign_channel(row):
if row['Risk Label'] == 'Low risk' or row['Bounce String'] ==
'FEMI' or row['Ticket Size Quantile'] == 'Low Ticket Size':
return 'Whatsapp bot'
elif row['City'] in ["Mumbai", "Pune", "Delhi", "Ahmedabad",
"Surat", "Chennai", "Kolkata", "Bangalore", "Hyderabad"] or
row['State'] in ['Maharashtra', 'Madhya Pradesh'] or row['Risk Label']
== 'Medium Risk':
return 'Voice bot'
else:
return 'Human calling'
# Apply the function to assign channels

df['Channel'] = df.apply(assign_channel, axis=1)
df
City \
0 423 Maharashtra 11 11.84
Sangli
1 444 Tamil Nadu 11 12.23
VIRUDHUNAGAR
Pune
3 522 Karnataka 11 12.83
Bagalkot
Pune
... ... ... ... ... ..
.
24577 12500 Maharashtra 8 0.00
Kolhapur
24578 12500 Maharashtra 8 0.00
Pune
24579 12500 Kerala 8 0.00
MALAPPURAM
24580 12500 Maharashtra 8 0.00
Sangli
24581 13349 Maharashtra 8 0.00
Nagpur
Bounce String Disbursed Amount Loan Number Risk Label \

0 FEMI 4389 HEMS0 Unknown risk
1 FEMI 4598 1BYJD Unknown risk
2 BSSSSB 2793 7COLC High Risk
3 FEMI 5390 587TX Unknown risk
4 S 5390 5QJN0 Low risk
... ... ... ... ...
24577 BBSSSSS 100000 8MQRY Medium Risk
24578 S 100000 1R840 Low risk
24579 S 100000 QUV9D Low risk
24580 S 100000 66HA4 Low risk
24581 S 106792 HZ6XJ Low risk
Months on Book Tenure Label Cumulative Sum Ticket Size

Quantile \
0 4 Mid Tenure 423 Low Ticket
Size
Size
2 6 Late Tenure 1318 Low Ticket
Size
Size
4 1 Early Tenure 2362 Low Ticket
Size
... ... ... ... .
..
24577 7 Late Tenure 43979758 High Ticket
Size
24578 1 Early Tenure 43992258 High Ticket
Size
Size
Size
Size
Channel
0 Whatsapp bot
1 Whatsapp bot
2 Whatsapp bot
3 Whatsapp bot
4 Whatsapp bot
... ...
24577 Voice bot
24578 Whatsapp bot
24579 Whatsapp bot
24580 Whatsapp bot
24581 Whatsapp bot
[24582 rows x 14 columns]
# Total cost for each channel
whatsapp_cost = df[df['Channel'] == 'Whatsapp bot'].shape[0] * 5

voice_cost = df[df['Channel'] == 'Voice bot'].shape[0] * 10
human_cost = df[df['Channel'] == 'Human calling'].shape[0] * 50
total_cost = whatsapp_cost + voice_cost + human_cost
print(f"Total cost for Whatsapp Bot: {whatsapp_cost} rupees")

print(f"Total cost for Voice Bot: {voice_cost} rupees")
print(f"Total cost for Human Calling: {human_cost} rupees")
print(f"Total overall cost: {total_cost} rupees")
Total cost for Whatsapp Bot: 96640 rupees

Total cost for Voice Bot: 39610 rupees
Total cost for Human Calling: 64650 rupees
Total overall cost: 200900 rupees

DPDZero Assessment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DPDZero Assessment

Uploaded by

Copyright:

Available Formats

DPDZero Assesment

The columns are as follows

Amount Pending - This is the EMI amount

State - The borrower’s state

Tenure - Total tenure of the borrower

Interest Rate - Interest rate

City - The city of the borrower

Disbursed Amount - the total disbursed amount

Loan Number - The loan number

Amount Pending State Tenure Interest Rate City Bounce

Disbursed Amount Loan Number

Index(['Amount Pending', 'State', 'Tenure', 'Interest Rate', 'City',

Amount Pending State Tenure Interest Rate

Bounce String Disbursed Amount Loan Number

Amount Pending 3.077099

df['Bounce String'] = df['Bounce String'].str.replace('L','B')

Task 1 - Calculating the risk labels for all the borrowers.

array(['SSS', 'SS', 'S', 'SSSSS', 'SSSS', 'BSSSSSSS', 'SSSSSSSS',

condition3 = df[(df['Bounce String'] != 'FEMI') & (df['Bounce

array(['BS', 'BSSSSS', 'SBSSS', 'SSSBS', 'BSSSS', 'SSBSS', 'BSSS',

condition1 = (df['Bounce String'] == 'FEMI')

values = ['Unknown risk', 'Low risk', 'Medium Risk', 'High Risk']

Amount Pending State Tenure Interest Rate City Bounce

Disbursed Amount Loan Number Risk Label

df['Risk Label'].value_counts().plot(kind= 'bar')

Amount Pending State Tenure Interest Rate

Bounce String Disbursed Amount Loan Number Risk Label \

[24582 rows x 10 columns]

Task 2: Labeling customers based on where they are in their tenure.

values = ['Early Tenure', 'Late Tenure', 'Mid Tenure']

Disbursed Amount Loan Number Risk Label Months on Book Tenure

df['Tenure Label'].value_counts().plot(kind= 'bar')

# Calculate the cumulative sum of 'Amount Pending'

# Calculate the quantiles for the cumulative sum

# Display the distribution of borrowers in each cohort

Low Ticket Size 8194

Amount Pending State Tenure Interest Rate City \

Bounce String Disbursed Amount Loan Number Risk Label Months on

Tenure Label Cumulative Sum Ticket Size Quantile

# Firt EMI Customers

# Low EMI Customers

# Low or Medium EMI Customers

# English Speaking Customers

# Hindi Speaking Customers

# Hindi or English Speaking Customers

# Low Bounce Behaviour Customers

# Define the cost of each channel

# Assign each borrower to a channel based on the defined criteria

# Apply the function to assign channels

Bounce String Disbursed Amount Loan Number Risk Label \

Months on Book Tenure Label Cumulative Sum Ticket Size

[24582 rows x 14 columns]

# Total cost for each channel

whatsapp_cost = df[df['Channel'] == 'Whatsapp bot'].shape[0] * 5

print(f"Total cost for Whatsapp Bot: {whatsapp_cost} rupees")

Total cost for Whatsapp Bot: 96640 rupees

You might also like