You are on page 1of 12

DPDZero Assesment

The dataset contains Loan data for various borrowers in a loan portfolio

The columns are as follows

Amount Pending - This is the EMI amount

State - The borrower’s state

Tenure - Total tenure of the borrower

Interest Rate - Interest rate

City - The city of the borrower

Bounce String

• This is a string that explain’s customer’s bounce behaviour since the disbursal of the loan
- bounce means that the customer did not end up making the payment
• S or H- No bounce in that month
• B or L - Bounce in that month
• FEMI - first EMI - no known behaviour
• Last character denotes the last month - first character denotes the first month on book -
for example SSB means that customer was on book for 3 months and he has bounced the
in the last month

Disbursed Amount - the total disbursed amount

Loan Number - The loan number

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import os

df = pd.read_csv(r"C:\Users\Mohit\Downloads\
Data_Analyst_Assignment_Dataset.csv")
df.head()

Amount Pending State Tenure Interest Rate City Bounce


String \
0 963 Karnataka 11 7.69 Bangalore
SSS
1 1194 Karnataka 11 6.16 Bangalore
SSB
2 1807 Karnataka 14 4.24 Hassan
BBS
3 2451 Karnataka 10 4.70 Bangalore
SSS
4 2611 Karnataka 10 4.41 Mysore
SSB

Disbursed Amount Loan Number


0 10197 JZ6FS
1 12738 RDIOY
2 24640 WNW4L
3 23990 6LBJS
4 25590 ZFZUA

df.shape

(24582, 8)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24582 entries, 0 to 24581
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Amount Pending 24582 non-null int64
1 State 24582 non-null object
2 Tenure 24582 non-null int64
3 Interest Rate 24582 non-null float64
4 City 24582 non-null object
5 Bounce String 24582 non-null object
6 Disbursed Amount 24582 non-null int64
7 Loan Number 24582 non-null object
dtypes: float64(1), int64(3), object(4)
memory usage: 1.5+ MB

df.columns

Index(['Amount Pending', 'State', 'Tenure', 'Interest Rate', 'City',


'Bounce String', 'Disbursed Amount', 'Loan Number'],
dtype='object')

df.describe(include='all')

Amount Pending State Tenure Interest Rate


City \
count 24582.000000 24582 24582.000000 24582.000000
24582
unique NaN 7 NaN NaN
186
top NaN Maharashtra NaN NaN
Pune
freq NaN 6793 NaN NaN
1780
mean 1791.172687 NaN 9.415263 0.934960
NaN
std 937.565507 NaN 3.238904 3.114732
NaN
min 423.000000 NaN 7.000000 0.000000
NaN
25% 1199.000000 NaN 8.000000 0.000000
NaN
50% 1593.000000 NaN 8.000000 0.000000
NaN
75% 2083.000000 NaN 11.000000 0.000000
NaN
max 13349.000000 NaN 24.000000 37.920000
NaN

Bounce String Disbursed Amount Loan Number


count 24582 24582.000000 24582
unique 413 NaN 24579
top S NaN T7WLO
freq 3615 NaN 2
mean NaN 17705.195468 NaN
std NaN 14192.671509 NaN
min NaN 2793.000000 NaN
25% NaN 9857.750000 NaN
50% NaN 13592.000000 NaN
75% NaN 19968.000000 NaN
max NaN 141072.000000 NaN

df.isnull().sum()

Amount Pending 0
State 0
Tenure 0
Interest Rate 0
City 0
Bounce String 0
Disbursed Amount 0
Loan Number 0
dtype: int64

df.skew(numeric_only=True)

Amount Pending 3.077099


Tenure 3.199011
Interest Rate 4.910253
Disbursed Amount 3.282953
dtype: float64
df.duplicated().sum()

df['Bounce String'] = df['Bounce String'].str.replace('L','B')


df['Bounce String'] = df['Bounce String'].str.replace('H','S')

Task 1 - Calculating the risk labels for all the borrowers.


condition2 = df[(df['Bounce String'] != 'FEMI') & ((df['Bounce
String'].str.count('B') == 0) & (df['Bounce String'].str.len() <= 6))
|
((df['Bounce String'].str[-6:].str.count('B') == 0) &
(df['Bounce String'].str.len() > 6))]

condition2['Bounce String'].unique()

array(['SSS', 'SS', 'S', 'SSSSS', 'SSSS', 'BSSSSSSS', 'SSSSSSSS',


'BSSSSSS', 'SSSSSSS', 'SSSSSS', 'SSSSSSSSS', 'BBSSSSSS',
'BSSSSSSSSS', 'SSSSSSSSSS'], dtype=object)

condition3 = df[(df['Bounce String'] != 'FEMI') & (df['Bounce


String'].str[-1] != 'B')& ((df['Bounce String'].str.count('B') == 1) &
(df['Bounce String'].str.len() <= 6)) |
((df['Bounce String'].str[-6:].str.count('B') == 1) &
(df['Bounce String'].str.len() > 6))]

condition3['Bounce String'].unique()

array(['BS', 'BSSSSS', 'SBSSS', 'SSSBS', 'BSSSS', 'SSBSS', 'BSSS',


'SSBS',
'SBSS', 'SBS', 'SSSSSSSB', 'SSBSSSSS', 'BBBSSSSS', 'BSBSSSSS',
'BSSSSSBS', 'BBSSSSS', 'BSSSSBS', 'BSSSSSB', 'SBSSSSS',
'SSBSSSS',
'SSSSSBS', 'SSSBSSS', 'SSSSBSS', 'SSSSBS', 'SSSBSS', 'SSBSSS',
'BSSBSSSS', 'SBBSSSSS', 'SSSSSSBS', 'SSSSSSB', 'BSSSBSS',
'SBSSSS',
'BBBBBSSSSS', 'SBSSBSSSSS', 'BBBBSSSSS', 'SSSBSSSSS',
'BSSBSSS'],
dtype=object)

condition1 = (df['Bounce String'] == 'FEMI')


condition2 = ((df['Bounce String'] != 'FEMI') & ((df['Bounce
String'].str.count('B') == 0) & (df['Bounce String'].str.len() <= 6))
|
((df['Bounce String'].str[-6:].str.count('B') == 0) &
(df['Bounce String'].str.len() > 6)))
condition3 = ((df['Bounce String'] != 'FEMI') & (df['Bounce
String'].str[-1] != 'B') & ((df['Bounce String'].str.count('B') == 1)
& (df['Bounce String'].str.len() <= 6)) |
((df['Bounce String'].str[-6:].str.count('B') == 1) &
(df['Bounce String'].str.len() > 6)))
condition4 = ~(condition1 | condition2 | condition3)

conditions = [
condition1,
condition2,
condition3,
condition4
]

values = ['Unknown risk', 'Low risk', 'Medium Risk', 'High Risk']


df['Risk Label'] = np.select(conditions, values, default='Unknown')
df.head(2)

Amount Pending State Tenure Interest Rate City Bounce


String \
0 963 Karnataka 11 7.69 Bangalore
SSS
1 1194 Karnataka 11 6.16 Bangalore
SSB

Disbursed Amount Loan Number Risk Label


0 10197 JZ6FS Low risk
1 12738 RDIOY High Risk

df['Risk Label'].value_counts().plot(kind= 'bar')

<Axes: >
df['Months on Book'] = df['Bounce String'].str.len()

df

Amount Pending State Tenure Interest Rate


City \
0 963 Karnataka 11 7.69
Bangalore
1 1194 Karnataka 11 6.16
Bangalore
2 1807 Karnataka 14 4.24
Hassan
3 2451 Karnataka 10 4.70
Bangalore
4 2611 Karnataka 10 4.41
Mysore
... ... ... ... ... ..
.
24577 899 Andhra Pradesh 8 0.00
Chittoor
24578 2699 Andhra Pradesh 8 0.00
Krishna
24579 1540 Andhra Pradesh 8 0.00
Krishna
24580 824 Andhra Pradesh 8 0.00
Guntur
24581 2254 Andhra Pradesh 11 0.00
Kurnool

Bounce String Disbursed Amount Loan Number Risk Label \


0 SSS 10197 JZ6FS Low risk
1 SSB 12738 RDIOY High Risk
2 BBS 24640 WNW4L High Risk
3 SSS 23990 6LBJS Low risk
4 SSB 25590 ZFZUA High Risk
... ... ... ... ...
24577 FEMI 7192 EAX5C Unknown risk
24578 FEMI 21592 5MCE9 Unknown risk
24579 FEMI 12320 9HO4Q Unknown risk
24580 FEMI 6592 3VV72 Unknown risk
24581 FEMI 24794 18XBC Unknown risk

Months on Book
0 3
1 3
2 3
3 3
4 3
... ...
24577 4
24578 4
24579 4
24580 4
24581 4

[24582 rows x 10 columns]

Task 2: Labeling customers based on where they are in their tenure.


Cond1 = df['Months on Book'] <=3
Cond2 = (df['Tenure']-df['Months on Book'])<=3
Cond3 = ~(Cond1 | Cond2)

conditions = [
Cond1,
Cond2,
Cond3 ]

values = ['Early Tenure', 'Late Tenure', 'Mid Tenure']


df['Tenure Label'] = np.select(conditions, values, default='Unknown')
df.head(2)
Amount Pending State Tenure Interest Rate City Bounce
String \
0 963 Karnataka 11 7.69 Bangalore
SSS
1 1194 Karnataka 11 6.16 Bangalore
SSB

Disbursed Amount Loan Number Risk Label Months on Book Tenure


Label
0 10197 JZ6FS Low risk 3 Early
Tenure
1 12738 RDIOY High Risk 3 Early
Tenure

df['Tenure Label'].value_counts().plot(kind= 'bar')

<Axes: >
Task 3: Segmenting borrowers based on ticket size
# Sort the DataFrame by 'Amount Pending'
df = df.sort_values(by='Amount Pending').reset_index(drop=True)

# Calculate the cumulative sum of 'Amount Pending'


df['Cumulative Sum'] = df['Amount Pending'].cumsum()

# Calculate the quantiles for the cumulative sum


df['Ticket Size Quantile'] = pd.qcut(df['Cumulative Sum'], q=3,
labels=['Low Ticket Size', 'Medium Ticket Size', 'High Ticket Size'])

# Display the distribution of borrowers in each cohort


print(df['Ticket Size Quantile'].value_counts())

Low Ticket Size 8194


Medium Ticket Size 8194
High Ticket Size 8194
Name: Ticket Size Quantile, dtype: int64

df.head()

Amount Pending State Tenure Interest Rate City \


0 423 Maharashtra 11 11.84 Sangli
1 444 Tamil Nadu 11 12.23 VIRUDHUNAGAR
2 451 Maharashtra 7 37.92 Pune
3 522 Karnataka 11 12.83 Bagalkot
4 522 Maharashtra 11 12.83 Pune

Bounce String Disbursed Amount Loan Number Risk Label Months on


Book \
0 FEMI 4389 HEMS0 Unknown risk
4
1 FEMI 4598 1BYJD Unknown risk
4
2 BSSSSB 2793 7COLC High Risk
6
3 FEMI 5390 587TX Unknown risk
4
4 S 5390 5QJN0 Low risk
1

Tenure Label Cumulative Sum Ticket Size Quantile


0 Mid Tenure 423 Low Ticket Size
1 Mid Tenure 867 Low Ticket Size
2 Late Tenure 1318 Low Ticket Size
3 Mid Tenure 1840 Low Ticket Size
4 Early Tenure 2362 Low Ticket Size
Task 4: Chhannel spend recommendations
# low risk Customers
Low_risk_df = df[df['Risk Label']=='Low risk']

# Firt EMI Customers


First_EMI_df = df[df['Bounce String']=='FEMI']

# Low EMI Customers


Low_EMI_df = df[df['Ticket Size Quantile']=='Low Ticket Size']

# Low or Medium EMI Customers


Low_Med_EMI_df = df[df['Ticket Size Quantile']== ('Low Ticket Size' or
'Medium Ticket Size')]

# English Speaking Customers


Eng_df = df[df['City'] == ( "Mumbai" or "Pune" or "Delhi" or
"Ahmedabad" or "Surat" or "Chennai" or "Kolkata" or "Bangalore" or
"Hyderabad")]

# Hindi Speaking Customers


Hindi_df = df[df['State'] == ('Maharashtra' or 'Madhya Pradesh')]

# Hindi or English Speaking Customers


Hindi_Eng_df = pd.concat([Eng_df, Hindi_df]).drop_duplicates()

# Low Bounce Behaviour Customers


Low_bounce_df = df[df['Risk Label']=='Medium Risk']

# Define the cost of each channel


channel_costs = {'Whatsapp bot': 5, 'Voice bot': 10, 'Human calling':
50}

# Assign each borrower to a channel based on the defined criteria


def assign_channel(row):
if row['Risk Label'] == 'Low risk' or row['Bounce String'] ==
'FEMI' or row['Ticket Size Quantile'] == 'Low Ticket Size':
return 'Whatsapp bot'
elif row['City'] in ["Mumbai", "Pune", "Delhi", "Ahmedabad",
"Surat", "Chennai", "Kolkata", "Bangalore", "Hyderabad"] or
row['State'] in ['Maharashtra', 'Madhya Pradesh'] or row['Risk Label']
== 'Medium Risk':
return 'Voice bot'
else:
return 'Human calling'

# Apply the function to assign channels


df['Channel'] = df.apply(assign_channel, axis=1)
df
Amount Pending State Tenure Interest Rate
City \
0 423 Maharashtra 11 11.84
Sangli
1 444 Tamil Nadu 11 12.23
VIRUDHUNAGAR
2 451 Maharashtra 7 37.92
Pune
3 522 Karnataka 11 12.83
Bagalkot
4 522 Maharashtra 11 12.83
Pune
... ... ... ... ... ..
.
24577 12500 Maharashtra 8 0.00
Kolhapur
24578 12500 Maharashtra 8 0.00
Pune
24579 12500 Kerala 8 0.00
MALAPPURAM
24580 12500 Maharashtra 8 0.00
Sangli
24581 13349 Maharashtra 8 0.00
Nagpur

Bounce String Disbursed Amount Loan Number Risk Label \


0 FEMI 4389 HEMS0 Unknown risk
1 FEMI 4598 1BYJD Unknown risk
2 BSSSSB 2793 7COLC High Risk
3 FEMI 5390 587TX Unknown risk
4 S 5390 5QJN0 Low risk
... ... ... ... ...
24577 BBSSSSS 100000 8MQRY Medium Risk
24578 S 100000 1R840 Low risk
24579 S 100000 QUV9D Low risk
24580 S 100000 66HA4 Low risk
24581 S 106792 HZ6XJ Low risk

Months on Book Tenure Label Cumulative Sum Ticket Size


Quantile \
0 4 Mid Tenure 423 Low Ticket
Size
1 4 Mid Tenure 867 Low Ticket
Size
2 6 Late Tenure 1318 Low Ticket
Size
3 4 Mid Tenure 1840 Low Ticket
Size
4 1 Early Tenure 2362 Low Ticket
Size
... ... ... ... .
..
24577 7 Late Tenure 43979758 High Ticket
Size
24578 1 Early Tenure 43992258 High Ticket
Size
24579 1 Early Tenure 44004758 High Ticket
Size
24580 1 Early Tenure 44017258 High Ticket
Size
24581 1 Early Tenure 44030607 High Ticket
Size

Channel
0 Whatsapp bot
1 Whatsapp bot
2 Whatsapp bot
3 Whatsapp bot
4 Whatsapp bot
... ...
24577 Voice bot
24578 Whatsapp bot
24579 Whatsapp bot
24580 Whatsapp bot
24581 Whatsapp bot

[24582 rows x 14 columns]

# Total cost for each channel

whatsapp_cost = df[df['Channel'] == 'Whatsapp bot'].shape[0] * 5


voice_cost = df[df['Channel'] == 'Voice bot'].shape[0] * 10
human_cost = df[df['Channel'] == 'Human calling'].shape[0] * 50
total_cost = whatsapp_cost + voice_cost + human_cost

print(f"Total cost for Whatsapp Bot: {whatsapp_cost} rupees")


print(f"Total cost for Voice Bot: {voice_cost} rupees")
print(f"Total cost for Human Calling: {human_cost} rupees")
print(f"Total overall cost: {total_cost} rupees")

Total cost for Whatsapp Bot: 96640 rupees


Total cost for Voice Bot: 39610 rupees
Total cost for Human Calling: 64650 rupees
Total overall cost: 200900 rupees

You might also like