Dpdzero Assessment

dpdzero-assessment
March 31, 2024
1 DPDZero Assesment
The dataset contains Loan data for various borrowers in a loan portfolio
The columns are as follows
Amount Pending - This is the EMI amount
State - The borrower’s state
Tenure - Total tenure of the borrower
Interest Rate - Interest rate
City - The city of the borrower
Bounce String
• This is a string that explain’s customer’s bounce behaviour since the disbursal of the loan -
bounce means that the customer did not end up making the payment
• S or H- No bounce in that month
• B or L - Bounce in that month
• FEMI - first EMI - no known behaviour
• Last character denotes the last month - first character denotes the first month on book - for
example SSB means that customer was on book for 3 months and he has bounced the in the
last month
Disbursed Amount - the total disbursed amount
Loan Number - The loan number
[166]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import os
[167]: df = pd.read_csv(r"C:\Users\Mohit\Downloads\Data_Analyst_Assignment_Dataset.
↪csv")
df.head()
1
[167]: Amount Pending State Tenure Interest Rate City Bounce String \
0 963 Karnataka 11 7.69 Bangalore SSS
1 1194 Karnataka 11 6.16 Bangalore SSB
2 1807 Karnataka 14 4.24 Hassan BBS
4 2611 Karnataka 10 4.41 Mysore SSB
Disbursed Amount Loan Number

0 10197 JZ6FS
1 12738 RDIOY
2 24640 WNW4L
3 23990 6LBJS
4 25590 ZFZUA
[168]: df.shape
[168]: (24582, 8)
[169]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24582 entries, 0 to 24581
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Amount Pending 24582 non-null int64
1 State 24582 non-null object
2 Tenure 24582 non-null int64
3 Interest Rate 24582 non-null float64
4 City 24582 non-null object
5 Bounce String 24582 non-null object
6 Disbursed Amount 24582 non-null int64
7 Loan Number 24582 non-null object
dtypes: float64(1), int64(3), object(4)
memory usage: 1.5+ MB
[170]: df.columns
[170]: Index(['Amount Pending', 'State', 'Tenure', 'Interest Rate', 'City',

'Bounce String', 'Disbursed Amount', 'Loan Number'],
dtype='object')
[171]: df.describe(include='all')
[171]: Amount Pending State Tenure Interest Rate City \

count 24582.000000 24582 24582.000000 24582.000000 24582
unique NaN 7 NaN NaN 186
2
top NaN Maharashtra NaN NaN Pune
freq NaN 6793 NaN NaN 1780
mean 1791.172687 NaN 9.415263 0.934960 NaN
std 937.565507 NaN 3.238904 3.114732 NaN
min 423.000000 NaN 7.000000 0.000000 NaN
25% 1199.000000 NaN 8.000000 0.000000 NaN
50% 1593.000000 NaN 8.000000 0.000000 NaN
75% 2083.000000 NaN 11.000000 0.000000 NaN
max 13349.000000 NaN 24.000000 37.920000 NaN
Bounce String Disbursed Amount Loan Number

count 24582 24582.000000 24582
unique 413 NaN 24579
top S NaN T7WLO
freq 3615 NaN 2
mean NaN 17705.195468 NaN
std NaN 14192.671509 NaN
min NaN 2793.000000 NaN
25% NaN 9857.750000 NaN
50% NaN 13592.000000 NaN
75% NaN 19968.000000 NaN
max NaN 141072.000000 NaN
[172]: df.isnull().sum()
[172]: Amount Pending 0

State 0
Tenure 0
Interest Rate 0
City 0
Bounce String 0
Disbursed Amount 0
Loan Number 0
dtype: int64
[173]: df.skew(numeric_only=True)
[173]: Amount Pending 3.077099

Tenure 3.199011
Interest Rate 4.910253
Disbursed Amount 3.282953
dtype: float64
[174]: df.duplicated().sum()
[174]: 0
3
[175]: df['Bounce String'] = df['Bounce String'].str.replace('L','B')
df['Bounce String'] = df['Bounce String'].str.replace('H','S')
1.0.1 Task 1 - Calculating the risk labels for all the borrowers.
[176]: condition2 = df[(df['Bounce String'] != 'FEMI') & ((df['Bounce String'].str.

↪count('B') == 0) & (df['Bounce String'].str.len() <= 6)) |
((df['Bounce String'].str[-6:].str.count('B') == 0) & (df['Bounce␣

↪String'].str.len() > 6))]
[177]: condition2['Bounce String'].unique()
[177]: array(['SSS', 'SS', 'S', 'SSSSS', 'SSSS', 'BSSSSSSS', 'SSSSSSSS',

'BSSSSSS', 'SSSSSSS', 'SSSSSS', 'SSSSSSSSS', 'BBSSSSSS',
'BSSSSSSSSS', 'SSSSSSSSSS'], dtype=object)
[178]: condition3 = df[(df['Bounce String'] != 'FEMI') & (df['Bounce String'].str[-1] !

↪= 'B')& ((df['Bounce String'].str.count('B') == 1) & (df['Bounce String'].
↪str.len() <= 6)) |

↪String'].str.len() > 6))]
[179]: condition3['Bounce String'].unique()
[179]: array(['BS', 'BSSSSS', 'SBSSS', 'SSSBS', 'BSSSS', 'SSBSS', 'BSSS', 'SSBS',

'SBSS', 'SBS', 'SSSSSSSB', 'SSBSSSSS', 'BBBSSSSS', 'BSBSSSSS',
'BSSSSSBS', 'BBSSSSS', 'BSSSSBS', 'BSSSSSB', 'SBSSSSS', 'SSBSSSS',
'SSSSSBS', 'SSSBSSS', 'SSSSBSS', 'SSSSBS', 'SSSBSS', 'SSBSSS',
'BSSBSSSS', 'SBBSSSSS', 'SSSSSSBS', 'SSSSSSB', 'BSSSBSS', 'SBSSSS',
'BBBBBSSSSS', 'SBSSBSSSSS', 'BBBBSSSSS', 'SSSBSSSSS', 'BSSBSSS'],
dtype=object)
[180]: condition1 = (df['Bounce String'] == 'FEMI')

condition2 = ((df['Bounce String'] != 'FEMI') & ((df['Bounce String'].str.
↪count('B') == 0) & (df['Bounce String'].str.len() <= 6)) |

↪String'].str.len() > 6)))
condition3 = ((df['Bounce String'] != 'FEMI') & (df['Bounce String'].str[-1] !=␣

↪'B') & ((df['Bounce String'].str.count('B') == 1) & (df['Bounce String'].str.
↪len() <= 6)) |

↪String'].str.len() > 6)))
condition4 = ~(condition1 | condition2 | condition3)
conditions = [
condition1,
4
condition2,
condition3,
condition4
]
values = ['Unknown risk', 'Low risk', 'Medium Risk', 'High Risk']

df['Risk Label'] = np.select(conditions, values, default='Unknown')
df.head(2)
Disbursed Amount Loan Number Risk Label

0 10197 JZ6FS Low risk
1 12738 RDIOY High Risk
[203]: df['Risk Label'].value_counts().plot(kind= 'bar')
[203]: <Axes: >
5
[181]: df['Months on Book'] = df['Bounce String'].str.len()
[182]: df

0 963 Karnataka 11 7.69 Bangalore
2 1807 Karnataka 14 4.24 Hassan
4 2611 Karnataka 10 4.41 Mysore
… … … … … …
24577 899 Andhra Pradesh 8 0.00 Chittoor
24578 2699 Andhra Pradesh 8 0.00 Krishna
24579 1540 Andhra Pradesh 8 0.00 Krishna
24580 824 Andhra Pradesh 8 0.00 Guntur
24581 2254 Andhra Pradesh 11 0.00 Kurnool
Bounce String Disbursed Amount Loan Number Risk Label \

0 SSS 10197 JZ6FS Low risk
1 SSB 12738 RDIOY High Risk
2 BBS 24640 WNW4L High Risk
3 SSS 23990 6LBJS Low risk
4 SSB 25590 ZFZUA High Risk
… … … … …
24577 FEMI 7192 EAX5C Unknown risk
24578 FEMI 21592 5MCE9 Unknown risk
24579 FEMI 12320 9HO4Q Unknown risk
24580 FEMI 6592 3VV72 Unknown risk
24581 FEMI 24794 18XBC Unknown risk
Months on Book
0 3
1 3
2 3
3 3
4 3
… …
24577 4
24578 4
24579 4
24580 4
24581 4
[24582 rows x 10 columns]
6
1.0.2 Task 2: Labeling customers based on where they are in their tenure.
[183]: Cond1 = df['Months on Book'] <=3

Cond2 = (df['Tenure']-df['Months on Book'])<=3
Cond3 = ~(Cond1 | Cond2)
conditions = [
Cond1,
Cond2,
Cond3 ]
values = ['Early Tenure', 'Late Tenure', 'Mid Tenure']

df['Tenure Label'] = np.select(conditions, values, default='Unknown')
df.head(2)
Disbursed Amount Loan Number Risk Label Months on Book Tenure Label
0 10197 JZ6FS Low risk 3 Early Tenure
1 12738 RDIOY High Risk 3 Early Tenure
[204]: df['Tenure Label'].value_counts().plot(kind= 'bar')
[204]: <Axes: >
7
1.0.3 Task 3: Segmenting borrowers based on ticket size
[184]: # Sort the DataFrame by 'Amount Pending'

df = df.sort_values(by='Amount Pending').reset_index(drop=True)
# Calculate the cumulative sum of 'Amount Pending'

df['Cumulative Sum'] = df['Amount Pending'].cumsum()
# Calculate the quantiles for the cumulative sum

df['Ticket Size Quantile'] = pd.qcut(df['Cumulative Sum'], q=3, labels=['Low␣
↪Ticket Size', 'Medium Ticket Size', 'High Ticket Size'])
# Display the distribution of borrowers in each cohort

print(df['Ticket Size Quantile'].value_counts())
Low Ticket Size 8194

Medium Ticket Size 8194
High Ticket Size 8194
8
Name: Ticket Size Quantile, dtype: int64
[185]: df.head()

0 423 Maharashtra 11 11.84 Sangli
1 444 Tamil Nadu 11 12.23 VIRUDHUNAGAR
2 451 Maharashtra 7 37.92 Pune
3 522 Karnataka 11 12.83 Bagalkot
Bounce String Disbursed Amount Loan Number Risk Label Months on Book \
0 FEMI 4389 HEMS0 Unknown risk 4
1 FEMI 4598 1BYJD Unknown risk 4
2 BSSSSB 2793 7COLC High Risk 6
3 FEMI 5390 587TX Unknown risk 4
4 S 5390 5QJN0 Low risk 1
Tenure Label Cumulative Sum Ticket Size Quantile

0 Mid Tenure 423 Low Ticket Size
2 Late Tenure 1318 Low Ticket Size
4 Early Tenure 2362 Low Ticket Size
1.0.4 Task 4: Chhannel spend recommendations

[199]: # low risk Customers
Low_risk_df = df[df['Risk Label']=='Low risk']
# Firt EMI Customers

First_EMI_df = df[df['Bounce String']=='FEMI']
# Low EMI Customers

Low_EMI_df = df[df['Ticket Size Quantile']=='Low Ticket Size']
# Low or Medium EMI Customers

Low_Med_EMI_df = df[df['Ticket Size Quantile']== ('Low Ticket Size' or 'Medium␣
↪Ticket Size')]
# English Speaking Customers

Eng_df = df[df['City'] == ( "Mumbai" or "Pune" or "Delhi" or "Ahmedabad" or␣
↪"Surat" or "Chennai" or "Kolkata" or "Bangalore" or "Hyderabad")]
# Hindi Speaking Customers

Hindi_df = df[df['State'] == ('Maharashtra' or 'Madhya Pradesh')]
9
# Hindi or English Speaking Customers
Hindi_Eng_df = pd.concat([Eng_df, Hindi_df]).drop_duplicates()
# Low Bounce Behaviour Customers

Low_bounce_df = df[df['Risk Label']=='Medium Risk']
[200]: # Define the cost of each channel

channel_costs = {'Whatsapp bot': 5, 'Voice bot': 10, 'Human calling': 50}
# Assign each borrower to a channel based on the defined criteria

def assign_channel(row):
if row['Risk Label'] == 'Low risk' or row['Bounce String'] == 'FEMI' or␣
↪row['Ticket Size Quantile'] == 'Low Ticket Size':
return 'Whatsapp bot'

elif row['City'] in ["Mumbai", "Pune", "Delhi", "Ahmedabad", "Surat",␣
↪"Chennai", "Kolkata", "Bangalore", "Hyderabad"] or row['State'] in␣
↪['Maharashtra', 'Madhya Pradesh'] or row['Risk Label'] == 'Medium Risk':
return 'Voice bot'

else:
return 'Human calling'
# Apply the function to assign channels

df['Channel'] = df.apply(assign_channel, axis=1)
df

1 444 Tamil Nadu 11 12.23 VIRUDHUNAGAR
3 522 Karnataka 11 12.83 Bagalkot
… … … … … …
24577 12500 Maharashtra 8 0.00 Kolhapur
24579 12500 Kerala 8 0.00 MALAPPURAM
24581 13349 Maharashtra 8 0.00 Nagpur
Bounce String Disbursed Amount Loan Number Risk Label \

0 FEMI 4389 HEMS0 Unknown risk
1 FEMI 4598 1BYJD Unknown risk
2 BSSSSB 2793 7COLC High Risk
3 FEMI 5390 587TX Unknown risk
4 S 5390 5QJN0 Low risk
… … … … …
24577 BBSSSSS 100000 8MQRY Medium Risk
24578 S 100000 1R840 Low risk
10
24579 S 100000 QUV9D Low risk
24580 S 100000 66HA4 Low risk
24581 S 106792 HZ6XJ Low risk
Months on Book Tenure Label Cumulative Sum Ticket Size Quantile \

0 4 Mid Tenure 423 Low Ticket Size
2 6 Late Tenure 1318 Low Ticket Size
4 1 Early Tenure 2362 Low Ticket Size
… … … … …
24577 7 Late Tenure 43979758 High Ticket Size
24578 1 Early Tenure 43992258 High Ticket Size
Channel
0 Whatsapp bot
1 Whatsapp bot
2 Whatsapp bot
3 Whatsapp bot
4 Whatsapp bot
… …
24577 Voice bot
24578 Whatsapp bot
24579 Whatsapp bot
24580 Whatsapp bot
24581 Whatsapp bot
[24582 rows x 14 columns]
[201]: # Total cost for each channel
whatsapp_cost = df[df['Channel'] == 'Whatsapp bot'].shape[0] * 5

voice_cost = df[df['Channel'] == 'Voice bot'].shape[0] * 10
human_cost = df[df['Channel'] == 'Human calling'].shape[0] * 50
total_cost = whatsapp_cost + voice_cost + human_cost
print(f"Total cost for Whatsapp Bot: {whatsapp_cost} rupees")

print(f"Total cost for Voice Bot: {voice_cost} rupees")
print(f"Total cost for Human Calling: {human_cost} rupees")
print(f"Total overall cost: {total_cost} rupees")
Total cost for Whatsapp Bot: 96640 rupees

Total cost for Voice Bot: 39610 rupees
Total cost for Human Calling: 64650 rupees
11
Total overall cost: 200900 rupees
[ ]:
12

Dpdzero Assessment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dpdzero Assessment

Uploaded by

Copyright:

Available Formats

dpdzero-assessment

March 31, 2024

Disbursed Amount Loan Number

[170]: Index(['Amount Pending', 'State', 'Tenure', 'Interest Rate', 'City',

[171]: Amount Pending State Tenure Interest Rate City \

Bounce String Disbursed Amount Loan Number

[172]: Amount Pending 0

[173]: Amount Pending 3.077099

[176]: condition2 = df[(df['Bounce String'] != 'FEMI') & ((df['Bounce String'].str.

((df['Bounce String'].str[-6:].str.count('B') == 0) & (df['Bounce␣

[177]: condition2['Bounce String'].unique()

[177]: array(['SSS', 'SS', 'S', 'SSSSS', 'SSSS', 'BSSSSSSS', 'SSSSSSSS',

[178]: condition3 = df[(df['Bounce String'] != 'FEMI') & (df['Bounce String'].str[-1] !

↪str.len() <= 6)) |

((df['Bounce String'].str[-6:].str.count('B') == 1) & (df['Bounce␣

[179]: condition3['Bounce String'].unique()

[179]: array(['BS', 'BSSSSS', 'SBSSS', 'SSSBS', 'BSSSS', 'SSBSS', 'BSSS', 'SSBS',

[180]: condition1 = (df['Bounce String'] == 'FEMI')

((df['Bounce String'].str[-6:].str.count('B') == 0) & (df['Bounce␣

condition3 = ((df['Bounce String'] != 'FEMI') & (df['Bounce String'].str[-1] !=␣

↪len() <= 6)) |

((df['Bounce String'].str[-6:].str.count('B') == 1) & (df['Bounce␣

condition4 = ~(condition1 | condition2 | condition3)

values = ['Unknown risk', 'Low risk', 'Medium Risk', 'High Risk']

Disbursed Amount Loan Number Risk Label

[203]: df['Risk Label'].value_counts().plot(kind= 'bar')

[203]: <Axes: >

[182]: Amount Pending State Tenure Interest Rate City \

Bounce String Disbursed Amount Loan Number Risk Label \

[24582 rows x 10 columns]

[183]: Cond1 = df['Months on Book'] <=3

values = ['Early Tenure', 'Late Tenure', 'Mid Tenure']

[204]: df['Tenure Label'].value_counts().plot(kind= 'bar')

[204]: <Axes: >

[184]: # Sort the DataFrame by 'Amount Pending'

# Calculate the cumulative sum of 'Amount Pending'

# Calculate the quantiles for the cumulative sum

# Display the distribution of borrowers in each cohort

Low Ticket Size 8194

[185]: Amount Pending State Tenure Interest Rate City \

Tenure Label Cumulative Sum Ticket Size Quantile

1.0.4 Task 4: Chhannel spend recommendations

# Firt EMI Customers

# Low EMI Customers

# Low or Medium EMI Customers

# English Speaking Customers

# Hindi Speaking Customers

# Low Bounce Behaviour Customers

[200]: # Define the cost of each channel

# Assign each borrower to a channel based on the defined criteria

return 'Whatsapp bot'

↪['Maharashtra', 'Madhya Pradesh'] or row['Risk Label'] == 'Medium Risk':

return 'Voice bot'

# Apply the function to assign channels

[200]: Amount Pending State Tenure Interest Rate City \

Bounce String Disbursed Amount Loan Number Risk Label \

Months on Book Tenure Label Cumulative Sum Ticket Size Quantile \

[24582 rows x 14 columns]

[201]: # Total cost for each channel

whatsapp_cost = df[df['Channel'] == 'Whatsapp bot'].shape[0] * 5

print(f"Total cost for Whatsapp Bot: {whatsapp_cost} rupees")

Total cost for Whatsapp Bot: 96640 rupees

You might also like