You are on page 1of 12

Technical Audit / Proposal

31th July 2022

1
Executive Summary 3
Gaps identified 3
Kashat / TensorGraph engagement options: 4
Consultancy agreement to support LFD close the identified gaps 4
Technical Partnership to transition the LFD assets to TensorGraph 5

The LFD current ML approach 6


Limitations : Summary 6
Limitations : Details 7
The good, the bad and the ugly 7
Model performance provided by LFD 8
LFD Model cost 9

TensorGraph Technical proposal 11


Overview 11
Model cost reduction 11

2
Executive Summary
TensorGraph is a Machine Learning Solution provider hired by Kashat to audit the Machine Learning
technology and processes built and supported by LFD.

This document scope is to present


1. the technical gaps identified
2. Kashat/Tensorgraph potential engagement options to close the identified gaps

The document does not cover legal or IP assessment.

Gaps identified
After reviewing the LFD ML models (Algo or classifier) and processes, we identified 10 gaps which we
categorized by criticality

High:
1. ML Models optimization were focused on reducing the cost associated with default rate more than it
was focused on reducing the cost of lost opportunities. LFD and Kashat have not developed a
comprehensive KPI to track both costs.
2. Correlation between data quality coming from traffic generation (marketing campaigns) and classifier
performance is not analyzed/tracked.
3. No ML model versioning system is currently used.
4. No versioned gold test dataset is currently used.
5. No data partner or expert data labeller (professional underwriter) was engaged in building test
datasets, and monitoring ML model performance.

Medium:
6. No applicant profile similarity metrics were used to analyze LFD historical data
7. No systematic A/B testing were conducted for ML models benchmarking
8. No focused study for feature correlation analysis (affects classifier performance)

Low:
9. No academic partner
10. No focused study for feature reduction

We estimated the cost of using the current ML model to be around 22% of the amount disbursed.
Our calculation is based on:
● 10,000 loans/month

3
● Average ticket size 400EGP and interest rate of 28%
● Tra c with 43% bad applicants

The credit score of 55-60 chosen by LFD to achieve a default rate below than 25% for new
customers (red line) is at the expense of a high cost of lost opportunities (green line).

We believe that the ratio of the LFD model total cost to disturbsed amount (estimated to be 22%)
will increase as you drive more tra c to the kashat application which will result in higher
rejection rate and higher customer acquisition cost. The high rejection rate in June 2022 could
be attributed to the issue explained above.

Kashat / TensorGraph engagement options:

● Consultancy agreement to support LFD close the identified gaps


Estimated cost: ~$15,000/month
Kashat will continue to use LFD as its ML vendor, and TensorGraph will support Kashat team to
request and review the right KPIs, processes, artifacts from LFD

4
● Technical Partnership to transition the LFD assets to TensorGraph
Timeline: 9 month for transition in three milestones
Estimated cost during transition: ~$30,000/month
Estimated cost post transition and up to 500,000 disbursed loans: ~$40,000 - 60,000/month
Estimated cost post transition and up to 1Million disbursed loans): ~$70,000 - 120,000/month
This is a high level estimate and if there is an interest to explore in further detail, TensorGraph will
prepare a detailed plan. TensorGraph has a similar type of engagement and it was very successful.
TensorGraph will help Kashat hire a small team (ML engineer, data scientist, and ML architect) and
the rest of the team will be under the TensorGraph workforce.

5
The LFD current ML approach
The LFD model is based on a logistic regression classifier with ~220 features for the new customers.

𝑋 : 𝐿𝑜𝑎𝑛 𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑒𝑟,
𝑌 : [𝑔𝑒𝑛𝑑𝑒𝑟, 𝑚𝑎𝑟𝑖𝑡𝑎𝑙_𝑠𝑡𝑎𝑡𝑢𝑠, 𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑠_𝑛𝑎𝑚𝑒, 𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑠_𝑡𝑦𝑝𝑒, 𝑐𝑎𝑙𝑙𝑠, ...., 𝑎𝑝𝑝_𝑖𝑛𝑠𝑡𝑎𝑙𝑙_𝑑𝑎𝑡𝑎_𝑚𝑖𝑠𝑠𝑖𝑛𝑔]
𝑍 : 𝑆𝑐𝑜𝑟𝑒 𝑓𝑟𝑜𝑚 0 𝑡𝑜 100 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑠 𝑡ℎ𝑒 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑖𝑛𝑔

Limitations : Summary
After analyzing artifacts provided by LFD we noticed 3 main limitations

Limitation Summary Impact on business


- documentation readiness
- MLOps
- iterative data-centric approach - harder to calculate actual AI capacity cost
- accurate labeled data and gold - harder to judge AI performance
standard - harder to make an informed decision on
Process limitation - limited or incomplete KPIs conservativeness vs scaling
- low confidence in current
AI-selected strategies/approaches
to scale up and capture more - high risk to lose potential good new customers (new
complex relationships for credit business opportunities)
strategy limitations bureau - backed by a (simple) credit bureau
- new customers classifier
performance is ~ 62 ~ accurate, to
mitigate the risk the chosen
threshold must be aggressive to
reject any low-confidence client,
which comes at the expense of - cost of lost opportunity
losing many potential good - the cost of customer retention
performance limitations customers - marketing cost

6
Limitations : Details

The good, the bad and the ugly


The good
- Easy to implement algorithm
- Most popular white box algorithm to be used for credit scoring
- Easy to interpret and explain
- Very fast
- Performs very well if the data is linearly separable
- Very popular algorithm in the domain of credit scoring.
- Has wide community support.
- Doesn’t require hard core infrastructure to be trained
- Good performance if the correlation between parameters are simple
-
The bad
- It is tough to obtain complex and non linear relationships using logistic regression.
- Poor handling to variable multicollinearity
The ugly
LR mainly draws a decision boundary between the 2 classes assumed.
- When the relationship between the variables is described as
- Complex
- Multicollinearity
- Non linear
The algorithm tends to fail to understand and model the reality of the relationship between the
features and the outcome, which makes the algorithm more suited for a highly separable problem,
for the algorithm to work wel, it needs
- Deep study of feature importance
- Deep study of feature correlation
- Explicit feature selection and feature reduction
LR doesn’t take into account the difference between the historical circumstances governing the data
used in training certain version of the model which limits the algorithm performance and produce wrong
detections when these circumstances change

7
Model performance provided by LFD

According to results shared by LFD, it is clear that the algorithm is still struggling with new customer
datasets and features which might suggest the need for a detailed
- Feature importance analysis
- Feature correlation analysis
- Feature reduction and correlation resolution

8
LFD Model cost
Our calculation is based on:
● 10,000 loans/month
● Average ticket size 400EGP and interest rate of 28%
● Tra c with 43% bad applicants

Score Cost of bad borrowers Cost of lost opportunity Total cost

0-5 2,216,960 0 2,216,960.00

5-10 2,216,960 0 2,216,960.00

10-15 2,216,960 0 2,216,960.00

15-20 2,216,738 64 2,216,801.81

20-25 2,204,915 613 2,205,528.34

25-30 2,119,578 7,482 2,127,059.64

30-35 1,879,732 38,903 1,918,634.90

35-40 1,447,867 106,721 1,554,588.38

40-45 1,033,516 182,402 1,215,917.66

45-50 770,448 247,720 1,018,167.72

50-55 545,208 322,656 867,864.76

55-60 286,412 425,645 712,056.79

60-65 120,478 518,706 639,184.07

65-70 35,528 584,684 620,211.55

70-75 6,677 619,029 625,705.96

9
75-80 0 633,472 633,472.00

80-85 0 635,040 635,040.00

85-90 0 635,040 635,040.00

90-95 0 635,040 635,040.00

95-100 0 635,040 635,040.00

10
TensorGraph Technical proposal
Overview
Working towards the goal of minimizing the overall cost of AI classifiers while maximizing the revenue
generated using the system, we propose the following 3 pillars to track and maintain.

Model cost reduction


- Minimizing cost of lost opportunities, without compromising the risk level of the business
model
- Unlocking potential of AI credit bureau by using more advanced ML techniques
- Solid explainable MLOps (ML management and operations process), to
- Monitor
- Track
- Version
- Explain
- Iteratively enhance AI capacity.

We believe that the key to a successful AI based commercial application/solution is not the outcome alone,
but the MLOps process and the ecosystem governing the typical procedure to produce the final outcome,
the ML model.

It’s a collaborative process and a learning journey between all stakeholders


- Kashat management

11
- Kashat Marketing team
- Kashat Underwriters
- Investors and board of directors

Given the above premises, we propose our MLOps iterative process of model production that will be used to
● Develop
● Analyze
● Interpret and explain
each model iteration, which could be summarized as follows.

12

You might also like