You are on page 1of 11

Page 1 of 11

Business Analytics Project


BY
Meghana Anaparthy
Netid: ma6781
Course Id: MG-GY 8413 Business Analytics
Mentor: Prof. Mukul Pareek
Date: 04/29/2022
Page 2 of 11

Contents
Overview: ..................................................................................................................3
Analysis: ....................................................................................................................3
Deeper Analysis and proposal: ...............................................................................8
Technical explanations: ...........................................................................................9
Summary:................................................................................................................11
Page 3 of 11

Analytics report

To: President of the Association


Company: Big Bangs Association
Branch: New York
Date: 4/29/2022

Overview:
We are aware of the Central Financial Protection Bureau (CFPB). This U.S. government agency
collects consumer complaints about financial products and services from various sources. It routes
them to the companies to ensure they have responded. Therefore, the collected data has been taken
from the website to discern a few meaningful insights for the association. It delivers the positives
and negatives from the complaints data analysis. This report also provides a profound study of the
Big Bang association companies' complaints and provides a summary and suggestions to reduce
the cost of resolution.
The central aspect of the report is to propose a model that saves thousands of dollars for the
association and reduces the number of complaints that end up in dispute.

Analysis:
Based on the Data we found from CFPB website we were able to analyze and draw some
conclusions about the association and also make some recommendations for companies of the
association to consider them.
Below are the few insights that we discerned from the data:

1. Timely response: One positive aspect found through the data is that most of
the complaints were timely responded to by all the companies. A timely response can be
critical for customers complaining about the product or the service. Negligence or late
response can lead to loss of customers and eventually loss. However, a few other aspects
were found during the analysis and needed to be addressed by the association.
Page 4 of 11

2. Company response to the consumer: The highest number of complaints were closed with
an explanation. It states that the companies take consumer reviews seriously and are
interested in resolving their complaints.
Page 5 of 11

3. The mode of Submission:


The complaints mainly were submitted through the web and significantly less via email.

4. Emotion of customers: After performing the sentimental analysis on the customer


complaint narratives, we found that 52.4% of words were negative, 47.6% were neutral,
and 0% were positive, which can be expected but stating the usual facts.

However, there are many things that the association needs to focus on; a few are listed below:
1. The focus company: Out of the five companies, Bank of America, National Association
has received the most complaints with a count of 65,440 and U.S. Bancorp the least with a
count of just 12,198.
Page 6 of 11

2. The focus product/Service: From the above visualization, the Mortgage product/service
has received the most complaints with a whopping count of 101,680, whereas the count of
second-most is just 44,594, which is less than half of Mortgage.
Page 7 of 11

3. The focus issues: The interesting fact is that most issues were found in the section, Loan
modification, collection, and foreclosure.

4. The focus regions: The heat map shows the states where most complaints came from. The
top 10 states were California, Florida, New York, Texas, New Jersey, Georgia, Illinois,
Maryland, Pennsylvania, and Virginia.
Page 8 of 11

Deeper Analysis and proposal:


The chart below shows the number of complaints that the customers have disputed. Even though
the number looks small compared to the non-disputed complaints, if the association has to spend
an extra $1500 for every disputed complaint, they spend a high amount of
$67,354,500(1500*44903).

A proposed solution to reduce the cost:


1. Machine learning algorithms can predict the complaints that can be disputed by the
customers and take care of the issue beforehand; prevention is better than cure.
2. We created a model using the XG Boost algorithm that comparatively reduces the
complaints-related costs.
3. Keeping in mind that we are looking to have fewer false negatives, i.e., to identify the
complaints that are prone to dispute as accurately as possible, we aim to have a high recall
which provides a solution to cut down the expenses.
4. We adjust a threshold parameter to achieve the required recall, like setting an appropriate
probability value to achieve low cost. To Illustrate, an XG Boost algorithm with a default
threshold can cost $6350010 after prediction, whereas the same algorithm with a tweaked
threshold (say 0.15) can reduce the cost to $2914710, reduced by 54.10% from the before
cost.
Page 9 of 11

Technical explanations:
1. The dataset was prepped, eradicating null and unwanted values.
2. We chose consumer disputed as the target variable (Y), and out of the other variables, the
relatable ones are grouped as a Feature set.
3. We tried out a few algorithms like XG Boost, Logistic and Random Forest. Below are the
results from all the algorithms before and after altering the threshold.
4. After comparisons, it was evident that the best model was XG Boost with the threshold
between 0.1 and 0.2, preferable 0.13

XGBoost results with default threshold:

XG BOOST CONFUSION MATRIX WITH DEFAULT THRESHOLD

COST CALCULATION AND RESULT WITH DEFALULT THRESHOLD


Page 10 of 11

XGBoost results after tweaking threshold:

LEAST POSSIBLE COST WITH HIGH RECALL VALUES

(0.13, $2914710)
THIS PLOT SHOWS WHICH THRESHOLD VALUE WAS ABLE TO GIVE US THE
MINIMUM COST

CONFUSION MATRIX AT THRESHOLD 0.13


Page 11 of 11

5. ROC - AUC Curve:


The curve below looks average as we compromised on precision to achieve high recall.
This means we did a good job in reducing the False negative rates which was our primary
focus.

6. Logistic regression almost gave results as efficient as XG Boost, but XG Boost won over
it by giving a minimum cost a little lesser. So, Logistic regression is the second-best option.

Summary:
The complaints data provided by the CFPB website has helped us analyze the situations of the five
companies that are part of the Big Bangs association. We suggest the company make changes and
focus on the highlighted sections shown in the visualizations to improve the association's
operations and reduce the incoming complaints from the customers. We also suggest the
association use our XG model with a tweaked threshold to predict the complaints that can end up
in dispute early and save a significant amount of money.

You might also like