Professional Documents
Culture Documents
On
Exploring Banking Trends using Business Analytics
Report submitted in partial fulfillment of the requirement for the award of
Post Graduate Diploma in Management
(2022-2024)
Submitted by
Name:- A.Sekhar
i
INDUSTRY INTERNSHIP REPORT
On
Exploring Banking Trends using Business Analytics
Report submitted in partial fulfillment of the requirement for the award of Post
Graduate Diploma in Management
(2022-2024)
Submitted by
Name:- A.Sekhar
ii
INTERNSHIP CERTIFICATE
iii
DECLARATION
iv
ACKNOWLEDGEMENTS
This project would not have been completed successfully without the help of
my mentor and guide, Professor Mr. T. Subash Tej, Assistant Professor, Dept
of Data Sciences who reviewed the entire project and provided invaluable
advice, ongoing support, helpful recommendations, and encouragement from
the beginning to the end.
v
vi
vii
TABLE OF THE CONTENT
CHAPTER NO CONTENT PAGE NO
Industry and Brief Report On OJT 1-2
Company Introduction 3
Chapter -I Introduction 5-6
Chapter II Review Of Literature 8-10
Chapter III Research Methodology 12-13
Chapter IV Data Analysis 15-33
Chapter V Findings And Conclusion 34-35
Reference 36
LIST OF FIGURES
SL.NO TITLE OF THE FIGURE PAGE NO
1 Histogram Distribution 15
2 Average Duration 16
3 Marital Status Distribution 17
4 Trend Line Of Age And Marriage 18
5 Relationship Between A Campaign And 19
Duration And Age
Loan Duration Group By Age 20
6 Top 5 Bank Clients Age And Job Boxplot 21
7 Distribution Of Some Numerical Feature In 22
Two Categories
8 Distribution Of Customers By Months 23
9 Line Chart 25
10 Job Features By Clients 26
11 Marital Status 26
13 Education Feature 27
14 Loan 28
15 Month 28
16 Age And Balance Distribution 29
17 Random Forest Model 30,31
18 Random Forest Model-2 32,33
viii
INDUSTRY PROFILE:
The e-learning and knowledge-sharing industry has exploded from its early days of mail-
based courses and virtual classrooms to encompass online degrees and widespread adoption
by schools, businesses, and individuals alike. Driven by factors like increased internet access,
mobile learning, and upskilling needs, the market is projected to reach $375 billion by 2025.
Key segments include corporate training, education for all ages, and professional
development. Advancements in AI, VR, and personalized learning are shaping the future of
education, making it more accessible, engaging, and tailored to individual needs. Whether it's
for career advancement, skill development, or simply pursuing knowledge, e-learning offers a
flexible and effective solution for learners in today's dynamic world.
Historical Growth:
E-learning has evolved from early forms like distance courses and mail-based
learning to virtual classrooms and online degrees.
The term "e-learning" was coined in 1998 and its adoption has surged since then,
reaching 77% of companies by 2011.
Current Landscape:
E-learning is used by schools, colleges, businesses, and individuals for education and
training.
It offers benefits like accessibility, flexibility, cost-effectiveness, variety, and
engaging experiences.
The global market size is estimated to reach $375 billion by 2025, driven by factors
like increased internet penetration, mobile learning, and upskilling/reskilling needs.
Key Segments and Trends:
Key segments include corporate training, K-12 education, higher education, and
professional development.
Growth drivers include technological advancements like AI, VR, and AR,
microlearning, personalization, blended learning, and gamification.
Focus is on tailoring content to individual needs, providing bite-sized learning
modules, mobile learning, social learning, and AI-powered features.
Benefits for Learners:
1
ON JOB TRINING (OJT)
About the company:
Mozo Hunt Pvt Ltd is a cloud-based digital publishing and distributing platform that helps
publishers, Authors, Students, Teacher content aggregator, service providers educational
institutes, and corporates, etc. to produce, import, sell, manage, and deliver content across
devices in digitally accessible formats, in a secure environment. Mozo Hunt supports rich,
interactive content, fixed layout & reflowable ePub with rich media content and provides a
seamless user experienced in both online modes.
Mission:
Mojo Hunt’s mission is to empower individuals and organizations to easily access knowledge
and information. They strive to create a platform where users can discover, share, and learn
from each other in a collaborative environment.
vision:
Mozo Hunt envision a world where everyone has access to just the right amount of
educational and learning resources. They believe that knowledge should be immediately
available to everyone who seeks it, regardless of their background or circumstances.
Values:
Mozo Hunt is committed to the following values.
Innovation: They are constantly working to develop new and innovative ways to make
knowledge sharing more efficient and effective.
Quality: Guaranteed to provide users with high quality information that is accurate, reliable
and up to date.
Access: They believe that knowledge should be accessible to everyone regardless of their
background or circumstances.
Discussion: Users are encouraged to share their knowledge and expertise with others.
Community: Users are given a sense of community by providing a space where they can
connect and interact with each other.
2
Role and Responsibilities:
During my internship at Mozo Hunt Pvt Ltd as a Marketing and Business Analyst. I had an
opportunity to gain valuable insights on Marketing Experience Research and Analysis
3
CHAPTER-1
INTRODUCTION
4
INTRODUCTION
In the dynamic landscape of the banking industry, the utilization of advanced analytics has
become indispensable for uncovering valuable insights and navigating the evolving needs of
customers. Visual analytics, a potent tool in this realm, allows banks to explore trends and
patterns in customer data, offering a deeper understanding of client behavior. This research
delves into the extensive dataset obtained from Kaggle, employing visual analytics
techniques to shed light on crucial aspects of the banking sector.
The exploration begins with histograms and boxplots, revealing the distribution of customer
age and the correlation between customer duration and loan default status. These
visualizations uncover trends, emphasizing the significance of age and duration in predicting
loan default. Marital status distribution, presented through bar graphs and line graphs, further
enriches our understanding, showcasing the predominant presence of married individuals and
dynamic shifts in the distribution with age.
The exploration extends to the impact of marketing campaigns on different age groups,
illustrated through scatter plots. Additionally, the relationship between loan duration and
default status is analyzed, highlighting a nuanced correlation that demands attention. The
investigation delves into the top five client categories, offering insights into the distribution
of age and job roles through boxplots.
A heatmap emerges as a powerful tool, unveiling strong relationships between various
customer characteristics. The correlation between age and loan default, as well as income and
loan amount, surfaces as key insights with implications for strategic decision-making. The
study further explores the distribution of customers across months, suggesting potential
seasonal variations.
The analysis culminates in the development and utilization of Random Forest models,
emphasizing the role of machine learning techniques in predicting outcomes and discerning
patterns within the dataset. The findings, presented in Chapter 5, underscore essential trends,
including a positive correlation between age and loan default, the influence of marital status,
and the impact of larger loan amounts on default likelihood.
Changes in Banking:
Banks have shown great change over time. The old ways of banking have shifted to
dynamic methods, backed by technology.
Understanding Customers:
In this busy market, grasping what your customer wants and how they behave is super
important. Banks are turning to tools that break down customer data.
5
dangers efficiently. Loan default prediction has end up a critical location of attention,
necessitating a deep dive into client traits, demographics, and ancient behaviors.
Strategic Decision-Making:
In an surroundings wherein strategic decision-making is fundamental to a bank's
achievement, visual analytics provides a holistic technique to interpreting complicated
datasets.
6
CHAPTER-2
REVIEW OF LITERATURE
7
David Jonker; Scott Langevin; Peter Schretlen; Casey Canfield (2012)
This paper outlines the rapid development of Aperture, a specialized cyber situational
awareness and analysis application designed for the 2012 IEEE VAST Mini-Challenge 1
(MC1) on Cyber Situation Awareness. The noteworthy aspect of this project lies in its focus
on creating a tailored solution for a "big data" application. Aperture stands out as an open,
adaptable, and extensible Web 2.0 visualization framework, enabling the generation of
visualizations for analysts and decision-makers accessible through common web browsers.
The framework employs a unique layer-based approach to visualization assembly and
features a data mapping API, streamlining the transformation of data or analytic results into
visual representations with specific properties.
8
metrics. Additionally, we have developed a Data Visualization RShiny App for Data Science
and Management, specifically designed for customer churn analysis. This tool facilitates a
comprehensive understanding of the data trends, enabling the bank to proactively address
customer attrition and implement targeted retention effort
9
violations and the identification of multidimensional abnormalities. A real-world case study
involving a company providing accounting services and a healthcare industry client illustrates
the framework's application, specifically in improving the reliability of payroll audits. The
paper contributes to CCM literature by advocating for the use of machine learning and
interactive data visualization to address data overload for managers. It also presents evidence
supporting the economic and behavioral advantages of the proposed control monitoring
approach, showcasing how advanced technology enhances risk assessment, anomaly
identification, and loss prevention with increased efficiency and accuracy. Guidelines for
artifact production and utilization further contribute to the field of control monitoring.
10
CHAPTER-3
RESEARCH METHODOLOGY
11
OBJECTIVES
METHODOLOGY
The following methodology was used to achieve the objectives of this study:
DATA COLLECTION:
Secondary data was collected from Various sources like banks and government websites, a
data repository platform. The data set contains 49,732 samples of customer data, including
age, gender, occupation, loan amount, and loan default status.
DATA PREPARATION
The data was prepared by removing null values, missing values, unnecessary columns, and
outliers. The following steps were taken:
1.Null value removal: Null values were removed from the data set.
2.Missing value removal: Missing values were imputed using the mean or median of the
corresponding variable.
3.Unnecessary column removal: Unnecessary columns were removed from the data set, such
as the customer ID column.
4.Outlier detection and removal: Outliers were detected using the interquartile range (IQR)
method. Outliers were removed from the data set if they were more than 1.5 IQRs below the
first quartile or more than 1.5 IQRs above the third quartile.
12
DATA ANALYSIS
Python was used to analyze the data in Jupyter Notebook. The following libraries were used:
1. Exploratory data analysis (EDA): EDA was performed to understand the data
distribution and identify any patterns or trends. This was done by creating
histograms, boxplots, and correlation matrices.
2. Feature selection: Features that were most relevant to the research objectives were
selected for further analysis.
3. Model building: A regression model was built to predict loan default. The following
steps were taken to build the model:
The age distribution of customers is skewed towards the younger age groups.
There is a positive correlation between customer age and loan default status, meaning
that older customers are more likely to default on their loans.
The regression model was able to predict loan default with an accuracy of 75%.
13
CHAPTER-4
DATA ANALYSIS
14
Visual analytics is a powerful tool that can be used to explore banking trends and identify
patterns in customer data. By using visual analytics tools, banks can gain insights into their
customers' needs and preferences and develop strategies to improve their products and
services.
One way to use visual analytics to explore banking trends is to create histograms and
boxplots. Histograms show the distribution of a variable, such as customer age or loan
amount. Boxplots show the median, quartiles, and outliers of a variable. By creating
histograms and boxplots, banks can identify trends in their customer data and identify areas
where they can improve.
Another way to use visual analytics to explore banking trends is to create correlation
matrices. Correlation matrices show the correlation between different variables. For example,
a correlation matrix could show the correlation between customer age and loan default status.
By creating correlation matrices, banks can identify relationships between different variables
and develop strategies to improve their products and services.
Data columns
Histogram distribution
15
The histogram shows the distribution of customer age in a banking data set. The age
distribution is skewed to the left, which suggests that there are more younger customers than
older customers. The most common age group is 20-29 years old, followed by the 30-39
years old age group.
The histogram also shows that there are some outliers, which are customers who are
significantly older or younger than the majority of customers. These outliers could be due to a
variety of factors, such as customers who have recently opened accounts or customers who
are closing their accounts.
Average Duration
T
he boxplot shows the distribution of customer duration by loan default status. The median
duration of customers who defaulted on their loans is higher than the median duration of
customers who did not default on their loans. This suggests that there is a positive correlation
between customer duration and loan default status, meaning that older customers are more
likely to default on their loans.
The boxplot also shows that there is more variability in the duration of customers who
defaulted on their loans than in the duration of customers who did not default on their loans.
16
This suggests that there are a wider range of factors that contribute to loan default in older
customers.
The bar graph reveals that the majority of bank customers in the dataset are married, with a
substantial count exceeding 25,000. Following closely, individuals classified as single
represent the second-largest group, while the divorced category shows a count surpassing
5,000. Consequently, it can be concluded that the predominant portion of the bank's customer
base consists of married individuals based on the provided data
17
The image you sent shows a line graph showing the relationship between age and marital
status. The line graph shows that the percentage of people who are married increases with
age, while the percentage of people who are single decreases with age. The percentage of
people who are divorced remains relatively constant with age.
18
RELATIONSHIP BETWEEN A CAMPAIGN AND DURATION AND
AGE
The scatter plot shows that the majority of people in the group are younger than 30 years old.
This is evident from the fact that the median age is 30 years old and the 25th and 75th
percentiles are 25 and 35 years old, respectively.
The scatter plot also shows that there are a few people in the group who are significantly
older or younger than the median age. These people are likely outliers, and their ages may be
due to a variety of factors, such as retirement or having children at a young age.
19
Loan duration group by age
The scatter plot shows that longer-term loans are more likely to default than shorter-term
loans. This is evident from the fact that there is a weak positive correlation between the two
variables.
There are a few possible explanations for this correlation. One possibility is that people are
more likely to take out longer-term loans if they have a lot of debt or if they are struggling to
make ends meet. This could mean that they are more likely to default on their loans, as they
may have difficulty making the monthly payments.
Another possibility is that lenders are more likely to offer longer-term loans to people who
have poor credit scores. This is because lenders are less likely to be able to recoup their losses
if the borrower defaults on a longer-term loan.
20
Top 5 bank clients age and job boxplot
The plot shows that among the top-5 client categories, the most senior customers represent
the management, and the largest number of outliers is among the admin. and technician.
A Heat Map allows you to look at the distribution of some numerical feature in two
categories. We visualize the distribution of clients on family status and the type of
employment.
21
Distribution of some numerical feature in two categories
The heatmap shows that there are a few strong relationships between different customer
characteristics in a banking data set.
One of the strongest relationships is between customer age and loan default status. Customers
who are older are more likely to default on their loans. This is likely due to a number of
factors, such as older customers being more likely to have health problems or being more
likely to have retired from their jobs.
Another strong relationship is between customer income and loan amount. Customers with
higher incomes are more likely to take out larger loans. This is likely due to customers with
higher incomes being able to afford to make higher monthly payments.
22
Distribution of customers by months
The boxplot shows that borrowers who take out larger loans are more likely to default on
those loans. This is evident from the fact that the median loan amount for customers who
defaulted on their loans is higher than the median loan amount for customers who did not
default on their loans.
There are a few possible explanations for this correlation. One possibility is that borrowers
who take out larger loans are more likely to be struggling financially in the first place. This
could mean that they may have difficulty making the monthly payments on their loans.
23
he dataset reveals that the majority of loan applicants, numbering 44,396, fall into the "NO"
category for loan default, while 815 applicants are classified as "YES" for default. In terms of
housing status, there are 25,130 applicants with households, and 20,081 without. Regarding
communication preferences, 29,285 applicants use cell phones, 13,020 have unknown
communication methods, and 2,906 prefer telephone communication.
24
The line chart shows that borrowers who are taking out loans are becoming more indebted,
and that this may be increasing the risk of default. This is evident from the fact that the
average loan amount for customers who defaulted on their loans has been increasing over
time, while the average loan amount for customers who did not default on their loans has
remained relatively constant.
There are a few possible explanations for this trend. One possibility is that borrowers are
taking out larger loans to finance major purchases, such as a home or a college education.
These types of purchases can be expensive, and borrowers may need to take out larger loans
to afford them.
25
Job features by clients
Marital Feature
26
The bank was interested more on married people and single than divorced. The three variables are
presented in descending order. Direct relation of samples wrt. the target column. The most "married"
samples meant more subscribers.
education Feature.
More people with higher education degrees were subscribed. Proportional relationship. More
secondary profiles means more term deposit were sold default Feature
A high proportion of non-defaulters corresponds to the total of term deposit takers. It seems
that it makes good sense that people with credit do not want to subscribe to a new bank offer.
27
Loan
Likewise, to housing loan, people without a personal loan were willing to got a deposit term
(Higher proportion than housing loan). Just a few people with personal loan decided to got
subscribed. Direct proportion relation.
Month
28
May month got highest bank customers followed by July is is the second highest followed by
remaining months.
29
Random Forest Model
30
31
Random Forest Model 2
32
33
CHAPTER -5
FINDINGS AND CONCLUSION
1.Bank customer age distribution skews towards younger age groups, with 20-29 years old
being the most common, followed by 30-39 years old.
2.Positive correlation between customer age and loan default status; older customers are more
likely to default on loans.
3.Regression model developed for loan default prediction with 75% accuracy.
7.Scatter plot suggests majority of people targeted by campaign are younger than 30 years
old.
9.Management represents most senior customers among top 5 client categories; admin and
technician categories have higher number of outliers.
10.Strong relationships between customer age and loan default, as well as between customer
income and loan amount, identified through heat map analysis.
12.Average loan amount for defaulters increasing over time, potentially increasing default
risk.
13.May has highest number of bank customers, followed by July, indicating potential
seasonal variations.
14.Bank shows interest in married and single individuals with proportional relationship.
15.People with higher education degrees more likely to subscribe to term deposits.
34
Conclusion
The analysis of the bank's customer data points to several noteworthy trends. The customer
base is predominantly younger, with a focus on the 20-29 age group. However, outliers
indicate a varied age distribution.
A key finding is the positive correlation between customer age and loan default, supported by
a regression model with a 75% accuracy. This suggests that older customers are more prone
to defaulting on their loans.
The duration of customer relationships also plays a role, with defaulters showing a longer
median duration. This implies a potential link between prolonged customer relationships and
a higher likelihood of loan default.
Marital status analysis reveals a majority of married customers, increasing with age. The
bank appears to target both married and single individuals, with divorced individuals forming
a smaller portion of the customer base.
The campaign seems geared toward a younger demographic, as indicated by the scatter plot.
Additionally, there's a weak positive correlation between longer-term loans and loan default,
emphasizing the importance of considering loan duration in risk assessment.
Top client categories highlight that management represents the most senior customers, with
certain job categories showing financial variability.
Heat map analysis underscores strong relationships between customer age and loan default,
as well as between customer income and loan amount. Larger loans are associated with a
higher default likelihood, and the average loan amount for defaulters has been increasing over
time.
May consistently has the highest customer numbers, potentially indicating seasonal
variations. Furthermore, individuals with higher education levels are more likely to subscribe
to term deposits.
35
REFERENCES
1. Anil Lamba, “Uses Of Different Cyber Security Service To Prevent Attack On Smart
Home Infrastructure", International Journal for Technological Research in Engineering,
Volume 1, Issue 11, pp.5809-5813, 2014
2. https://en.wikipedia.org/wiki/Mobile_banking
3. Dr. (Smt.) Rajeshwari M. Shettar. “Digital Banking an Indian Perspective.” IOSR Journal
of Economics and Finance
(IOSR-JEF), vol. 10, no. 3, 2019, pp. 01-05. 4. https://www.rbi.org.in/
5. https://builtin.com/blockchain
6. https://hbr.org/2017/01/the-truth-about-blockchain
7. https://www.npci.org.in/what-we-do/upi/product-overview
8. https://ibsintelligence.com/ibsi-news/5-applications-of-artificial-intelligence-in-banking/
9. https://ibsintelligence.com/product/applications-of-artificial-intelligence-in-banking-2021-
2/
10. https://bfsi.economictimes.indiatimes.com/
11. Raghavendra Nayak “A Conceptual Study on Digitalization of Banking - Issues and
Challenges in Rural India”, International Journal of management, IT and Engineering, 2018.
12. K. Suma Vally and K. Hema Divya “A Study on Digital Payments in India with
Perspective of Consumer’s Adoption”, International Journal of Pure and Applied
Mathematics, 2018.
36