Professional Documents
Culture Documents
2014
(PROJECT PHASE- I)
submitted in partial fulfillment of the requirements
for the award of the degree in
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
by
DEPARTMENT OF
COMPUTER SCIENCE AND ENGINEERING
DECEMBER 2023
ii
FORM NO. - F/ EP - E & T / 041 Rev.00 Date 01.01.2014
DECLARATION
1.
2.
DATE:
3.
ii
FORM NO. - F/ EP - E & T / 041 Rev.00 Date 01.01.2014
BONAFIDE CERTIFICATE
This is to certify that this Project Report (Project Phase-I) is the bonafide work of Mr./Ms.
G.BHANU PRAVEEN Reg. No 201191101016, Mr./Ms. CV SAKETH Reg. No
201191101059, Mr/Ms. N. SAMBA SIVA REDDY Reg. No 201191101038 who carried out
the project entitled “CUSTOMER CHURN PREDECTION USING MACHINE
LEARNING ” under our supervision from June 2023 to Dec 2023.
ii
FORM NO. - F/ EP - E & T / 041 Rev.00 Date 01.01.2014
ACKNOWLEDGEMENT
We would first like to thank our beloved Chancellor Thiru. Dr. A.C. Shanmugam, B.A.,
B.L., President Er. A.C.S. Arunkumar, B.Tech., and Secretary Thiru A. Ravikumar for
all the encouragement and support extended to us during the tenure of this project and also
We express my heartfelt thanks to our Head of the Department, Prof. Dr. S. Geetha, who
has been actively involved and very influential from the start till the completion of our
project.
Our sincere thanks to our Project Coordinators Dr. T.V Ananthan & Fawaz Abdulkader
and Project guide C.SUBALAKSHMI for their continuous guidance and encouragement
We would also like to thank all the teaching and nonteaching staffs of Computer Science and
Engineering department, for their constant support and the encouragement given to us while
ii
FORM NO. - F/ EP - E & T / 041 Rev.00 Date 01.01.2014
ii
FORM NO. - F/ EP - E & T / 041 Rev.00 Date 01.01.2014
ABSTRACT
In an era of intense competition and rapidly evolving markets, businesses face the critical
challenge of retaining their customer base. Customer churn, the phenomenon of customers
discontinuing their engagement with a company, has a significant impact on a company's
revenue and profitability. This project addresses the pressing need for businesses to
proactively identify potential churners and implement targeted retention strategies.
The objective of this project is to develop and implement a machine learning-based solution
for customer churn prediction. We leverage a diverse dataset comprising customer
demographics, transaction history, and behavioral data. Various machine learning
algorithms, including logistic regression, decision trees, random forests, and gradient
boosting, are explored and compared for their effectiveness in predicting customer churn.
The project employs rigorous data preprocessing techniques to handle missing values,
outliers, and feature scaling. Feature engineering is performed to create meaningful variables
that enhance the predictive power of the models. Furthermore, model evaluation metrics
such as accuracy, precision, recall, F1-score, and ROC-AUC are utilized to assess the
performance of the churn prediction models.
The outcomes of this project promise to illuminate the effectiveness of machine learning in
predicting customer churn. By doing so, they will provide businesses with invaluable
insights, enabling them to take prompt and well-informed actions to retain at-risk customers.
Furthermore, the project will discuss the real-world applicability, practical implications, and
implementation challenges associated with integrating churn prediction models into business
operations.
In conclusion, this project underscores the importance of data-driven decision-making in
customer retention strategies and demonstrates the potential of machine learning in
identifying and mitigating customer churn. The insights gained from this project serve as
valuable resource for businesses seeking to enhance customer satisfaction and reduce
revenue loss due to churn.
ii
1.INTRODUCTION
1.1 INTRODUCTION
In today's highly competitive business landscape, customer loyalty and retention are
paramount to the sustained success of any enterprise. As companies strive to adapt to
evolving consumer preferences and market dynamics, understanding and mitigating
customer churn has become a critical imperative. Customer churn, the phenomenon where
customers cease their engagement with a company or brand, can inflict substantial financial
losses and disrupt growth trajectories.
In response to this challenge, organizations are turning to data-driven strategies to identify
and retain customers at risk of churning. Machine learning, a powerful subset of artificial
intelligence, has emerged as a valuable tool for predicting customer churn. By harnessing
the vast amounts of customer data generated through transactions, interactions, and behavior,
businesses can proactively identify signs of disengagement and implement targeted retention
efforts.
This project centers on the development and implementation of a machine learning-based
solution for customer churn prediction. We delve into a diverse array of data sources,
including customer demographics, historical transaction records, and behavioral patterns, to
discern meaningful patterns and early warning signals. Leveraging a range of machine
learning algorithms and data preprocessing techniques, we aim to construct predictive
models capable of forecasting customer churn with accuracy and precision.
This report chronicles our journey in building a customer churn prediction system.. We
discuss the dataset and data preprocessing steps undertaken to ensure the quality and integrity
of our analysis. Subsequently, we delve into the application of various machine learning
algorithms and model evaluation metrics to assess the efficacy of our churn prediction
models.
The insights garnered from this project have far-reaching implications for businesses of all
sizes and sectors. By harnessing the predictive power of machine learning, companies can
not only identify customers at risk of churning but also tailor retention strategies that resonate
with individual preferences and needs. Ultimately, this project underscores the
transformative potential of data analytics and machine learning in empowering businesses to
make informed decisions and secure the loyalty of their valued customers.
1
1.2 Importance of the Project:
Customer churn prediction using machine learning holds immense significance in today's
business landscape. Several key factors underscore its importance:
Revenue Protection: Customer churn can result in significant revenue losses for
businesses. Identifying potential churners early enables companies to take proactive
measures to retain valuable customers.
Competitive Edge: Businesses that can successfully reduce churn and improve customer
retention gain a competitive advantage in crowded markets.
Data-Driven Decision-Making: The project aligns with the contemporary trend of data-
driven decision-making, where insights from data analytics inform strategic actions.
Operational Efficiency: By proactively addressing churn, businesses can reduce the strain
on customer service and support teams. Fewer customer complaints and issues translate
into improved operational efficiency.
Risk Mitigation: Churn prediction also serves as a risk mitigation strategy. By identifying
and addressing potential churners, businesses can prevent revenue losses and financial
instability.
Strategic Insights: The project generates strategic insights into customer behavior,
preferences, and pain points. This information can guide product development, marketing
campaigns, and service improvements.
2
1.3 OBJECTIVES
Our foremost objective in this Customer Churn Prediction project is to significantly reduce
customer churn within our organization. This entails developing advanced machine
learning models that can effectively predict which customers are most likely to discontinue
their relationship with our services or products. By identifying potential churners before
they actually leave, we can proactively engage with them and take targeted actions to retain
their loyalty.
Our project also seeks to optimize marketing efforts and resource allocation. By directing
our marketing campaigns towards customers identified as high-risk for churn, we aim to
improve the efficiency of our marketing spend and enhance the return on investment
(ROI). We plan to conduct A/B testing to assess the effectiveness of different marketing
strategies and use data-driven insights to inform our decision-making.
3
2. SYSTEM ANALYSIS
Basic analytics are performed on this collected data, primarily focusing on descriptive
analytics. These analyses provide insights into customer behavior, including metrics like
customer retention rates and average customer lifetime. However, the existing system often
lacks the capability to predict customer churn or proactively intervene before customers
decide to discontinue their relationship with the organization.
Customer segmentation is a common practice within the existing system, where customers
are grouped based on criteria such as demographics, purchase behavior, or geographic
location. This segmentation helps tailor marketing efforts and customer service interactions
to some extent, but it may not be highly granular or predictive.
Marketing campaigns are executed based on general customer segments and historical data.
However, these campaigns may not be highly targeted or personalized, missing the
opportunity to engage customers in a more tailored manner.
In summary, the existing system provides valuable customer data and insights but falls short
in terms of predictive capabilities and personalized interventions to reduce churn effectively.
Implementing a more advanced Customer Churn Prediction project, leveraging machine
learning and predictive analytics, can enhance the organization's ability to identify at-risk
customers and take proactive measures to retain their loyalty.
4
2.1.1. DISADVANTAGES
Lack of Predictive Power: One of the primary disadvantages of the existing system is its
limited predictive capability. It typically relies on historical data and basic analytics, which
may not effectively predict future customer churn. As a result, organizations may miss
opportunities to intervene and retain customers before they decide to leave.
Reactive Approach: The existing system often follows a reactive approach to churn
management. It identifies churn after it has occurred but lacks the ability to proactively
prevent it. This can result in customer losses that could have been avoided with more
advanced predictive models.
Limited Personalization: Customer segmentation and marketing efforts are based on broad
criteria, resulting in limited personalization. Customers may receive generic marketing
campaigns and interactions, reducing their engagement and loyalty to the organization.
Manual Reporting: Reports generated by the existing system are typically created
manually. This process can be time-consuming, prone to errors, and may not provide real-
time insights needed for agile decision-making.
The proposed system for managing customer churn represents a substantial advancement
over the existing system, offering a comprehensive approach powered by advanced machine
learning and predictive analytics. At its core, the proposed system introduces predictive
churn models that leverage historical customer data, behavior patterns, and sophisticated
algorithms to forecast which customers are at a heightened risk of churning in the near future.
The proposed system for customer churn management is a significant advancement over the
existing approach. It incorporates predictive churn models, leveraging historical data and
sophisticated algorithms to proactively identify customers at risk of leaving. These
predictions enable personalized retention strategies and real-time monitoring, allowing for
timely interventions. Marketing efforts become more targeted and efficient, while customer
service interactions are enhanced. The proposed system provides agility in responding to
evolving customer needs and shifts in behavior. Its comprehensive approach empowers
organizations to reduce churn, enhance customer satisfaction, and optimize resource
allocation in a highly competitive market.
5
3.REQUIREMENTS SPECIFICATION
REQUIREMENT SPECIFICATION:-
Requirement specification is a critical step in the software development process, whether it's
for a customer churn prediction system or any other project. It involves defining both user
and system requirements, as well as considerations for both development and runtime
environments.
These software requirements form the foundation for building a robust and efficient
Customer Churn Prediction system using machine learning. The specific tools and
technologies chosen may vary based on the project's complexity, budget, and the
development team's expertise.
6
3.2 HARDWARE REQUIREMENTS:-
The hardware requirements for a Customer Churn Prediction project using machine learning
will depend on various factors, including the size of your dataset, the complexity of your
machine learning models, and your performance expectations.
CPU: A multicore CPU (e.g., Intel Core i5 or AMD Ryzen) is recommended for training
machine learning models efficiently. More cores can significantly speed up model training,
especially for large datasets.
RAM: The amount of RAM you need depends on the size of your dataset and the complexity
of your models. At a minimum, 8 GB of RAM is recommended, but for larger datasets and
complex models, 16 GB or more may be necessary.
Storage: Fast storage is crucial for handling large datasets and model checkpoints
efficiently. A Solid State Drive (SSD) is recommended for better read and write speeds.
Allocate enough storage space for your data, model files, and any necessary backups.
GPU (Graphics Processing Unit): If your machine learning models are computationally
intensive, consider using a GPU for model training. GPUs can significantly accelerate
training times. NVIDIA GPUs are commonly used for deep learning tasks.
The specific GPU model will depend on your budget and requirements. For deep learning,
NVIDIA's GeForce RTX or NVIDIA Tesla GPUs are popular choices.
Security: Implement security measures to protect your hardware and data, including
firewalls, antivirus software, and access controls.
Cooling: Machine learning tasks can put a significant load on the CPU and GPU. Adequate
cooling, such as high-quality fans or liquid cooling systems, can help maintain system
stability during extended training sessions.
Backup and Redundancy: Implement a backup solution for critical data and ensure system
redundancy to minimize downtime in case of hardware failures
It's essential to monitor system performance during development and adjust hardware
resources as needed to meet project goals and performance targets.
7
4. DESIGN
They are having several steps involved in this process they are:-
❖ Data Collection
❖ Feature Engineering
❖ Model Selection
❖ Model Validation
8
4.1.1 Data collection
Data collection is the foundational step where relevant customer data is gathered from
various sources. These sources can include customer databases, transaction records,
feedback forms, and external data sets. Data quality is crucial during this phase, as it
impacts the accuracy of subsequent analysis. Once collected, the data needs to be cleaned
and preprocessed to handle missing values, outliers, and inconsistencies. After cleaning,
data from different sources may be integrated into a single, comprehensive dataset, which
is then stored in a structured format in a data repository or database for analysis
Feature engineering involves preparing the data for modeling by selecting, transforming,
or creating relevant features (variables). This process includes identifying and selecting
features that are likely to have an impact on churn prediction. It also encompasses feature
transformation, such as one-hot encoding for categorical variables or scaling for numerical
features, to ensure compatibility with machine learning algorithms. Additionally, feature
creation may involve generating new features based on domain knowledge or mathematical
transformations to enhance model performance. Finally, assessing the importance of
features using techniques like feature importance scores helps in understanding which
variables contribute significantly to churn predictions.
Model selection is a critical step in choosing the right machine learning algorithms for churn
prediction. Organizations may consider a variety of models, such as logistic regression,
decision trees, random forests, support vector machines, or neural networks, based on the
nature of their data and the problem at hand. The selection process involves evaluating
candidate models using appropriate evaluation metrics (e.g., accuracy, precision, recall, F1-
score, ROC AUC) on validation data. Ensemble methods, like stacking or boosting, may
be employed to combine multiple models to achieve improved predictive performance. It's
also essential to assess the interpretability of the selected models to gain insights into the
factors driving churn predictions.
9
4.1.4 Hyper Parameter Tuning
Model validation is the phase where the performance of the selected models is rigorously
assessed. Cross-validation, such as k-fold cross-validation, is often employed to evaluate
how well the model generalizes to different subsets of the data. A holdout dataset is reserved
for final model validation, ensuring that the model performs well on entirely unseen data,
mimicking real-world scenarios. Performance metrics, relevant to churn prediction (e.g.,
precision-recall curves), are used to gauge model effectiveness. Moreover, models should
be scrutinized for potential bias and fairness concerns to ensure equitable predictions across
different demographic groups. Model deployment readiness is also verified, ensuring that
the selected model meets predefined validation criteria before moving forward with
deployment.
In conclusion, the design methodology for our Customer Churn Prediction project is a
structured and iterative approach that guides us through the various stages of this critical
endeavor. Beginning with a clear definition of project objectives and scope, we
systematically collect, preprocess, and engineer features from our data sources. We then
leverage machine learning models, meticulously fine-tuning them through hyperparameter
tuning and rigorous validation. This methodology not only ensures the development of
accurate and effective churn prediction models but also emphasizes fairness and ethics in
our approach. With a solid foundation in design methodology, we are well-equipped to
address the complexities of customer churn, proactively retain customers, and drive long-
term business success
10
5. IMPLEMENTATION
Import libraries
import streamlit as st
import pandas as pd
import numpy as np
from PIL import Image
def main():
#Setting Application title
st.title('Telco Customer Churn Prediction App')
11
"How would you like to predict?", ("Online", "Batch"))
st.sidebar.info('This app is created to predict Customer Churn')
st.sidebar.image(image)
if add_selectbox == "Online":
st.info("Input data below")
#Based on our optimal features selection
st.subheader("Demographic data")
seniorcitizen = st.selectbox('Senior Citizen:', ('Yes', 'No'))
dependents = st.selectbox('Dependent:', ('Yes', 'No'))
st.subheader("Payment data")
tenure = st.slider('Number of months the customer has stayed with the company',
min_value=0, max_value=72, value=0)
contract = st.selectbox('Contract', ('Month-to-month', 'One year', 'Two year'))
paperlessbilling = st.selectbox('Paperless Billing', ('Yes', 'No'))
PaymentMethod = st.selectbox('PaymentMethod',('Electronic check', 'Mailed check',
'Bank transfer (automatic)','Credit card (automatic)'))
monthlycharges = st.number_input('The amount charged to the customer monthly',
min_value=0, max_value=150, value=0)
totalcharges = st.number_input('The total amount charged to the
customer',min_value=0, max_value=10000, value=0)
12
streamingtv = st.selectbox("Does the customer stream TV", ('Yes','No','No internet
service'))
streamingmovies = st.selectbox("Does the customer stream movies", ('Yes','No','No
internet service'))
data = {
'SeniorCitizen': seniorcitizen,
'Dependents': dependents,
'tenure':tenure,
'PhoneService': phoneservice,
'MultipleLines': mutliplelines,
'InternetService': internetservice,
'OnlineSecurity': onlinesecurity,
'OnlineBackup': onlinebackup,
'TechSupport': techsupport,
'StreamingTV': streamingtv,
'StreamingMovies': streamingmovies,
'Contract': contract,
'PaperlessBilling': paperlessbilling,
'PaymentMethod':PaymentMethod,
'MonthlyCharges': monthlycharges,
'TotalCharges': totalcharges
}
features_df = pd.DataFrame.from_dict([data])
st.markdown("<h3></h3>", unsafe_allow_html=True)
st.write('Overview of input is shown below')
st.markdown("<h3></h3>", unsafe_allow_html=True)
st.dataframe(features_df)
#Preprocess inputs
preprocess_df = preprocess(features_df, 'Online')
prediction = model.predict(preprocess_df)
if st.button('Predict'):
13
if prediction == 1:
st.warning('Yes, the customer will terminate the service.')
else:
st.success('No, the customer is happy with Telco Services.')
else:
st.subheader("Dataset upload")
uploaded_file = st.file_uploader("Choose a file")
if uploaded_file is not None:
data = pd.read_csv(uploaded_file)
#Get overview of data
st.write(data.head())
st.markdown("<h3></h3>", unsafe_allow_html=True)
#Preprocess inputs
preprocess_df = preprocess(data, "Batch")
if st.button('Predict'):
#Get batch prediction
prediction = model.predict(preprocess_df)
prediction_df = pd.DataFrame(prediction, columns=["Predictions"])
prediction_df = prediction_df.replace({1:'Yes, the customer will terminate the
service.',
0:'No, the customer is happy with Telco Services.'})
st.markdown("<h3></h3>", unsafe_allow_html=True)
st.subheader('Prediction')
st.write(prediction_df)
if __name__ == '__main__':
main()
14
5.2 SCREENSHOTS
15
6.CONCLUSION
In conclusion, the Customer Churn Prediction project represents a strategic initiative that
leverages advanced data analytics, machine learning, and a well-defined design methodology
to address the critical challenge of customer retention. Throughout this project, we have
systematically collected, cleaned, and processed valuable customer data, transforming it into
actionable insights. By selecting appropriate machine learning models, optimizing
hyperparameters, and rigorously validating our predictions, we have established a robust
framework for proactively identifying and retaining at-risk customers.
As we move forward, we recognize the ever-evolving nature of customer behavior and market
dynamics. Therefore, the project's iterative design allows us to continuously refine our churn
prediction models, adapt our strategies, and remain agile in responding to emerging trends. By
embracing this holistic approach, we are poised to not only mitigate churn effectively but also
foster long-term customer loyalty and sustainable business growth.
16
7. BIBLIOGRAPHY
4) S. R. Patil and S. S. Patil, "Dynamic Churn Prediction using Machine Learning Algorithms",
IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 12, pp. 4597-
4608, 2021.
5) Mahapatra, A. K., Sahoo, S. K., & Sahoo, P. K. (2022). A hybrid machine learning approach
for customer churn prediction in the banking sector. IEEE Transactions on Services
Computing, 15(4), 875-886.
6) Atay, M. C., Patil, S. S., & Patil, M. S. (2022). A reinforcement learning approach for
customer churn prediction in the insurance industry. IEEE Transactions on Systems, Man, and
Cybernetics: Systems, 52(10), 4392-4402.
7) Das, S. K., Sahoo, P. K., & Samantaray, S. K. (2023). A federated learning approach for
customer churn prediction. IEEE Transactions on Industrial Informatics, 19(1), 646-655.
8) Patil, S. S., & Pawar, S. D. (2023). A multi-task learning approach for customer churn
prediction. IEEE Transactions on Services Computing, 15(4), 887-898.
17