Customer Churn Prediction

https://www.linkedin.com/in/yennhi95zz/ || https://github.com/yennhi95zz || https://medium.
com/@yennhi95zz
A Hands-on Project: Enhancing Customer Churn

Prediction with Continuous Experiment
Tracking in Machine Learning
Dataset:
You can use the Telco Customer Churn dataset from Kaggle. This dataset contains information
about telecom customers, including various features like contract type, monthly charges, and
whether the customer churned or not.
Objective:
The goal of this project is to predict customer churn (whether a customer will leave the telecom
service) using a model stacking approach. Model stacking involves training multiple models and
combining their predictions using another model.
1
https://www.linkedin.com/in/yennhi95zz/ || https://github.com/yennhi95zz || https://medium.com/@yennhi95zz
Steps:
1. Import Libraries: Import necessary libraries and initialize Comet ML.
2. Load and Explore Data: Load dataset and perform exploratory data analysis (EDA).
3. Preprocessing: Preprocess data by encoding and scaling features.
4. Model Training: Train multiple machine learning models, including Logistic Regression,
Random Forest, Gradient Boosting, and Support Vector Machine.
5. Hyperparameter Tuning: Use Optuna to optimize hyperparameters for the models.
6. Ensemble Modeling: Create a stacking ensemble of models for improved predictions.
7. Optimization Results: Display the best hyperparameters and accuracy.
8. End Experiment: Conclude the Comet ML experiment.
This project will give you insights into dealing with classification problems, handling imbalanced
datasets (if applicable), and utilizing model stacking to enhance predictive performance.
0. Import Libraries
!pip install -q optuna comet_ml
import optuna
import comet_ml
from comet_ml import Experiment
ERROR: pip's dependency resolver does not currently take into account
all the packages that are installed. This behaviour is the source of
the following dependency conflicts.
jupyterlab-lsp 4.2.0 requires jupyter-lsp>=2.0.0, but you have
jupyter-lsp 1.5.1 which is incompatible.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier,
GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, log_loss, roc_auc_score
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import StackingClassifier
from sklearn.metrics import accuracy_score, log_loss
from kaggle_secrets import UserSecretsClient
2
# Set display options to show all columns

pd.set_option('display.max_columns', None)
/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146:
UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this
version of SciPy (detected version 1.23.5
warnings.warn(f"A NumPy version >={np_minversion} and
<{np_maxversion}"
1. Initialize Comet ML
user_secrets = UserSecretsClient()
comet_api_key = user_secrets.get_secret("Comet API Key")
experiment = Experiment(
api_key= comet_api_key,
project_name="customer-churn",
workspace="yennhi95zz"
)
COMET WARNING: As you are running in a Jupyter environment, you will

need to call `experiment.end()` when finished to ensure all metrics
and code are logged before exiting.
COMET INFO: Couldn't find a Git repository in '/kaggle/working' nor in
any parent directory. Set `COMET_GIT_DIRECTORY` if your Git Repository
is elsewhere.
COMET INFO: Experiment is live on comet.com
https://www.comet.com/yennhi95zz/customer-churn/ce4189deb57943d281df04
05dab75687
2. Load Data
# Load the dataset
data = pd.read_csv("/kaggle/input/telco-customer-churn/WA_Fn-UseC_-
Telco-Customer-Churn.csv")
data.head()
customerID gender SeniorCitizen Partner Dependents tenure

PhoneService \
0 7590-VHVEG Female 0 Yes No 1
No
1 5575-GNVDE Male 0 No No 34
Yes
2 3668-QPYBK Male 0 No No 2
Yes
3
3 7795-CFOCW Male 0 No No 45
No
4 9237-HQITU Female 0 No No 2
Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \

0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
DeviceProtection TechSupport StreamingTV StreamingMovies

Contract \
0 No No No No Month-to-
month
1 Yes No No No One
year
month
3 Yes Yes No No One
year
month
PaperlessBilling PaymentMethod MonthlyCharges

TotalCharges \
0 Yes Electronic check 29.85
29.85
1 No Mailed check 56.95
1889.5
2 Yes Mailed check 53.85
108.15
3 No Bank transfer (automatic) 42.30
1840.75
4 Yes Electronic check 70.70
151.65
Churn
0 No
1 No
2 Yes
3 No
4 Yes
3. Perform EDA on the Dataset:

data.info()
4
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null object
1 gender 7043 non-null object
2 SeniorCitizen 7043 non-null int64
3 Partner 7043 non-null object
4 Dependents 7043 non-null object
5 tenure 7043 non-null int64
6 PhoneService 7043 non-null object
7 MultipleLines 7043 non-null object
8 InternetService 7043 non-null object
9 OnlineSecurity 7043 non-null object
10 OnlineBackup 7043 non-null object
11 DeviceProtection 7043 non-null object
12 TechSupport 7043 non-null object
13 StreamingTV 7043 non-null object
14 StreamingMovies 7043 non-null object
15 Contract 7043 non-null object
16 PaperlessBilling 7043 non-null object
17 PaymentMethod 7043 non-null object
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null object
20 Churn 7043 non-null object
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB
# Convert 'TotalCharges' column to numerical

data['TotalCharges'] = pd.to_numeric(data['TotalCharges'],
errors='coerce')
# Drop rows with missing values

data.dropna(inplace=True)
3.1. Customer Churn Distribution

This plot shows the distribution of churn vs. non-churn customers. You can see the number of
customers who have churned (left the telecom service) and those who have not.
# Plot 1: Class Distribution (Churn vs. Non-Churn)

plt.figure(figsize=(6, 6))
ax = sns.countplot(data=data, x='Churn')
plt.title("Customer Churn Distribution")
plt.xlabel("Churn")
plt.ylabel("Count")
# Adding data labels (rounded) to the bars
5
for p in ax.patches:
ax.annotate(f'{int(round(p.get_height()))}', (p.get_x() +
p.get_width() / 2., p.get_height()), ha='center', va='center',
fontsize=12, color='black', xytext=(0, 5), textcoords='offset points')
# Log the plot to Comet

experiment.log_figure(figure=plt)
plt.tight_layout()
plt.show()
3.2. Numeric Feature Distribution:

These histograms show the distribution of numeric features (tenure, MonthlyCharges, and
TotalCharges) for the entire dataset.
6
# Plot 2: Numeric Feature Distribution

numerical_features = ['tenure', 'MonthlyCharges', 'TotalCharges']
for i, feature in enumerate(numerical_features, 1):
plt.subplot(1, 3, i)
sns.histplot(data=data, x=feature, kde=True)
plt.title(f'{feature} Distribution')
plt.xlabel(feature)
plt.ylabel('Density')
plt.tight_layout()
plt.show()
3.3. Categorical Feature Distribution:

These plots show the distribution of categorical features (gender, SeniorCitizen, Partner,
Dependents, Contract, PaymentMethod) split by churn status.
These plots provide insights into how different categories of customers (e.g., seniors vs. non-
seniors, customers with partners vs. without) are distributed in terms of churn. You can identify
potential customer segments that are more likely to churn.
# Plot 3: Categorical Feature Distribution

categorical_features = ['gender', 'SeniorCitizen', 'Partner',
'Dependents', 'Contract', 'PaymentMethod']
for i, feature in enumerate(categorical_features, 1):
plt.subplot(2, 3, i)
sns.countplot(data=data, x=feature, hue='Churn', palette='Set2')
plt.title(f'{feature} Distribution by Churn')
plt.xlabel(feature)
plt.ylabel('Count')
plt.xticks(rotation=45)
7
plt.tight_layout()
plt.show()
3.4. Correlation Heatmap:

The heatmap displays the correlation between numeric features in the dataset.
Understanding feature correlations can help in feature selection. For instance, if

MonthlyCharges and TotalCharges are highly correlated, you might choose to keep only one of
them to avoid multicollinearity in your models. It also helps identify which features might be
more important in predicting churn.
# Plot 4: Correlation Heatmap

correlation_matrix = data.corr(method='pearson', min_periods=1)
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm',
fmt=".2f")
plt.title("Correlation Heatmap")
8
plt.tight_layout()
plt.show()
/tmp/ipykernel_32/829961921.py:3: FutureWarning: The default value of

numeric_only in DataFrame.corr is deprecated. In a future version, it
will default to False. Select only valid columns or specify the value
of numeric_only to silence this warning.
correlation_matrix = data.corr(method='pearson', min_periods=1)
3.5. Monthly Charges vs. Total Charges:

This scatterplot shows the relationship between Monthly Charges and Total Charges, with
points colored by churn status.
It appears that customers who have higher Total Charges are less likely to churn. This suggests
that long-term customers who spend more are more loyal. You can use this insight to focus on
retaining high-value, long-term customers by offering loyalty programs or incentives. These
9
business insights derived from EDA can guide feature engineering and model selection for your
churn prediction project. They help you understand the data's characteristics and make informed
decisions to optimize customer retention strategies.
# Plot 5: Monthly Charges vs. Total Charges

sns.scatterplot(data=data, x='MonthlyCharges', y='TotalCharges',
hue='Churn', palette='Set2')
plt.title("Monthly Charges vs. Total Charges")
plt.xlabel("Monthly Charges")
plt.ylabel("Total Charges")
plt.tight_layout()
plt.show()
10
4. Preprocessing
# Encode categorical features, scale numerical features
encoder = OneHotEncoder(handle_unknown="ignore", sparse=False)

scaler = StandardScaler()
X_train, X_val, y_train, y_val = train_test_split(data.drop("Churn",

axis=1), data["Churn"], test_size=0.2, random_state=42)
X_train_encoded = encoder.fit_transform(X_train[categorical_features])
X_val_encoded = encoder.transform(X_val[categorical_features])
X_train_scaled = scaler.fit_transform(X_train[numerical_features])
X_val_scaled = scaler.transform(X_val[numerical_features])
X_train_processed = np.concatenate((X_train_encoded, X_train_scaled),

axis=1)
X_val_processed = np.concatenate((X_val_encoded, X_val_scaled),
axis=1)
/opt/conda/lib/python3.10/site-packages/sklearn/preprocessing/
_encoders.py:868: FutureWarning: `sparse` was renamed to
`sparse_output` in version 1.2 and will be removed in 1.4.
`sparse_output` is ignored unless you leave `sparse` to its default
value.
warnings.warn(
# Split data into features and target

X = data.drop("Churn", axis=1)
y = data["Churn"]
# Split data into train and validation sets

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2,
random_state=42)
5. Model Training and Hyperparameter Tuning:

Logistic Regression (logreg):
• Simple and interpretable model.

• Well-suited for binary classification tasks like churn prediction.
• Helps understand how features impact the chance of churn.
Random Forest Classifier (rf):
• Ensemble method combining multiple decision trees.

• Handles mixed feature types (categorical and numerical).
• Resistant to overfitting, good for complex datasets.
11
Gradient Boosting Classifier (gb):
• Sequential ensemble building strong predictive power.

• Captures complex relationships in data.
• Works well for various types of datasets.
Support Vector Machine (svm):
• Versatile model for linear and non-linear data.

• Can find complex decision boundaries.
• Useful when patterns between churn and non-churn are intricate.
Modeling Stacking
In the project, I am stacking models such as random forests, gradient boosting, and support
vector machines, which have different characteristics and can capture different aspects of the
customer churn problem. This ensemble approach can help you achieve a more accurate and
robust churn prediction model, ultimately leading to better customer retention strategies and
business outcomes.
5.1. Define an Optuna Objective Function:

def objective(trial):
# Define hyperparameter search space for individual models
rf_params = {
'n_estimators': trial.suggest_int('rf_n_estimators', 100,
300),
'max_depth': trial.suggest_categorical('rf_max_depth', [None,
10, 20]),
'min_samples_split': trial.suggest_int('rf_min_samples_split',
2, 10),
'min_samples_leaf': trial.suggest_int('rf_min_samples_leaf',
1, 4),
}
gb_params = {
'n_estimators': trial.suggest_int('gb_n_estimators', 100,
300),
'learning_rate': trial.suggest_float('gb_learning_rate', 0.01,
0.2),
'max_depth': trial.suggest_categorical('gb_max_depth', [3, 4,
5]),
}
svm_params = {
'C': trial.suggest_categorical('svm_C', [0.1, 1, 10]),
'kernel': trial.suggest_categorical('svm_kernel', ['linear',
'rbf']),
}
# Create models with suggested hyperparameters
12
rf = RandomForestClassifier(**rf_params)
gb = GradientBoostingClassifier(**gb_params)
svm = SVC(probability=True, **svm_params)
# Train individual models

rf.fit(X_train_processed, y_train)
gb.fit(X_train_processed, y_train)
svm.fit(X_train_processed, y_train)
# Evaluate individual models on validation data

rf_predictions = rf.predict(X_val_processed)
gb_predictions = gb.predict(X_val_processed)
svm_predictions = svm.predict(X_val_processed)
# Calculate accuracy and ROC AUC for individual models

rf_accuracy = accuracy_score(y_val, rf_predictions)
gb_accuracy = accuracy_score(y_val, gb_predictions)
svm_accuracy = accuracy_score(y_val, svm_predictions)
rf_roc_auc = roc_auc_score(y_val,
rf.predict_proba(X_val_processed)[:, 1])
gb_roc_auc = roc_auc_score(y_val,
gb.predict_proba(X_val_processed)[:, 1])
svm_roc_auc = roc_auc_score(y_val,
svm.predict_proba(X_val_processed)[:, 1])
# Create a stacking ensemble with trained models

estimators = [
('random_forest', rf),
('gradient_boosting', gb),
('svm', svm)
]
stacking_classifier = StackingClassifier(estimators=estimators,
final_estimator=LogisticRegression())
# Train the stacking ensemble

stacking_classifier.fit(X_train_processed, y_train)
# Evaluate the stacking ensemble on validation data

stacking_predictions =
stacking_classifier.predict(X_val_processed)
stacking_accuracy = accuracy_score(y_val, stacking_predictions)
stacking_roc_auc = roc_auc_score(y_val,
stacking_classifier.predict_proba(X_val_processed)[:, 1])
# Log parameters and metrics to Comet ML

experiment.log_parameters({
'rf_n_estimators': rf_params['n_estimators'],
'rf_max_depth': rf_params['max_depth'],
13
'rf_min_samples_split': rf_params['min_samples_split'],
'rf_min_samples_leaf': rf_params['min_samples_leaf'],
'gb_n_estimators': gb_params['n_estimators'],
'gb_learning_rate': gb_params['learning_rate'],
'gb_max_depth': gb_params['max_depth'],
'svm_C': svm_params['C'],
'svm_kernel': svm_params['kernel']
})
experiment.log_metrics({
'rf_accuracy': rf_accuracy,
'gb_accuracy': gb_accuracy,
'svm_accuracy': svm_accuracy,
'rf_roc_auc': rf_roc_auc,
'gb_roc_auc': gb_roc_auc,
'svm_roc_auc': svm_roc_auc,
'stacking_accuracy': stacking_accuracy,
'stacking_roc_auc': stacking_roc_auc
})
# Return the negative accuracy as Optuna aims to minimize the

objective
return -stacking_accuracy
5.2. Optuna Hyperparameter Optimization:

Now, you can use Optuna to optimize the hyperparameters of your models. Optuna will search
the hyperparameter space defined in the objective function and log the results to Comet ML.
Clarify the optimization goal: You should mention whether you are minimizing or maximizing a
specific metric. In the code, I am using direction='minimize', which implies optimizing accuracy
(negative accuracy to minimize) AKA minimizing a loss or error metric. If you want to
maximize accuracy or ROC AUC, you should use direction='maximize'.
from tabulate import tabulate
# Create and optimize the study

study = optuna.create_study(direction='minimize') # Adjust direction
based on your optimization goal
study.optimize(objective, n_trials=100) # You can adjust the number
of trials
# Get the best hyperparameters and results

best_rf_params = study.best_params
best_accuracy = -study.best_value # Convert back to positive accuracy
# Convert the dictionary to a list of key-value pairs for tabulation

param_table = [(key, value) for key, value in best_rf_params.items()]
14
# Display the best_rf_params table

best_rf_params = tabulate(param_table, headers=["Parameter", "Value"],
tablefmt="grid")
print(f"Best RF Hyperparameters:\n{best_rf_params}")
print(f"Best Accuracy: {best_accuracy}")
[I 2023-09-12 12:38:35,983] A new study created in memory with name:

no-name-b1a4aace-05ed-4053-8c5b-e29be198e505
[I 2023-09-12 12:40:16,164] Trial 1 finished with value: -
0.7853589196872779 and parameters: {'rf_n_estimators': 227,
'rf_max_depth': 10, 'rf_min_samples_split': 5, 'rf_min_samples_leaf':
3, 'gb_n_estimators': 272, 'gb_learning_rate': 0.09729708459940609,
'gb_max_depth': 3, 'svm_C': 0.1, 'svm_kernel': 'linear'}. Best is
trial 1 with value: -0.7853589196872779.
'gb_max_depth': 4, 'svm_C': 10, 'svm_kernel': 'linear'}. Best is trial
1 with value: -0.7853589196872779.
'rf_max_depth': None, 'rf_min_samples_split': 7,
'rf_min_samples_leaf': 3, 'gb_n_estimators': 257, 'gb_learning_rate':
0.09612273536468068, 'gb_max_depth': 5, 'svm_C': 1, 'svm_kernel':
'rbf'}. Best is trial 1 with value: -0.7853589196872779.
trial 1 with value: -0.7853589196872779.
0.16816455385699045, 'gb_max_depth': 5, 'svm_C': 0.1, 'svm_kernel':
trial 1 with value: -0.7853589196872779.
'gb_max_depth': 4, 'svm_C': 1, 'svm_kernel': 'rbf'}. Best is trial 7
15
with value: -0.7867803837953091.

7 with value: -0.7867803837953091.
0.15691516614004472, 'gb_max_depth': 5, 'svm_C': 0.1, 'svm_kernel':
with value: -0.7867803837953091.
7 with value: -0.7867803837953091.
7 with value: -0.7867803837953091.
with value: -0.7867803837953091.
7 with value: -0.7867803837953091.
'gb_max_depth': 4, 'svm_C': 0.1, 'svm_kernel': 'rbf'}. Best is trial 7
with value: -0.7867803837953091.
16

7 with value: -0.7867803837953091.
'gb_max_depth': 3, 'svm_C': 0.1, 'svm_kernel': 'rbf'}. Best is trial 7
with value: -0.7867803837953091.
with value: -0.7867803837953091.
'linear'}. Best is trial 7 with value: -0.7867803837953091.
7 with value: -0.7867803837953091.
with value: -0.7867803837953091.
with value: -0.7867803837953091.
with value: -0.7889125799573561.
17

with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
18

with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
19

with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
'gb_max_depth': 4, 'svm_C': 0.1, 'svm_kernel': 'rbf'}. Best is trial
23 with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
20

23 with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7889125799573561.
23 with value: -0.7889125799573561.
with value: -0.7889125799573561.
with value: -0.7903340440653873.
21
56 with value: -0.7903340440653873.

'linear'}. Best is trial 56 with value: -0.7903340440653873.
with value: -0.7903340440653873.
with value: -0.7903340440653873.
with value: -0.7903340440653873.
with value: -0.7903340440653873.
with value: -0.7903340440653873.
with value: -0.7910447761194029.
with value: -0.7917555081734187.
22

with value: -0.7917555081734187.
65 with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
23

with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
65 with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
24

with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
65 with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
25

with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
with value: -0.7917555081734187.
65 with value: -0.7917555081734187.
26
with value: -0.7917555081734187.

65 with value: -0.7917555081734187.
Best RF Hyperparameters:
+----------------------+---------------------+
| Parameter | Value |
+======================+=====================+
| rf_n_estimators | 300 |
+----------------------+---------------------+
| rf_max_depth | 20 |
+----------------------+---------------------+
| rf_min_samples_split | 8 |
+----------------------+---------------------+
| rf_min_samples_leaf | 2 |
+----------------------+---------------------+
| gb_n_estimators | 139 |
+----------------------+---------------------+
| gb_learning_rate | 0.09345289942291049 |
+----------------------+---------------------+
| gb_max_depth | 4 |
+----------------------+---------------------+
| svm_C | 1 |
+----------------------+---------------------+
| svm_kernel | rbf |
+----------------------+---------------------+
Best Accuracy: 0.7917555081734187
experiment.end()
COMET INFO:
----------------------------------------------------------------------
-----------------
COMET INFO: Comet.ml Experiment Summary
COMET INFO:
----------------------------------------------------------------------
-----------------
COMET INFO: Data:
COMET INFO: display_summary_level : 1
COMET INFO: url :
https://www.comet.com/yennhi95zz/customer-churn/ce4189deb57943d281df04
05dab75687
COMET INFO: Metrics [count] (min, max):
COMET INFO: gb_accuracy [100] : (0.7640369580668088,
0.7924662402274343)
COMET INFO: gb_roc_auc [100] : (0.8042899814154298,
27
0.8275504604728453)
COMET INFO: rf_accuracy [100] : (0.7604832977967306,
0.783226723525231)
COMET INFO: rf_roc_auc [100] : (0.7839880209762333,
0.8187654979267074)
COMET INFO: stacking_accuracy [100] : (0.7782515991471215,
0.7917555081734187)
COMET INFO: stacking_roc_auc [100] : (0.8136417992348748,
0.8247718342815433)
COMET INFO: svm_accuracy [100] : (0.7732764747690121,
0.7860696517412935)
COMET INFO: svm_roc_auc [100] : (0.7664970414813818,
0.8130671788208375)
COMET INFO: Parameters:
COMET INFO: C : 1.0
COMET INFO: bootstrap : True
COMET INFO: break_ties : False
COMET INFO: cache_size : 200
COMET INFO: categories : auto
COMET INFO: ccp_alpha : 0.0
COMET INFO: class_weight : 1
COMET INFO: coef0 : 0.0
COMET INFO: constant : 1
COMET INFO: copy : True
COMET INFO: criterion :
friedman_mse
COMET INFO: cv : 1
COMET INFO: decision_function_shape : ovr
COMET INFO: degree : 3
COMET INFO: drop : 1
COMET INFO: dtype : <class
'numpy.float64'>
COMET INFO: dual : False
COMET INFO: estimators :
[('random_forest', RandomForestClassifier(max_depth=20,
min_samples_leaf=2, min_samples_split=9,
n_estimators=295)), ('gradient_boosting',
GradientBoostingClassifier(learning_rate=0.11487120000946097,
max_depth=4,
n_estimators=113)), ('svm', SVC(C=1,
kernel='linear', probability=True))]
COMET INFO: final_estimator :
LogisticRegression()
COMET INFO: final_estimator__C : 1.0
COMET INFO: final_estimator__class_weight : 1
COMET INFO: final_estimator__dual : False
COMET INFO: final_estimator__fit_intercept : True
COMET INFO: final_estimator__intercept_scaling : 1
COMET INFO: final_estimator__l1_ratio : 1
28
COMET INFO: final_estimator__max_iter : 100

COMET INFO: final_estimator__multi_class : auto
COMET INFO: final_estimator__n_jobs : 1
COMET INFO: final_estimator__penalty : l2
COMET INFO: final_estimator__random_state : 1
COMET INFO: final_estimator__solver : lbfgs
COMET INFO: final_estimator__tol : 0.0001
COMET INFO: final_estimator__verbose : 0
COMET INFO: final_estimator__warm_start : False
COMET INFO: fit_intercept : True
COMET INFO: gamma : scale
COMET INFO: gb_learning_rate :
0.11487120000946097
COMET INFO: gb_max_depth : 4
COMET INFO: gb_n_estimators : 113
COMET INFO: gradient_boosting :
GradientBoostingClassifier(learning_rate=0.11487120000946097,
max_depth=4,
n_estimators=113)
COMET INFO: gradient_boosting__ccp_alpha : 0.0
COMET INFO: gradient_boosting__criterion :
friedman_mse
COMET INFO: gradient_boosting__init : 1
COMET INFO: gradient_boosting__learning_rate :
0.11487120000946097
COMET INFO: gradient_boosting__loss : log_loss
COMET INFO: gradient_boosting__max_depth : 4
COMET INFO: gradient_boosting__max_features : 1
COMET INFO: gradient_boosting__max_leaf_nodes : 1
COMET INFO: gradient_boosting__min_impurity_decrease : 0.0
COMET INFO: gradient_boosting__min_samples_leaf : 1
COMET INFO: gradient_boosting__min_samples_split : 2
COMET INFO: gradient_boosting__min_weight_fraction_leaf : 0.0
COMET INFO: gradient_boosting__n_estimators : 113
COMET INFO: gradient_boosting__n_iter_no_change : 1
COMET INFO: gradient_boosting__random_state : 1
COMET INFO: gradient_boosting__subsample : 1.0
COMET INFO: gradient_boosting__tol : 0.0001
COMET INFO: gradient_boosting__validation_fraction : 0.1
COMET INFO: gradient_boosting__verbose : 0
COMET INFO: gradient_boosting__warm_start : False
COMET INFO: handle_unknown : ignore
COMET INFO: init : 1
COMET INFO: intercept_scaling : 1
COMET INFO: kernel : linear
COMET INFO: l1_ratio : 1
COMET INFO: learning_rate :
0.11487120000946097
COMET INFO: loss : log_loss
29
COMET INFO: max_categories : 1

COMET INFO: max_depth : 4
COMET INFO: max_features : 600
COMET INFO: max_iter : 100
COMET INFO: max_leaf_nodes : 1
COMET INFO: max_samples : 1
COMET INFO: min_frequency : 1
COMET INFO: min_impurity_decrease : 0.0
COMET INFO: min_samples_leaf : 1
COMET INFO: min_samples_split : 2
COMET INFO: min_weight_fraction_leaf : 0.0
COMET INFO: multi_class : auto
COMET INFO: n_estimators : 113
COMET INFO: n_iter_no_change : 1
COMET INFO: n_jobs : 1
COMET INFO: oob_score : False
COMET INFO: passthrough : False
COMET INFO: penalty : l2
COMET INFO: probability : True
COMET INFO: random_forest :
RandomForestClassifier(max_depth=20, min_samples_leaf=2,
min_samples_split=9,
n_estimators=295)
COMET INFO: random_forest__bootstrap : True
COMET INFO: random_forest__ccp_alpha : 0.0
COMET INFO: random_forest__class_weight : 1
COMET INFO: random_forest__criterion : gini
COMET INFO: random_forest__max_depth : 20
COMET INFO: random_forest__max_features : sqrt
COMET INFO: random_forest__max_leaf_nodes : 1
COMET INFO: random_forest__max_samples : 1
COMET INFO: random_forest__min_impurity_decrease : 0.0
COMET INFO: random_forest__min_samples_leaf : 2
COMET INFO: random_forest__min_samples_split : 9
COMET INFO: random_forest__min_weight_fraction_leaf : 0.0
COMET INFO: random_forest__n_estimators : 295
COMET INFO: random_forest__n_jobs : 1
COMET INFO: random_forest__oob_score : False
COMET INFO: random_forest__random_state : 1
COMET INFO: random_forest__verbose : 0
COMET INFO: random_forest__warm_start : False
COMET INFO: random_state : 181532
COMET INFO: rf_max_depth : 20
COMET INFO: rf_min_samples_leaf : 2
COMET INFO: rf_min_samples_split : 9
COMET INFO: rf_n_estimators : 295
COMET INFO: shrinking : True
COMET INFO: solver : lbfgs
COMET INFO: sparse : False
30
COMET INFO: sparse_output : False

COMET INFO: splitter : best
COMET INFO: stack_method : auto
COMET INFO: strategy : prior
COMET INFO: subsample : 1.0
COMET INFO: svm : SVC(C=1,
kernel='linear', probability=True)
COMET INFO: svm_C : 1
COMET INFO: svm__C : 1
COMET INFO: svm__break_ties : False
COMET INFO: svm__cache_size : 200
COMET INFO: svm__class_weight : 1
COMET INFO: svm__coef0 : 0.0
COMET INFO: svm__decision_function_shape : ovr
COMET INFO: svm__degree : 3
COMET INFO: svm__gamma : scale
COMET INFO: svm__kernel : linear
COMET INFO: svm__max_iter : -1
COMET INFO: svm__probability : True
COMET INFO: svm__random_state : 1
COMET INFO: svm__shrinking : True
COMET INFO: svm__tol : 0.001
COMET INFO: svm__verbose : False
COMET INFO: svm_kernel : linear
COMET INFO: tol : 0.0001
COMET INFO: validation_fraction : 0.1
COMET INFO: verbose : 0
COMET INFO: warm_start : False
COMET INFO: with_mean : True
COMET INFO: with_std : True
COMET INFO: Uploads:
COMET INFO: conda-environment-definition : 1
COMET INFO: conda-info : 1
COMET INFO: conda-specification : 1
COMET INFO: environment details : 1
COMET INFO: figures : 5
COMET INFO: filename : 1
COMET INFO: installed packages : 1
COMET INFO: notebook : 1
COMET INFO: os packages : 1
COMET INFO: source_code : 1
COMET INFO:
COMET INFO: Please wait for metadata to finish uploading (timeout is
3600 seconds)
6. Interpret the results

Detail explanation and interpret the results into business insights in the blog.
31
References:
• GitHub Repository
• Kaggle Project
• Medium Article
👏If you found this article interesting, your support by commenting your insights in this post will
help me spread the knowledge to others.
❗Found the articles helpful? Get UNLIMITED access to every story on Medium with just
$1/week — HERE
☕Buy Me a Coffee — HERE
32

Customer Churn Prediction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Customer Churn Prediction

Uploaded by

Copyright:

Available Formats

https://www.linkedin.com/in/yennhi95zz/ || https://github.com/yennhi95zz || https://medium.

A Hands-on Project: Enhancing Customer Churn

3. Preprocessing: Preprocess data by encoding and scaling features.

5. Hyperparameter Tuning: Use Optuna to optimize hyperparameters for the models.

6. Ensemble Modeling: Create a stacking ensemble of models for improved predictions.

7. Optimization Results: Display the best hyperparameters and accuracy.

8. End Experiment: Conclude the Comet ML experiment.

from kaggle_secrets import UserSecretsClient

# Set display options to show all columns

COMET WARNING: As you are running in a Jupyter environment, you will

customerID gender SeniorCitizen Partner Dependents tenure

MultipleLines InternetService OnlineSecurity OnlineBackup \

DeviceProtection TechSupport StreamingTV StreamingMovies

PaperlessBilling PaymentMethod MonthlyCharges

3. Perform EDA on the Dataset:

# Convert 'TotalCharges' column to numerical

# Drop rows with missing values

3.1. Customer Churn Distribution

# Plot 1: Class Distribution (Churn vs. Non-Churn)

# Adding data labels (rounded) to the bars

# Log the plot to Comet

3.2. Numeric Feature Distribution:

# Plot 2: Numeric Feature Distribution

3.3. Categorical Feature Distribution:

# Plot 3: Categorical Feature Distribution

3.4. Correlation Heatmap:

Understanding feature correlations can help in feature selection. For instance, if

# Plot 4: Correlation Heatmap

/tmp/ipykernel_32/829961921.py:3: FutureWarning: The default value of

3.5. Monthly Charges vs. Total Charges:

# Plot 5: Monthly Charges vs. Total Charges

encoder = OneHotEncoder(handle_unknown="ignore", sparse=False)

X_train, X_val, y_train, y_val = train_test_split(data.drop("Churn",

X_train_processed = np.concatenate((X_train_encoded, X_train_scaled),

# Split data into features and target

# Split data into train and validation sets

5. Model Training and Hyperparameter Tuning:

• Simple and interpretable model.

Random Forest Classifier (rf):

• Ensemble method combining multiple decision trees.

Gradient Boosting Classifier (gb):

• Sequential ensemble building strong predictive power.

Support Vector Machine (svm):

• Versatile model for linear and non-linear data.

5.1. Define an Optuna Objective Function:

# Create models with suggested hyperparameters

# Train individual models

# Evaluate individual models on validation data

# Calculate accuracy and ROC AUC for individual models

# Create a stacking ensemble with trained models

# Train the stacking ensemble

# Evaluate the stacking ensemble on validation data

# Log parameters and metrics to Comet ML

# Return the negative accuracy as Optuna aims to minimize the

5.2. Optuna Hyperparameter Optimization:

from tabulate import tabulate

# Create and optimize the study

# Get the best hyperparameters and results

# Convert the dictionary to a list of key-value pairs for tabulation

# Display the best_rf_params table

[I 2023-09-12 12:38:35,983] A new study created in memory with name:

with value: -0.7867803837953091.

[I 2023-09-12 12:53:54,938] Trial 16 finished with value: -

0.7846481876332623 and parameters: {'rf_n_estimators': 283,