You are on page 1of 2

Questions to think through, when preparing on the topics of customer segmentation:

1. Explain the rationale behind using customer segmentation algorithms, and how
they contribute to the overall business strategy?
2. Describe a specific customer segmentation algorithm you have worked with in the
past. What were the key parameters or features considered, and how did it enhance
the understanding of customer behavior?
3. How do you handle challenges related to data quality and completeness when
implementing a customer segmentation algorithm, and what impact can such challenges
have on the results?
4. Can you discuss a scenario where you used clustering techniques for customer
segmentation? What were the main challenges, and how did you evaluate the
effectiveness of the segmentation?
5. In the context of customer segmentation, how do you ensure that the algorithm's
outputs are interpretable and actionable for the marketing or sales teams?

Below is the famous Titanic dataset from Kaggle for a simple customer segmentation
example. In this case, we'll use K-means clustering to segment passengers based on
their age and fare. Please note that customer segmentation in real-world scenarios
might require more features and preprocessing.

import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Load Titanic dataset from Kaggle


url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/
titanic.csv'
titanic_data = pd.read_csv(url)

# Select relevant features for clustering


features = titanic_data[['Age', 'Fare']].dropna()

# Standardize the features


scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

# Determine the optimal number of clusters using the Elbow Method


inertia = []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, random_state=42)
kmeans.fit(features_scaled)
inertia.append(kmeans.inertia_)

# Plot the Elbow Method graph


plt.plot(range(1, 11), inertia, marker='o')
plt.title('Elbow Method for Optimal k')
plt.xlabel('Number of Clusters (k)')
plt.ylabel('Inertia')
plt.show()

# Choose the optimal number of clusters (k) based on the elbow method (e.g., k=3)
optimal_k = 3

# Apply K-means clustering with the chosen k


kmeans = KMeans(n_clusters=optimal_k, random_state=42)
titanic_data['Cluster'] = kmeans.fit_predict(features_scaled)
# Display the first few rows with cluster assignments
print(titanic_data[['Age', 'Fare', 'Cluster']].head())

This code uses the Titanic dataset to perform K-means clustering on passengers
based on their age and fare. The Elbow Method is used to determine the optimal
number of clusters. The resulting clusters are assigned to the 'Cluster' column in
the dataset.

You might also like