Professional Documents
Culture Documents
Group Name C
Check target variable distribution: It is a simple yet crucial step to see if the dataset upholds
any class imbalance issues. It can be observed that the data set is imbalanced with a high
proportion of active customers compared to their churned counterparts.
Identify unique values: We investigated the unique values in each categorical variables, we
get an insight that the customers are either on a month-to-month rolling contract or on a
fixed contract for one/two years. Also, they are paying bills via credit card, bank transfer or
electronic checks.
A few observations can be made based on the bar charts and histograms for numerical
variables:
Gender distribution shows that the dataset features a relatively equal proportion of male and
female customers. Almost half of the customers in our dataset are female whilst the other
half are male.
From the dataset it can be inferred that most of the customers are younger people. The
number of younger customers is 5901 and number of senior citizens are 1142.
Most of the customers seem to have phone service and 3/4th of them have opted for
paperless Billing
Monthly charges span anywhere between $18 to $118 per customer with a huge proportion
of customers on $20 segment.
There are a lot of new customers in the organization (less than 10 months old) followed by a
loyal customer segment that stays for more than 70 months on average.
Distribution of contract type: Most of the customers seem to have a prepaid connection with
the telecom company. On the other hand, there are a more or less equal proportion of
customers in the 1-year and 2-year contracts.
Distribution of payment method type: The dataset indicates that customers prefer to pay
their bills electronically the most followed by bank transfer, credit card and mailed checks.
Most of the customers have phone service out of which almost half of the customers have
multiple lines.
3/4th of the customers has opted for internet service via Fiber Optic and DSL connections
with almost half of the internet users subscribing to streaming TV and movies.
Customers who have availed Online Backup, Device Protection, Technical Support and Online
Security features are a minority.
Churn Rate by Payment Method Type: Customers who pay via bank transfers seem to have
the lowest churn rate among all the payment method segments.
Strongest positive correlation with the target features is Monthly Charges and Age whilst
negative correlation is with Partner, Dependents and Tenure.
When we combined the insights of 3 parameters i.e. Tenure, Monthly Charges & Total Charges
then the picture is bit clear: Higher Monthly Charge at lower tenure results into lower Total
Charge. Hence, all these 3 factors viz Higher Monthly Charge, Lower tenure and Lower Total
Charge are linked to High Churn.
The difference between a Month-to-month and annual contracts is bigger, and can lead to a
conclusion that annual contracts are better to retain the clients, perhaps fidelity promotions
could aid to reduce the churn rate.
Lastly, the tenure of the churned clients. Most clients just used the service for one month,
seems like the clients used to service to check the quality or they couldn't stay for the amount
of charges, as the Monthly Charges for these clients was high and the Total Charges was small,
as the client just stayed a little time.
The type of contract has a strict relationship with churned clients, Month-to-month contracts
with high amount of charges could lead a client to leave the service.
In the coming week, the analytics team will start working on Feature engineering to
transform raw data into features that better represent the underlying problem to the
predictive models, resulting in improved model accuracy on unseen data.
Preparing the proper input dataset, compatible with the machine learning algorithm
requirements.