Christina's 5 Analytical Questions

Q1: Who were our most loyal customers?

Q2: Did longer calls yield higer sales?

Q3: On average, were males more likely to call

than females? If so, how much more? Knowing
this could help with targeting the company's
marketing campaigns, and be a more effectve
way to tailor the messagea according to one's
gender. For example, one could use specific
words toward their occupations, toward
mothers, businessmen who like things to be
concise, or people who need extra time
understand the full scope of products or
services offered before making an informed
decision? This would help the telemarketer
relate better to the listener, and thus increase
more sales calls.
Q4: Did married couples close more sales due
to their combined incomes, or was there an
equal distribution between singles and married
couples in the number of calls made?
Q5: Out of the 30-40 year olds who were
mainly targeted, and the 50-60 year olds who
were the next group to be targeted, which

coverage plan gained the most popularity?
Was it selected by the price or by an added
health feature that compelled clients to
choose one over the other? #Note to self: See
my notebook on Google Drive.
Graphs to Consider:
#1. Bar graph -great way to show relative sizes: could depict most
popular vs least popular cover plan

#2. Box and Whisker plot - great for depicting numerical data (such as
number of sales made) through the quartiles
sns.boxplot( x=df["Sale_Status"], y=df["Verified_Date"] )

#3. Heat map - appropriate to use for conversion rate and revenue for
Qs 2,3,4
graphical representation of data where each value of a matrix is represented as a color.
Create a dataset df = pd.DataFrame(np.random.random((5,5)), columns=["a","b","c","d","e"])
Default heatmap p1 = sns.heatmap(df)

#4.violinplot - Comparing Marital_Status vs Age. Formula:

x="Marital_Status_x", y="Age")

#5) Plot bar graph: EX: df_calls= df.groupby(['CampaignID']).count()

df_calls = df_calls.drop(df_calls.columns.difference(['Cust_ID']), 1) df_calls =
df_calls.rename(columns={"Cust_ID": "Total # of calls"}) df_calls.sort_values("Total # of
calls", ascending=False, inplace=True) df_calls.head(20)</font>:

#6) correlogram # library & dataset

import seaborn as sns df = sns.load_dataset('data_post2021.csv') import matplotlib.pyplot
as plt# Basic correlogram sns.pairplot(df)

#Useful column names to consider: 1) Cover_Level 2) Family_To_Cover 3)

Cust_Sex 4) Policy_Status 5)Premium 6) Product_Category (use this one)
7)Benefit_Level 8)HistoryID 9) Sale_Status (use this one) 10)
Verified_Date 12) HistoryID

Keep: 'Call_Result', 'avg_est_income',

'avg_bal_01', 'avg_bal_avail',
'Marital_Status_x', 'Postal_Code', 'Cust_Sex',
'Batch_ID' (super helpful #This column has 58
unique entries. The exact definition of this
column and each of its entries will be
beneficial. This column indicates the sequence
we dialled the leads for the campaign).'Age',
'Policy_no' (useful b/c it's part of Customer Data History dataset which represents the
information of the policy sold successfully to clients over the phone. This is the history of the
customers with policy information that has previously been sold to the customer, it will
indicate if the policy is active or not and other relevant information). Policy_Status was the
only one that needed clarification- A – Active policy based on feedback from the client these
policies are still active (premium paying) on their policy admin system. C – Cancelled policy
based on feedback from the client these policies have either lapsed or have been cancelled
on their policy admin system.

Drop: 'CampaignID', 'Cust_ID', 'Call_Start',

'Call_End', 'Connection_ID', 'Emp_ID',
'Call_Time_seconds', 'wage_earner', 'ID_No',
'Lang_x', ''InceptionDateCorrected',
'Campaign_Type', 'Team_ID',
'EmploymentDate', 'Employee_Gender', 'Race'
In [16]:
# <font color='#9531A9'> Q1) Who were our most loyal customers? </font>

In [33]:
import numpy as n
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [34]:
Batch_ID = pd.read_csv('data_post2021.csv')
Batch_ID.head() #This column has 58 unique entries. The exact definition of this
#This column indicates the sequence we dialled the leads for the campaign

In [35]:
Batch_ID.shape #loading and inspecting data

In [20]:
Batch_ID.dtypes #loading and inspecting data

In [21]:
Batch_ID.columns #loading and inspecting data

In [22]:
Batch_ID.apply('nunique') #loading and inspecting data

In [23]:
Batch_ID = Batch_ID.drop([['CampaignID', 'Cust_ID', 'Effective_Date','Call_Start'

In [24]:
Batch_ID = Batch_ID.drop(['CampaignID'], axis=1)

In [25]:
Batch_ID = Batch_ID.rename(columns={"Cust_Sex": "Cust_Gender"})

In [26]:
Batch_ID = Batch_ID.rename(columns={"Avg_est_income": "Avg_income"})

In [27]:
print(Batch_ID.shape) #removing duplicates
duplicate_rows_df = df[df.duplicated()] #rows containing duplicate data


In [28]:
Batch_ID = Batch_ID.drop_duplicates(keep='Verified_Date')

In [29]:
Batch_ID.dtypes #data types

In [30]:
Batch_ID = Batch_ID.drop(["Verified_Date", "Postal_Cde","Effective_Date"], axis

In [31]:
Batch_ID['Verified_Date'] = pd.to_datetime(Batch_ID['Verified_Date']) #needed to be

In [32]:
Batch_ID.Postal_Code = pd.to_int(Batch_ID["Postal_Code"]) #needed to be renamed

In [40]:
Batch_ID["Postal_Code"] = Batch_ID["Postal_Code”].astype(int)

In [41]:
Batch_ID["Postal_Code"] = Batch_ID["Postal_Code"].astype(int)

In [ ]:
Batch_ID.Cover_Amount = pd.to_int64(Batch_ID["Cover_Amount"]) #needed to be renamed

In [ ]:
print(Batch_ID.isnull().sum()) #missing values

In [36]:
! pip install missingno

In [37]:
import missingno as msno


In [39]:
Batch_ID = Batch_ID([])

In [38]:
Batch_ID = Batch_ID.drop(["Verified_Date"], axis=1 #Verified_Date - doesnt look lik
Batch_ID = Batch_ID.drop(["Effective_Date"], axis=1 #Effective_Date -had 00:00.0 in
Batch_ID = Batch_ID.drop(["Date_of_Debit"], axis=1 #Date_of_Debit had 00:00.0 in ev

In [42]:

