Project (Batch - ID)

Final Elucidate AI project (Batch_ID) 11/14/21, 9:01 PM
Christina's 5 Analytical Questions
Q1: Who were our most loyal customers?
Q2: Did longer calls yield higer sales?
Q3: On average, were males more likely to call

than females? If so, how much more? Knowing
this could help with targeting the company's
marketing campaigns, and be a more effectve
way to tailor the messagea according to one's
gender. For example, one could use specific
words toward their occupations, toward
mothers, businessmen who like things to be
concise, or people who need extra time
understand the full scope of products or
services offered before making an informed
decision? This would help the telemarketer
relate better to the listener, and thus increase
more sales calls.
Q4: Did married couples close more sales due
to their combined incomes, or was there an
equal distribution between singles and married
couples in the number of calls made?
Q5: Out of the 30-40 year olds who were
mainly targeted, and the 50-60 year olds who
http://localhost:8888/nbconvert/html/Final%20Elucidate%20AI%20project%20(Batch_ID)%20.ipynb?download=false Page 1 of 10
were the next group to be targeted, which

coverage plan gained the most popularity?
Was it selected by the price or by an added
health feature that compelled clients to
choose one over the other? #Note to self: See
my notebook on Google Drive.
Graphs to Consider:
#1. Bar graph -great way to show relative sizes: could depict most
popular vs least popular cover plan
sns.barplot
#2. Box and Whisker plot - great for depicting numerical data (such as
number of sales made) through the quartiles
sns.boxplot( x=df["Sale_Status"], y=df["Verified_Date"] )
#3. Heat map - appropriate to use for conversion rate and revenue for
Qs 2,3,4
graphical representation of data where each value of a matrix is represented as a color.
Create a dataset df = pd.DataFrame(np.random.random((5,5)), columns=["a","b","c","d","e"])
Default heatmap p1 = sns.heatmap(df)
#4.violinplot - Comparing Marital_Status vs Age. Formula:

sns.violinplot(data=df,
x="Marital_Status_x", y="Age")
#5) Plot bar graph: EX: df_calls= df.groupby(['CampaignID']).count()

df_calls = df_calls.drop(df_calls.columns.difference(['Cust_ID']), 1) df_calls =
df_calls.rename(columns={"Cust_ID": "Total # of calls"}) df_calls.sort_values("Total # of
calls", ascending=False, inplace=True) df_calls.plot.bar() df_calls.head(20).plot.bar()</font>:
#6) correlogram # library & dataset

import seaborn as sns df = sns.load_dataset('data_post2021.csv') import matplotlib.pyplot
as plt# Basic correlogram sns.pairplot(df) sns.plt.show()
#Useful column names to consider: 1) Cover_Level 2) Family_To_Cover 3)

Cust_Sex 4) Policy_Status 5)Premium 6) Product_Category (use this one)
7)Benefit_Level 8)HistoryID 9) Sale_Status (use this one) 10)
Verified_Date 12) HistoryID
Keep: 'Call_Result', 'avg_est_income',

'avg_bal_01', 'avg_bal_avail',
'Marital_Status_x', 'Postal_Code', 'Cust_Sex',
'Batch_ID' (super helpful #This column has 58
unique entries. The exact definition of this
column and each of its entries will be
beneficial. This column indicates the sequence
we dialled the leads for the campaign).'Age',
'ListSegment',
'Policy_no' (useful b/c it's part of Customer Data History dataset which represents the
information of the policy sold successfully to clients over the phone. This is the history of the
customers with policy information that has previously been sold to the customer, it will
indicate if the policy is active or not and other relevant information). Policy_Status was the
only one that needed clarification- A – Active policy based on feedback from the client these
policies are still active (premium paying) on their policy admin system. C – Cancelled policy
based on feedback from the client these policies have either lapsed or have been cancelled
on their policy admin system.
Drop: 'CampaignID', 'Cust_ID', 'Call_Start',

'Call_End', 'Connection_ID', 'Emp_ID',
'Call_Time_seconds', 'wage_earner', 'ID_No',
'Lang_x', ''InceptionDateCorrected',
'Campaign_Type', 'Team_ID',
'EmploymentDate', 'Employee_Gender', 'Race'
In [16]:
# <font color='#9531A9'> Q1) Who were our most loyal customers? </font>
In [33]:
import numpy as n
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
In [34]:
Batch_ID = pd.read_csv('data_post2021.csv')
Batch_ID.head() #This column has 58 unique entries. The exact definition of this
#This column indicates the sequence we dialled the leads for the campaign
---------------------------------------------------------------------------
IsADirectoryError Traceback (most recent call last)
<ipython-input-34-72523357077b> in <module>
----> 1 Batch_ID = pd.read_csv('data_post2021.csv')
2 Batch_ID.head() #This column has 58 unique entries. The exact defini
tion of this column and each of its entries will be beneficial.
3 #This column indicates the sequence we dialled the leads for the campa
ign
/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in read_csv(fi
lepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze,
prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values
, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, n
a_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_
date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression
, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapec
har, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whites
pace, low_memory, memory_map, float_precision, storage_options)
608 kwds.update(kwds_defaults)
609
--> 610 return _read(filepath_or_buffer, kwds)
611
612
/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in _read(filep
ath_or_buffer, kwds)
460
461 # Create the parser.
--> 462 parser = TextFileReader(filepath_or_buffer, **kwds)
463
464 if chunksize or iterator:
/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in __init__(se
lf, f, engine, **kwds)
817 self.options["has_index_names"] = kwds["has_index_names"]
818
--> 819 self._engine = self._make_engine(self.engine)
820
821 def close(self):
/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in _make_engin
e(self, engine)
1048 )
1049 # error: Too many arguments for "ParserBase"
-> 1050 return mapping[engine](self.f, **self.options) # type: ignore
[call-arg]
1051
1052 def _failover_to_python(self):
/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in __init__(se
lf, src, **kwds)
1865
1866 # open handles
-> 1867 self._open_handles(src, kwds)
1868 assert self.handles is not None
1869 for key in ("storage_options", "encoding", "memory_map", "comp
ression"):
/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in _open_handl
es(self, src, kwds)
1360 Let the readers open IOHanldes after they are done with their
potential raises.
1361 """
-> 1362 self.handles = get_handle(
1363 src,
1364 "r",
/opt/anaconda3/lib/python3.8/site-packages/pandas/io/common.py in get_handle(p
ath_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_
options)
640 errors = "replace"
641 # Encoding
--> 642 handle = open(
643 handle,
644 ioargs.mode,
IsADirectoryError: [Errno 21] Is a directory: 'data_post2021.csv'
In [35]:
Batch_ID.shape #loading and inspecting data
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-35-373762c67413> in <module>
----> 1 Batch_ID.shape #loading and inspecting data
NameError: name 'Batch_ID' is not defined
In [20]:
Batch_ID.dtypes #loading and inspecting data
---------------------------------------------------------------------------
<ipython-input-20-148267869b91> in <module>
----> 1 Batch_ID.dtypes #loading and inspecting data
In [21]:
Batch_ID.columns #loading and inspecting data
---------------------------------------------------------------------------
<ipython-input-21-ef58acf42f5c> in <module>
----> 1 Batch_ID.columns #loading and inspecting data
In [22]:
Batch_ID.apply('nunique') #loading and inspecting data
---------------------------------------------------------------------------
<ipython-input-22-8ec1553fa572> in <module>
----> 1 Batch_ID.apply('nunique') #loading and inspecting data
In [23]:
Batch_ID = Batch_ID.drop([['CampaignID', 'Cust_ID', 'Effective_Date','Call_Start'
File "<ipython-input-23-a94b346ac738>", line 1

Batch_ID = Batch_ID.drop([['CampaignID', 'Cust_ID', 'Effective_Date','Call
_Start', 'Verified_Date','Call_End', 'Connection_ID', 'Emp_ID', 'Call_Time_sec
onds', 'wage_earner', 'ID_No', 'Lang_x','InceptionDateCorrected','Campaign_Typ
e', 'Team_ID', 'EmploymentDate', 'Employee_Gender', 'Race'], axis=1)
^
SyntaxError: invalid syntax
In [24]:
Batch_ID = Batch_ID.drop(['CampaignID'], axis=1)
---------------------------------------------------------------------------
<ipython-input-24-99d56e58c6a8> in <module>
----> 1 Batch_ID = Batch_ID.drop(['CampaignID'], axis=1)
In [25]:
Batch_ID = Batch_ID.rename(columns={"Cust_Sex": "Cust_Gender"})
Batch_ID.head()
---------------------------------------------------------------------------
<ipython-input-25-2d577942fb8f> in <module>
----> 1 Batch_ID = Batch_ID.rename(columns={"Cust_Sex": "Cust_Gender"})
2 Batch_ID.head()
In [26]:
Batch_ID = Batch_ID.rename(columns={"Avg_est_income": "Avg_income"})
Batch_ID.head()
---------------------------------------------------------------------------
<ipython-input-26-e69b42e36748> in <module>
----> 1 Batch_ID = Batch_ID.rename(columns={"Avg_est_income": "Avg_income"})
2 Batch_ID.head()
In [27]:
print(Batch_ID.shape) #removing duplicates
duplicate_rows_df = df[df.duplicated()] #rows containing duplicate data
print(duplicate_rows_df.shape)
---------------------------------------------------------------------------
<ipython-input-27-c4c486b6b206> in <module>
----> 1 print(Batch_ID.shape) #removing duplicates
2 duplicate_rows_df = df[df.duplicated()] #rows containing duplicate dat
a
3
4 print(duplicate_rows_df.shape)
In [28]:
Batch_ID = Batch_ID.drop_duplicates(keep='Verified_Date')
print(Batch_ID.shape)
---------------------------------------------------------------------------
<ipython-input-28-db91953c01d4> in <module>
----> 1 Batch_ID = Batch_ID.drop_duplicates(keep='Verified_Date')
2 print(Batch_ID.shape)
In [29]:
Batch_ID.dtypes #data types
---------------------------------------------------------------------------
<ipython-input-29-1cdb81cf53b0> in <module>
----> 1 Batch_ID.dtypes #data types
In [30]:
Batch_ID = Batch_ID.drop(["Verified_Date", "Postal_Cde","Effective_Date"], axis
Batch_ID.head()
---------------------------------------------------------------------------
<ipython-input-30-964043ab9a00> in <module>
----> 1 Batch_ID = Batch_ID.drop(["Verified_Date", "Postal_Cde","Effective_Dat
e"], axis=1)
2 Batch_ID.head()
In [31]:
Batch_ID['Verified_Date'] = pd.to_datetime(Batch_ID['Verified_Date']) #needed to be
Batch_ID.info()
---------------------------------------------------------------------------
<ipython-input-31-946ff7e96323> in <module>
----> 1 Batch_ID['Verified_Date'] = pd.to_datetime(Batch_ID['Verified_Date'])
#needed to be renamed
2 Batch_ID.info()
In [32]:
Batch_ID.Postal_Code = pd.to_int(Batch_ID["Postal_Code"]) #needed to be renamed
print(Batch_ID.dtypes)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-32-6cdeb9dea063> in <module>
----> 1 Batch_ID.Postal_Code = pd.to_int(Batch_ID["Postal_Code"]) #needed to b
e renamed
2 print(Batch_ID.dtypes)
/opt/anaconda3/lib/python3.8/site-packages/pandas/__init__.py in __getattr__(n
ame)
242 return _SparseArray
243
--> 244 raise AttributeError(f"module 'pandas' has no attribute '{name}'")
245
246
AttributeError: module 'pandas' has no attribute 'to_int'
In [40]:
Batch_ID["Postal_Code"] = Batch_ID["Postal_Code”].astype(int)
File "<ipython-input-40-1b2da4f23f34>", line 1

Batch_ID["Postal_Code"] = Batch_ID["Postal_Code”].astype(int)
^
SyntaxError: EOL while scanning string literal
In [41]:
Batch_ID["Postal_Code"] = Batch_ID["Postal_Code"].astype(int)
---------------------------------------------------------------------------
<ipython-input-41-bb0351a0cb47> in <module>
----> 1 Batch_ID["Postal_Code"] = Batch_ID["Postal_Code"].astype(int)
In [ ]:
Batch_ID.Cover_Amount = pd.to_int64(Batch_ID["Cover_Amount"]) #needed to be renamed
print(Batch_ID.dtypes)
In [ ]:
print(Batch_ID.isnull().sum()) #missing values
In [36]:
! pip install missingno
Requirement already satisfied: missingno in /opt/anaconda3/lib/python3.8/site-

packages (0.5.0)
Requirement already satisfied: matplotlib in /opt/anaconda3/lib/python3.8/site
-packages (from missingno) (3.3.4)
Requirement already satisfied: scipy in /opt/anaconda3/lib/python3.8/site-pack
ages (from missingno) (1.6.2)
Requirement already satisfied: seaborn in /opt/anaconda3/lib/python3.8/site-pa
ckages (from missingno) (0.11.1)
Requirement already satisfied: numpy in /opt/anaconda3/lib/python3.8/site-pack
ages (from missingno) (1.20.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /op
t/anaconda3/lib/python3.8/site-packages (from matplotlib->missingno) (2.4.7)
Requirement already satisfied: pillow>=6.2.0 in /opt/anaconda3/lib/python3.8/s
ite-packages (from matplotlib->missingno) (8.2.0)
Requirement already satisfied: cycler>=0.10 in /opt/anaconda3/lib/python3.8/si
te-packages (from matplotlib->missingno) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in /opt/anaconda3/lib/pyth
on3.8/site-packages (from matplotlib->missingno) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/anaconda3/lib/python3
.8/site-packages (from matplotlib->missingno) (1.3.1)
Requirement already satisfied: six in /opt/anaconda3/lib/python3.8/site-packag
es (from cycler>=0.10->matplotlib->missingno) (1.15.0)
Requirement already satisfied: pandas>=0.23 in /opt/anaconda3/lib/python3.8/si
te-packages (from seaborn->missingno) (1.2.4)
Requirement already satisfied: pytz>=2017.3 in /opt/anaconda3/lib/python3.8/si
te-packages (from pandas>=0.23->seaborn->missingno) (2021.1)
In [37]:
import missingno as msno
msno.matrix(Batch_ID);
---------------------------------------------------------------------------
<ipython-input-37-9fa12fbf4e1c> in <module>
1 import missingno as msno
2
----> 3 msno.matrix(Batch_ID);
In [39]:
Batch_ID = Batch_ID([])
---------------------------------------------------------------------------
<ipython-input-39-3200018f45d5> in <module>
----> 1 Batch_ID = Batch_ID([])
In [38]:
Batch_ID = Batch_ID.drop(["Verified_Date"], axis=1 #Verified_Date - doesnt look lik
Batch_ID = Batch_ID.drop(["Effective_Date"], axis=1 #Effective_Date -had 00:00.0 in
Batch_ID = Batch_ID.drop(["Date_of_Debit"], axis=1 #Date_of_Debit had 00:00.0 in ev
File "<ipython-input-38-816c887da580>", line 2

Batch_ID = Batch_ID.drop(["Effective_Date"], axis=1 #Effective_Date -had 0
0:00.0 in entire column
^
SyntaxError: invalid syntax
In [42]:
Batch_ID.dtypes
---------------------------------------------------------------------------
<ipython-input-42-459ddd2e979d> in <module>
----> 1 Batch_ID.dtypes
In [ ]:
df.
In [ ]:

Project (Batch - ID)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project (Batch - ID)

Uploaded by

Copyright:

Available Formats

Final Elucidate AI project (Batch_ID) 11/14/21, 9:01 PM

Christina's 5 Analytical Questions

Q1: Who were our most loyal customers?

Q2: Did longer calls yield higer sales?

Q3: On average, were males more likely to call

were the next group to be targeted, which

#4.violinplot - Comparing Marital_Status vs Age. Formula:

#5) Plot bar graph: EX: df_calls= df.groupby(['CampaignID']).count()

#6) correlogram # library & dataset

#Useful column names to consider: 1) Cover_Level 2) Family_To_Cover 3)

Keep: 'Call_Result', 'avg_est_income',

Drop: 'CampaignID', 'Cust_ID', 'Call_Start',

IsADirectoryError: [Errno 21] Is a directory: 'data_post2021.csv'

NameError: name 'Batch_ID' is not defined

NameError: name 'Batch_ID' is not defined

NameError: name 'Batch_ID' is not defined

NameError: name 'Batch_ID' is not defined

File "<ipython-input-23-a94b346ac738>", line 1

NameError: name 'Batch_ID' is not defined

NameError: name 'Batch_ID' is not defined

NameError: name 'Batch_ID' is not defined

NameError: name 'Batch_ID' is not defined

NameError: name 'Batch_ID' is not defined

NameError: name 'Batch_ID' is not defined

NameError: name 'Batch_ID' is not defined

NameError: name 'Batch_ID' is not defined

AttributeError: module 'pandas' has no attribute 'to_int'

File "<ipython-input-40-1b2da4f23f34>", line 1

NameError: name 'Batch_ID' is not defined

Requirement already satisfied: missingno in /opt/anaconda3/lib/python3.8/site-

NameError: name 'Batch_ID' is not defined

NameError: name 'Batch_ID' is not defined

File "<ipython-input-38-816c887da580>", line 2

NameError: name 'Batch_ID' is not defined

You might also like