You are on page 1of 23

comcast

May 30, 2020

1 By: Alok Kumar


Email:kumar.alok988@gmail.com

Mob:+918800699746

1.0.1 Check current working directory

[44]: pwd

[44]: 'C:\\Users\\91880\\Desktop\\comcast_py'

1.0.2 Set working directory to working path

[45]: cd C:\Users\91880\Desktop\comcast_py

C:\Users\91880\Desktop\comcast_py

[46]: pwd

[46]: 'C:\\Users\\91880\\Desktop\\comcast_py'

1
2 Comcast Telecom Consumer Complaints
2.1 Analysis Task
2.1.1 - Import data into Python environment.
2.1.2 - Provide the trend chart for the number of complaints at monthly and daily
granularity levels.
2.1.3 - Provide a table with the frequency of complaint types.
2.1.4 Which complaint types are maximum i.e., around internet, network issues, or
across any other domains.
2.1.5 - Create a new categorical variable with value as Open and Closed. Open &
Pending is to be categorized as Open and Closed & Solved is to be categorized
as Closed.
2.1.6 - Provide state wise status of complaints in a stacked bar chart. Use the cate-
gorized variable from Q3. Provide insights on:
2.1.7 Which state has the maximum complaints
2.1.8 Which state has the highest percentage of unresolved complaints
2.1.9 - Provide the percentage of complaints resolved till date, which were received
through the Internet and customer care calls.
2.2 STEP 1 :Import Python Libraries
2.2.1 For numerical python

[47]: import numpy as np

2.2.2 For Managing Datasets

[48]: import pandas as pd

2.2.3 For Scientific Computing

[49]: import scipy as sp

2.2.4 Matplotlib for data visualisation

[50]: from matplotlib import pyplot as plt

2.2.5 For Styling Visualisations

[51]: from matplotlib import style

2
2.2.6 To Display Plot Within Jupiter Notebook

[52]: %matplotlib inline

2.2.7 To plot count plot

[53]: import seaborn as sns

2.3 STEP 2:Import And Understand Data


[94]: comcast=pd.read_csv("Comcast_telecom_complaints_data.csv")
comcast

[94]: Ticket # Customer Complaint Date \


0 250635 Comcast Cable Internet Speeds 22-04-15
1 223441 Payment disappear - service got disconnected 04-08-15
2 242732 Speed and Service 18-04-15
3 277946 Comcast Imposed a New Usage Cap of 300GB that … 05-07-15
4 307175 Comcast not working and no service to boot 26-05-15
… … … …
2219 213550 Service Availability 04-02-15
2220 318775 Comcast Monthly Billing for Returned Modem 06-02-15
2221 331188 complaint about comcast 06-09-15
2222 360489 Extremely unsatisfied Comcast customer 23-06-15
2223 363614 Comcast, Ypsilanti MI Internet Speed 24-06-15

Date_month_year Time Received Via City State \


0 22-Apr-15 3:53:50 PM Customer Care Call Abingdon Maryland
1 04-Aug-15 10:22:56 AM Internet Acworth Georgia
2 18-Apr-15 9:55:47 AM Internet Acworth Georgia
3 05-Jul-15 11:59:35 AM Internet Acworth Georgia
4 26-May-15 1:25:26 PM Internet Acworth Georgia
… … … … … …
2219 04-Feb-15 9:13:18 AM Customer Care Call Youngstown Florida
2220 06-Feb-15 1:24:39 PM Customer Care Call Ypsilanti Michigan
2221 06-Sep-15 5:28:41 PM Internet Ypsilanti Michigan
2222 23-Jun-15 11:13:30 PM Customer Care Call Ypsilanti Michigan
2223 24-Jun-15 10:28:33 PM Customer Care Call Ypsilanti Michigan

Zip code Status Filing on Behalf of Someone


0 21009 Closed No
1 30102 Closed No
2 30101 Closed Yes
3 30101 Open Yes
4 30101 Solved No
… … … …
2219 32466 Closed No

3
2220 48197 Solved No
2221 48197 Solved No
2222 48197 Solved No
2223 48198 Open Yes

[2224 rows x 11 columns]

Above command Import data into Python environment.

2.3.1 Check For Missing Values

[95]: comcast[comcast.isnull()].count()

[95]: Ticket # 0
Customer Complaint 0
Date 0
Date_month_year 0
Time 0
Received Via 0
City 0
State 0
Zip code 0
Status 0
Filing on Behalf of Someone 0
dtype: int64

[97]: comcast.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2224 entries, 0 to 2223
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Ticket # 2224 non-null object
1 Customer Complaint 2224 non-null object
2 Date 2224 non-null object
3 Date_month_year 2224 non-null object
4 Time 2224 non-null object
5 Received Via 2224 non-null object
6 City 2224 non-null object
7 State 2224 non-null object
8 Zip code 2224 non-null int64
9 Status 2224 non-null object
10 Filing on Behalf of Someone 2224 non-null object
dtypes: int64(1), object(10)
memory usage: 191.2+ KB

4
From above we conclude that there is not any missing values present in COMCAST
data

2.3.2 Explain Data Briefly

[57]: comcast.describe(include=['object'])

[57]: Ticket # Customer Complaint Date Date_month_year Time \


count 2224 2224 2224 2224 2224
unique 2224 1841 91 91 2190
top 366714 Comcast 24-06-15 24-Jun-15 12:43:14 PM
freq 1 83 218 218 2

Received Via City State Status \


count 2224 2224 2224 2224
unique 2 928 43 4
top Customer Care Call Atlanta Georgia Solved
freq 1119 63 288 973

Filing on Behalf of Someone


count 2224
unique 2
top No
freq 2021

[58]: comcast.describe(include=['number'])

[58]: Zip code


count 2224.000000
mean 47994.393435
std 28885.279427
min 1075.000000
25% 30056.500000
50% 37211.000000
75% 77058.750000
max 99223.000000

[59]: comcast.describe(include='all')

[59]: Ticket # Customer Complaint Date Date_month_year Time \


count 2224 2224 2224 2224 2224
unique 2224 1841 91 91 2190
top 366714 Comcast 24-06-15 24-Jun-15 12:43:14 PM
freq 1 83 218 218 2
mean NaN NaN NaN NaN NaN
std NaN NaN NaN NaN NaN
min NaN NaN NaN NaN NaN

5
25% NaN NaN NaN NaN NaN
50% NaN NaN NaN NaN NaN
75% NaN NaN NaN NaN NaN
max NaN NaN NaN NaN NaN

Received Via City State Zip code Status \


count 2224 2224 2224 2224.000000 2224
unique 2 928 43 NaN 4
top Customer Care Call Atlanta Georgia NaN Solved
freq 1119 63 288 NaN 973
mean NaN NaN NaN 47994.393435 NaN
std NaN NaN NaN 28885.279427 NaN
min NaN NaN NaN 1075.000000 NaN
25% NaN NaN NaN 30056.500000 NaN
50% NaN NaN NaN 37211.000000 NaN
75% NaN NaN NaN 77058.750000 NaN
max NaN NaN NaN 99223.000000 NaN

Filing on Behalf of Someone


count 2224
unique 2
top No
freq 2021
mean NaN
std NaN
min NaN
25% NaN
50% NaN
75% NaN
max NaN

Clearly from above we get many useful info about data. For
Numeric columns(count,min,max,mean,IQR info) and for string
columns(count,unique,top,freq) are depicted. For the columns with strings, NaN was
returned for numeric operations.

2.3.3 Provide the trend chart for the number of complaints at monthly and daily
granularity levels
Cast Date_Month_Year to date type
[60]: comcast['Date_month_year'] = pd.to_datetime(comcast['Date_month_year'])
comcast['Date_month_year']

[60]: 0 2015-04-22
1 2015-08-04
2 2015-04-18
3 2015-07-05

6
4 2015-05-26

2219 2015-02-04
2220 2015-02-06
2221 2015-09-06
2222 2015-06-23
2223 2015-06-24
Name: Date_month_year, Length: 2224, dtype: datetime64[ns]

month details by month as numeric value


[61]: comcast['month_by_num']=comcast['Date_month_year'].apply(lambda x: x.month)
comcast['month_by_num']

[61]: 0 4
1 8
2 4
3 7
4 5
..
2219 2
2220 2
2221 9
2222 6
2223 6
Name: month_by_num, Length: 2224, dtype: int64

[62]: import calendar


comcast['month_by_name']=comcast['month_by_num'].apply(lambda x: calendar.
,→month_abbr[x])

comcast['month_by_name']

[62]: 0 Apr
1 Aug
2 Apr
3 Jul
4 May

2219 Feb
2220 Feb
2221 Sep
2222 Jun
2223 Jun
Name: month_by_name, Length: 2224, dtype: object

To plot on monthly basis

7
[63]: month_basis = comcast.groupby('month_by_num').count().reset_index()
month_basis

[63]: month_by_num Ticket # Customer Complaint Date Date_month_year Time \


0 1 55 55 55 55 55
1 2 59 59 59 59 59
2 3 45 45 45 45 45
3 4 375 375 375 375 375
4 5 317 317 317 317 317
5 6 1046 1046 1046 1046 1046
6 7 49 49 49 49 49
7 8 67 67 67 67 67
8 9 55 55 55 55 55
9 10 53 53 53 53 53
10 11 38 38 38 38 38
11 12 65 65 65 65 65

Received Via City State Zip code Status Filing on Behalf of Someone \
0 55 55 55 55 55 55
1 59 59 59 59 59 59
2 45 45 45 45 45 45
3 375 375 375 375 375 375
4 317 317 317 317 317 317
5 1046 1046 1046 1046 1046 1046
6 49 49 49 49 49 49
7 67 67 67 67 67 67
8 55 55 55 55 55 55
9 53 53 53 53 53 53
10 38 38 38 38 38 38
11 65 65 65 65 65 65

month_by_name
0 55
1 59
2 45
3 375
4 317
5 1046
6 49
7 67
8 55
9 53
10 38
11 65

[64]: month_basis_name = comcast.groupby('month_by_name').count().reset_index()


month_basis_name

8
[64]: month_by_name Ticket # Customer Complaint Date Date_month_year Time \
0 Apr 375 375 375 375 375
1 Aug 67 67 67 67 67
2 Dec 65 65 65 65 65
3 Feb 59 59 59 59 59
4 Jan 55 55 55 55 55
5 Jul 49 49 49 49 49
6 Jun 1046 1046 1046 1046 1046
7 Mar 45 45 45 45 45
8 May 317 317 317 317 317
9 Nov 38 38 38 38 38
10 Oct 53 53 53 53 53
11 Sep 55 55 55 55 55

Received Via City State Zip code Status Filing on Behalf of Someone \
0 375 375 375 375 375 375
1 67 67 67 67 67 67
2 65 65 65 65 65 65
3 59 59 59 59 59 59
4 55 55 55 55 55 55
5 49 49 49 49 49 49
6 1046 1046 1046 1046 1046 1046
7 45 45 45 45 45 45
8 317 317 317 317 317 317
9 38 38 38 38 38 38
10 53 53 53 53 53 53
11 55 55 55 55 55 55

month_by_num
0 375
1 67
2 65
3 59
4 55
5 49
6 1046
7 45
8 317
9 38
10 53
11 55

[65]: plt.figure(figsize=(18,8))
plt.plot('month_by_num', 'Customer Complaint',data=month_basis,color='green',␣
,→linestyle='solid', linewidth = 3,

marker='h', markerfacecolor='red', markersize=12)


plt.xlabel('Month')

9
plt.ylabel('Complaints')
plt.title('number of complaints at monthly levels')
plt.show()

[66]: plt.figure(figsize=(18,8))
sns.countplot(x='month_by_name', data = comcast, order =␣
,→comcast['month_by_name'].value_counts().index)

plt.title('number_of_complaints_month')
plt.show()

#### From above trend graph it clearly shows that complaint are most in month of June(mid
year).

10
For trend at daily levels
[67]: comcast['Daily_basis'] = comcast['Date_month_year'].apply(lambda x: x.day)
comcast['Daily_basis']

[67]: 0 22
1 4
2 18
3 5
4 26
..
2219 4
2220 6
2221 6
2222 23
2223 24
Name: Daily_basis, Length: 2224, dtype: int64

[68]: comcast['week_day'] = comcast['Date_month_year'].apply(lambda x: x.dayofweek)


comcast['week_day']

[68]: 0 2
1 1
2 5
3 6
4 1
..
2219 2
2220 4
2221 6
2222 1
2223 2
Name: week_day, Length: 2224, dtype: int64

[69]: daymap = {0:'Mon',1:'Tue',2:'Wed',3:'Thur',4:'Fri',5:'Sat',6:'Sun'}


comcast['week_day']=comcast['week_day'].map(daymap)
comcast.head(5)

[69]: Ticket # Customer Complaint Date \


0 250635 Comcast Cable Internet Speeds 22-04-15
1 223441 Payment disappear - service got disconnected 04-08-15
2 242732 Speed and Service 18-04-15
3 277946 Comcast Imposed a New Usage Cap of 300GB that … 05-07-15
4 307175 Comcast not working and no service to boot 26-05-15

Date_month_year Time Received Via City State \


0 2015-04-22 3:53:50 PM Customer Care Call Abingdon Maryland
1 2015-08-04 10:22:56 AM Internet Acworth Georgia

11
2 2015-04-18 9:55:47 AM Internet Acworth Georgia
3 2015-07-05 11:59:35 AM Internet Acworth Georgia
4 2015-05-26 1:25:26 PM Internet Acworth Georgia

Zip code Status Filing on Behalf of Someone month_by_num month_by_name \


0 21009 Closed No 4 Apr
1 30102 Closed No 8 Aug
2 30101 Closed Yes 4 Apr
3 30101 Open Yes 7 Jul
4 30101 Solved No 5 May

Daily_basis week_day
0 22 Wed
1 4 Tue
2 18 Sat
3 5 Sun
4 26 Tue

[70]: daily_basis = comcast.groupby('Daily_basis').count().reset_index()


daily_basis

[70]: Daily_basis Ticket # Customer Complaint Date Date_month_year Time \


0 4 206 206 206 206 206
1 5 131 131 131 131 131
2 6 272 272 272 272 272
3 13 68 68 68 68 68
4 14 54 54 54 54 54
5 15 58 58 58 58 58
6 16 65 65 65 65 65
7 17 60 60 60 60 60
8 18 69 69 69 69 69
9 19 50 50 50 50 50
10 20 51 51 51 51 51
11 21 41 41 41 41 41
12 22 66 66 66 66 66
13 23 225 225 225 225 225
14 24 249 249 249 249 249
15 25 126 126 126 126 126
16 26 90 90 90 90 90
17 27 81 81 81 81 81
18 28 79 79 79 79 79
19 29 87 87 87 87 87
20 30 86 86 86 86 86
21 31 10 10 10 10 10

Received Via City State Zip code Status Filing on Behalf of Someone \
0 206 206 206 206 206 206

12
1 131 131 131 131 131 131
2 272 272 272 272 272 272
3 68 68 68 68 68 68
4 54 54 54 54 54 54
5 58 58 58 58 58 58
6 65 65 65 65 65 65
7 60 60 60 60 60 60
8 69 69 69 69 69 69
9 50 50 50 50 50 50
10 51 51 51 51 51 51
11 41 41 41 41 41 41
12 66 66 66 66 66 66
13 225 225 225 225 225 225
14 249 249 249 249 249 249
15 126 126 126 126 126 126
16 90 90 90 90 90 90
17 81 81 81 81 81 81
18 79 79 79 79 79 79
19 87 87 87 87 87 87
20 86 86 86 86 86 86
21 10 10 10 10 10 10

month_by_num month_by_name week_day


0 206 206 206
1 131 131 131
2 272 272 272
3 68 68 68
4 54 54 54
5 58 58 58
6 65 65 65
7 60 60 60
8 69 69 69
9 50 50 50
10 51 51 51
11 41 41 41
12 66 66 66
13 225 225 225
14 249 249 249
15 126 126 126
16 90 90 90
17 81 81 81
18 79 79 79
19 87 87 87
20 86 86 86
21 10 10 10

13
[71]: plt.figure(figsize=(18,8))
plt.plot('Daily_basis', 'Customer Complaint',data=daily_basis,color='green',␣
,→linestyle='solid', linewidth = 3,

marker='.', markerfacecolor='red', markersize=12)


plt.xlabel('Daily_Basis')
plt.ylabel('Complaints')
plt.title('number of complaints at daily levels')
plt.show()

[72]: plt.figure(figsize=(18,8))
sns.countplot(x='week_day', data = comcast, order = comcast['week_day'].
,→value_counts().index)

plt.title('number of complaints by days')


plt.show()

14
Above trend shows tuesday and wednesday are not so good as more number of com-
plaints are on Tuesday and wednesday

2.3.4 Provide a table with the frequency of complaint types

[73]: comcast['Customer Complaint'] = comcast['Customer Complaint'].str.capitalize()


complaint_type_freq = comcast['Customer Complaint'].value_counts()
complaint_type_freq

[73]: Comcast 102


Comcast data cap 30
Comcast internet 29
Comcast data caps 21
Comcast billing 18

Internet offer rescinded 1
I am so fed up with comcast 1
I had sent out a check payment comcast 1
Comcast, ypsilanti mi internet speed 1
Continue to be scared by comcast 1
Name: Customer Complaint, Length: 1740, dtype: int64

2.3.5 Which complaint types are maximum i.e., around internet, network issues, or
across any other domains

[74]: plt.figure(figsize=(18,8))
complaint_type_freq.head(20).plot(kind="bar")
plt.show()

15
Customers Are Mainly complaining about the Comcast,Data Caps, Internet Speed,
Billing Methods.As per above graph “comcast” have maximum complaints.

2.3.6 Create a new categorical variable with value as Open and Closed. Open &
Pending is to be categorized as Open and Closed & Solved is to be categorized
as Closed.
[75]: comcast['Status_updated'] = comcast['Status'].map({'Solved':'Closed', 'Closed':
,→'Closed', 'Pending':'Open', 'Open':'Open'})

comcast['Status_updated']

[75]: 0 Closed
1 Closed
2 Closed
3 Open
4 Closed

2219 Closed
2220 Closed
2221 Closed
2222 Closed
2223 Open
Name: Status_updated, Length: 2224, dtype: object

[76]: comcast['Status_updated'].unique()

16
[76]: array(['Closed', 'Open'], dtype=object)

2.3.7 Provide state wise status of complaints in a stacked bar chart.

[77]: state_wise = pd.crosstab(comcast['State'], comcast['Status_updated'])


state_wise

[77]: Status_updated Closed Open


State
Alabama 17 9
Arizona 14 6
Arkansas 6 0
California 159 61
Colorado 58 22
Connecticut 9 3
Delaware 8 4
District Of Columbia 14 2
District of Columbia 1 0
Florida 201 39
Georgia 208 80
Illinois 135 29
Indiana 50 9
Iowa 1 0
Kansas 1 1
Kentucky 4 3
Louisiana 12 1
Maine 3 2
Maryland 63 15
Massachusetts 50 11
Michigan 92 23
Minnesota 29 4
Mississippi 23 16
Missouri 3 1
Montana 1 0
Nevada 1 0
New Hampshire 8 4
New Jersey 56 19
New Mexico 11 4
New York 6 0
North Carolina 3 0
Ohio 3 0
Oregon 36 13
Pennsylvania 110 20
Rhode Island 1 0
South Carolina 15 3
Tennessee 96 47
Texas 49 22

17
Utah 16 6
Vermont 2 1
Virginia 49 11
Washington 75 23
West Virginia 8 3

[78]: state_wise.sort_values('Closed',axis = 0,ascending=True).


,→plot(kind="barh",figsize=(12,14), stacked=True)

plt.show()

2.3.8 Use the categorized variable from Q3. Provide insights on:
Which state has the maximum complaints

18
[79]: max_complaints=comcast.groupby(['State']).size().sort_values(ascending=False).
,→to_frame().rename({0: "Complaint count"}, axis=1)[:20]

max_complaints

[79]: Complaint count


State
Georgia 288
Florida 240
California 220
Illinois 164
Tennessee 143
Pennsylvania 130
Michigan 115
Washington 98
Colorado 80
Maryland 78
New Jersey 75
Texas 71
Massachusetts 61
Virginia 60
Indiana 59
Oregon 49
Mississippi 39
Minnesota 33
Alabama 26
Utah 22

[80]: max_complaints.head(20).plot(kind="bar",figsize=(18,8))
plt.show()

19
[81]: comcast.groupby(['State']).size().sort_values(ascending=False).to_frame().
,→rename({0: "Complaint count"}, axis=1)[:1]

[81]: Complaint count


State
Georgia 288

[82]: Highest_complaint = comcast.groupby(['State','Status_updated']).size().


,→unstack().fillna(0)

Highest_complaint.sort_values('Closed',axis = 0,ascending=False)[:1]

[82]: Status_updated Closed Open


State
Georgia 208.0 80.0

highest percentage of unresolved complaints


[83]: state_wise["Total"] = state_wise.Open + state_wise.Closed

[84]: Highest_complaint['Resolved_cmp_prct'] = Highest_complaint['Closed']/


,→state_wise["Total"]*100

Highest_complaint['Unresolved_cmp_prct'] = Highest_complaint['Open']/
,→state_wise["Total"]*100

[85]: Percent=Highest_complaint.sort_values('Unresolved_cmp_prct',axis =␣
,→0,ascending=False)[:20]

Percent

[85]: Status_updated Closed Open Resolved_cmp_prct Unresolved_cmp_prct


State
Kansas 1.0 1.0 50.000000 50.000000
Kentucky 4.0 3.0 57.142857 42.857143
Mississippi 23.0 16.0 58.974359 41.025641
Maine 3.0 2.0 60.000000 40.000000
Alabama 17.0 9.0 65.384615 34.615385
Vermont 2.0 1.0 66.666667 33.333333
Delaware 8.0 4.0 66.666667 33.333333
New Hampshire 8.0 4.0 66.666667 33.333333
Tennessee 96.0 47.0 67.132867 32.867133
Texas 49.0 22.0 69.014085 30.985915
Arizona 14.0 6.0 70.000000 30.000000
Georgia 208.0 80.0 72.222222 27.777778
California 159.0 61.0 72.272727 27.727273
Colorado 58.0 22.0 72.500000 27.500000
Utah 16.0 6.0 72.727273 27.272727
West Virginia 8.0 3.0 72.727273 27.272727

20
New Mexico 11.0 4.0 73.333333 26.666667
Oregon 36.0 13.0 73.469388 26.530612
New Jersey 56.0 19.0 74.666667 25.333333
Missouri 3.0 1.0 75.000000 25.000000

[86]: Percent.head(20).plot(kind="bar",figsize=(18,8))
plt.show()

[87]: Highest_complaint.sort_values('Unresolved_cmp_prct',axis = 0,ascending=False)[:


,→1]

[87]: Status_updated Closed Open Resolved_cmp_prct Unresolved_cmp_prct


State
Kansas 1.0 1.0 50.0 50.0

Clearly kansas has highest percentage of unresolved complaints

2.3.9 Provide the percentage of complaints resolved till date, which were received
through the Internet and customer care calls.

[88]: complaints_resolved = pd.crosstab(comcast['Received Via'],␣


,→comcast['Status_updated'])

[89]: complaints_resolved['Tot']=complaints_resolved.Open+complaints_resolved.Closed
complaints_resolved

21
[89]: Status_updated Closed Open Tot
Received Via
Customer Care Call 864 255 1119
Internet 843 262 1105

[90]: complaints_resolved['resolved'] =complaints_resolved['Closed']/


,→complaints_resolved['Tot']*100

complaints_resolved['resolved']

[90]: Received Via


Customer Care Call 77.211796
Internet 76.289593
Name: resolved, dtype: float64

[91]: slices = [864,255]


activities = ['resolved','unresolved']
cols = ['c','m']

plt.pie(slices,
labels=activities,
colors=cols,
startangle=90,
shadow= True,
explode=(0,0.07),
autopct='%1.1f%%')

plt.title('Status of Complaints via customer care call')


plt.show()

22
[92]: slices = [843,262]
activities = ['resolved','unresolved']
cols = ['r','b']

plt.pie(slices,
labels=activities,
colors=cols,
startangle=90,
shadow= True,
explode=(0,0.07),
autopct='%1.1f%%')

plt.title('Status of Complaints via internet')


plt.show()

Above are the complaints resolves which received via internet and customer care calls

23

You might also like