Professional Documents
Culture Documents
Mob:+918800699746
[44]: pwd
[44]: 'C:\\Users\\91880\\Desktop\\comcast_py'
[45]: cd C:\Users\91880\Desktop\comcast_py
C:\Users\91880\Desktop\comcast_py
[46]: pwd
[46]: 'C:\\Users\\91880\\Desktop\\comcast_py'
1
2 Comcast Telecom Consumer Complaints
2.1 Analysis Task
2.1.1 - Import data into Python environment.
2.1.2 - Provide the trend chart for the number of complaints at monthly and daily
granularity levels.
2.1.3 - Provide a table with the frequency of complaint types.
2.1.4 Which complaint types are maximum i.e., around internet, network issues, or
across any other domains.
2.1.5 - Create a new categorical variable with value as Open and Closed. Open &
Pending is to be categorized as Open and Closed & Solved is to be categorized
as Closed.
2.1.6 - Provide state wise status of complaints in a stacked bar chart. Use the cate-
gorized variable from Q3. Provide insights on:
2.1.7 Which state has the maximum complaints
2.1.8 Which state has the highest percentage of unresolved complaints
2.1.9 - Provide the percentage of complaints resolved till date, which were received
through the Internet and customer care calls.
2.2 STEP 1 :Import Python Libraries
2.2.1 For numerical python
2
2.2.6 To Display Plot Within Jupiter Notebook
3
2220 48197 Solved No
2221 48197 Solved No
2222 48197 Solved No
2223 48198 Open Yes
[95]: comcast[comcast.isnull()].count()
[95]: Ticket # 0
Customer Complaint 0
Date 0
Date_month_year 0
Time 0
Received Via 0
City 0
State 0
Zip code 0
Status 0
Filing on Behalf of Someone 0
dtype: int64
[97]: comcast.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2224 entries, 0 to 2223
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Ticket # 2224 non-null object
1 Customer Complaint 2224 non-null object
2 Date 2224 non-null object
3 Date_month_year 2224 non-null object
4 Time 2224 non-null object
5 Received Via 2224 non-null object
6 City 2224 non-null object
7 State 2224 non-null object
8 Zip code 2224 non-null int64
9 Status 2224 non-null object
10 Filing on Behalf of Someone 2224 non-null object
dtypes: int64(1), object(10)
memory usage: 191.2+ KB
4
From above we conclude that there is not any missing values present in COMCAST
data
[57]: comcast.describe(include=['object'])
[58]: comcast.describe(include=['number'])
[59]: comcast.describe(include='all')
5
25% NaN NaN NaN NaN NaN
50% NaN NaN NaN NaN NaN
75% NaN NaN NaN NaN NaN
max NaN NaN NaN NaN NaN
Clearly from above we get many useful info about data. For
Numeric columns(count,min,max,mean,IQR info) and for string
columns(count,unique,top,freq) are depicted. For the columns with strings, NaN was
returned for numeric operations.
2.3.3 Provide the trend chart for the number of complaints at monthly and daily
granularity levels
Cast Date_Month_Year to date type
[60]: comcast['Date_month_year'] = pd.to_datetime(comcast['Date_month_year'])
comcast['Date_month_year']
[60]: 0 2015-04-22
1 2015-08-04
2 2015-04-18
3 2015-07-05
6
4 2015-05-26
…
2219 2015-02-04
2220 2015-02-06
2221 2015-09-06
2222 2015-06-23
2223 2015-06-24
Name: Date_month_year, Length: 2224, dtype: datetime64[ns]
[61]: 0 4
1 8
2 4
3 7
4 5
..
2219 2
2220 2
2221 9
2222 6
2223 6
Name: month_by_num, Length: 2224, dtype: int64
comcast['month_by_name']
[62]: 0 Apr
1 Aug
2 Apr
3 Jul
4 May
…
2219 Feb
2220 Feb
2221 Sep
2222 Jun
2223 Jun
Name: month_by_name, Length: 2224, dtype: object
7
[63]: month_basis = comcast.groupby('month_by_num').count().reset_index()
month_basis
Received Via City State Zip code Status Filing on Behalf of Someone \
0 55 55 55 55 55 55
1 59 59 59 59 59 59
2 45 45 45 45 45 45
3 375 375 375 375 375 375
4 317 317 317 317 317 317
5 1046 1046 1046 1046 1046 1046
6 49 49 49 49 49 49
7 67 67 67 67 67 67
8 55 55 55 55 55 55
9 53 53 53 53 53 53
10 38 38 38 38 38 38
11 65 65 65 65 65 65
month_by_name
0 55
1 59
2 45
3 375
4 317
5 1046
6 49
7 67
8 55
9 53
10 38
11 65
8
[64]: month_by_name Ticket # Customer Complaint Date Date_month_year Time \
0 Apr 375 375 375 375 375
1 Aug 67 67 67 67 67
2 Dec 65 65 65 65 65
3 Feb 59 59 59 59 59
4 Jan 55 55 55 55 55
5 Jul 49 49 49 49 49
6 Jun 1046 1046 1046 1046 1046
7 Mar 45 45 45 45 45
8 May 317 317 317 317 317
9 Nov 38 38 38 38 38
10 Oct 53 53 53 53 53
11 Sep 55 55 55 55 55
Received Via City State Zip code Status Filing on Behalf of Someone \
0 375 375 375 375 375 375
1 67 67 67 67 67 67
2 65 65 65 65 65 65
3 59 59 59 59 59 59
4 55 55 55 55 55 55
5 49 49 49 49 49 49
6 1046 1046 1046 1046 1046 1046
7 45 45 45 45 45 45
8 317 317 317 317 317 317
9 38 38 38 38 38 38
10 53 53 53 53 53 53
11 55 55 55 55 55 55
month_by_num
0 375
1 67
2 65
3 59
4 55
5 49
6 1046
7 45
8 317
9 38
10 53
11 55
[65]: plt.figure(figsize=(18,8))
plt.plot('month_by_num', 'Customer Complaint',data=month_basis,color='green',␣
,→linestyle='solid', linewidth = 3,
9
plt.ylabel('Complaints')
plt.title('number of complaints at monthly levels')
plt.show()
[66]: plt.figure(figsize=(18,8))
sns.countplot(x='month_by_name', data = comcast, order =␣
,→comcast['month_by_name'].value_counts().index)
plt.title('number_of_complaints_month')
plt.show()
#### From above trend graph it clearly shows that complaint are most in month of June(mid
year).
10
For trend at daily levels
[67]: comcast['Daily_basis'] = comcast['Date_month_year'].apply(lambda x: x.day)
comcast['Daily_basis']
[67]: 0 22
1 4
2 18
3 5
4 26
..
2219 4
2220 6
2221 6
2222 23
2223 24
Name: Daily_basis, Length: 2224, dtype: int64
[68]: 0 2
1 1
2 5
3 6
4 1
..
2219 2
2220 4
2221 6
2222 1
2223 2
Name: week_day, Length: 2224, dtype: int64
11
2 2015-04-18 9:55:47 AM Internet Acworth Georgia
3 2015-07-05 11:59:35 AM Internet Acworth Georgia
4 2015-05-26 1:25:26 PM Internet Acworth Georgia
Daily_basis week_day
0 22 Wed
1 4 Tue
2 18 Sat
3 5 Sun
4 26 Tue
Received Via City State Zip code Status Filing on Behalf of Someone \
0 206 206 206 206 206 206
12
1 131 131 131 131 131 131
2 272 272 272 272 272 272
3 68 68 68 68 68 68
4 54 54 54 54 54 54
5 58 58 58 58 58 58
6 65 65 65 65 65 65
7 60 60 60 60 60 60
8 69 69 69 69 69 69
9 50 50 50 50 50 50
10 51 51 51 51 51 51
11 41 41 41 41 41 41
12 66 66 66 66 66 66
13 225 225 225 225 225 225
14 249 249 249 249 249 249
15 126 126 126 126 126 126
16 90 90 90 90 90 90
17 81 81 81 81 81 81
18 79 79 79 79 79 79
19 87 87 87 87 87 87
20 86 86 86 86 86 86
21 10 10 10 10 10 10
13
[71]: plt.figure(figsize=(18,8))
plt.plot('Daily_basis', 'Customer Complaint',data=daily_basis,color='green',␣
,→linestyle='solid', linewidth = 3,
[72]: plt.figure(figsize=(18,8))
sns.countplot(x='week_day', data = comcast, order = comcast['week_day'].
,→value_counts().index)
14
Above trend shows tuesday and wednesday are not so good as more number of com-
plaints are on Tuesday and wednesday
2.3.5 Which complaint types are maximum i.e., around internet, network issues, or
across any other domains
[74]: plt.figure(figsize=(18,8))
complaint_type_freq.head(20).plot(kind="bar")
plt.show()
15
Customers Are Mainly complaining about the Comcast,Data Caps, Internet Speed,
Billing Methods.As per above graph “comcast” have maximum complaints.
2.3.6 Create a new categorical variable with value as Open and Closed. Open &
Pending is to be categorized as Open and Closed & Solved is to be categorized
as Closed.
[75]: comcast['Status_updated'] = comcast['Status'].map({'Solved':'Closed', 'Closed':
,→'Closed', 'Pending':'Open', 'Open':'Open'})
comcast['Status_updated']
[75]: 0 Closed
1 Closed
2 Closed
3 Open
4 Closed
…
2219 Closed
2220 Closed
2221 Closed
2222 Closed
2223 Open
Name: Status_updated, Length: 2224, dtype: object
[76]: comcast['Status_updated'].unique()
16
[76]: array(['Closed', 'Open'], dtype=object)
17
Utah 16 6
Vermont 2 1
Virginia 49 11
Washington 75 23
West Virginia 8 3
plt.show()
2.3.8 Use the categorized variable from Q3. Provide insights on:
Which state has the maximum complaints
18
[79]: max_complaints=comcast.groupby(['State']).size().sort_values(ascending=False).
,→to_frame().rename({0: "Complaint count"}, axis=1)[:20]
max_complaints
[80]: max_complaints.head(20).plot(kind="bar",figsize=(18,8))
plt.show()
19
[81]: comcast.groupby(['State']).size().sort_values(ascending=False).to_frame().
,→rename({0: "Complaint count"}, axis=1)[:1]
Highest_complaint.sort_values('Closed',axis = 0,ascending=False)[:1]
Highest_complaint['Unresolved_cmp_prct'] = Highest_complaint['Open']/
,→state_wise["Total"]*100
[85]: Percent=Highest_complaint.sort_values('Unresolved_cmp_prct',axis =␣
,→0,ascending=False)[:20]
Percent
20
New Mexico 11.0 4.0 73.333333 26.666667
Oregon 36.0 13.0 73.469388 26.530612
New Jersey 56.0 19.0 74.666667 25.333333
Missouri 3.0 1.0 75.000000 25.000000
[86]: Percent.head(20).plot(kind="bar",figsize=(18,8))
plt.show()
2.3.9 Provide the percentage of complaints resolved till date, which were received
through the Internet and customer care calls.
[89]: complaints_resolved['Tot']=complaints_resolved.Open+complaints_resolved.Closed
complaints_resolved
21
[89]: Status_updated Closed Open Tot
Received Via
Customer Care Call 864 255 1119
Internet 843 262 1105
complaints_resolved['resolved']
plt.pie(slices,
labels=activities,
colors=cols,
startangle=90,
shadow= True,
explode=(0,0.07),
autopct='%1.1f%%')
22
[92]: slices = [843,262]
activities = ['resolved','unresolved']
cols = ['r','b']
plt.pie(slices,
labels=activities,
colors=cols,
startangle=90,
shadow= True,
explode=(0,0.07),
autopct='%1.1f%%')
Above are the complaints resolves which received via internet and customer care calls
23