You are on page 1of 11

Data Science with R

Project 2: Comcast Telecom Consumer Complaints

Name Surname: Alhamouieh Dima


Business Scenario:

Comcast is an American global telecommunication company. The firm has been providing
terrible customer service. They continue to fall short despite repeated promises to improve.
Only last month (October 2016) the authority fined them a $2.3 million, after receiving over
1000 consumer complaints.
The existing database will serve as a repository of public customer complaints filed against
Comcast.
It will help to pin down what is wrong with Comcast's customer service.
Data Dictionary
• Ticket #: Ticket number assigned to each complaint
• Customer Complaint : Description of complaint
• Date : Date of complaint
• Time : Time of complaint
• Received Via: Mode of communication of the complaint
• City : Customer city
• State : Customer state
• Zipcode : Customer zip
• Status : Status of complaint
• Filing on behalf of someone

Analysis Task

- Import data into R environment.


- Provide the trend chart for the number of complaints at monthly and daily granularity
levels.
- Provide a table with the frequency of complaint types.
- Which complaint types are maximum i.e., around internet, network issues, or across
any other domains?
- Create a new categorical variable with value as Open and Closed. Open & Pending is
to be categorized as Open and Closed & Solved is to be categorized as Closed.
- Provide state wise status of complaints in a stacked bar chart. Use the categorized
variable from Q3.

- Provide insights on :
o Which state has the maximum complaints
o Which state has the highest percentage of unresolved complaints
- Provide the percentage of complaints resolved till date, which were received through
the Internet and customer care calls.
Code

# Comcast Telecom Consumer Complaints Analysis

# Install Packages

install.packages("stringi")
install.packages("lubridate")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("ggpubr")
library("lubridate")
library("stringi")
library("dplyr")
library("ggplot2")
library("ggpubr")

# Selecting work directory

setwd("C:/Users/Dima/Desktop")
getwd()

# Importing Dataset

comcast <- read.csv("Copy of Comcast Telecom Complaints data.csv",head=TRUE,sep =";")


View(comcast)
str(comcast)

# Let us check if there is any missing data

comcastna <- is.na("comcast")


length(comcastna[comcastna==T])
# As per the results there is no missing values in the dataset

comcast$Date <- dmy(comcast$Date)

# Let us extract the monthly and daily count tickets

monthly_tickets <-
summarise(group_by(comcast,month=as.integer(month(Date))),count=n())

# Let us remove the NA values

monthly_tickets <- na.omit(monthly_tickets)


daily_tickets <- summarise(group_by(comcast,Date),count=n())
daily_tickets <- na.omit(daily_tickets)
monthly_tickets <- arrange(monthly_tickets,month)

# Let us plot the monthly and daily complaints to perform a comparison

library(ggplot2)

ggplot(data = monthly_tickets,aes(month,count,label = count))+


geom_line()+
geom_point(size = 0.5)+
geom_text()+
scale_x_continuous(breaks = monthly_tickets$month)+
labs(title = "Monthly Ticket Count",x= "Months",y ="No. of Tickets")+
theme(plot.title = element_text(hjust = 0.5))

ggplot(data = daily_tickets,aes(as.POSIXct(Date),count))+
geom_line()+
geom_point(size = 1)+
scale_x_datetime(breaks = "1 weeks",date_labels = "%d/%m")+
labs(title = "Daily Ticket Count",x= "Days",y ="No. of Tickets")+
theme(axis.text.x = element_text(angle = 75),
plot.title = element_text(hjust = 0.5))

# Complaint Type Processing

network_issues <- contains(comcast$Customer.Complaint,match="network",ignore.case = T)


internet_issues <- contains(comcast$Customer.Complaint,match = "internet",ignore.case = T)
billing_issues <- contains(comcast$Customer.Complaint,match = "billing",ignore.case = T)
charges_issues <- contains(comcast$Customer.Complaint,match = "charge", ignore.case = T)
email_issues <- contains(comcast$Customer.Complaint,match = "email", ignore.case = T)

comcast$ComplaintType[internet_issues]<- "Internet"
comcast$ComplaintType[network_issues] <- "Network"
comcast$ComplaintType[billing_issues] <- "billing"
comcast$ComplaintType[charges_issues] <- "Charges"
comcast$ComplaintType[email_issues] <- "Email"

comcast$ComplaintType[-
c(internet_issues,network_issues,billing_issues,charges_issues,email_issues)] <- "Others"

table(comcast$ComplaintType)

# Let us create a new categorical variable with value as Open and Closed

open_complaints <- (comcast$Status == "Open"| comcast$Status=="Pending")


closed_complaints <- (comcast$Status=="Closed"|comcast$Status=="Solved")
comcast$ComplaintStatus[open_complaints] <- "Open"
comcast$ComplaintStatus[closed_complaints] <- "Closed"

na_vector <- is.na(comcast)


length(na_vector[na_vector==T])
comcast <- subset(comcast,!is.na(comcast$ComplaintStatus))
# Let us plot in a barchart the State and Status of Tickets

comcast <- group_by(comcast,State,ComplaintStatus)


chart_data <- summarise(comcast,Count = n())
ggplot(as.data.frame(chart_data), mapping = aes(State,Count))+
geom_col(aes(fill = ComplaintStatus),width = 0.95)+
theme(axis.text.x = element_text(angle = 90),
axis.title.y = element_text(size = 15),
axis.title.x = element_text(size = 15),
title = element_text(size = 16,colour = "#0073C2FF"),
plot.title = element_text(hjust = 0.5))+
labs(title = "Ticket Status",
x = "States",y = "No of Tickets",
fill= "Status")

# Provide the percentage of complaints resolved till date, which were received
through the Internet and customer care calls

resolved <- group_by(comcast, ComplaintStatus)


total_resolved <- summarise(resolved,percentage=(n()/nrow(resolved)))
resolved <- group_by(comcast,Received.Via,ComplaintStatus)

Category_resloved<- summarise(resolved ,percentage =(n()/nrow(resolved)))

# Let us plot this in a pie chart

par(mfrow = c(1,2))
total<-ggplot(total_resolved,
aes(x= "",y =percentage,fill = ComplaintStatus))+
geom_bar(stat = "identity",width = 1)+
coord_polar("y",start = 0)+
geom_text(aes(label = paste0(round(percentage*100),"%")),
position = position_stack(vjust = 0.5))+
labs(x = NULL,y = NULL,fill = NULL)+
theme_classic()+theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())

# Pie Chart for Category wise Ticket Status

category<-ggplot(Category_resloved,
aes(x= "",y =percentage,fill = ComplaintStatus))+
geom_bar(stat = "identity",width = 1)+
coord_polar("y",start = 0)+
geom_text(aes(label = paste0(Received.Via,"-",round(percentage*100),"%")),
position = position_stack(vjust = 0.5))+
labs(x = NULL,y = NULL,fill = NULL)+
theme_classic()+theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())
ggarrange(total,category,nrow = 1, ncol = 2)
Analysis with screenshots

Data was loaded into the R environment and no missing data was identified as per the
screenshot below:

Provide the trend chart for the number of complaints at monthly and daily granularity
levels.

As showcased in the table below the daily and monthly tickets were extracted and two graphs
were plotted below to compare the Monthly and daily tickets

As we can interpret in the first graph below, the number of tickets starts to increase in April
and May. However, what we can further see is that the number of tickets has raised drastically
in the month of June. We can as such assume is that there a significant reason behind such
turning-point.

In the second graph what we can interpret is that the number of tickets starts to drastically
increase during the second half of the month of June.
Let us dive into more details to check what is the most category of complaints that the company
is receiving.
Provide a table with the frequency of complaint types.

As shown in the table below, most of the complaints are related to Internet issues. A lot of
other categories of complaints were grouped under the “Others” category

Create a stacked bar chart for complaints based on city and status

As we can observe in the chart the states where the number of tickets is the highest are in
Georgia and Florida.
As depicted in the pie charts below, we can conclude that the resolved complaints are 77% in
which 38% are received from the Internet and 39% from the customer care calls. Also, we can
notice that there is 23% of complaints that are still unresolved and in which 12% are received
from the Internet and 11% from the customer care calls.
Conclusion:

As per the above analysis we observe that in the 2nd half of the June month Comcast received
high amount of complaints in which most of the complaints are related to internet service issu
e and the highest amount of complaints are received from the state Georgia. The highest unres
olved complaints are related from the state Georgia and the total amount of resolved complain
ts are 77% in which 38% are received the internet and 39% are from the customer care calls.

You might also like