You are on page 1of 6

CUTOMER ANALYSIS PROJECT

PROJECT DESCRIPTION

IN THIS PROJECT THE GIVEN DATASET DESCRIBES ABOUT THE COMPLAINTS FILED BY THE POLICE

DEPARTMENT IN U.S.AND THE TIME TAKEN TO REPONSE TO EVERY COMPLAINTS AND CLOSE THE

FILED COMPLAINTS. BEFORE PERMING THE DETAILED ANALYSIS,REMOVING THE NULL

VALUES ,UNUSED COLUMN ARE TO BE CLEANED,TO PERFORM STATISCAL ANALYSIS, THE CLOSED

COMPLAINTS TO BE TAKEN INTO CONSIDERATION,HENCE THE COMPLAINTS WHICH ARE PENDING

TO BE REMOVED FROM THE DATASET.IN OREDER TO CHECK THE MEAN TIME TAKEN FOR THE

RESOLUTION, ONE WAY ANOVA TEST IS CONDUCTED BETWEEN THE COMPLAINTS.

APPROACH:

THE PYTHON PROGRAMMING CODE IS USED TO PERFORM THE ANALYSIS.

INITIALYY THE REQUIRED LIBRARIES ARE IMPORTED.

Dataset Name and FileName

 311_Service_Requests_from_2010_to_Present.csv7 days ago


 customer analysis. Ipynb

IMPORTING THE DATASET

cust_dat=pd.read_csv('311_Service_Requests_from_2010_to_Present.csv', header=0,

sep=',', parse_dates=['Created Date', 'Closed Date', 'Resolution Action Updated


Date'],index_col='Unique Key')

CHECKING FOR NULL VALUES

null_values=cust_dat.isna().sum()

FREQUENCY PLOT SHOWING NULL VALUES

plt.figure(figsize=(30,10))

x=cust_dat.columns

y=null_values

plt.xticks(rotation=70)
plt.bar(x,y)

plt.show

REMOVING THE RECORED UNCLOSED COMPLAINTS

cust_dat.dropna(subset=['Closed Date'],inplace=True)

cust_dat.dropna(subset=['Resolution Action Updated Date'],inplace=True)

THE COMPLAINT DURATON IS DIVIDED INTO TWO PARTS REPONSE TIME AND CLOSING TIME,

THE RESPONSE TIME IS THE TIME BETWEEN CREATED TIME AND RESOLUTION ACTION
UPDATED TIME.

THE CLOSING TIME IS THE TIME BETWEEN RESOLUTION ACTION UPDATED TIME AND THE
CLOSED TIME.

LATER PART THE AVERAGE RESPONSE TIME AND AVERAGE CLOSED TIME IS CALCULATED FOR
THE STATISTICAL ANALYSIS.

cust_dat['City'].fillna('UNKNOWN CITY',inplace=True)

THE CITY COLUMN HAS NULL VALUES WHICH ARE TO BE IMPUTED WITH ‘UNKNOWN’ VALUES
WITH THE FILLNA FUNCTION.

BAR GRAPH TO SHOW THE FREQUENCY OF THE COMPLAINTS:

cust_dat['Complaint Type'].value_counts().plot(kind='bar',figsize=(10,5))

A NEW DATAFRAME IS CREATED USING THE CROSSTAB TO CITY AS COLUMNS AND


COMPLAINTS AS ROWS

new_df=pd.crosstab(index=cust_dat['Complaint Type'],columns=cust_dat['City'])

AND TO SHOW THE DIFFERENT COMPLAINTS IN DIFFERENT COLORS WE PLOT THE CROSSTAB

new_df1=pd.crosstab(index=cust_dat['City'],columns=cust_dat['Complaint Type'])

new_df1.plot(kind='bar',figsize=(30,10),stacked=True,colormap="Paired")

plt.show()

new york and brooklyn has large number of complaints

noise-stree sidewalk and blocked driveway is major complaints in most of the cities
PIE GRAPH TO SHOW THE BOROUGH:

cust_dat['Borough'].nunique()

THE ABOVE CODE DISPLAYS THE NUMBER OF UNIQUE VALUES IN BOROUGH WHICH HELPS IN
MAKING PIE GRAPH

plt.figure(figsize=(20,10))

explode=(0.15,0.05,0,0,0)

cust_dat['Borough'].value_counts().head(5).plot(kind='pie',labels=cust_dat['Borough'],explode=ex
plode,autopct='%1.1f%%',startangle=70)

plt.axis('equal')

AVERAGE RESPONSE TIME AND AVERAGE CLOSING TIME

cust_dat['RESPONSE_TIME'] = (cust_dat['Resolution Action Updated Date'] - cust_dat['Created


Date']).dt.total_seconds()

cust_dat['CLOSING_TIME']= (cust_dat['Closed Date'] - cust_dat['Resolution Action Updated


Date']).dt.total_seconds()

AVERAGE RESPONSE TIME IS CALCULATED FROM THE MEAN OF REPONSE TIME

AVERAGE CLOSING TIME IS CALCULATED FROM THE MEAN OF CLOSING TIME OR RESOLUTION
TIME

A HISTOGRAM PLOT HAS PLOTTED TO CHECK THE FREQUENCY OF THE RESPONSE TIME

plt.figure(figsize=(10,6))

sns.histplot(cust_dat['RESPONSE_TIME'],kde=False)

plt.title('RESPONSE TIME')

plt.show()

A HISTOGRAM HAS PLOTTED TO CHECK THE FREQUENCY OF THE CLOSING TIME OR THE
RESOLUTION TIME

plt.figure(figsize=(10,6))

sns.histplot(cust_dat['CLOSING_TIME'],kde=False)

plt.title("CLOSING TIME")

plt.show()
THE SIGNIFICANT VARIABLES ASSOCIATED WITH THE RESOLUTION TIME:

COLS=cust_dat.corr().nlargest(10,'Resolution_Time')["Resolution_Time"].index

WE DRAW THE HEAT MAP TO SEE THE CORRELATION BETWEEN THE VARIABLES

plt.figure(figsize=(10,6))

sns.heatmap(cust_dat[COLS].corr(),annot=True)

plt.show()

THERE ARE SEVEN SIGNIFICANT VARIABLES ASSOCIATED WITH THE RESOLUTION TIME.

IN ORDER TO CHECK THE AVERAGE RESOLUTION TIME AND COMPARE THE MEAN OF

RESOLUTION TIME BETWEEN THE COMPLAINTS WE GROUP BY COMPLAINTS AND COMPARE.

AVG_RESOLUTION_TIME = cust_dat.groupby('Complaint Type').CLOSING_TIME.mean()

ONE-WAY-ANOVA

ONE WAY ANALYSIS OF VARIANCE IS CONDUCTED BETWEEN THE COMPLAINTS WHICH SHOW
MAJOR VARIATIONS IN THE AVG_RESOLUTION TIME

FOUR MAJOR COMPLAINTS :

DERELICT VEHICLE

AGENCY ISSUES

NOISY -STREET/SIDEWALK

POSTING ADVERTISEMENT

1.fvalue, pvalue = stats.f_oneway(PA,DV)

pvalue

2.fvalue, pvalue = stats.f_oneway(AI,NSS)

pvalue

3.fvalue, pvalue = stats.f_oneway(PA,AI)

pvalue
4.fvalue, pvalue = stats.f_oneway(PA,NSS)

pvalue

5.fvalue, pvalue = stats.f_oneway(PA,NSS)

Pvalue

ANOVA TABLE:

cust_dat['Complaint_Type']=cust_dat['Complaint Type']

TEST_RESULT = cust_dat.loc[:, ['Complaint_Type','CLOSING_TIME']] #Complaint Type

# Ordinary Least Squares (OLS) model

model = ols('CLOSING_TIME ~ Complaint_Type', data=TEST_RESULT).fit()

anova_table = sm.stats.anova_lm(model, typ=2)

anova_table

FINALLY NULL HYPOTHESIS IS ACCEPTED P>0.05

APRROX P-VALUE IS 0.988

CHISQUARE TEST

THE FOLLOWING OUTPUT IS OBTAINED ON CHI-SQUARE TEST

dof=1166
[[4.48019550e-03 1.27298646e-01 1.43773546e-02 ... 4.92006924e-02
7.10110987e-02 2.38264942e-03]
[5.70328887e+00 1.62051176e+02 1.83023725e+01 ... 6.26324814e+01
9.03971286e+01 3.03311272e+00]
[7.46699250e-04 2.12164410e-02 2.39622577e-03 ... 8.20011540e-03
1.18351831e-02 3.97108237e-04]
...
[3.30265078e+00 9.38403184e+01 1.05985066e+01 ... 3.62691104e+01
5.23470149e+01 1.75640973e+00]
[4.38312460e-01 1.24540508e+01 1.40658453e+00 ... 4.81346774e+00
6.94725249e+00 2.33102535e-01]
[2.80534908e+00 7.97101687e+01 9.00262024e+00 ... 3.08078336e+01
4.44647829e+01 1.49193565e+00]]
probability=0.950, critical=1246.552, stat=122561.494
Dependent (reject H0)
significance=0.050, p=0.000
Dependent (reject H0)
RESULT

HENCE P IS LOW, REJECT NULL HYPOTHESIS.

You might also like