You are on page 1of 46

import libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sn
from pandas.plotting import scatter_matrix
from matplotlib import pyplot as plt
from sklearn.svm import SVC

Data Science Course Project

Tasks to Perform
1. Understand the dataset:
1.1 Import the dataset

1.2 Visualize the dataset

1.3 Print the columns of the DataFrame

1.4 Identify the shape of the dataset

1.5 Identify the variables with null values

1.1 Import the dataset


In [6]:
CS_Dataset=pd.read_csv("311_Service_Requests_from_2010_to_Present.csv", encoding='latin2')

C:\Users\hp\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3165: DtypeWarning: Columns (48,49) have mixed types.Specify dtype opti


on on import or set low_memory=False.
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,

1.2 Visualize the dataset


In [7]:
# display All record's
CS_Dataset

Out[7]: Bridge Bridge Bridge


Unique Created Closed Agency Complaint Incident Incident Road
Agency Descriptor Location Type ... Highway Highway Highway
Key Date Date Name Type Zip Address Ramp
Name Direction Segment

12/31/2015 01/01/2016 New York 71


Noise - Loud
0 32310363 11:59:45 12:55:15 NYPD City Police Street/Sidewalk 10034.0 VERMILYEA ... NaN NaN NaN NaN
Street/Sidewalk Music/Party
PM AM Department AVENUE

12/31/2015 01/01/2016 New York


Blocked 27-07 23
1 32309934 11:59:44 01:26:57 NYPD City Police No Access Street/Sidewalk 11105.0 ... NaN NaN NaN NaN
Driveway AVENUE
PM AM Department

12/31/2015 01/01/2016 New York 2897


Blocked
2 32309159 11:59:29 04:51:03 NYPD City Police No Access Street/Sidewalk 10458.0 VALENTINE ... NaN NaN NaN NaN
Driveway
PM AM Department AVENUE

12/31/2015 01/01/2016 New York Commercial 2940


3 32305098 11:57:46 07:43:13 NYPD City Police Illegal Parking Overnight Street/Sidewalk 10461.0 BAISLEY ... NaN NaN NaN NaN
PM AM Department Parking AVENUE

12/31/2015 01/01/2016 New York


Blocked 87-14 57
4 32306529 11:56:58 03:24:42 NYPD City Police Illegal Parking Street/Sidewalk 11373.0 ... NaN NaN NaN NaN
Sidewalk ROAD
PM AM Department

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

01/01/2015 01/01/2015 New York


Blocked 84-25 85
364553 29609918 12:04:44 10:22:31 NYPD City Police Illegal Parking Street/Sidewalk 11421.0 ... NaN NaN NaN NaN
Hydrant ROAD
AM AM Department

01/01/2015 01/01/2015 New York 2555


Car/Truck
364554 29608392 12:04:28 02:25:02 NYPD City Police Noise - Vehicle Street/Sidewalk 10468.0 SEDGWICK ... NaN NaN NaN NaN
Horn
AM AM Department AVENUE

01/01/2015 01/01/2015 New York 508 WEST


Noise - Loud
364555 29607589 12:01:30 12:20:33 NYPD City Police Street/Sidewalk 10031.0 139 ... NaN NaN NaN NaN
Street/Sidewalk Music/Party
AM AM Department STREET

01/01/2015 01/01/2015 New York 931 EAST


Blocked
364556 29610889 12:01:29 02:42:22 NYPD City Police No Access Street/Sidewalk 10466.0 226 ... NaN NaN NaN NaN
Driveway
AM AM Department STREET

01/01/2015 01/01/2015 New York


Blocked 123-19 135
364557 29611816 12:00:50 02:47:50 NYPD City Police No Access Street/Sidewalk 11420.0 ... NaN NaN NaN NaN
Driveway STREET
AM AM Department

364558 rows × 53 columns


In [8]:
# display the first 10 record's
CS_Dataset.head(10)

Out[8]: Bridge Bridge Bridge Garage


Unique Created Closed Agency Complaint Incident Incident Road
Agency Descriptor Location Type ... Highway Highway Highway
Key Date Date Name Type Zip Address Ramp
Name Direction Segment Name

12/31/2015 01/01/2016 New York 71


Noise - Loud
0 32310363 11:59:45 12:55:15 NYPD City Police Street/Sidewalk 10034.0 VERMILYEA ... NaN NaN NaN NaN NaN
Street/Sidewalk Music/Party
PM AM Department AVENUE

12/31/2015 01/01/2016 New York


Blocked 27-07 23
1 32309934 11:59:44 01:26:57 NYPD City Police No Access Street/Sidewalk 11105.0 ... NaN NaN NaN NaN NaN
Driveway AVENUE
PM AM Department

12/31/2015 01/01/2016 New York 2897


Blocked
2 32309159 11:59:29 04:51:03 NYPD City Police No Access Street/Sidewalk 10458.0 VALENTINE ... NaN NaN NaN NaN NaN
Driveway
PM AM Department AVENUE

12/31/2015 01/01/2016 New York Commercial 2940


3 32305098 11:57:46 07:43:13 NYPD City Police Illegal Parking Overnight Street/Sidewalk 10461.0 BAISLEY ... NaN NaN NaN NaN NaN
PM AM Department Parking AVENUE

12/31/2015 01/01/2016 New York


Blocked 87-14 57
4 32306529 11:56:58 03:24:42 NYPD City Police Illegal Parking Street/Sidewalk 11373.0 ... NaN NaN NaN NaN NaN
Sidewalk ROAD
PM AM Department

Posted
12/31/2015 01/01/2016 New York
Parking 260 21
5 32306554 11:56:30 01:50:11 NYPD City Police Illegal Parking Street/Sidewalk 11215.0 ... NaN NaN NaN NaN NaN
Sign STREET
PM AM Department
Violation

12/31/2015 01/01/2016 New York


Blocked 524 WEST
6 32306559 11:55:32 01:53:54 NYPD City Police Illegal Parking Street/Sidewalk 10032.0 ... NaN NaN NaN NaN NaN
Hydrant 169 STREET
PM AM Department

12/31/2015 01/01/2016 New York


Blocked 501 EAST
7 32307009 11:54:05 01:42:54 NYPD City Police No Access Street/Sidewalk 10457.0 ... NaN NaN NaN NaN NaN
Driveway 171 STREET
PM AM Department

Posted
12/31/2015 01/01/2016 New York 83-44
Parking
8 32308581 11:53:58 08:27:32 NYPD City Police Illegal Parking Street/Sidewalk 11415.0 LEFFERTS ... NaN NaN NaN NaN NaN
Sign
PM AM Department BOULEVARD
Violation

12/31/2015 01/01/2016 New York


Blocked 1408 66
9 32308391 11:53:58 01:17:40 NYPD City Police No Access Street/Sidewalk 11219.0 ... NaN NaN NaN NaN NaN
Driveway STREET
PM AM Department

10 rows × 53 columns

In [9]:
print("The Customer Request service dataset Information ")
print("==========================================================")
print(CS_Dataset.info(10))
print(CS_Dataset.describe())

The Customer Request service dataset Information


==========================================================
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 364558 entries, 0 to 364557
Data columns (total 53 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unique Key 364558 non-null int64
1 Created Date 364558 non-null object
2 Closed Date 362177 non-null object
3 Agency 364558 non-null object
4 Agency Name 364558 non-null object
5 Complaint Type 364558 non-null object
6 Descriptor 358057 non-null object
7 Location Type 364425 non-null object
8 Incident Zip 361560 non-null float64
9 Incident Address 312859 non-null object
10 Street Name 312859 non-null object
11 Cross Street 1 307370 non-null object
12 Cross Street 2 306753 non-null object
13 Intersection Street 1 51120 non-null object
14 Intersection Street 2 50512 non-null object
15 Address Type 361306 non-null object
16 City 361561 non-null object
17 Landmark 375 non-null object
18 Facility Type 362169 non-null object
19 Status 364558 non-null object
20 Due Date 364555 non-null object
21 Resolution Description 364558 non-null object
22 Resolution Action Updated Date 362156 non-null object
23 Community Board 364558 non-null object
24 Borough 364558 non-null object
25 X Coordinate (State Plane) 360528 non-null float64
26 Y Coordinate (State Plane) 360528 non-null float64
27 Park Facility Name 364558 non-null object
28 Park Borough 364558 non-null object
29 School Name 364558 non-null object
30 School Number 364558 non-null object
31 School Region 364557 non-null object
32 School Code 364557 non-null object
33 School Phone Number 364558 non-null object
34 School Address 364558 non-null object
35 School City 364558 non-null object
36 School State 364558 non-null object
37 School Zip 364557 non-null object
38 School Not Found 364558 non-null object
39 School or Citywide Complaint 0 non-null float64
40 Vehicle Type 0 non-null float64
41 Taxi Company Borough 0 non-null float64
42 Taxi Pick Up Location 0 non-null float64
43 Bridge Highway Name 297 non-null object
44 Bridge Highway Direction 297 non-null object
45 Road Ramp 262 non-null object
46 Bridge Highway Segment 262 non-null object
47 Garage Lot Name 0 non-null float64
48 Ferry Direction 1 non-null object
49 Ferry Terminal Name 2 non-null object
50 Latitude 360528 non-null float64
51 Longitude 360528 non-null float64
52 Location 360528 non-null object
dtypes: float64(10), int64(1), object(42)
memory usage: 147.4+ MB
None
Unique Key Incident Zip X Coordinate (State Plane) \
count 3.645580e+05 361560.000000 3.605280e+05
mean 3.106595e+07 10858.496659 1.005043e+06
std 7.331531e+05 578.263114 2.196362e+04
min 2.960737e+07 83.000000 9.133570e+05
25% 3.049938e+07 10314.000000 9.919460e+05
50% 3.108795e+07 11209.000000 1.003470e+06
75% 3.167433e+07 11238.000000 1.019134e+06
max 3.231065e+07 11697.000000 1.067186e+06

Y Coordinate (State Plane) School or Citywide Complaint Vehicle Type \


count 360528.000000 0.0 0.0
mean 203425.305782 NaN NaN
std 29842.192857 NaN NaN
min 121185.000000 NaN NaN
25% 182945.000000 NaN NaN
50% 201023.000000 NaN NaN
75% 222790.000000 NaN NaN
max 271876.000000 NaN NaN

Taxi Company Borough Taxi Pick Up Location Garage Lot Name \


count 0.0 0.0 0.0
mean NaN NaN NaN
std NaN NaN NaN
min NaN NaN NaN
25% NaN NaN NaN
50% NaN NaN NaN
75% NaN NaN NaN
max NaN NaN NaN

Latitude Longitude
count 360528.000000 360528.000000
mean 40.724980 -73.924946
std 0.081907 0.079213
min 40.499040 -74.254937
25% 40.668742 -73.972253
50% 40.718406 -73.930643
75% 40.778166 -73.874098
max 40.912869 -73.700715

In [10]:
CS_Dataset.tail(10)

Out[10]: Bridge Bridge Bridge


Unique Created Closed Agency Complaint Incident Road
Agency Descriptor Location Type Incident Address ... Highway Highway Highway
Key Date Date Name Type Zip Ramp
Name Direction Segment

01/01/2015 01/01/2015 New York


Blocked 800 EAST 219
364548 29613386 12:08:34 02:42:23 NYPD City Police No Access Street/Sidewalk 10467.0 ... NaN NaN NaN
Driveway STREET
AM AM Department

01/01/2015 01/01/2015 New York


Blocked
364549 29610965 12:08:02 01:17:43 NYPD City Police No Access Street/Sidewalk 11368.0 NaN ... NaN NaN NaN
Driveway
AM AM Department

01/01/2015 01/01/2015 New York 616


Blocked
364550 29610950 12:06:43 06:05:18 NYPD City Police No Access Street/Sidewalk 10473.0 COMMONWEALTH ... NaN NaN NaN
Driveway
AM AM Department AVENUE

01/01/2015 01/01/2015 New York


Noise - Loud
364551 29607567 12:06:02 12:43:41 NYPD City Police Street/Sidewalk 10453.0 NaN ... NaN NaN NaN
Street/Sidewalk Music/Party
AM AM Department

01/01/2015 01/01/2015 New York


Noise - Loud
364552 29610051 12:05:05 01:22:10 NYPD City Police Street/Sidewalk 10002.0 NaN ... NaN NaN NaN
Street/Sidewalk Music/Party
AM AM Department

01/01/2015 01/01/2015 New York


Blocked
364553 29609918 12:04:44 10:22:31 NYPD City Police Illegal Parking Street/Sidewalk 11421.0 84-25 85 ROAD ... NaN NaN NaN
Hydrant
AM AM Department

01/01/2015 01/01/2015 New York


Car/Truck 2555 SEDGWICK
364554 29608392 12:04:28 02:25:02 NYPD City Police Noise - Vehicle Street/Sidewalk 10468.0 ... NaN NaN NaN
Horn AVENUE
AM AM Department

01/01/2015 01/01/2015 New York


Noise - Loud 508 WEST 139
364555 29607589 12:01:30 12:20:33 NYPD City Police Street/Sidewalk 10031.0 ... NaN NaN NaN
Street/Sidewalk Music/Party STREET
AM AM Department

01/01/2015 01/01/2015 New York


Blocked 931 EAST 226
364556 29610889 12:01:29 02:42:22 NYPD City Police No Access Street/Sidewalk 10466.0 ... NaN NaN NaN
Driveway STREET
AM AM Department

01/01/2015 01/01/2015 New York


Blocked 123-19 135
364557 29611816 12:00:50 02:47:50 NYPD City Police No Access Street/Sidewalk 11420.0 ... NaN NaN NaN
Driveway STREET
AM AM Department

10 rows × 53 columns

1.3 Print the columns of the DataFrame


In [11]:
CS_Dataset.columns

Out[11]: Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',
'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
'Resolution Description', 'Resolution Action Updated Date',
'Community Board', 'Borough', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough',
'School Name', 'School Number', 'School Region', 'School Code',
'School Phone Number', 'School Address', 'School City', 'School State',
'School Zip', 'School Not Found', 'School or Citywide Complaint',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location',
'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp',
'Bridge Highway Segment', 'Garage Lot Name', 'Ferry Direction',
'Ferry Terminal Name', 'Latitude', 'Longitude', 'Location'],
dtype='object')

1.4 Identify the shape of the dataset


In [12]:
CS_Dataset.shape

Out[12]: (364558, 53)

1.5 Identify the variables with null values


In [13]:
# Finding the total of null values
CS_Dataset.isna().sum()# Or you can also use

Out[13]: Unique Key 0


Created Date 0
Closed Date 2381
Agency 0
Agency Name 0
Complaint Type 0
Descriptor 6501
Location Type 133
Incident Zip 2998
Incident Address 51699
Street Name 51699
Cross Street 1 57188
Cross Street 2 57805
Intersection Street 1 313438
Intersection Street 2 314046
Address Type 3252
City 2997
Landmark 364183
Facility Type 2389
Status 0
Due Date 3
Resolution Description 0
Resolution Action Updated Date 2402
Community Board 0
Borough 0
X Coordinate (State Plane) 4030
Y Coordinate (State Plane) 4030
Park Facility Name 0
Park Borough 0
School Name 0
School Number 0
School Region 1
School Code 1
School Phone Number 0
School Address 0
School City 0
School State 0
School Zip 1
School Not Found 0
School or Citywide Complaint 364558
Vehicle Type 364558
Taxi Company Borough 364558
Taxi Pick Up Location 364558
Bridge Highway Name 364261
Bridge Highway Direction 364261
Road Ramp 364296
Bridge Highway Segment 364296
Garage Lot Name 364558
Ferry Direction 364557
Ferry Terminal Name 364556
Latitude 4030
Longitude 4030
Location 4030
dtype: int64

2. Perform basic data exploratory analysis:


2.1 Draw a frequency plot to show the number of null values in

each column of the DataFrame

2.2 Missing value treatment


2.2.1 Remove the records whose Closed Date values are null
2.1 Draw a frequency plot to show the number of null values in each column of the DataFrame
In [15]:
CS_Dataset.isnull().sum().plot.bar()
plt.show()

2.2 Missing value treatment


2.2.1 Remove the records whose Closed Date values are null

In [23]:
# Drop rows that has null on selected columns
CS_Dataset=CS_Dataset.dropna(subset=['Closed Date'])

In [18]:
CS_Dataset.to_csv('Customer _Service_Requests.csv',index=False)

In [18]:
CS_Dataset=pd.read_csv("Customer _Service_Requests.csv",encoding='latin2')

C:\Users\hp\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3165: DtypeWarning: Columns (17,43,44,45,46) have mixed types.Specify d


type option on import or set low_memory=False.
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,

In [24]:
CS_Dataset.shape

Out[24]: (362177, 55)

In [26]:
CS_Dataset.columns

Out[26]: Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',
'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
'Resolution Description', 'Resolution Action Updated Date',
'Community Board', 'Borough', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough',
'School Name', 'School Number', 'School Region', 'School Code',
'School Phone Number', 'School Address', 'School City', 'School State',
'School Zip', 'School Not Found', 'School or Citywide Complaint',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location',
'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp',
'Bridge Highway Segment', 'Garage Lot Name', 'Ferry Direction',
'Ferry Terminal Name', 'Latitude', 'Longitude', 'Location',
'Time Elapsed1', 'Time Elapsed2'],
dtype='object')

2.3 Analyze the date column,and remove entries that have an incorrect timeline
2.3.1 Calculate the time elapsed in closed and creation date

2.3.2 Convert the calculated date to seconds to get a better representation

2.3.3 View the descriptive statistics for the newly created column

2.3.4 Check the number of null values in the Complaint_Typeand Citycolumns

2.3.5 Impute the NA value with Unknown City

2.3.6 Draw a frequency plot for the complaints in each city

2.3.7 Create a scatter and hexbin plot of the concentration of complaints across Brooklyn
2.3.1 Calculate the time elapsed in closed and creation date
In [27]:
CS_Dataset

Out[27]: Bridge Garage


Unique Created Closed Agency Complaint Incident Incident Road Ferry
Agency Descriptor Location Type ... Highway Lot
Key Date Date Name Type Zip Address Ramp Direction
Segment Name

New York 71
2015-12-31 2016-01-01 Noise - Loud
0 32310363 NYPD City Police Street/Sidewalk 10034.0 VERMILYEA ... NaN NaN NaN NaN
23:59:45 00:55:15 Street/Sidewalk Music/Party
Department AVENUE

New York
2015-12-31 2016-01-01 Blocked 27-07 23
1 32309934 NYPD City Police No Access Street/Sidewalk 11105.0 ... NaN NaN NaN NaN
23:59:44 01:26:57 Driveway AVENUE
Department

New York 2897


2015-12-31 2016-01-01 Blocked
2 32309159 NYPD City Police No Access Street/Sidewalk 10458.0 VALENTINE ... NaN NaN NaN NaN
23:59:29 04:51:03 Driveway
Department AVENUE

New York Commercial 2940


2015-12-31 2016-01-01
3 32305098 NYPD City Police Illegal Parking Overnight Street/Sidewalk 10461.0 BAISLEY ... NaN NaN NaN NaN
23:57:46 07:43:13
Department Parking AVENUE

New York
2015-12-31 2016-01-01 Blocked 87-14 57
4 32306529 NYPD City Police Illegal Parking Street/Sidewalk 11373.0 ... NaN NaN NaN NaN
23:56:58 03:24:42 Sidewalk ROAD
Department

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

New York
2015-01-01 2015-01-01 Blocked 84-25 85
362172 29609918 NYPD City Police Illegal Parking Street/Sidewalk 11421.0 ... NaN NaN NaN NaN
00:04:44 10:22:31 Hydrant ROAD
Department

New York 2555


2015-01-01 2015-01-01 Car/Truck
362173 29608392 NYPD City Police Noise - Vehicle Street/Sidewalk 10468.0 SEDGWICK ... NaN NaN NaN NaN
00:04:28 02:25:02 Horn
Department AVENUE

New York 508 WEST


2015-01-01 2015-01-01 Noise - Loud
362174 29607589 NYPD City Police Street/Sidewalk 10031.0 139 ... NaN NaN NaN NaN
00:01:30 00:20:33 Street/Sidewalk Music/Party
Department STREET

New York 931 EAST


2015-01-01 2015-01-01 Blocked
362175 29610889 NYPD City Police No Access Street/Sidewalk 10466.0 226 ... NaN NaN NaN NaN
00:01:29 02:42:22 Driveway
Department STREET

New York
2015-01-01 2015-01-01 Blocked 123-19 135
362176 29611816 NYPD City Police No Access Street/Sidewalk 11420.0 ... NaN NaN NaN NaN
00:00:50 02:47:50 Driveway STREET
Department

362177 rows × 55 columns

In [28]:
#convert Created Date and Closed Date to datetime
CS_Dataset[['Created Date','Closed Date']] = CS_Dataset[['Created Date','Closed Date']].apply(pd.to_datetime)

#calculate dates difference between Closed Date and Created Date


CS_Dataset['Time Elapsed1'] = (CS_Dataset['Closed Date'] - CS_Dataset['Created Date']) / np.timedelta64(1, 'D')

#view updated DataFrame


CS_Dataset['Time Elapsed1']

Out[28]: 0 0.038542
1 0.060567
2 0.202477
3 0.323229
4 0.144259
...
362172 0.429016
362173 0.097616
362174 0.013229
362175 0.111725
362176 0.115972
Name: Time Elapsed1, Length: 362177, dtype: float64

2.3.2 Convert the calculated date to seconds to get a better representation


Solution 1 about question 2.3.2
In [29]:
CS_Dataset['Time Elapsed2'] = (CS_Dataset['Closed Date']- CS_Dataset['Created Date']).dt.total_seconds()
CS_Dataset['Time Elapsed2'].head(20)

Out[29]: 0 3330.0
1 5233.0
2 17494.0
3 27927.0
4 12464.0
5 6821.0
6 7102.0
7 6529.0
8 30814.0
9 5022.0
10 28120.0
11 40031.0
12 8996.0
13 30649.0
14 37785.0
15 56007.0
16 17559.0
17 3078.0
18 10589.0
19 2856.0
Name: Time Elapsed2, dtype: float64
Solution 2 about question 2.3.2
In [30]:
CS_Dataset['Time Elapsed2'] =CS_Dataset['Time Elapsed1']*24*60*60
CS_Dataset['Time Elapsed2'].head(20)

Out[30]: 0 3330.0
1 5233.0
2 17494.0
3 27927.0
4 12464.0
5 6821.0
6 7102.0
7 6529.0
8 30814.0
9 5022.0
10 28120.0
11 40031.0
12 8996.0
13 30649.0
14 37785.0
15 56007.0
16 17559.0
17 3078.0
18 10589.0
19 2856.0
Name: Time Elapsed2, dtype: float64

2.3.3 View the descriptive statistics for the newly created column
In [31]:
CS_Dataset['Time Elapsed2'].describe()

Out[31]: count 3.621770e+05


mean 1.511330e+04
std 2.110255e+04
min 6.100000e+01
25% 4.533000e+03
50% 9.616000e+03
75% 1.887800e+04
max 2.134342e+06
Name: Time Elapsed2, dtype: float64

2.3.4 Check the number of null values in the Complaint_Type and City columns
In [32]:
CS_Dataset.isnull().sum()

Out[32]: Unique Key 0


Created Date 0
Closed Date 0
Agency 0
Agency Name 0
Complaint Type 0
Descriptor 6496
Location Type 130
Incident Zip 675
Incident Address 51686
Street Name 51686
Cross Street 1 55331
Cross Street 2 55464
Intersection Street 1 311549
Intersection Street 2 311673
Address Type 929
City 674
Landmark 361802
Facility Type 18
Status 0
Due Date 1
Resolution Description 0
Resolution Action Updated Date 39
Community Board 0
Borough 0
X Coordinate (State Plane) 1707
Y Coordinate (State Plane) 1707
Park Facility Name 0
Park Borough 0
School Name 0
School Number 0
School Region 1
School Code 1
School Phone Number 0
School Address 0
School City 0
School State 0
School Zip 1
School Not Found 0
School or Citywide Complaint 362177
Vehicle Type 362177
Taxi Company Borough 362177
Taxi Pick Up Location 362177
Bridge Highway Name 361880
Bridge Highway Direction 361880
Road Ramp 361915
Bridge Highway Segment 361915
Garage Lot Name 362177
Ferry Direction 362177
Ferry Terminal Name 362177
Latitude 1707
Longitude 1707
Location 1707
Time Elapsed1 0
Time Elapsed2 0
dtype: int64
Counted NaN value of Complaint Type and City columns
In [33]:
CS_Dataset[['Complaint Type','City']].isnull().sum()

Out[33]: Complaint Type 0


City 674
dtype: int64

2.3.5 Impute the NA value with Unknown City


In [34]:
CS_Dataset['City'].fillna('Unknown City')
CS_Dataset.head()

Out[34]: Bridge Garage Ferry


Unique Created Closed Agency Complaint Incident Incident Road Ferry
Agency Descriptor Location Type ... Highway Lot Terminal
Key Date Date Name Type Zip Address Ramp Direction
Segment Name Name

New York 71
2015-12-31 2016-01-01 Noise - Loud
0 32310363 NYPD City Police Street/Sidewalk 10034.0 VERMILYEA ... NaN NaN NaN NaN NaN
23:59:45 00:55:15 Street/Sidewalk Music/Party
Department AVENUE

New York
2015-12-31 2016-01-01 Blocked 27-07 23
1 32309934 NYPD City Police No Access Street/Sidewalk 11105.0 ... NaN NaN NaN NaN NaN
23:59:44 01:26:57 Driveway AVENUE
Department

New York 2897


2015-12-31 2016-01-01 Blocked
2 32309159 NYPD City Police No Access Street/Sidewalk 10458.0 VALENTINE ... NaN NaN NaN NaN NaN
23:59:29 04:51:03 Driveway
Department AVENUE

New York Commercial 2940


2015-12-31 2016-01-01
3 32305098 NYPD City Police Illegal Parking Overnight Street/Sidewalk 10461.0 BAISLEY ... NaN NaN NaN NaN NaN
23:57:46 07:43:13
Department Parking AVENUE

New York
2015-12-31 2016-01-01 Blocked 87-14 57
4 32306529 NYPD City Police Illegal Parking Street/Sidewalk 11373.0 ... NaN NaN NaN NaN NaN
23:56:58 03:24:42 Sidewalk ROAD
Department

5 rows × 55 columns

In [35]:
CS_Dataset.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 362177 entries, 0 to 362176
Data columns (total 55 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unique Key 362177 non-null int64
1 Created Date 362177 non-null datetime64[ns]
2 Closed Date 362177 non-null datetime64[ns]
3 Agency 362177 non-null object
4 Agency Name 362177 non-null object
5 Complaint Type 362177 non-null object
6 Descriptor 355681 non-null object
7 Location Type 362047 non-null object
8 Incident Zip 361502 non-null float64
9 Incident Address 310491 non-null object
10 Street Name 310491 non-null object
11 Cross Street 1 306846 non-null object
12 Cross Street 2 306713 non-null object
13 Intersection Street 1 50628 non-null object
14 Intersection Street 2 50504 non-null object
15 Address Type 361248 non-null object
16 City 361503 non-null object
17 Landmark 375 non-null object
18 Facility Type 362159 non-null object
19 Status 362177 non-null object
20 Due Date 362176 non-null object
21 Resolution Description 362177 non-null object
22 Resolution Action Updated Date 362138 non-null object
23 Community Board 362177 non-null object
24 Borough 362177 non-null object
25 X Coordinate (State Plane) 360470 non-null float64
26 Y Coordinate (State Plane) 360470 non-null float64
27 Park Facility Name 362177 non-null object
28 Park Borough 362177 non-null object
29 School Name 362177 non-null object
30 School Number 362177 non-null object
31 School Region 362176 non-null object
32 School Code 362176 non-null object
33 School Phone Number 362177 non-null object
34 School Address 362177 non-null object
35 School City 362177 non-null object
36 School State 362177 non-null object
37 School Zip 362176 non-null object
38 School Not Found 362177 non-null object
39 School or Citywide Complaint 0 non-null float64
40 Vehicle Type 0 non-null float64
41 Taxi Company Borough 0 non-null float64
42 Taxi Pick Up Location 0 non-null float64
43 Bridge Highway Name 297 non-null object
44 Bridge Highway Direction 297 non-null object
45 Road Ramp 262 non-null object
46 Bridge Highway Segment 262 non-null object
47 Garage Lot Name 0 non-null float64
48 Ferry Direction 0 non-null float64
49 Ferry Terminal Name 0 non-null float64
50 Latitude 360470 non-null float64
51 Longitude 360470 non-null float64
52 Location 360470 non-null object
53 Time Elapsed1 362177 non-null float64
54 Time Elapsed2 362177 non-null float64
dtypes: datetime64[ns](2), float64(14), int64(1), object(38)
memory usage: 154.7+ MB
In [ ]:
CS_Dataset.to_csv('Customer _Service_Requests.csv',index=False)

In [36]:
CS_Dataset=pd.read_csv("Customer _Service_Requests.csv")

C:\Users\hp\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3165: DtypeWarning: Columns (17,43,44,45,46) have mixed types.Specify d


type option on import or set low_memory=False.
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,

In [37]:
print(CS_Dataset.shape)
print(CS_Dataset.columns )

(362177, 55)
Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',
'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
'Resolution Description', 'Resolution Action Updated Date',
'Community Board', 'Borough', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough',
'School Name', 'School Number', 'School Region', 'School Code',
'School Phone Number', 'School Address', 'School City', 'School State',
'School Zip', 'School Not Found', 'School or Citywide Complaint',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location',
'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp',
'Bridge Highway Segment', 'Garage Lot Name', 'Ferry Direction',
'Ferry Terminal Name', 'Latitude', 'Longitude', 'Location',
'Time Elapsed1', 'Time Elapsed2'],
dtype='object')

2.3.6 Draw a frequency plot for the complaints in each city


In [38]:
CS_Dataset['Complaint Type'].unique()

Out[38]: array(['Noise - Street/Sidewalk', 'Blocked Driveway', 'Illegal Parking',


'Derelict Vehicle', 'Noise - Commercial',
'Noise - House of Worship', 'Posting Advertisement',
'Noise - Vehicle', 'Animal Abuse', 'Vending', 'Traffic',
'Drinking', 'Bike/Roller/Skate Chronic', 'Panhandling',
'Noise - Park', 'Homeless Encampment', 'Urinating in Public',
'Graffiti', 'Disorderly Youth', 'Illegal Fireworks',
'Agency Issues', 'Squeegee', 'Animal in a Park'], dtype=object)

In [39]:
CS_Dataset=CS_Dataset.groupby(['City','Complaint Type']).size().unstack().fillna(0)
CS_Dataset

Out[39]: Noise -
Noise
Complaint Animal Blocked Derelict Disorderly Homeless Illegal Noise - House Urinating
Drinking Graffiti ... - Panhandling Traffic Vending
Type Abuse Driveway Vehicle Youth Encampment Parking Commercial of in Public
Vehicle
Worship

City

ARVERNE 46.0 50.0 32.0 2.0 1.0 1.0 4.0 62.0 2.0 14.0 ... 10.0 1.0 1.0 1.0 1.0

ASTORIA 170.0 3436.0 426.0 5.0 43.0 4.0 32.0 1340.0 1653.0 21.0 ... 236.0 2.0 60.0 10.0 57.0

Astoria 0.0 159.0 14.0 0.0 0.0 0.0 0.0 277.0 310.0 0.0 ... 0.0 0.0 0.0 0.0 0.0

BAYSIDE 53.0 514.0 231.0 2.0 1.0 3.0 2.0 638.0 47.0 3.0 ... 24.0 0.0 9.0 0.0 2.0

BELLEROSE 15.0 138.0 120.0 2.0 1.0 0.0 1.0 132.0 38.0 1.0 ... 11.0 1.0 9.0 1.0 0.0

BREEZY
2.0 3.0 3.0 0.0 1.0 0.0 0.0 16.0 4.0 0.0 ... 1.0 0.0 0.0 0.0 0.0
POINT

BRONX 1971.0 17062.0 2402.0 66.0 206.0 15.0 275.0 9889.0 2944.0 90.0 ... 3556.0 20.0 427.0 54.0 433.0

BROOKLYN 3191.0 36445.0 6257.0 79.0 291.0 60.0 948.0 33532.0 13855.0 389.0 ... 5965.0 49.0 1258.0 155.0 575.0

CAMBRIA
15.0 177.0 148.0 0.0 0.0 0.0 6.0 113.0 19.0 2.0 ... 100.0 0.0 7.0 0.0 0.0
HEIGHTS

CENTRAL
0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
PARK

COLLEGE
35.0 597.0 223.0 1.0 1.0 2.0 3.0 449.0 38.0 2.0 ... 140.0 0.0 16.0 0.0 1.0
POINT

CORONA 104.0 3597.0 72.0 6.0 34.0 4.0 26.0 791.0 281.0 3.0 ... 110.0 1.0 14.0 7.0 65.0

EAST
85.0 1925.0 136.0 1.0 9.0 3.0 2.0 1092.0 41.0 25.0 ... 82.0 0.0 24.0 6.0 9.0
ELMHURST

ELMHURST 59.0 1992.0 94.0 2.0 13.0 1.0 34.0 760.0 85.0 6.0 ... 69.0 3.0 18.0 10.0 25.0

East Elmhurst 0.0 0.0 2.0 0.0 0.0 0.0 0.0 28.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0

FAR
111.0 383.0 215.0 1.0 4.0 0.0 16.0 339.0 59.0 1.0 ... 83.0 0.0 11.0 1.0 10.0
ROCKAWAY

FLORAL PARK 7.0 33.0 74.0 1.0 1.0 0.0 0.0 72.0 3.0 0.0 ... 2.0 0.0 0.0 0.0 0.0

FLUSHING 191.0 3640.0 532.0 2.0 47.0 6.0 26.0 2250.0 222.0 5.0 ... 147.0 2.0 59.0 12.0 37.0

FOREST HILLS 78.0 873.0 71.0 1.0 1.0 3.0 18.0 627.0 163.0 1.0 ... 70.0 6.0 65.0 2.0 10.0

FRESH
66.0 682.0 347.0 0.0 2.0 0.0 6.0 1158.0 21.0 0.0 ... 97.0 1.0 15.0 1.0 1.0
MEADOWS

GLEN OAKS 5.0 48.0 57.0 0.0 0.0 0.0 0.0 95.0 84.0 0.0 ... 4.0 0.0 3.0 2.0 19.0
Noise -
Noise
Complaint Animal Blocked Derelict Disorderly Homeless Illegal Noise - House Urinating
Drinking Graffiti ... - Panhandling Traffic Vending
Type Abuse Driveway Vehicle Youth Encampment Parking Commercial of in Public
Vehicle
Worship

City

HOLLIS 39.0 442.0 162.0 1.0 3.0 0.0 9.0 181.0 54.0 215.0 ... 52.0 0.0 11.0 2.0 0.0

HOWARD
51.0 215.0 172.0 1.0 4.0 0.0 3.0 384.0 258.0 1.0 ... 10.0 2.0 9.0 0.0 5.0
BEACH

Howard
0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
Beach

JACKSON
50.0 703.0 41.0 0.0 10.0 1.0 11.0 240.0 619.0 2.0 ... 75.0 1.0 13.0 3.0 86.0
HEIGHTS

JAMAICA 317.0 3620.0 1132.0 9.0 40.0 3.0 93.0 1698.0 552.0 15.0 ... 337.0 3.0 632.0 37.0 24.0

KEW
26.0 429.0 16.0 0.0 1.0 0.0 5.0 276.0 203.0 1.0 ... 23.0 0.0 10.0 3.0 1.0
GARDENS

LITTLE NECK 21.0 174.0 73.0 2.0 1.0 0.0 0.0 322.0 77.0 0.0 ... 8.0 0.0 20.0 1.0 0.0

LONG ISLAND
40.0 1052.0 220.0 2.0 8.0 3.0 10.0 987.0 269.0 0.0 ... 124.0 2.0 83.0 3.0 31.0
CITY

Long Island
0.0 55.0 4.0 0.0 0.0 0.0 0.0 64.0 19.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
City

MASPETH 56.0 1000.0 510.0 2.0 9.0 1.0 11.0 1234.0 57.0 2.0 ... 26.0 0.0 71.0 2.0 7.0

MIDDLE
36.0 663.0 366.0 0.0 2.0 0.0 5.0 1104.0 13.0 0.0 ... 45.0 0.0 14.0 0.0 0.0
VILLAGE

NEW HYDE
1.0 76.0 14.0 0.0 0.0 0.0 0.0 32.0 4.0 0.0 ... 2.0 0.0 0.0 0.0 0.0
PARK

NEW YORK 1941.0 2705.0 695.0 81.0 321.0 25.0 3060.0 14549.0 18686.0 222.0 ... 6294.0 206.0 1769.0 264.0 2638.0

OAKLAND
29.0 177.0 117.0 1.0 2.0 0.0 1.0 337.0 2.0 0.0 ... 7.0 0.0 6.0 0.0 2.0
GARDENS

OZONE PARK 72.0 1681.0 479.0 4.0 20.0 0.0 8.0 774.0 125.0 4.0 ... 81.0 7.0 21.0 4.0 1.0

QUEENS 1.0 3.0 2.0 0.0 0.0 0.0 2.0 10.0 6.0 1.0 ... 2.0 0.0 2.0 1.0 0.0

QUEENS
90.0 772.0 478.0 0.0 5.0 1.0 19.0 669.0 49.0 2.0 ... 54.0 1.0 27.0 5.0 2.0
VILLAGE

REGO PARK 33.0 780.0 94.0 0.0 4.0 1.0 6.0 640.0 82.0 1.0 ... 60.0 0.0 16.0 1.0 3.0

RICHMOND
55.0 1099.0 200.0 0.0 10.0 1.0 30.0 489.0 249.0 0.0 ... 69.0 0.0 8.0 5.0 15.0
HILL

RIDGEWOOD 154.0 2161.0 507.0 3.0 10.0 3.0 26.0 2235.0 491.0 2.0 ... 249.0 0.0 50.0 9.0 9.0

ROCKAWAY
33.0 80.0 19.0 4.0 23.0 0.0 4.0 337.0 72.0 0.0 ... 29.0 0.0 7.0 1.0 2.0
PARK

ROSEDALE 44.0 270.0 247.0 0.0 2.0 2.0 4.0 326.0 28.0 2.0 ... 25.0 0.0 25.0 0.0 19.0

SAINT
43.0 318.0 248.0 1.0 3.0 0.0 11.0 237.0 36.0 1.0 ... 50.0 0.0 14.0 1.0 2.0
ALBANS

SOUTH
74.0 1202.0 425.0 2.0 14.0 2.0 5.0 602.0 82.0 5.0 ... 97.0 0.0 36.0 2.0 5.0
OZONE PARK

SOUTH
RICHMOND 40.0 1946.0 356.0 2.0 25.0 0.0 12.0 596.0 223.0 3.0 ... 93.0 0.0 12.0 1.0 24.0
HILL

SPRINGFIELD
42.0 330.0 267.0 0.0 6.0 0.0 7.0 291.0 38.0 1.0 ... 48.0 2.0 12.0 3.0 1.0
GARDENS

STATEN
786.0 2845.0 2184.0 25.0 188.0 6.0 77.0 6224.0 783.0 18.0 ... 424.0 13.0 229.0 19.0 25.0
ISLAND

SUNNYSIDE 40.0 278.0 17.0 2.0 12.0 1.0 12.0 167.0 238.0 0.0 ... 53.0 0.0 17.0 2.0 15.0

WHITESTONE 43.0 279.0 279.0 1.0 3.0 1.0 0.0 631.0 21.0 0.0 ... 31.0 0.0 32.0 0.0 1.0

WOODHAVEN 57.0 1363.0 369.0 0.0 4.0 0.0 10.0 896.0 209.0 3.0 81.0 1.0 7.0 2.0 6.0
In [40]:
CS_Dataset.plot.bar(figsize=(14,10), stacked=True)
plt.ylabel('Number of Complaints')
plt.title('Number of complaints vs. City')

Out[40]: Text(0.5, 1.0, 'Number of complaints vs. City')


2.3.7 Create a scatter and hexbin plot of the concentration of complaints across Brooklyn
Concentration of complaints across Brooklyn
In [41]:
CS_Dataset=pd.read_csv("Customer _Service_Requests.csv")

C:\Users\hp\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3165: DtypeWarning: Columns (17,43,44,45,46) have mixed types.Specify d


type option on import or set low_memory=False.
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,

In [42]:
brooklyn = CS_Dataset.loc[CS_Dataset['City'] == 'BROOKLYN']
brooklyn

Out[42]: Bridge Garage


Unique Created Closed Agency Complaint Incident Incident Road Ferry
Agency Descriptor Location Type ... Highway Lot Terminal
Key Date Date Name Type Zip Address Ramp Direction
Segment Name

Posted
New York
2015-12-31 2016-01-01 Illegal Parking 260 21
5 32306554 NYPD City Police Street/Sidewalk 11215.0 ... NaN NaN NaN NaN
23:56:30 01:50:11 Parking Sign STREET
Department
Violation

New York
2015-12-31 2016-01-01 Blocked 1408 66
9 32308391 NYPD City Police No Access Street/Sidewalk 11219.0 ... NaN NaN NaN NaN
23:53:58 01:17:40 Driveway STREET
Department

Posted
New York
2015-12-31 2016-01-01 Illegal Parking 38 COX
13 32305074 NYPD City Police Street/Sidewalk 11208.0 ... NaN NaN NaN NaN
23:47:58 08:18:47 Parking Sign PLACE
Department
Violation

New York 622


2015-12-31 2016-01-01 Noise - Loud Club/Bar
17 32310273 NYPD City Police 11217.0 DEGRAW ... NaN NaN NaN NaN
23:44:52 00:36:10 Commercial Music/Party /Restaurant
Department STREET

New York 2192


2015-12-31 2016-01-01 Noise - Loud Club/Bar
18 32306617 NYPD City Police 11234.0 FLATBUSH ... NaN NaN NaN NaN
23:40:59 02:37:28 Commercial Music/Party /Restaurant
Department AVENUE

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

New York 229


2015-01-01 2015-01-01 Blocked
362158 29608505 NYPD City Police No Access Street/Sidewalk 11201.0 DUFFIELD ... NaN NaN NaN NaN
00:23:55 02:58:38 Driveway
Department STREET

New York
2015-01-01 2015-01-01 Blocked 27 HOPE
362160 29612697 NYPD City Police No Access Street/Sidewalk 11211.0 ... NaN NaN NaN NaN
00:19:22 02:41:10 Driveway STREET
Department

New York 242


2015-01-01 2015-01-01 Noise - Loud
362163 29613295 NYPD City Police Store/Commercial 11217.0 FLATBUSH ... NaN NaN NaN NaN
00:17:48 03:24:48 Commercial Music/Party
Department AVENUE

New York 1373


2015-01-01 2015-01-01 Blocked
362164 29613456 NYPD City Police No Access Street/Sidewalk 11237.0 DECATUR ... NaN NaN NaN NaN
00:17:47 00:51:13 Driveway
Department STREET

New York 19
2015-01-01 2015-01-01 Blocked
362165 29613402 NYPD City Police No Access Street/Sidewalk 11218.0 MICIELI ... NaN NaN NaN NaN
00:15:45 02:04:54 Driveway
Department PLACE

118849 rows × 55 columns

Concentration of complaints across Brooklyn using Scatter plot


In [43]:
brooklyn[['Longitude', 'Latitude']].plot(kind = 'scatter', x='Longitude', y='Latitude',
title = 'Concentration of Complaints across Brooklyn', figsize = (15, 10));
plt.xlabel("Longitude in degrees (°)")
plt.ylabel("Latitude in degrees (°)")

Out[43]: Text(0, 0.5, 'Latitude in degrees (°)')


Concentration of complaints across Brooklyn using hexbin plot
In [44]:
#hexbin
brooklyn[['Longitude', 'Latitude']].plot(kind = 'hexbin', x='Longitude', y='Latitude', gridsize=25,
colormap = 'Paired', mincnt=1, title = 'Concentration of Complaints across Brooklyn', figsize = (15, 10));
plt.xlabel("Longitude in degrees (°)")
plt.ylabel("Latitude in degrees (°)")

Out[44]: Text(0, 0.5, 'Latitude in degrees (°)')

3. Find major types of complaints:


3.1 Plot a bar graph to show the types of complaints

3.2 Check the frequency of various types of complaints for New York City

3.3 Find the top 10 complaint types

3.4 Display the various types of complaints in each city

3.5 Create a DataFrame, df_new, which contains cities as columns and complaint types in rows
3.1 Plot a bar graph to show the types of complaints
In [45]:
#Complaint Types
CS_Dataset['Complaint Type'].unique()

Out[45]: array(['Noise - Street/Sidewalk', 'Blocked Driveway', 'Illegal Parking',


'Derelict Vehicle', 'Noise - Commercial',
'Noise - House of Worship', 'Posting Advertisement',
'Noise - Vehicle', 'Animal Abuse', 'Vending', 'Traffic',
'Drinking', 'Bike/Roller/Skate Chronic', 'Panhandling',
'Noise - Park', 'Homeless Encampment', 'Urinating in Public',
'Graffiti', 'Disorderly Youth', 'Illegal Fireworks',
'Agency Issues', 'Squeegee', 'Animal in a Park'], dtype=object)

In [46]:
# Display complaint types by counts
CS_Dataset['Complaint Type'].value_counts()

Out[46]: Blocked Driveway 100624


Illegal Parking 91716
Noise - Street/Sidewalk 51139
Noise - Commercial 43751
Derelict Vehicle 21518
Noise - Vehicle 19301
Animal Abuse 10530
Traffic 5196
Homeless Encampment 4879
Vending 4185
Noise - Park 4089
Drinking 1404
Noise - House of Worship 1068
Posting Advertisement 679
Urinating in Public 641
Bike/Roller/Skate Chronic 475
Panhandling 325
Disorderly Youth 315
Illegal Fireworks 172
Graffiti 157
Agency Issues 8
Squeegee 4
Animal in a Park 1
Name: Complaint Type, dtype: int64

using bar graph complaint types plot as shown


In [47]:
# Display frequency of complaint types
CS_Dataset['Complaint Type'].value_counts().plot(kind = 'bar',
title = 'Types of complaints');

3.2 Check the frequency of various types of complaints for New York City
Displaying compliant type that have only New York City
In [48]:
New_York_City = CS_Dataset.loc[CS_Dataset['City'] == 'NEW YORK']
New_York_City

Out[48]: Bridge Garage


Unique Created Closed Agency Complaint Incident Incident Road Ferry
Agency Descriptor Location Type ... Highway Lot Terminal
Key Date Date Name Type Zip Address Ramp Direction
Segment Name

New York 71
2015-12-31 2016-01-01 Noise - Loud
0 32310363 NYPD City Police Street/Sidewalk 10034.0 VERMILYEA ... NaN NaN NaN NaN
23:59:45 00:55:15 Street/Sidewalk Music/Party
Department AVENUE

New York 524 WEST


2015-12-31 2016-01-01 Blocked
6 32306559 NYPD City Police Illegal Parking Street/Sidewalk 10032.0 169 ... NaN NaN NaN NaN
23:55:32 01:53:54 Hydrant
Department STREET

New York 264 WEST


2015-12-31 2016-01-01 Noise - Loud
19 32308195 NYPD City Police Street/Sidewalk 10026.0 118 ... NaN NaN NaN NaN
23:40:55 00:28:31 Street/Sidewalk Music/Party
Department STREET

Double
New York 133 WEST
2015-12-31 2016-01-01 Parked
23 32308765 NYPD City Police Illegal Parking Street/Sidewalk 10030.0 134 ... NaN NaN NaN NaN
23:32:46 00:25:21 Blocking
Department STREET
Vehicle
Bridge Garage
Unique Created Closed Agency Complaint Incident Incident Road Ferry
Agency Descriptor Location Type ... Highway Lot Terminal
Key Date Date Name Type Zip Address Ramp Direction
Segment Name

New York 452 WEST


2015-12-31 2015-12-31 Noise - House Loud House of
26 32305916 NYPD City Police 10031.0 147 ... NaN NaN NaN NaN
23:26:41 23:53:31 of Worship Music/Party Worship
Department STREET

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

New York 565 WEST


2015-01-01 2015-01-01 Noise - Loud
362161 29607990 NYPD City Police Street/Sidewalk 10040.0 190 ... NaN NaN NaN NaN
00:19:20 03:17:10 Street/Sidewalk Music/Party
Department STREET

New York 565 WEST


2015-01-01 2015-01-01 Noise - Loud
362162 29609631 NYPD City Police Street/Sidewalk 10040.0 190 ... NaN NaN NaN NaN
00:18:49 03:17:11 Street/Sidewalk Music/Party
Department STREET

New York
2015-01-01 2015-01-01 Noise - Loud LUDLOW
362166 29608295 NYPD City Police Street/Sidewalk 10002.0 ... NaN NaN NaN NaN
00:15:33 00:56:37 Street/Sidewalk Music/Party STREET
Department

New York
2015-01-01 2015-01-01 Noise - Loud
362171 29610051 NYPD City Police Street/Sidewalk 10002.0 NaN ... NaN NaN NaN NaN
00:05:05 01:22:10 Street/Sidewalk Music/Party
Department

New York 508 WEST


2015-01-01 2015-01-01 Noise - Loud

To check the frequency I used two Solutions as described below


In [49]:
# solution 1 by checking the total count of each compaliant type
New_York_City['Complaint Type'].value_counts()

Out[49]: Noise - Street/Sidewalk 22245


Noise - Commercial 18686
Illegal Parking 14549
Noise - Vehicle 6294
Homeless Encampment 3060
Blocked Driveway 2705
Vending 2638
Animal Abuse 1941
Traffic 1769
Noise - Park 1243
Derelict Vehicle 695
Drinking 321
Urinating in Public 264
Bike/Roller/Skate Chronic 254
Noise - House of Worship 222
Panhandling 206
Disorderly Youth 81
Posting Advertisement 49
Illegal Fireworks 38
Graffiti 25
Squeegee 4
Name: Complaint Type, dtype: int64

In [50]:
# solution 2 by using count plotting technique
plt.figure(figsize=(15,10))
sn.countplot(data=New_York_City,y='Complaint Type')
plt.title('Count by Complaint Type for New York City')
plt.xlabel('Complaint Type')
plt.ylabel('Count')

Out[50]: Text(0, 0.5, 'Count')


3.3 Find the top 10 complaint types
In [51]:
top10_ctypes=CS_Dataset.groupby(['Complaint Type']).size().nlargest(10)
top10_ctypes

Out[51]: Complaint Type


Blocked Driveway 100624
Illegal Parking 91716
Noise - Street/Sidewalk 51139
Noise - Commercial 43751
Derelict Vehicle 21518
Noise - Vehicle 19301
Animal Abuse 10530
Traffic 5196
Homeless Encampment 4879
Vending 4185
dtype: int64

3.4 Display the various types of complaints in each city


In [52]:
cl=CS_Dataset['City'].unique()
cl

Out[52]: array(['NEW YORK', 'ASTORIA', 'BRONX', 'ELMHURST', 'BROOKLYN',


'KEW GARDENS', 'JACKSON HEIGHTS', 'MIDDLE VILLAGE', 'REGO PARK',
'SAINT ALBANS', 'JAMAICA', 'SOUTH RICHMOND HILL', nan, 'RIDGEWOOD',
'HOWARD BEACH', 'FOREST HILLS', 'STATEN ISLAND', 'OZONE PARK',
'RICHMOND HILL', 'WOODHAVEN', 'FLUSHING', 'CORONA',
'QUEENS VILLAGE', 'OAKLAND GARDENS', 'HOLLIS', 'MASPETH',
'EAST ELMHURST', 'SOUTH OZONE PARK', 'WOODSIDE', 'FRESH MEADOWS',
'LONG ISLAND CITY', 'ROCKAWAY PARK', 'SPRINGFIELD GARDENS',
'COLLEGE POINT', 'BAYSIDE', 'GLEN OAKS', 'FAR ROCKAWAY',
'BELLEROSE', 'LITTLE NECK', 'CAMBRIA HEIGHTS', 'ROSEDALE',
'SUNNYSIDE', 'WHITESTONE', 'ARVERNE', 'FLORAL PARK',
'NEW HYDE PARK', 'CENTRAL PARK', 'BREEZY POINT', 'QUEENS',
'Astoria', 'Long Island City', 'Woodside', 'East Elmhurst',
'Howard Beach'], dtype=object)

In [53]:
for c in cl:
df_c = CS_Dataset.loc[CS_Dataset['City'] == c]
df_c['Complaint Type'].value_counts().plot(kind='bar', figsize=(18, 10))
plt.title(f"Count by Complaint Type for %s" %c)
plt.xlabel('Complaint Type')
plt.ylabel('Count')
plt.show()
print("=================================================================")
print("/////////////////////////////////////////////////////////////////")

=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////

=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////

=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////

=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////

=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////

=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-53-d02a9c8fb227> in <module>
1 for c in cl:
2 df_c = CS_Dataset.loc[CS_Dataset['City'] == c]
----> 3 df_c['Complaint Type'].value_counts().plot(kind='bar', figsize=(18, 10))
4 plt.title(f"Count by Complaint Type for %s" %c)
('Complaint Type')
3.5 Create a DataFrame, df_new, which contains cities as columns and complaint types in rows
In [54]:
df_new = CS_Dataset[['Complaint Type', 'City']]

df_new
# as a data frame

Out[54]: Complaint Type City

0 Noise - Street/Sidewalk NEW YORK

1 Blocked Driveway ASTORIA

2 Blocked Driveway BRONX

3 Illegal Parking BRONX

4 Illegal Parking ELMHURST

... ... ...

362172 Illegal Parking WOODHAVEN

362173 Noise - Vehicle BRONX

362174 Noise - Street/Sidewalk NEW YORK

362175 Blocked Driveway BRONX

362176 Blocked Driveway SOUTH OZONE PARK

362177 rows × 2 columns

In [55]:
df_new=df_new.set_index('Complaint Type')
df_new

Out[55]: City

Complaint Type

Noise - Street/Sidewalk NEW YORK

Blocked Driveway ASTORIA

Blocked Driveway BRONX

Illegal Parking BRONX

Illegal Parking ELMHURST


City

Complaint Type

... ...

Illegal Parking WOODHAVEN

Noise - Vehicle BRONX

Noise - Street/Sidewalk NEW YORK

Blocked Driveway BRONX

Blocked Driveway SOUTH OZONE PARK

4. Visualize the major types of complaints in each city¶


4.1Draw another chart that shows the types of complaints in each city in a single chart, where different colors show
the different types of complaints
In [56]:
# Complaint type per city

df_complainttypes=CS_Dataset.groupby(['City','Complaint Type']).size().unstack().fillna(0)
df_complainttypes

Out[56]: Noise -
Noise
Complaint Animal Blocked Derelict Disorderly Homeless Illegal Noise - House Urinating
Drinking Graffiti ... - Panhandling Traffic Vending
Type Abuse Driveway Vehicle Youth Encampment Parking Commercial of in Public
Vehicle
Worship

City

ARVERNE 46.0 50.0 32.0 2.0 1.0 1.0 4.0 62.0 2.0 14.0 ... 10.0 1.0 1.0 1.0 1.0

ASTORIA 170.0 3436.0 426.0 5.0 43.0 4.0 32.0 1340.0 1653.0 21.0 ... 236.0 2.0 60.0 10.0 57.0

Astoria 0.0 159.0 14.0 0.0 0.0 0.0 0.0 277.0 310.0 0.0 ... 0.0 0.0 0.0 0.0 0.0

BAYSIDE 53.0 514.0 231.0 2.0 1.0 3.0 2.0 638.0 47.0 3.0 ... 24.0 0.0 9.0 0.0 2.0

BELLEROSE 15.0 138.0 120.0 2.0 1.0 0.0 1.0 132.0 38.0 1.0 ... 11.0 1.0 9.0 1.0 0.0

BREEZY
2.0 3.0 3.0 0.0 1.0 0.0 0.0 16.0 4.0 0.0 ... 1.0 0.0 0.0 0.0 0.0
POINT

BRONX 1971.0 17062.0 2402.0 66.0 206.0 15.0 275.0 9889.0 2944.0 90.0 ... 3556.0 20.0 427.0 54.0 433.0

BROOKLYN 3191.0 36445.0 6257.0 79.0 291.0 60.0 948.0 33532.0 13855.0 389.0 ... 5965.0 49.0 1258.0 155.0 575.0

CAMBRIA
15.0 177.0 148.0 0.0 0.0 0.0 6.0 113.0 19.0 2.0 ... 100.0 0.0 7.0 0.0 0.0
HEIGHTS

CENTRAL
0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
PARK

COLLEGE
35.0 597.0 223.0 1.0 1.0 2.0 3.0 449.0 38.0 2.0 ... 140.0 0.0 16.0 0.0 1.0
POINT

CORONA 104.0 3597.0 72.0 6.0 34.0 4.0 26.0 791.0 281.0 3.0 ... 110.0 1.0 14.0 7.0 65.0

EAST
85.0 1925.0 136.0 1.0 9.0 3.0 2.0 1092.0 41.0 25.0 ... 82.0 0.0 24.0 6.0 9.0
ELMHURST

ELMHURST 59.0 1992.0 94.0 2.0 13.0 1.0 34.0 760.0 85.0 6.0 ... 69.0 3.0 18.0 10.0 25.0

East Elmhurst 0.0 0.0 2.0 0.0 0.0 0.0 0.0 28.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0

FAR
111.0 383.0 215.0 1.0 4.0 0.0 16.0 339.0 59.0 1.0 ... 83.0 0.0 11.0 1.0 10.0
ROCKAWAY

FLORAL PARK 7.0 33.0 74.0 1.0 1.0 0.0 0.0 72.0 3.0 0.0 ... 2.0 0.0 0.0 0.0 0.0

FLUSHING 191.0 3640.0 532.0 2.0 47.0 6.0 26.0 2250.0 222.0 5.0 ... 147.0 2.0 59.0 12.0 37.0

FOREST HILLS 78.0 873.0 71.0 1.0 1.0 3.0 18.0 627.0 163.0 1.0 ... 70.0 6.0 65.0 2.0 10.0

FRESH
66.0 682.0 347.0 0.0 2.0 0.0 6.0 1158.0 21.0 0.0 ... 97.0 1.0 15.0 1.0 1.0
MEADOWS

GLEN OAKS 5.0 48.0 57.0 0.0 0.0 0.0 0.0 95.0 84.0 0.0 ... 4.0 0.0 3.0 2.0 19.0

HOLLIS 39.0 442.0 162.0 1.0 3.0 0.0 9.0 181.0 54.0 215.0 ... 52.0 0.0 11.0 2.0 0.0

HOWARD
51.0 215.0 172.0 1.0 4.0 0.0 3.0 384.0 258.0 1.0 ... 10.0 2.0 9.0 0.0 5.0
BEACH

Howard
0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
Beach

JACKSON
50.0 703.0 41.0 0.0 10.0 1.0 11.0 240.0 619.0 2.0 ... 75.0 1.0 13.0 3.0 86.0
HEIGHTS

JAMAICA 317.0 3620.0 1132.0 9.0 40.0 3.0 93.0 1698.0 552.0 15.0 ... 337.0 3.0 632.0 37.0 24.0

KEW
26.0 429.0 16.0 0.0 1.0 0.0 5.0 276.0 203.0 1.0 ... 23.0 0.0 10.0 3.0 1.0
GARDENS

LITTLE NECK 21.0 174.0 73.0 2.0 1.0 0.0 0.0 322.0 77.0 0.0 ... 8.0 0.0 20.0 1.0 0.0

LONG ISLAND
40.0 1052.0 220.0 2.0 8.0 3.0 10.0 987.0 269.0 0.0 ... 124.0 2.0 83.0 3.0 31.0
CITY

Long Island
0.0 55.0 4.0 0.0 0.0 0.0 0.0 64.0 19.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
City

MASPETH 56.0 1000.0 510.0 2.0 9.0 1.0 11.0 1234.0 57.0 2.0 ... 26.0 0.0 71.0 2.0 7.0

MIDDLE
36.0 663.0 366.0 0.0 2.0 0.0 5.0 1104.0 13.0 0.0 ... 45.0 0.0 14.0 0.0 0.0
VILLAGE
Noise -
Noise
Complaint Animal Blocked Derelict Disorderly Homeless Illegal Noise - House Urinating
Drinking Graffiti ... - Panhandling Traffic Vending
Type Abuse Driveway Vehicle Youth Encampment Parking Commercial of in Public
Vehicle
Worship

City

NEW HYDE
1.0 76.0 14.0 0.0 0.0 0.0 0.0 32.0 4.0 0.0 ... 2.0 0.0 0.0 0.0 0.0
PARK

NEW YORK 1941.0 2705.0 695.0 81.0 321.0 25.0 3060.0 14549.0 18686.0 222.0 ... 6294.0 206.0 1769.0 264.0 2638.0

OAKLAND
29.0 177.0 117.0 1.0 2.0 0.0 1.0 337.0 2.0 0.0 ... 7.0 0.0 6.0 0.0 2.0
GARDENS

OZONE PARK 72.0 1681.0 479.0 4.0 20.0 0.0 8.0 774.0 125.0 4.0 ... 81.0 7.0 21.0 4.0 1.0

QUEENS 1.0 3.0 2.0 0.0 0.0 0.0 2.0 10.0 6.0 1.0 ... 2.0 0.0 2.0 1.0 0.0

QUEENS
90.0 772.0 478.0 0.0 5.0 1.0 19.0 669.0 49.0 2.0 ... 54.0 1.0 27.0 5.0 2.0
VILLAGE

REGO PARK 33.0 780.0 94.0 0.0 4.0 1.0 6.0 640.0 82.0 1.0 ... 60.0 0.0 16.0 1.0 3.0

RICHMOND
55.0 1099.0 200.0 0.0 10.0 1.0 30.0 489.0 249.0 0.0 ... 69.0 0.0 8.0 5.0 15.0
HILL

RIDGEWOOD 154.0 2161.0 507.0 3.0 10.0 3.0 26.0 2235.0 491.0 2.0 ... 249.0 0.0 50.0 9.0 9.0

ROCKAWAY
33.0 80.0 19.0 4.0 23.0 0.0 4.0 337.0 72.0 0.0 ... 29.0 0.0 7.0 1.0 2.0
PARK

ROSEDALE 44.0 270.0 247.0 0.0 2.0 2.0 4.0 326.0 28.0 2.0 ... 25.0 0.0 25.0 0.0 19.0

SAINT
43.0 318.0 248.0 1.0 3.0 0.0 11.0 237.0 36.0 1.0 ... 50.0 0.0 14.0 1.0 2.0
ALBANS

SOUTH
74.0 1202.0 425.0 2.0 14.0 2.0 5.0 602.0 82.0 5.0 ... 97.0 0.0 36.0 2.0 5.0
OZONE PARK

SOUTH
RICHMOND 40.0 1946.0 356.0 2.0 25.0 0.0 12.0 596.0 223.0 3.0 ... 93.0 0.0 12.0 1.0 24.0
HILL

SPRINGFIELD
42.0 330.0 267.0 0.0 6.0 0.0 7.0 291.0 38.0 1.0 ... 48.0 2.0 12.0 3.0 1.0
GARDENS

STATEN
786.0 2845.0 2184.0 25.0 188.0 6.0 77.0 6224.0 783.0 18.0 ... 424.0 13.0 229.0 19.0 25.0
ISLAND

SUNNYSIDE 40.0 278.0 17.0 2.0 12.0 1.0 12.0 167.0 238.0 0.0 ... 53.0 0.0 17.0 2.0 15.0

WHITESTONE 43.0 279.0 279.0 1.0 3.0 1.0 0.0 631.0 21.0 0.0 ... 31.0 0.0 32.0 0.0 1.0

WOODHAVEN 57.0 1363.0 369.0 0.0 4.0 0.0 10.0 896.0 209.0 3.0 81.0 1.0 7.0 2.0 6.0

In [57]:
df_complainttypes.plot.bar(figsize=(15,10), stacked=True, colormap='Paired')
plt.ylabel('Number of Complaints')
plt.title('Number of complaints vs. City')

Out[57]: Text(0.5, 1.0, 'Number of complaints vs. City')


4.2 Sort the complaint types based on the average Request_Closing_Timegrouping them for different locations
In [58]:
locations=CS_Dataset['Location'].unique()
locations

Out[58]: array(['(40.86568153633767, -73.92350095571744)',


'(40.775945312321085, -73.91509393898605)',
'(40.870324522111424, -73.88852464418646)', ...,
'(40.860067825505645, -73.85211211113571)',
'(40.64643889447912, -73.98197140465561)',
'(40.73884743426441, -73.86375174412073)'], dtype=object)

In [59]:
CS_Dataset.columns

Out[59]: Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',
'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
'Resolution Description', 'Resolution Action Updated Date',
'Community Board', 'Borough', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough',
'School Name', 'School Number', 'School Region', 'School Code',
'School Phone Number', 'School Address', 'School City', 'School State',
'School Zip', 'School Not Found', 'School or Citywide Complaint',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location',
'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp',
'Bridge Highway Segment', 'Garage Lot Name', 'Ferry Direction',
'Ferry Terminal Name', 'Latitude', 'Longitude', 'Location',
'Time Elapsed1', 'Time Elapsed2'],
dtype='object')

In [60]:
cl

Out[60]: array(['NEW YORK', 'ASTORIA', 'BRONX', 'ELMHURST', 'BROOKLYN',


'KEW GARDENS', 'JACKSON HEIGHTS', 'MIDDLE VILLAGE', 'REGO PARK',
'SAINT ALBANS', 'JAMAICA', 'SOUTH RICHMOND HILL', nan, 'RIDGEWOOD',
'HOWARD BEACH', 'FOREST HILLS', 'STATEN ISLAND', 'OZONE PARK',
'RICHMOND HILL', 'WOODHAVEN', 'FLUSHING', 'CORONA',
'QUEENS VILLAGE', 'OAKLAND GARDENS', 'HOLLIS', 'MASPETH',
'EAST ELMHURST', 'SOUTH OZONE PARK', 'WOODSIDE', 'FRESH MEADOWS',
'LONG ISLAND CITY', 'ROCKAWAY PARK', 'SPRINGFIELD GARDENS',
'COLLEGE POINT', 'BAYSIDE', 'GLEN OAKS', 'FAR ROCKAWAY',
'BELLEROSE', 'LITTLE NECK', 'CAMBRIA HEIGHTS', 'ROSEDALE',
'SUNNYSIDE', 'WHITESTONE', 'ARVERNE', 'FLORAL PARK',
'NEW HYDE PARK', 'CENTRAL PARK', 'BREEZY POINT', 'QUEENS',
'Astoria', 'Long Island City', 'Woodside', 'East Elmhurst',
'Howard Beach'], dtype=object)

In [61]:
df_groupby_city = CS_Dataset.groupby(['City','Complaint Type'])['Time Elapsed2'].mean()
df_groupby_location = CS_Dataset.groupby(['Location','Complaint Type'])['Time Elapsed2'].mean()
print(df_groupby_location)

Location Complaint Type


(40.49904035820963, -74.24392674874807) Illegal Parking 2571.0
(40.49913462101514, -74.24348482977875) Illegal Parking 2039.5
(40.49967332981336, -74.2379063249761) Derelict Vehicle 1951.0
(40.499823835142145, -74.24465233636964) Animal Abuse 529.0
(40.49994886080869, -74.23740031497493) Illegal Parking 39541.0
...
(40.91220586223159, -73.90075187169981) Illegal Parking 8705.0
(40.91234427543014, -73.902133732632) Noise - Street/Sidewalk 22717.0
(40.912827652752576, -73.90250567520697) Blocked Driveway 14830.0
(40.912868795316655, -73.90247305278565) Illegal Parking 46614.5
Noise - Vehicle 1732.0
Name: Time Elapsed2, Length: 178163, dtype: float64

In [62]:
# grop by City
print(df_groupby_city)

City Complaint Type


ARVERNE Animal Abuse 8399.195652
Blocked Driveway 8318.840000
Derelict Vehicle 11394.000000
Disorderly Youth 12928.500000
Drinking 859.000000
...
Woodside Blocked Driveway 15566.185185
Derelict Vehicle 19994.500000
Illegal Parking 17293.459677
Noise - Commercial 8619.000000
Noise - Street/Sidewalk 12285.600000
Name: Time Elapsed2, Length: 777, dtype: float64

5.See whether the average response time across different complaint types is
similar (overall)
5.1 Visualize the average of Request_Closing_Time
In [63]:
# Resolution time according to complaint type
CS_Dataset.groupby('Complaint Type')['Time Elapsed2'].mean().sort_values()

Out[63]: Complaint Type


Posting Advertisement 7.286256e+03
Illegal Fireworks 1.011348e+04
Noise - Commercial 1.108576e+04
Noise - House of Worship 1.139109e+04
Noise - Park 1.222606e+04
Noise - Street/Sidewalk 1.223130e+04
Traffic 1.230912e+04
Disorderly Youth 1.236375e+04
Noise - Vehicle 1.256180e+04
Urinating in Public 1.295929e+04
Bike/Roller/Skate Chronic 1.312369e+04
Drinking 1.382130e+04
Vending 1.436628e+04
Squeegee 1.456025e+04
Homeless Encampment 1.545138e+04
Illegal Parking 1.565044e+04
Panhandling 1.585355e+04
Blocked Driveway 1.623252e+04
Animal Abuse 1.803256e+04
Agency Issues 1.828912e+04
Graffiti 2.327634e+04
Derelict Vehicle 2.535960e+04
Animal in a Park 1.212634e+06

average response time across vs complaint types


In [64]:
df_mrt=CS_Dataset.groupby('Complaint Type')['Time Elapsed2'].mean().fillna(0).to_frame()
df_mrt

Out[64]: Time Elapsed2

Complaint Type

Agency Issues 1.828912e+04

Animal Abuse 1.803256e+04

Animal in a Park 1.212634e+06

Bike/Roller/Skate Chronic 1.312369e+04

Blocked Driveway 1.623252e+04

Derelict Vehicle 2.535960e+04

Disorderly Youth 1.236375e+04

Drinking 1.382130e+04

Graffiti 2.327634e+04

Homeless Encampment 1.545138e+04

Illegal Fireworks 1.011348e+04

Illegal Parking 1.565044e+04

Noise - Commercial 1.108576e+04

Noise - House of Worship 1.139109e+04

Noise - Park 1.222606e+04

Noise - Street/Sidewalk 1.223130e+04

Noise - Vehicle 1.256180e+04

Panhandling 1.585355e+04

Posting Advertisement 7.286256e+03

Squeegee 1.456025e+04

Traffic 1.230912e+04

Urinating in Public 1.295929e+04

Vending 1.436628e+04

In [65]:
df_mrt.plot(kind='bar', figsize=(15, 10))
plt.title('Average Response Time by Complaint Type')
plt.xlabel('Complaint Type')
plt.ylabel('Average Response Time in minutes')
plt.show()
In [66]:
# another option
plt.figure(figsize=(15,8))
plt.xticks(rotation = 45)
sn.boxplot(data=CS_Dataset, x="Time Elapsed2", y="Complaint Type")

Out[66]: <AxesSubplot:xlabel='Time Elapsed2', ylabel='Complaint Type'>

Here the Animal in a park has impact on the result¶


In [67]:
animal=CS_Dataset[CS_Dataset['Complaint Type']=='Animal in a Park']
animal

Out[67]: Bridge Garage Ferry


Unique Created Closed Agency Complaint Location Incident Incident Road Ferry
Agency Descriptor ... Highway Lot Terminal Latitude
Key Date Date Name Type Type Zip Address Ramp Direction
Segment Name Name

New York
2015-04-18 2015-05-02 Animal in Animal
281061 30427220 NYPD City Police Park NaN NaN ... NaN NaN NaN NaN NaN
09:44:55 10:35:29 a Park Waste
Department

1 rows × 55 columns
so remove this record
In [68]:
CS_Dataset.drop(labels=281061, axis=0, inplace=True)
animal=CS_Dataset[CS_Dataset['Complaint Type']=='Animal in a Park']
animal

Out[68]: Bridge Garage Ferry


Unique Created Closed Agency Complaint Location Incident Incident Road Ferry
Agency Descriptor ... Highway Lot Terminal Latitude Longitude Location
Key Date Date Name Type Type Zip Address Ramp Direction
Segment Name Name

0 rows × 55 columns

then plotting again


In [69]:
df_mrt=CS_Dataset.groupby('Complaint Type')['Time Elapsed2'].mean().fillna(0).to_frame()
df_mrt

Out[69]: Time Elapsed2

Complaint Type

Agency Issues 18289.125000

Animal Abuse 18032.556030

Bike/Roller/Skate Chronic 13123.688421

Blocked Driveway 16232.521516

Derelict Vehicle 25359.600102

Disorderly Youth 12363.749206

Drinking 13821.300570

Graffiti 23276.343949

Homeless Encampment 15451.384505

Illegal Fireworks 10113.482558

Illegal Parking 15650.435671

Noise - Commercial 11085.760531

Noise - House of Worship 11391.087079

Noise - Park 12226.055515

Noise - Street/Sidewalk 12231.295411

Noise - Vehicle 12561.800010

Panhandling 15853.550769

Posting Advertisement 7286.256259

Squeegee 14560.250000

Traffic 12309.120092

Urinating in Public 12959.293292

Vending 14366.278375

In [70]:
df_mrt.plot(kind='bar', figsize=(15, 10))
plt.title('Average Response Time by Complaint Type')
plt.xlabel('Complaint Type')
plt.ylabel('Average Response Time in minutes')
plt.show()
6.Identify the significant variables by performing statistical analysis using
p-values
In [71]:
print(CS_Dataset.columns)
print(CS_Dataset.info())

Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',


'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
'Resolution Description', 'Resolution Action Updated Date',
'Community Board', 'Borough', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough',
'School Name', 'School Number', 'School Region', 'School Code',
'School Phone Number', 'School Address', 'School City', 'School State',
'School Zip', 'School Not Found', 'School or Citywide Complaint',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location',
'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp',
'Bridge Highway Segment', 'Garage Lot Name', 'Ferry Direction',
'Ferry Terminal Name', 'Latitude', 'Longitude', 'Location',
'Time Elapsed1', 'Time Elapsed2'],
dtype='object')
<class 'pandas.core.frame.DataFrame'>
Int64Index: 362176 entries, 0 to 362176
Data columns (total 55 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unique Key 362176 non-null int64
1 Created Date 362176 non-null object
2 Closed Date 362176 non-null object
3 Agency 362176 non-null object
4 Agency Name 362176 non-null object
5 Complaint Type 362176 non-null object
6 Descriptor 355680 non-null object
7 Location Type 362046 non-null object
8 Incident Zip 361502 non-null float64
9 Incident Address 310491 non-null object
10 Street Name 310491 non-null object
11 Cross Street 1 306846 non-null object
12 Cross Street 2 306713 non-null object
13 Intersection Street 1 50628 non-null object
14 Intersection Street 2 50504 non-null object
15 Address Type 361248 non-null object
16 City 361502 non-null object
17 Landmark 375 non-null object
18 Facility Type 362159 non-null object
19 Status 362176 non-null object
20 Due Date 362175 non-null object
21 Resolution Description 362176 non-null object
22 Resolution Action Updated Date 362137 non-null object
23 Community Board 362176 non-null object
24 Borough 362176 non-null object
25 X Coordinate (State Plane) 360470 non-null float64
26 Y Coordinate (State Plane) 360470 non-null float64
27 Park Facility Name 362176 non-null object
28 Park Borough 362176 non-null object
29 School Name 362176 non-null object
30 School Number 362176 non-null object
31 School Region 362176 non-null object
32 School Code 362176 non-null object
33 School Phone Number 362176 non-null object
34 School Address 362176 non-null object
35 School City 362176 non-null object
36 School State 362176 non-null object
37 School Zip 362176 non-null object
38 School Not Found 362176 non-null object
39 School or Citywide Complaint 0 non-null float64
40 Vehicle Type 0 non-null float64
41 Taxi Company Borough 0 non-null float64
42 Taxi Pick Up Location 0 non-null float64
43 Bridge Highway Name 297 non-null object
44 Bridge Highway Direction 297 non-null object
45 Road Ramp 262 non-null object
46 Bridge Highway Segment 262 non-null object
47 Garage Lot Name 0 non-null float64
48 Ferry Direction 0 non-null float64
49 Ferry Terminal Name 0 non-null float64
50 Latitude 360470 non-null float64
51 Longitude 360470 non-null float64
52 Location 360470 non-null object
53 Time Elapsed1 362176 non-null float64
54 Time Elapsed2 362176 non-null float64
dtypes: float64(14), int64(1), object(40)
memory usage: 154.7+ MB
None
In [72]:
CS_Dataset['Ferry Direction'].head

Out[72]: <bound method NDFrame.head of 0 NaN


1 NaN
2 NaN
3 NaN
4 NaN
..
362172 NaN
362173 NaN
362174 NaN
362175 NaN
362176 NaN
Name: Ferry Direction, Length: 362176, dtype: float64>

In [73]:
# Drop columns with 0 non-null values
CS_Dataset = CS_Dataset.drop('Ferry Direction', axis=1)
CS_Dataset = CS_Dataset.drop('Ferry Terminal Name', axis=1)
CS_Dataset = CS_Dataset.drop('Garage Lot Name', axis=1)
CS_Dataset = CS_Dataset.drop('School or Citywide Complaint', axis=1)
CS_Dataset = CS_Dataset.drop('Vehicle Type', axis=1)
CS_Dataset = CS_Dataset.drop('Taxi Company Borough', axis=1)
CS_Dataset = CS_Dataset.drop('Taxi Pick Up Location', axis=1)

In [75]:
print(CS_Dataset.columns)
print(CS_Dataset.info())

Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',


'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
'Resolution Description', 'Resolution Action Updated Date',
'Community Board', 'Borough', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough',
'School Name', 'School Number', 'School Region', 'School Code',
'School Phone Number', 'School Address', 'School City', 'School State',
'School Zip', 'School Not Found', 'Bridge Highway Name',
'Bridge Highway Direction', 'Road Ramp', 'Bridge Highway Segment',
'Latitude', 'Longitude', 'Location', 'Time Elapsed1', 'Time Elapsed2'],
dtype='object')
<class 'pandas.core.frame.DataFrame'>
Int64Index: 362176 entries, 0 to 362176
Data columns (total 48 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unique Key 362176 non-null int64
1 Created Date 362176 non-null object
2 Closed Date 362176 non-null object
3 Agency 362176 non-null object
4 Agency Name 362176 non-null object
5 Complaint Type 362176 non-null object
6 Descriptor 355680 non-null object
7 Location Type 362046 non-null object
8 Incident Zip 361502 non-null float64
9 Incident Address 310491 non-null object
10 Street Name 310491 non-null object
11 Cross Street 1 306846 non-null object
12 Cross Street 2 306713 non-null object
13 Intersection Street 1 50628 non-null object
14 Intersection Street 2 50504 non-null object
15 Address Type 361248 non-null object
16 City 361502 non-null object
17 Landmark 375 non-null object
18 Facility Type 362159 non-null object
19 Status 362176 non-null object
20 Due Date 362175 non-null object
21 Resolution Description 362176 non-null object
22 Resolution Action Updated Date 362137 non-null object
23 Community Board 362176 non-null object
24 Borough 362176 non-null object
25 X Coordinate (State Plane) 360470 non-null float64
26 Y Coordinate (State Plane) 360470 non-null float64
27 Park Facility Name 362176 non-null object
28 Park Borough 362176 non-null object
29 School Name 362176 non-null object
30 School Number 362176 non-null object
31 School Region 362176 non-null object
32 School Code 362176 non-null object
33 School Phone Number 362176 non-null object
34 School Address 362176 non-null object
35 School City 362176 non-null object
36 School State 362176 non-null object
37 School Zip 362176 non-null object
38 School Not Found 362176 non-null object
39 Bridge Highway Name 297 non-null object
40 Bridge Highway Direction 297 non-null object
41 Road Ramp 262 non-null object
42 Bridge Highway Segment 262 non-null object
43 Latitude 360470 non-null float64
44 Longitude 360470 non-null float64
45 Location 360470 non-null object
46 Time Elapsed1 362176 non-null float64
47 Time Elapsed2 362176 non-null float64
dtypes: float64(7), int64(1), object(40)
memory usage: 135.4+ MB
None
In [76]:
#Find columns with same values
columns_with_same_values = CS_Dataset.columns[CS_Dataset.nunique() == 1]
print(columns_with_same_values)

Index(['Agency', 'Facility Type', 'Park Facility Name', 'School Name',


'School Number', 'School Region', 'School Code', 'School Phone Number',
'School Address', 'School City', 'School State', 'School Zip',
'School Not Found'],
dtype='object')

In [77]:
# drop all columns with same values
for c in columns_with_same_values:
CS_Dataset = CS_Dataset.drop(c, axis=1)
CS_Dataset.columns

Out[77]: Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency Name',


'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Status', 'Due Date', 'Resolution Description',
'Resolution Action Updated Date', 'Community Board', 'Borough',
'X Coordinate (State Plane)', 'Y Coordinate (State Plane)',
'Park Borough', 'Bridge Highway Name', 'Bridge Highway Direction',
'Road Ramp', 'Bridge Highway Segment', 'Latitude', 'Longitude',
'Location', 'Time Elapsed1', 'Time Elapsed2'],
dtype='object')

In [78]:
print(CS_Dataset.columns)
print(CS_Dataset.info())

Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency Name',


'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Status', 'Due Date', 'Resolution Description',
'Resolution Action Updated Date', 'Community Board', 'Borough',
'X Coordinate (State Plane)', 'Y Coordinate (State Plane)',
'Park Borough', 'Bridge Highway Name', 'Bridge Highway Direction',
'Road Ramp', 'Bridge Highway Segment', 'Latitude', 'Longitude',
'Location', 'Time Elapsed1', 'Time Elapsed2'],
dtype='object')
<class 'pandas.core.frame.DataFrame'>
Int64Index: 362176 entries, 0 to 362176
Data columns (total 35 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unique Key 362176 non-null int64
1 Created Date 362176 non-null object
2 Closed Date 362176 non-null object
3 Agency Name 362176 non-null object
4 Complaint Type 362176 non-null object
5 Descriptor 355680 non-null object
6 Location Type 362046 non-null object
7 Incident Zip 361502 non-null float64
8 Incident Address 310491 non-null object
9 Street Name 310491 non-null object
10 Cross Street 1 306846 non-null object
11 Cross Street 2 306713 non-null object
12 Intersection Street 1 50628 non-null object
13 Intersection Street 2 50504 non-null object
14 Address Type 361248 non-null object
15 City 361502 non-null object
16 Landmark 375 non-null object
17 Status 362176 non-null object
18 Due Date 362175 non-null object
19 Resolution Description 362176 non-null object
20 Resolution Action Updated Date 362137 non-null object
21 Community Board 362176 non-null object
22 Borough 362176 non-null object
23 X Coordinate (State Plane) 360470 non-null float64
24 Y Coordinate (State Plane) 360470 non-null float64
25 Park Borough 362176 non-null object
26 Bridge Highway Name 297 non-null object
27 Bridge Highway Direction 297 non-null object
28 Road Ramp 262 non-null object
29 Bridge Highway Segment 262 non-null object
30 Latitude 360470 non-null float64
31 Longitude 360470 non-null float64
32 Location 360470 non-null object
33 Time Elapsed1 362176 non-null float64
34 Time Elapsed2 362176 non-null float64
dtypes: float64(7), int64(1), object(27)
memory usage: 99.5+ MB
None
In [79]:
CS_Dataset.to_csv('Customer _Service_Requests1.csv',index=False)

In [80]:
CS_Dataset=pd.read_csv("Customer _Service_Requests1.csv")

C:\Users\hp\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3165: DtypeWarning: Columns (16,26,27,28,29) have mixed types.Specify d


type option on import or set low_memory=False.
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,

In [81]:
print(CS_Dataset.columns)
print(CS_Dataset.info())

Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency Name',


'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Status', 'Due Date', 'Resolution Description',
'Resolution Action Updated Date', 'Community Board', 'Borough',
'X Coordinate (State Plane)', 'Y Coordinate (State Plane)',
'Park Borough', 'Bridge Highway Name', 'Bridge Highway Direction',
'Road Ramp', 'Bridge Highway Segment', 'Latitude', 'Longitude',
'Location', 'Time Elapsed1', 'Time Elapsed2'],
dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 362176 entries, 0 to 362175
Data columns (total 35 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unique Key 362176 non-null int64
1 Created Date 362176 non-null object
2 Closed Date 362176 non-null object
3 Agency Name 362176 non-null object
4 Complaint Type 362176 non-null object
5 Descriptor 355680 non-null object
6 Location Type 362046 non-null object
7 Incident Zip 361502 non-null float64
8 Incident Address 310491 non-null object
9 Street Name 310491 non-null object
10 Cross Street 1 306846 non-null object
11 Cross Street 2 306713 non-null object
12 Intersection Street 1 50628 non-null object
13 Intersection Street 2 50504 non-null object
14 Address Type 361248 non-null object
15 City 361502 non-null object
16 Landmark 375 non-null object
17 Status 362176 non-null object
18 Due Date 362175 non-null object
19 Resolution Description 362176 non-null object
20 Resolution Action Updated Date 362137 non-null object
21 Community Board 362176 non-null object
22 Borough 362176 non-null object
23 X Coordinate (State Plane) 360470 non-null float64
24 Y Coordinate (State Plane) 360470 non-null float64
25 Park Borough 362176 non-null object
26 Bridge Highway Name 297 non-null object
27 Bridge Highway Direction 297 non-null object
28 Road Ramp 262 non-null object
29 Bridge Highway Segment 262 non-null object
30 Latitude 360470 non-null float64
31 Longitude 360470 non-null float64
32 Location 360470 non-null object
33 Time Elapsed1 362176 non-null float64
34 Time Elapsed2 362176 non-null float64
dtypes: float64(7), int64(1), object(27)
memory usage: 96.7+ MB
None

Complaint Type vs Response time


In [82]:
for t in CS_Dataset['Complaint Type'].unique():
CS_Dataset[CS_Dataset['Complaint Type']== t]['Time Elapsed2'].hist(range=(0,5000))
plt.title('Response Time by Complaint Type')
plt.xlabel(t)
plt.ylabel('Response Time in seconds')
plt.show()
In [83]:
# apply log transformation for converting data to gaussian

df_ct={}
for t in CS_Dataset['Complaint Type'].unique():
df_ct[t]= np.log(CS_Dataset[CS_Dataset['Complaint Type']==t]['Time Elapsed2'])
df_ct

Out[83]: {'Noise - Street/Sidewalk': 0 8.110728


12 9.104535
19 7.957177
38 7.477604
54 8.596928
...
362161 9.278186
362165 7.809541
362169 7.722678
362170 8.439232
362173 7.041412
Name: Time Elapsed2, Length: 51139, dtype: float64,
'Blocked Driveway': 1 8.562740
2 9.769613
7 8.784009
9 8.521584
10 10.244236
...
362166 9.130106
362167 8.338306
362168 9.976506
362174 9.175024
362175 9.212338
Name: Time Elapsed2, Length: 100624, dtype: float64,
'Illegal Parking': 3 10.237349
4 9.430600
5 8.827761
6 8.868132
8 10.335724
...
362103 9.689056
362116 10.335530
362122 8.093157
362148 8.932741
362171 10.520482
Name: Time Elapsed2, Length: 91716, dtype: float64,
'Derelict Vehicle': 14 10.539667
151 9.562686
255 8.499640
256 9.607706
295 7.905442
...
361859 10.368133
361879 9.545812
361900 10.846849
361946 9.070618
362095 8.919587
Name: Time Elapsed2, Length: 21518, dtype: float64,
'Noise - Commercial': 17 8.032035
18 9.267571
22 8.433377
29 9.106423
30 8.876684
...
362144 6.480045
362146 9.551658
362152 10.217751
362156 7.097549
362162 9.325453
Name: Time Elapsed2, Length: 43751, dtype: float64,
'Noise - House of Worship': 26 7.383989
127 8.693161
572 8.419801
2639 8.149024
3057 9.076466
...
359937 8.159089
360351 9.506139
361093 7.627544
361539 8.231376
361586 8.889446
Name: Time Elapsed2, Length: 1068, dtype: float64,
'Posting Advertisement': 39 8.941545
42 8.961623
46 8.973732
49 8.993055
51 9.005037
...
348685 7.629490
349016 7.188413
355089 9.321703
355813 8.480114
357309 7.775696
Name: Time Elapsed2, Length: 679, dtype: float64,
'Noise - Vehicle': 87 10.042989
156 8.889308
172 9.334238
221 9.299907
319 7.869019
...
361902 9.424161
361986 8.010028
361987 8.889033
362113 9.702411
362172 9.040026
Name: Time Elapsed2, Length: 19301, dtype: float64,
'Animal Abuse': 89 7.358831
140 8.344267
164 8.999002
189 9.166179
247 8.357494
...
361916 10.053587
361941 9.554639
361959 10.150621
362024 9.335739
362063 7.948385
Name: Time Elapsed2, Length: 10530, dtype: float64,
'Vending': 98 9.340228
142 9.919213
341 10.031001
375 8.921458
393 7.556951
...
361818 8.562549
361821 8.611230
361822 7.335634
361856 9.136909
361931 8.718009
Name: Time Elapsed2, Length: 4185, dtype: float64,
'Traffic': 130 9.685518
311 9.065661
334 7.737180
336 7.844633
337 9.091444
...
361066 9.526683
361133 7.950502
361172 8.676076
361669 10.034121
361674 9.814110
Name: Time Elapsed2, Length: 5196, dtype: float64,
'Drinking': 180 7.936303
466 7.544861
644 9.455402
679 6.642487
796 9.373649
...
361618 7.939515
361657 8.677440
361731 9.782223
361978 8.814628
362020 6.612041
Name: Time Elapsed2, Length: 1404, dtype: float64,
'Bike/Roller/Skate Chronic': 313 9.444147
1843 8.537192
3771 10.296138
3837 8.637817
6296 7.650169
...
353797 9.111514
353883 8.483223
358480 10.610316
360463 8.520787
361223 7.004882
Name: Time Elapsed2, Length: 475, dtype: float64,
'Panhandling': 374 9.451795
2070 9.501741
3036 8.731498
3913 8.537584
5506 10.401137
...
353289 9.553788
353680 10.699620
355894 9.054972
355961 8.417152
360033 10.706520
Name: Time Elapsed2, Length: 325, dtype: float64,
'Noise - Park': 389 6.259581
592 7.691657
1355 10.698650
3151 11.027833
3818 8.833317
...
355354 10.357616
360857 7.234898
360886 7.150701
360892 7.149132
361942 9.274348
Name: Time Elapsed2, Length: 4089, dtype: float64,
'Homeless Encampment': 392 7.895808
434 9.747418
458 8.978787
562 8.733272
565 9.342070
...
361334 10.170189
361404 7.446001
361654 8.111928
361793 9.660333
361972 9.258273
Name: Time Elapsed2, Length: 4879, dtype: float64,
'Urinating in Public': 585 9.949034
652 8.065265
1148 9.785605
4757 8.980550
6057 8.679142
...
359137 7.478735
360513 8.077758
360603 10.140179
361042 9.118992
361707 6.697034
Name: Time Elapsed2, Length: 641, dtype: float64,
'Graffiti': 2314 8.656433
6278 10.036094
9391 9.700147
23225 9.107865
25588 9.981143
...
354684 9.429637
359177 8.313117
359735 9.194821
360556 9.249753
360868 11.137039
Name: Time Elapsed2, Length: 157, dtype: float64,
'Disorderly Youth': 4651 6.569481
8984 8.434898
11959 7.760041
12107 9.873801
17080 8.831858
...
347643 8.983942
347667 7.387090
348140 8.664406
361071 9.825472
362140 7.900266
Name: Time Elapsed2, Length: 315, dtype: float64,
'Illegal Fireworks': 23390 7.039660
34023 10.682881
37725 8.586719
62401 7.298445
64404 9.367173
...
276854 7.443664
303944 8.970559
315108 9.393162
332779 10.740584
362154 8.109225
Name: Time Elapsed2, Length: 172, dtype: float64,
'Agency Issues': 184636 10.206920
186456 10.528918
205740 9.183586
238330 10.113992
244503 8.312135
277038 9.271247
300720 10.335854
320102 7.550135
Name: Time Elapsed2, dtype: float64,
'Squeegee': 187686 9.997752
213914 10.103649
280566 8.353497
295549 8.934192
Name: Time Elapsed2, dtype: float64}
In [84]:
# ANOVA Analysis
# Null Hypothesis: Average Response Time across Complaint Types is not different
# Alternate Hypothesis: Average Response Time across Complaint Types is different

from scipy.stats import f_oneway

lis = []
for t in CS_Dataset['Complaint Type'].unique():
lis.append(df_ct[t])

lis

Out[84]: [0 8.110728
12 9.104535
19 7.957177
38 7.477604
54 8.596928
...
362161 9.278186
362165 7.809541
362169 7.722678
362170 8.439232
362173 7.041412
Name: Time Elapsed2, Length: 51139, dtype: float64,
1 8.562740
2 9.769613
7 8.784009
9 8.521584
10 10.244236
...
362166 9.130106
362167 8.338306
362168 9.976506
362174 9.175024
362175 9.212338
Name: Time Elapsed2, Length: 100624, dtype: float64,
3 10.237349
4 9.430600
5 8.827761
6 8.868132
8 10.335724
...
362103 9.689056
362116 10.335530
362122 8.093157
362148 8.932741
362171 10.520482
Name: Time Elapsed2, Length: 91716, dtype: float64,
14 10.539667
151 9.562686
255 8.499640
256 9.607706
295 7.905442
...
361859 10.368133
361879 9.545812
361900 10.846849
361946 9.070618
362095 8.919587
Name: Time Elapsed2, Length: 21518, dtype: float64,
17 8.032035
18 9.267571
22 8.433377
29 9.106423
30 8.876684
...
362144 6.480045
362146 9.551658
362152 10.217751
362156 7.097549
362162 9.325453
Name: Time Elapsed2, Length: 43751, dtype: float64,
26 7.383989
127 8.693161
572 8.419801
2639 8.149024
3057 9.076466
...
359937 8.159089
360351 9.506139
361093 7.627544
361539 8.231376
361586 8.889446
Name: Time Elapsed2, Length: 1068, dtype: float64,
39 8.941545
42 8.961623
46 8.973732
49 8.993055
51 9.005037
...
348685 7.629490
349016 7.188413
355089 9.321703
355813 8.480114
357309 7.775696
Name: Time Elapsed2, Length: 679, dtype: float64,
87 10.042989
156 8.889308
172 9.334238
221 9.299907
319 7.869019
...
361902 9.424161
361986 8.010028
361987 8.889033
362113 9.702411
362172 9.040026
Name: Time Elapsed2, Length: 19301, dtype: float64,
89 7.358831
140 8.344267
164 8.999002
189 9.166179
247 8.357494
...
361916 10.053587
361941 9.554639
361959 10.150621
362024 9.335739
362063 7.948385
Name: Time Elapsed2, Length: 10530, dtype: float64,
98 9.340228
142 9.919213
341 10.031001
375 8.921458
393 7.556951
...
361818 8.562549
361821 8.611230
361822 7.335634
361856 9.136909
361931 8.718009
Name: Time Elapsed2, Length: 4185, dtype: float64,
130 9.685518
311 9.065661
334 7.737180
336 7.844633
337 9.091444
...
361066 9.526683
361133 7.950502
361172 8.676076
361669 10.034121
361674 9.814110
Name: Time Elapsed2, Length: 5196, dtype: float64,
180 7.936303
466 7.544861
644 9.455402
679 6.642487
796 9.373649
...
361618 7.939515
361657 8.677440
361731 9.782223
361978 8.814628
362020 6.612041
Name: Time Elapsed2, Length: 1404, dtype: float64,
313 9.444147
1843 8.537192
3771 10.296138
3837 8.637817
6296 7.650169
...
353797 9.111514
353883 8.483223
358480 10.610316
360463 8.520787
361223 7.004882
Name: Time Elapsed2, Length: 475, dtype: float64,
374 9.451795
2070 9.501741
3036 8.731498
3913 8.537584
5506 10.401137
...
353289 9.553788
353680 10.699620
355894 9.054972
355961 8.417152
360033 10.706520
Name: Time Elapsed2, Length: 325, dtype: float64,
389 6.259581
592 7.691657
1355 10.698650
3151 11.027833
3818 8.833317
...
355354 10.357616
360857 7.234898
360886 7.150701
360892 7.149132
361942 9.274348
Name: Time Elapsed2, Length: 4089, dtype: float64,
392 7.895808
434 9.747418
458 8.978787
562 8.733272
565 9.342070
...
361334 10.170189
361404 7.446001
361654 8.111928
361793 9.660333
361972 9.258273
Name: Time Elapsed2, Length: 4879, dtype: float64,
585 9.949034
652 8.065265
1148 9.785605
4757 8.980550
6057 8.679142
...
359137 7.478735
360513 8.077758
360603 10.140179
361042 9.118992
361707 6.697034
Name: Time Elapsed2, Length: 641, dtype: float64,
2314 8.656433
6278 10.036094
9391 9.700147
23225 9.107865
25588 9.981143
...
354684 9.429637
359177 8.313117
359735 9.194821
360556 9.249753
360868 11.137039
Name: Time Elapsed2, Length: 157, dtype: float64,
4651 6.569481
8984 8.434898
11959 7.760041
12107 9.873801
17080 8.831858
...
347643 8.983942
347667 7.387090
348140 8.664406
361071 9.825472
362140 7.900266
Name: Time Elapsed2, Length: 315, dtype: float64,
23390 7.039660
34023 10.682881
37725 8.586719
62401 7.298445
64404 9.367173
...
276854 7.443664
303944 8.970559
315108 9.393162
332779 10.740584
362154 8.109225
Name: Time Elapsed2, Length: 172, dtype: float64,
184636 10.206920
186456 10.528918
205740 9.183586
238330 10.113992
244503 8.312135
277038 9.271247
300720 10.335854
320102 7.550135
Name: Time Elapsed2, dtype: float64,
187686 9.997752
213914 10.103649
280566 8.353497
295549 8.934192
Name: Time Elapsed2, dtype: float64]
In [85]:
F,p = f_oneway(*lis)
print("p-value for significance is: ", p)
if p<0.05:
print("reject null hypothesis")
else:
print("accept null hypothesis")

p-value for significance is: 0.0


reject null hypothesis

So Alternate Hypothesis is true, i.e.


Average Response Time across Complaint Types is different

Let's find out any corelation between Complaint Type and Location
In [86]:
df_loc= CS_Dataset[['Complaint Type','Location','City','Borough']]

ccolumns = df_loc.describe(include="O").columns
ccolumns.tolist()

Out[86]: ['Complaint Type', 'Location', 'City', 'Borough']

In [87]:
#label encoding
for col in ccolumns:
df_loc[col] = df_loc[col].astype("category").cat.codes

df_loc

<ipython-input-87-910586728146>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


df_loc[col] = df_loc[col].astype("category").cat.codes
Out[87]: Complaint Type Location City Borough

0 14 139904 33 2

1 3 114140 1 3

2 3 141251 6 0

3 10 130296 6 0

4 10 88722 13 3

... ... ... ... ...


Complaint Type Location City Borough

362171 10 63918 50 3

362172 15 140565 6 0

362173 14 125130 33 2

362174 3 144566 6 0

362175 3 45957 44 3

In [88]:
cor=df_loc.corr(method='pearson')
cor

Out[88]: Complaint Type Location City Borough

Complaint Type 1.000000 0.143785 0.102547 -0.056562

Location 0.143785 1.000000 0.051755 -0.156765

City 0.102547 0.051755 1.000000 0.701269

Borough -0.056562 -0.156765 0.701269 1.000000

In [89]:
### None of the value is significant to establish corelation between Complaint Type and location

# stopped as it was taking for ever


plt.figure(figsize=(10,6))
sn.heatmap(cor, annot=True, cmap = 'viridis')

Out[89]: <AxesSubplot:>

7. Perform a Kruskal-Wallis H test¶


7.1 Fail to reject H0: All sample distributions are equal

7.2 Reject H0: One or more sample distributions are not equal¶
In [90]:
#declare an empty list l
l = []
for t in CS_Dataset['Complaint Type'].unique():
l.append(CS_Dataset[CS_Dataset['Complaint Type']==t]['Time Elapsed2'].values)

Out[90]: [array([3330., 8996., 2856., ..., 2259., 4625., 1143.]),


array([ 5233., 17494., 6529., ..., 21515., 9653., 10020.]),
array([27927., 12464., 6821., ..., 3272., 7576., 37067.]),
array([37785., 14224., 4913., ..., 51372., 8696., 7477.]),
array([ 3078., 10589., 4598., ..., 27385., 1209., 11220.]),
array([1610., 5962., 4536., ..., 2054., 3757., 7255.]),
array([ 7643., 7798., 7893., 8047., 8144., 3931., 4096., 4190.,
4344., 4439., 755., 810., 868., 834., 869., 7175.,
4796., 4895., 4935., 5029., 5118., 4437., 4458., 6201.,
6289., 6316., 3084., 3450., 3546., 3649., 3503., 5046.,
5141., 5164., 5673., 5759., 3026., 2955., 2792., 2683.,
2636., 2237., 2336., 2428., 2452., 2570., 19521., 18746.,
1583., 3970., 4148., 4317., 5026., 5182., 8607., 21517.,
7622., 1460., 1612., 475., 255., 402., 2208., 609.,
759., 932., 600., 1022., 1900., 1998., 2108., 2227.,
2263., 1588., 1901., 2025., 2110., 2215., 384., 293.,
512., 808., 922., 611., 704., 864., 975., 1072.,
1461., 1625., 1801., 2002., 2173., 3824., 2182., 3154.,
2561., 2648., 3149., 1593., 2499., 2578., 2659., 3174.,
1954., 2059., 2180., 2347., 2366., 22038., 14141., 14240.,
14541., 14580., 14868., 21350., 21445., 21543., 21648., 22651.,
750., 1588., 1674., 2150., 2252., 249., 467., 528.,
430., 1010., 20960., 1878., 1990., 2156., 12797., 12953.,
2842., 3930., 3966., 4409., 4127., 10803., 5065., 5223.,
5327., 5430., 5526., 4086., 1062., 3052., 3217., 3321.,
2960., 3107., 3205., 3305., 3411., 353., 396., 361.,
401., 394., 2379., 2462., 2568., 2689., 2783., 13939.,
9537., 4411., 4285., 4262., 4205., 3895., 3016., 3137.,
3185., 5769., 24424., 1800., 3100., 3139., 3259., 3339.,
1736., 2018., 2193., 2358., 20498., 3309., 3418., 3524.,
1702., 1413., 1661., 1709., 1809., 1906., 2005., 42892.,
2101., 455., 554., 491., 452., 420., 542., 11352.,
535., 645., 817., 859., 1066., 224., 146., 198.,
297., 184., 16947., 5792., 7634., 240., 340., 436.,
5091., 357., 441., 566., 859., 527., 150., 248.,
218., 329., 490., 494., 671., 823., 289., 957.,
1049., 44961., 747., 1194., 1206., 1688., 1814., 2441.,
3617., 3663., 3647., 3614., 3480., 6823., 6769., 6633.,
6673., 6589., 3115., 3214., 3400., 3583., 3710., 5530.,
7638., 17409., 17700., 17884., 17997., 6921., 7841., 9971.,
10221., 10319., 10438., 10583., 3469., 3640., 3802., 4120.,
3741., 1247., 1290., 197., 1618., 1651., 7690., 6351.,
6455., 6556., 6659., 870., 6766., 4653., 4775., 4898.,
19064., 2982., 3061., 8136., 8325., 8477., 8636., 8785.,
152., 286., 254., 213., 329., 2080., 2955., 90313.,
6828., 6922., 6995., 7096., 7297., 10485., 10631., 10764.,
10906., 11035., 1976., 387., 639., 600., 645., 8799.,
8869., 9017., 9143., 3951., 16454., 8522., 2966., 8080.,
445., 669., 888., 1122., 1341., 3823., 16124., 884.,
997., 1094., 1208., 10429., 10557., 10685., 10803., 10921.,
6691., 1302., 1290., 1542., 1708., 1569., 2301., 2573.,
2736., 2897., 3068., 1290., 1533., 1704., 1931., 2235.,
4191., 4313., 4473., 4692., 4891., 18882., 387., 339.,
26327., 26373., 26414., 4738., 2424., 4505., 4700., 2231.,
2403., 2867., 3102., 3694., 10294., 10410., 10523., 10572.,
10817., 34609., 19211., 19395., 19603., 19518., 19882., 395.,
562., 1176., 1360., 2062., 265., 28283., 28997., 29516.,
29719., 30285., 3577., 4731., 7558., 4712., 4836., 4956.,
5072., 13461., 13576., 13747., 13880., 14002., 8533., 8706.,
8835., 9007., 9107., 2092., 1752., 1922., 2661., 2816.,
2993., 3125., 3241., 396., 4156., 3966., 20561., 21022.,
21021., 21130., 21267., 1731., 2739., 2933., 3117., 3236.,
4094., 16376., 14315., 7333., 24621., 7404., 27975., 936.,
2433., 17539., 4242., 4355., 4518., 4657., 636., 7309.,
2735., 2781., 2898., 3004., 3217., 6446., 6642., 6858.,
7043., 16146., 14809., 56070., 1049., 1286., 1395., 1468.,
26718., 4226., 388., 4462., 4590., 4653., 7019., 7133.,
7248., 7378., 7485., 1204., 1413., 1544., 1784., 1910.,
12857., 4471., 261., 12782., 283., 4920., 2971., 2975.,
19191., 6640., 6797., 6907., 7085., 7131., 7628., 7865.,
7971., 8315., 8382., 20324., 20558., 20710., 21127., 8583.,
8669., 8836., 8967., 8974., 36577., 2192., 2433., 2560.,
2616., 1170., 12738., 12910., 13037., 13198., 13373., 1229.,
670., 1797., 14386., 14658., 14835., 15047., 15237., 21335.,
1587., 1940., 2220., 2434., 2650., 818., 16528., 16701.,
17266., 17408., 17642., 6393., 6847., 7001., 7716., 8278.,
11934., 12139., 12376., 12583., 12796., 9162., 9707., 9641.,
10248., 10429., 5544., 6815., 7043., 7494., 7804., 10267.,
6884., 7131., 7276., 7501., 9342., 35034., 10422., 1560.,
2350., 3981., 827., 1148., 26016., 26302., 12248., 5559.,
5680., 5158., 6666., 12304., 544., 15040., 34404., 567.,
1434., 2687., 3167., 8675., 5980., 8199., 7248., 36903.,
22068., 46923., 3083., 1350., 22608., 3541., 10848., 5944.,
42781., 3050., 23832., 30187., 7262., 5382., 13144., 20834.,
4984., 4157., 9065., 11022., 895., 585., 2554., 8136.,
7014., 1636., 4027., 14109., 18918., 45786., 50108., 8030.,
2221., 2640., 46767., 8324., 14773., 4630., 21132., 949.,
42905., 7260., 2279., 1872., 1140., 1447., 932., 37474.,
37758., 21356., 4149., 23444., 2055., 26411., 5931., 4758.,
12044., 709., 5123., 8012., 2705., 3530., 4144., 39895.,
523., 17755., 2058., 1324., 11178., 4818., 2382.]),
array([22994., 7254., 11319., ..., 7252., 16357., 8434.]),
array([ 1570., 4206., 8095., ..., 25607., 11336., 2831.]),
array([11387., 20317., 22720., ..., 1534., 9292., 6112.]),
array([16083., 8653., 2292., ..., 5861., 22791., 18290.]),
array([ 2797., 1891., 12777., ..., 17716., 6732., 744.]),
array([ 12634., 5101., 29618., 5641., 2101., 7443., 2694.,
397., 7682., 40677., 56594., 5123., 2971., 5619.,
19789., 7389., 36231., 754., 5169., 17079., 3047.,
14088., 12381., 15232., 29078., 3348., 3567., 6017.,
10264., 33057., 1241., 8645., 2226., 7940., 8408.,
1832., 917., 1314., 14395., 17225., 9419., 31457.,
2222., 2072., 6412., 4713., 4949., 3534., 1988.,
6773., 1613., 1363., 3704., 5609., 10362., 5170.,
2200., 21968., 734., 19999., 26735., 8161., 36798.,
2222., 21625., 18480., 7405., 22821., 3858., 9992.,
2955., 29735., 8687., 1467., 118507., 24950., 39249.,
7657., 1006., 6710., 13272., 18877., 22921., 1350.,
122092., 9947., 15090., 13451., 16668., 4465., 22663.,
620., 32433., 609., 1972., 2724., 3110., 7287.,
8460., 7123., 14483., 19707., 31807., 2835., 36364.,
25875., 2142., 43085., 816., 841., 16855., 19770.,
19012., 11940., 5847., 7199., 20580., 1116., 555.,
11659., 58655., 111277., 10514., 14434., 467., 4196.,
743., 17986., 14162., 829., 5569., 2500., 6725.,
10088., 1244., 4667., 25244., 12712., 13163., 9052.,
540., 628., 10173., 35829., 29706., 6761., 20624.,
6509., 2144., 256., 8677., 50434., 31598., 5377.,
87847., 4664., 8514., 8316., 5952., 39167., 46428.,
4923., 910., 56456., 6099., 8697., 3725., 24770.,
10447., 13854., 412., 1082., 2861., 39391., 36583.,
38376., 8946., 1643., 41279., 2788., 64795., 21011.,
6512., 4499., 17415., 4560., 9204., 10666., 14317.,
3214., 33019., 13169., 8332., 13790., 3276., 17399.,
36235., 35880., 18709., 38900., 2849., 2983., 17616.,
25294., 15613., 22180., 1437., 3399., 37492., 12304.,
36612., 5802., 3216., 5761., 9914., 36615., 9807.,
4968., 5902., 518., 3790., 5032., 18761., 6322.,
18695., 13486., 36251., 5248., 17795., 15206., 55117.,
18813., 4200., 6247., 706., 44012., 8475., 32847.,
2658., 678., 19832., 10690., 10733., 27302., 2745.,
32405., 6937., 1349., 2169., 11681., 7535., 9806.,
25867., 689., 15017., 7387., 4863., 5672., 10320.,
4095., 35408., 16956., 4180., 34530., 5777., 3667.,
2344., 16889., 22732., 16167., 17482., 34042., 643.,
3064., 17362., 4847., 27031., 9817., 12749., 11685.,
7005., 879., 4534., 28817., 13617., 7211., 21586.,
15331., 16351., 19534., 8888., 11278., 916., 32712.,
7864., 9638., 747., 18668., 2315., 15603., 3935.,
6081., 4010., 8948., 3016., 8672., 10397., 19146.,
19773., 4937., 18019., 7277., 20165., 10617., 13259.,
13544., 22247., 21267., 5729., 10793., 8728., 8958.,
14598., 18769., 7517., 618., 1391., 2442., 717.,
28049., 8615., 13188., 21634., 23347., 7274., 1486.,
3110., 7888., 5913., 3340., 908., 2214., 7553.,
43806., 6229., 1787., 3769., 22990., 4827., 448.,
3056., 6616., 2614., 17514., 7136., 615., 5214.,
8113., 7514., 11668., 36691., 1144., 977., 7599.,
3547., 13171., 7072., 13459., 22417., 7814., 6355.,
9939., 4485., 21229., 16002., 1716., 4082., 10246.,
8300., 14970., 13176., 43987., 9182., 9900., 23838.,
12997., 9046., 8091., 15845., 14005., 8390., 27435.,
4633., 3245., 6195., 14254., 7401., 17982., 18007.,
12935., 12427., 25226., 1392., 9315., 3183., 583.,
2266., 4548., 1104., 33913., 2815., 19558., 31476.,
26958., 10601., 9468., 7138., 8914., 41993., 1440.,
2052., 4655., 30096., 9456., 1138., 3854., 5095.,
5800., 3436., 4771., 15707., 15506., 4564., 7307.,
2936., 4534., 5875., 7694., 27231., 2497., 4203.,
1765., 24865., 18957., 3248., 10002., 15972., 7119.,
690., 2560., 42381., 3584., 753., 2204., 2148.,
2684., 17383., 6022., 9896., 13143., 1843., 19293.,
16891., 6354., 13460., 36777., 988., 6370., 484.,
13937., 9059., 4833., 40551., 5018., 1102.]),
array([1.27310e+04, 1.33830e+04, 6.19500e+03, 5.10300e+03, 3.28970e+04,
3.30130e+04, 1.27900e+03, 2.55600e+03, 1.61230e+04, 6.45200e+03,
2.31600e+03, 1.50420e+04, 3.08150e+04, 1.93000e+03, 1.66460e+04,
3.26120e+04, 2.37900e+03, 4.42800e+03, 1.50810e+04, 7.58400e+03,
3.85400e+03, 1.74720e+04, 1.79930e+04, 3.34800e+03, 1.18710e+04,
5.22296e+05, 6.55300e+03, 2.74700e+03, 1.86090e+04, 1.70960e+04,
4.25160e+04, 1.38930e+04, 3.52500e+03, 5.93100e+03, 1.29260e+04,
5.38000e+02, 7.17400e+03, 3.57740e+04, 2.62400e+04, 2.87330e+04,
3.19510e+04, 7.19000e+03, 5.87400e+03, 3.39600e+03, 5.58400e+03,
1.12080e+04, 4.82800e+03, 5.81600e+03, 3.93240e+04, 4.90900e+03,
9.85100e+03, 1.25620e+04, 6.42600e+03, 2.46310e+04, 1.55700e+04,
4.08900e+03, 8.15600e+03, 3.27330e+04, 6.07400e+03, 4.61000e+03,
6.89200e+03, 1.53160e+04, 1.15940e+04, 6.15030e+04, 5.59230e+04,
2.27000e+04, 2.22630e+04, 4.67370e+04, 1.01800e+03, 3.91200e+03,
2.76290e+04, 7.67000e+03, 2.20400e+03, 1.85100e+03, 2.74910e+04,
2.52940e+04, 1.56420e+04, 2.09200e+03, 1.70120e+04, 1.35200e+03,
9.58900e+03, 1.16420e+04, 4.15600e+03, 4.28600e+03, 1.61310e+04,
1.62320e+04, 1.62750e+04, 1.65100e+03, 7.34760e+04, 2.43600e+03,
9.95500e+03, 1.10170e+04, 1.12300e+04, 1.66640e+04, 1.80590e+04,
8.35100e+03, 4.09090e+04, 1.07660e+04, 1.25360e+04, 5.65530e+04,
1.31590e+04, 1.13000e+04, 2.20670e+04, 1.31810e+04, 2.06100e+03,
3.39470e+04, 4.83100e+03, 1.40200e+03, 2.19260e+04, 6.98700e+03,
4.69600e+03, 5.56500e+03, 5.62700e+03, 2.78660e+04, 4.71400e+03,
2.89300e+03, 2.55400e+03, 1.77960e+04, 3.67300e+03, 1.00820e+04,
5.42810e+04, 4.80700e+03, 1.69100e+04, 2.99500e+03, 2.82500e+03,
3.11100e+03, 1.35000e+03, 2.08350e+04, 1.77590e+04, 1.42050e+04,
1.47940e+04, 2.95850e+04, 2.54100e+04, 2.81030e+04, 2.94300e+03,
3.68700e+03, 1.23750e+04, 8.35000e+02, 2.91570e+04, 3.05720e+04,
5.47200e+03, 1.86220e+04, 6.44500e+03, 1.25500e+04, 2.98920e+04,
1.26220e+04, 2.50000e+03, 5.32200e+03, 3.14840e+04, 4.33500e+03,
4.36100e+03, 3.66080e+04, 8.58100e+03, 4.69900e+03, 2.85660e+04,
2.85420e+04, 2.07960e+04, 3.21050e+04, 3.05900e+03, 5.55900e+03,
1.14500e+03, 1.81040e+04, 1.91200e+03, 2.14650e+04, 3.35580e+04,
6.02200e+03, 1.88160e+04, 5.27400e+03, 2.96700e+04, 8.91700e+03,
4.00830e+04, 1.62240e+04, 2.21000e+03, 4.41500e+03, 2.59360e+04,
2.55400e+03, 2.69660e+04, 3.64600e+03, 1.00000e+03, 1.06600e+03,
5.68700e+03, 1.34380e+04, 1.86200e+03, 1.00510e+04, 1.07950e+04,
1.69590e+04, 1.44660e+04, 5.82300e+03, 1.12880e+04, 1.76120e+04,
3.50920e+04, 2.52750e+04, 1.64400e+04, 2.05910e+04, 1.88800e+03,
1.30480e+04, 2.84010e+04, 4.96100e+03, 2.33070e+04, 1.71620e+04,
4.22900e+03, 3.23300e+03, 1.57490e+04, 2.72900e+03, 1.02350e+04,
2.20410e+04, 1.47960e+04, 6.01200e+03, 3.45490e+04, 3.50700e+03,
4.26200e+03, 5.14400e+03, 3.25100e+03, 1.99400e+03, 3.06330e+04,
3.45300e+04, 2.99060e+04, 1.00330e+04, 1.47580e+04, 5.93700e+03,
4.62200e+03, 9.68100e+03, 5.68270e+04, 6.43880e+04, 5.44700e+03,
7.11000e+02, 2.48500e+03, 5.44000e+03, 2.28310e+04, 6.48900e+03,
1.95320e+04, 1.16560e+04, 1.19400e+04, 8.12800e+03, 7.17000e+02,
7.80300e+03, 4.68800e+03, 1.49290e+04, 1.20980e+04, 3.46070e+04,
4.39100e+03, 4.46300e+03, 2.01750e+04, 1.05000e+04, 1.89900e+03,
2.10340e+04, 2.11420e+04, 7.15100e+03, 4.95800e+03, 1.73600e+04,
1.97700e+04, 6.26100e+03, 1.88900e+03, 5.76900e+03, 6.01600e+03,
3.33600e+03, 5.79100e+03, 5.03300e+03, 3.56990e+04, 5.65300e+03,
5.88500e+03, 1.87800e+03, 1.02340e+04, 6.82000e+03, 1.81200e+03,
9.99200e+03, 7.96000e+03, 2.67700e+03, 1.44400e+04, 4.33600e+03,
6.21700e+03, 6.87940e+04, 6.97100e+03, 6.69500e+03, 4.24800e+03,
2.27000e+03, 2.01990e+04, 1.20980e+04, 4.43200e+04, 2.61200e+03,
5.28700e+03, 5.28300e+03, 1.70560e+04, 1.71690e+04, 1.09020e+04,
9.74100e+03, 2.17300e+03, 3.11100e+03, 3.34600e+03, 7.16200e+03,
5.70000e+02, 6.81900e+03, 1.01100e+04, 3.98000e+03, 3.21400e+03,
7.50000e+02, 2.31600e+03, 8.44300e+03, 5.89300e+03, 2.30680e+04,
2.01215e+05, 5.71000e+02, 1.48900e+03, 5.45400e+03, 6.53200e+03,
8.89800e+03, 2.23230e+04, 1.71420e+04, 2.11330e+04, 4.40400e+03,
2.84090e+04, 3.85700e+03, 2.82500e+03, 1.81280e+04, 8.87000e+02,
2.41660e+04, 3.31000e+02, 4.34460e+04, 3.18880e+04, 6.89000e+03,
1.40980e+04, 4.43390e+04, 8.56100e+03, 4.52400e+03, 4.46460e+04]),
array([ 523., 2190., 44296., ..., 1275., 1273., 10661.]),
array([ 2686., 17110., 7933., ..., 3334., 15683., 10491.]),
array([ 20932., 3182., 17776., 7947., 5879., 1072., 11984.,
7919., 1042., 11079., 9523., 6057., 4669., 830.,
8509., 7701., 35594., 10060., 19515., 17444., 32717.,
16272., 10138., 2971., 28707., 20123., 18854., 8879.,
5036., 2121., 12160., 4097., 8501., 2509., 7562.,
2491., 14897., 21776., 4559., 15368., 17900., 15772.,
2540., 4190., 15433., 6972., 14633., 20885., 5113.,
2047., 18096., 2186., 559., 147738., 26647., 34264.,
4824., 18923., 2074., 9948., 4478., 31650., 6359.,
4418., 4037., 6241., 1868., 5913., 45634., 3285.,
2047., 8146., 3756., 12831., 2366., 20315., 3387.,
9874., 11186., 15166., 4095., 8191., 3852., 13020.,
5842., 17842., 6039., 30429., 10524., 4846., 7817.,
25109., 2437., 633., 11073., 2842., 1745., 4683.,
12356., 6159., 14184., 8507., 1107., 1752., 21545.,
5085., 1057., 6792., 292278., 15229., 6954., 7308.,
914., 19918., 3006., 80880., 3565., 24723., 4773.,
4726., 1483., 48565., 11678., 886., 12616., 11443.,
6100., 20673., 1971., 37143., 10481., 808., 12071.,
4820., 23654., 10512., 1239., 3946., 3206., 7120.,
5541., 14704., 19185., 35537., 10523., 18739., 3858.,
8303., 6137., 1594., 8494., 6952., 46550., 11653.,
7897., 27201., 15891., 2120., 31272., 15226., 30451.,
1850., 18776., 37002., 14473., 20484., 24371., 27074.,
9584., 6101., 7069., 4586., 2348., 1873., 21505.,
934., 1663., 3271., 1084., 6667., 4574., 32596.,
1799., 3067., 6012., 30703., 24779., 11359., 2458.,
3475., 3178., 13206., 15619., 6246., 6278., 29127.,
60812., 9946., 18727., 14207., 18944., 2180., 38996.,
20245., 13350., 5803., 9166., 18280., 8666., 7416.,
10140., 2899., 3485., 40336., 3046., 3861., 5007.,
6164., 16411., 4659., 4226., 9611., 32054., 74127.,
4694., 13604., 33124., 7096., 2459., 125020., 13476.,
15065., 2778., 5367., 13707., 3914., 3793., 18190.,
566., 11510., 3624., 38920., 7300., 20012., 4251.,
9402., 1912., 21353., 672., 994., 23544., 8542.,
79143., 3398., 19059., 16728., 2603., 5405., 755.,
5760., 19633., 24080., 1837., 3405., 3896., 12900.,
3637., 14945., 2458., 8501., 1505., 1165., 8513.,
16873., 14663., 5370., 4277., 5040., 925., 2298.,
1323., 8751., 27149., 19602., 3143., 7031., 27576.,
61864., 4701., 16357., 4372., 22322., 1980., 7391.,
7184., 40050., 3069., 1229., 910., 10583., 1909.,
2960., 8619., 25643., 5306., 5999., 33720., 7237.,
13317., 22853., 9165., 9848., 3285., 2464., 14293.,
14590., 5167., 1164., 16864., 6772., 2195., 16355.,
30890., 3419., 4923., 11561., 4885., 6355., 13380.,
4330., 12095., 7531., 1711., 8337., 23457., 24345.,
3072., 6771., 1708., 741., 9748., 3765., 11005.,
6455., 2446., 5203., 1604., 16559., 4642., 26700.,
3495., 14391., 14350., 739., 5671., 2595., 2779.,
5512., 5358., 668., 9852., 27155., 4018., 22146.,
9495., 977., 16198., 11594., 3424., 17682., 18606.,
2138., 19448., 10532., 5989., 45618., 34244., 3908.,
16275., 612., 4956., 26268., 3611., 635., 975.,
997., 44748., 985., 5586., 40500., 1627., 64821.,
4948., 15902., 15109., 3296., 41226., 4753., 3264.,
801., 5874., 13825., 40475., 11993., 25794., 9583.,
4272., 14415., 6836., 23176., 1144., 11833., 22914.,
15145., 1226., 7723., 1818., 12261., 5409., 2092.,
4256., 1636., 1070., 20503., 4190., 2637., 8968.,
1561., 27731., 1020., 1147., 6251., 15733., 5985.,
24512., 30403., 1590., 1822., 36325., 2468., 2667.,
6163., 5636., 15523., 1910., 10528., 16791., 3922.,
9328., 5693., 24971., 52376., 5244., 19760., 6600.,
12679., 5622., 19295., 7814., 10256., 5680., 17396.,
16321., 653., 26352., 2763., 7255., 7112., 11843.,
1606., 13391., 1375., 19231., 21584., 516., 13325.,
3199., 21807., 23768., 5417., 21208., 19870., 6577.,
1643., 3640., 22366., 44784., 1682., 10169., 37447.,
12059., 4229., 13965., 23448., 1217., 8249., 19288.,
51630., 2387., 11886., 4439., 17918., 5300., 13437.,
3279., 23497., 27033., 3607., 3937., 3428., 5057.,
6817., 6309., 7245., 3469., 48134., 3991., 1218.,
18542., 14848., 41093., 20508., 7540., 10598., 58490.,
23482., 1091., 6856., 22916., 8346., 7527., 4625.,
15302., 4592., 8344., 3608., 12292., 4099., 28948.,
5420., 5634., 6254., 9282., 25764., 18785., 937.,
1597., 2453., 1000., 16700., 5115., 63388., 9811.,
4322., 11193., 20443., 3354., 5291., 42629., 1341.,
17610., 27994., 28392., 5646., 5733., 29563., 20150.,
2559., 957., 3563., 9801., 15846., 7023., 793.,
2020., 31952., 13117., 13605., 18629., 32530., 3318.,
3045., 1709., 27109., 46359., 597., 9054., 24022.,
36474., 37638., 9123., 12146., 5841., 7481., 28333.,
19841., 1757., 40540., 7305., 40514., 6860., 3528.,
4774., 20674., 5696., 8973., 8948., 4184., 17574.,
1646., 20684., 6813., 1592., 13192., 966., 29766.,
31156., 35395., 9067., 12727., 3628., 1712., 4998.,
5677., 4878., 4652., 5294., 2893., 11509., 16618.,
2317., 17529., 566., 22600., 29914., 6465., 1770.,
3222., 25341., 9127., 810.]),
array([ 5747., 22836., 16320., 9026., 21615., 80633., 31507.,
46209., 15082., 17455., 19016., 13998., 1587., 24224.,
9024., 4937., 38363., 28198., 15861., 3411., 54519.,
11332., 24538., 7822., 34741., 16423., 17025., 22277.,
4007., 18636., 1100., 5340., 25119., 4649., 3530.,
12408., 3627., 11211., 16607., 140636., 6480., 9546.,
19835., 25867., 7783., 11191., 65249., 563., 8736.,
8836., 14138., 21410., 11699., 9891., 40467., 42711.,
37137., 2109., 26786., 196603., 2133., 46378., 8834.,
54266., 35327., 11982., 12700., 95646., 16074., 5514.,
9071., 91177., 21798., 191995., 15590., 61159., 1764.,
53194., 10632., 18251., 33909., 5280., 14580., 17220.,
2665., 7043., 3012., 17609., 31642., 2330., 29462.,
22580., 73797., 13675., 18079., 56631., 57074., 6026.,
27298., 35729., 6850., 5508., 25598., 48624., 14836.,
8672., 42277., 9019., 9119., 24360., 25295., 11381.,
10888., 41892., 8823., 24731., 12501., 12599., 12861.,
7274., 11136., 15481., 3373., 3987., 14963., 20862.,
11881., 14588., 28261., 16672., 9010., 1311., 7760.,
10673., 22565., 15235., 23310., 10595., 1450., 13495.,
24713., 7664., 8910., 29897., 39624., 26719., 27312.,
27500., 27761., 10496., 29747., 2123., 12452., 4077.,
9846., 10402., 68668.]),
array([ 713., 4605., 2345., 19415., 6849., 12311., 7713.,
12556., 33350., 6826., 7444., 32800., 2649., 40264.,
3738., 5637., 1867., 27335., 3007., 6514., 6474.,
12869., 10703., 11831., 13228., 6081., 2706., 3116.,
14674., 3121., 11988., 4219., 1037., 13047., 9239.,
619., 5677., 3558., 20694., 13913., 2177., 23905.,
16895., 8385., 21415., 22529., 6388., 27298., 22597.,
19795., 1445., 9490., 36842., 13165., 6789., 57706.,
1137., 5340., 11895., 19954., 11400., 17096., 9595.,
3293., 5211., 363., 6188., 65869., 5368., 22685.,
794., 14519., 7985., 26463., 14947., 7547., 21631.,
6333., 13057., 74167., 5569., 9583., 7926., 19345.,
4204., 27275., 3287., 4440., 8046., 16319., 13326.,
14308., 7085., 12575., 4886., 23326., 16905., 37589.,
9359., 30695., 12251., 2229., 16486., 1024., 16599.,
2047., 4532., 4346., 15120., 4806., 70779., 11557.,
16035., 7961., 469., 19605., 17768., 4436., 3777.,
24863., 8617., 32447., 14795., 680., 24862., 9711.,
7666., 9094., 1843., 2334., 13050., 15084., 11759.,
1416., 2160., 5989., 9355., 16470., 9927., 25222.,
6054., 11003., 10679., 5900., 5478., 13881., 2463.,
101007., 16974., 12800., 15815., 5145., 13416., 42252.,
24825., 31631., 35957., 22598., 5458., 5649., 1329.,
9181., 1958., 20500., 2021., 5096., 7087., 7125.,
1404., 24524., 798., 1799., 4644., 18908., 3881.,
799., 7437., 8472., 55077., 30631., 5217., 10613.,
8773., 1635., 2981., 3307., 3998., 13259., 4276.,
1920., 2526., 6112., 11034., 11758., 9166., 9172.,
4099., 15853., 17089., 6813., 2273., 2890., 9441.,
23832., 24563., 1325., 1043., 1223., 1574., 16797.,
1364., 12117., 11730., 2657., 18653., 5428., 7509.,
28849., 1466., 3741., 2314., 12364., 64150., 36868.,
8046., 13211., 6620., 3323., 2562., 8341., 5645.,
461., 6453., 9810., 15812., 13078., 9907., 2195.,
13899., 5710., 13253., 7371., 64707., 14508., 4380.,
24823., 6429., 6855., 5628., 20207., 24683., 17196.,
10115., 6250., 10693., 21069., 959., 17379., 17533.,
6991., 1048., 35605., 21208., 9459., 62285., 7049.,
1437., 4046., 24749., 10810., 19749., 5759., 18501.,
11129., 15001., 2360., 10266., 21092., 14613., 19862.,
5608., 5587., 8147., 6287., 6518., 12611., 9781.,
3215., 8255., 1954., 4617., 5059., 5082., 3425.,
1875., 1226., 2358., 20217., 8236., 2761., 5427.,
3009., 3087., 22086., 10655., 16312., 9124., 19939.,
25468., 909., 7974., 1615., 5793., 18499., 2698.]),
array([ 1141., 43603., 5360., 1478., 11698., 17102., 969.,
1175., 1217., 20151., 8891., 2090., 16265., 2937.,
52911., 5901., 1772., 26544., 37908., 22973., 9055.,
1535., 22484., 23570., 1956., 14023., 10605., 1377.,
7899., 1236., 3110., 3766., 2520., 1924., 100270.,
11689., 11757., 2173., 2816., 1191., 3510., 7847.,
19589., 16887., 5063., 10596., 7967., 29052., 36828.,
16863., 6344., 1868., 895., 3630., 1074., 30671.,
4682., 8927., 3270., 29708., 2605., 33210., 8073.,
11010., 2135., 4516., 6762., 1125., 12123., 38515.,
555., 4485., 20135., 3473., 547., 2992., 585.,
3990., 18741., 1993., 1432., 15163., 3041., 5420.,
8273., 12596., 6013., 8211., 29690., 25369., 642.,
12847., 1351., 14501., 2141., 13565., 2175., 2868.,
11430., 2974., 4535., 16388., 23987., 2553., 2427.,
21063., 7820., 852., 1468., 54900., 39695., 4491.,
1124., 12345., 3812., 519., 5309., 9383., 13698.,
17622., 15737., 5612., 4205., 2779., 2556., 5504.,
13211., 19289., 691., 2455., 486., 762., 5510.,
14379., 5812., 6090., 26930., 7314., 14749., 938.,
6798., 2268., 4049., 1111., 504., 2159., 4298.,
29081., 1821., 7782., 1627., 24278., 5839., 1687.,
3185., 7636., 6489., 545., 20003., 1181., 4617.,
1867., 1411., 1805., 4233., 510., 3049., 1709.,
7868., 12006., 46193., 3325.]),
array([27090., 37381., 9736., 24686., 4073., 10628., 30818., 1901.]),

In [91]:
from scipy import stats
stats.kruskal(*l)

Out[91]: KruskalResult(statistic=11985.335740628494, pvalue=0.0)

In [92]:
# Assuming marging of 5% .05, pvalue < .05
# So it is in the critical region i.e. One or more sample distributions are not equal
# Average Response Time across Complaint Types is different
# Try Chi-square test for Complaint type vs City

df_ct = pd.crosstab(CS_Dataset['Complaint Type'], CS_Dataset['City'], margins=True)

df_ct

Out[92]: SOUTH SOUTH


BREEZY CAMBRIA CENTRAL SPRINGFIELD STATEN
City ARVERNE ASTORIA Astoria BAYSIDE BELLEROSE BRONX BROOKLYN ... OZONE RICHMOND
POINT HEIGHTS PARK GARDENS ISLAND
PARK HILL

Complaint Type

Animal Abuse 46 170 0 53 15 2 1971 3191 15 0 ... 74 40 42 786

Bike/Roller
0 16 0 0 1 0 22 124 0 0 ... 1 1 0 10
/Skate Chronic

Blocked
50 3436 159 514 138 3 17062 36445 177 0 ... 1202 1946 330 2845
Driveway

Derelict Vehicle 32 426 14 231 120 3 2402 6257 148 0 ... 425 356 267 2184

Disorderly
2 5 0 2 2 0 66 79 0 0 ... 2 2 0 25
Youth

Drinking 1 43 0 1 1 1 206 291 0 0 ... 14 25 6 188

Graffiti 1 4 0 3 0 0 15 60 0 0 ... 2 0 0 6

Homeless
4 32 0 2 1 0 275 948 6 0 ... 5 12 7 77
Encampment
SOUTH SOUTH
BREEZY CAMBRIA CENTRAL SPRINGFIELD STATEN
City ARVERNE ASTORIA Astoria BAYSIDE BELLEROSE BRONX BROOKLYN ... OZONE RICHMOND
POINT HEIGHTS PARK GARDENS ISLAND
PARK HILL

Complaint Type

Illegal
0 4 0 0 1 0 24 61 1 0 ... 1 2 1 11
Fireworks

Illegal Parking 62 1340 277 638 132 16 9889 33532 113 5 ... 602 596 291 6224

Noise -
2 1653 310 47 38 4 2944 13855 19 0 ... 82 223 38 783
Commercial

Noise - House
14 21 0 3 1 0 90 389 2 0 ... 5 3 1 18
of Worship

Noise - Park 2 64 0 4 1 0 548 1575 0 0 ... 4 2 1 67

Noise -
29 409 145 17 13 1 9144 13982 29 105 ... 108 93 42 885
Street/Sidewalk

Noise - Vehicle 10 236 0 24 11 1 3556 5965 100 0 ... 97 93 48 424

Panhandling 1 2 0 0 1 0 20 49 0 0 ... 0 0 2 13

Posting
0 3 0 0 1 0 18 58 0 0 ... 1 0 2 516
Advertisement

Squeegee 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0

Traffic 1 60 0 9 9 0 427 1258 7 0 ... 36 12 12 229

Urinating in
1 10 0 0 1 0 54 155 0 0 ... 2 1 3 19
Public

In [93]:
chi2, p, dof, ex = stats.chi2_contingency(df_ct)

print(f'Chi_square value {chi2}\n\np value {p}\n\ndegrees of freedom {dof}\n\n expected {ex}')

Chi_square value 131574.49431914085

p value 0.0

degrees of freedom 1113

expected [[7.54355716e+00 2.32743495e+02 2.63587615e+01 ... 1.26900689e+02


4.83486675e+00 1.05290000e+04]
[3.38883326e-01 1.04556628e+01 1.18412900e+00 ... 5.70082876e+00
2.17199352e-01 4.73000000e+02]
[7.20309763e+01 2.22239201e+03 2.51691249e+02 ... 1.21173345e+03
4.61665717e+01 1.00538000e+05]
...
[4.59247805e-01 1.41693020e+01 1.60470758e+00 ... 7.72564744e+00
2.94344153e-01 6.41000000e+02]
[2.99764870e+00 9.24873002e+01 1.04744095e+01 ... 5.04276270e+01
1.92127291e+00 4.18400000e+03]
[2.59000000e+02 7.99100000e+03 9.05000000e+02 ... 4.35700000e+03
1.66000000e+02 3.61502000e+05]]

The p-value of 0 means the two variables (Complaint Type and City) are NOT independent
In [ ]:

In [ ]:

You might also like