Professional Documents
Culture Documents
In [1]:
import pandas as pd
import numpy as np
import seaborn as sn
from pandas.plotting import scatter_matrix
from matplotlib import pyplot as plt
from sklearn.svm import SVC
Tasks to Perform
1. Understand the dataset:
1.1 Import the dataset
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Posted
12/31/2015 01/01/2016 New York
Parking 260 21
5 32306554 11:56:30 01:50:11 NYPD City Police Illegal Parking Street/Sidewalk 11215.0 ... NaN NaN NaN NaN NaN
Sign STREET
PM AM Department
Violation
Posted
12/31/2015 01/01/2016 New York 83-44
Parking
8 32308581 11:53:58 08:27:32 NYPD City Police Illegal Parking Street/Sidewalk 11415.0 LEFFERTS ... NaN NaN NaN NaN NaN
Sign
PM AM Department BOULEVARD
Violation
10 rows × 53 columns
In [9]:
print("The Customer Request service dataset Information ")
print("==========================================================")
print(CS_Dataset.info(10))
print(CS_Dataset.describe())
Latitude Longitude
count 360528.000000 360528.000000
mean 40.724980 -73.924946
std 0.081907 0.079213
min 40.499040 -74.254937
25% 40.668742 -73.972253
50% 40.718406 -73.930643
75% 40.778166 -73.874098
max 40.912869 -73.700715
In [10]:
CS_Dataset.tail(10)
10 rows × 53 columns
Out[11]: Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',
'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
'Resolution Description', 'Resolution Action Updated Date',
'Community Board', 'Borough', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough',
'School Name', 'School Number', 'School Region', 'School Code',
'School Phone Number', 'School Address', 'School City', 'School State',
'School Zip', 'School Not Found', 'School or Citywide Complaint',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location',
'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp',
'Bridge Highway Segment', 'Garage Lot Name', 'Ferry Direction',
'Ferry Terminal Name', 'Latitude', 'Longitude', 'Location'],
dtype='object')
In [23]:
# Drop rows that has null on selected columns
CS_Dataset=CS_Dataset.dropna(subset=['Closed Date'])
In [18]:
CS_Dataset.to_csv('Customer _Service_Requests.csv',index=False)
In [18]:
CS_Dataset=pd.read_csv("Customer _Service_Requests.csv",encoding='latin2')
In [24]:
CS_Dataset.shape
In [26]:
CS_Dataset.columns
Out[26]: Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',
'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
'Resolution Description', 'Resolution Action Updated Date',
'Community Board', 'Borough', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough',
'School Name', 'School Number', 'School Region', 'School Code',
'School Phone Number', 'School Address', 'School City', 'School State',
'School Zip', 'School Not Found', 'School or Citywide Complaint',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location',
'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp',
'Bridge Highway Segment', 'Garage Lot Name', 'Ferry Direction',
'Ferry Terminal Name', 'Latitude', 'Longitude', 'Location',
'Time Elapsed1', 'Time Elapsed2'],
dtype='object')
2.3 Analyze the date column,and remove entries that have an incorrect timeline
2.3.1 Calculate the time elapsed in closed and creation date
2.3.3 View the descriptive statistics for the newly created column
2.3.7 Create a scatter and hexbin plot of the concentration of complaints across Brooklyn
2.3.1 Calculate the time elapsed in closed and creation date
In [27]:
CS_Dataset
New York 71
2015-12-31 2016-01-01 Noise - Loud
0 32310363 NYPD City Police Street/Sidewalk 10034.0 VERMILYEA ... NaN NaN NaN NaN
23:59:45 00:55:15 Street/Sidewalk Music/Party
Department AVENUE
New York
2015-12-31 2016-01-01 Blocked 27-07 23
1 32309934 NYPD City Police No Access Street/Sidewalk 11105.0 ... NaN NaN NaN NaN
23:59:44 01:26:57 Driveway AVENUE
Department
New York
2015-12-31 2016-01-01 Blocked 87-14 57
4 32306529 NYPD City Police Illegal Parking Street/Sidewalk 11373.0 ... NaN NaN NaN NaN
23:56:58 03:24:42 Sidewalk ROAD
Department
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
New York
2015-01-01 2015-01-01 Blocked 84-25 85
362172 29609918 NYPD City Police Illegal Parking Street/Sidewalk 11421.0 ... NaN NaN NaN NaN
00:04:44 10:22:31 Hydrant ROAD
Department
New York
2015-01-01 2015-01-01 Blocked 123-19 135
362176 29611816 NYPD City Police No Access Street/Sidewalk 11420.0 ... NaN NaN NaN NaN
00:00:50 02:47:50 Driveway STREET
Department
In [28]:
#convert Created Date and Closed Date to datetime
CS_Dataset[['Created Date','Closed Date']] = CS_Dataset[['Created Date','Closed Date']].apply(pd.to_datetime)
Out[28]: 0 0.038542
1 0.060567
2 0.202477
3 0.323229
4 0.144259
...
362172 0.429016
362173 0.097616
362174 0.013229
362175 0.111725
362176 0.115972
Name: Time Elapsed1, Length: 362177, dtype: float64
Out[29]: 0 3330.0
1 5233.0
2 17494.0
3 27927.0
4 12464.0
5 6821.0
6 7102.0
7 6529.0
8 30814.0
9 5022.0
10 28120.0
11 40031.0
12 8996.0
13 30649.0
14 37785.0
15 56007.0
16 17559.0
17 3078.0
18 10589.0
19 2856.0
Name: Time Elapsed2, dtype: float64
Solution 2 about question 2.3.2
In [30]:
CS_Dataset['Time Elapsed2'] =CS_Dataset['Time Elapsed1']*24*60*60
CS_Dataset['Time Elapsed2'].head(20)
Out[30]: 0 3330.0
1 5233.0
2 17494.0
3 27927.0
4 12464.0
5 6821.0
6 7102.0
7 6529.0
8 30814.0
9 5022.0
10 28120.0
11 40031.0
12 8996.0
13 30649.0
14 37785.0
15 56007.0
16 17559.0
17 3078.0
18 10589.0
19 2856.0
Name: Time Elapsed2, dtype: float64
2.3.3 View the descriptive statistics for the newly created column
In [31]:
CS_Dataset['Time Elapsed2'].describe()
2.3.4 Check the number of null values in the Complaint_Type and City columns
In [32]:
CS_Dataset.isnull().sum()
New York 71
2015-12-31 2016-01-01 Noise - Loud
0 32310363 NYPD City Police Street/Sidewalk 10034.0 VERMILYEA ... NaN NaN NaN NaN NaN
23:59:45 00:55:15 Street/Sidewalk Music/Party
Department AVENUE
New York
2015-12-31 2016-01-01 Blocked 27-07 23
1 32309934 NYPD City Police No Access Street/Sidewalk 11105.0 ... NaN NaN NaN NaN NaN
23:59:44 01:26:57 Driveway AVENUE
Department
New York
2015-12-31 2016-01-01 Blocked 87-14 57
4 32306529 NYPD City Police Illegal Parking Street/Sidewalk 11373.0 ... NaN NaN NaN NaN NaN
23:56:58 03:24:42 Sidewalk ROAD
Department
5 rows × 55 columns
In [35]:
CS_Dataset.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 362177 entries, 0 to 362176
Data columns (total 55 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unique Key 362177 non-null int64
1 Created Date 362177 non-null datetime64[ns]
2 Closed Date 362177 non-null datetime64[ns]
3 Agency 362177 non-null object
4 Agency Name 362177 non-null object
5 Complaint Type 362177 non-null object
6 Descriptor 355681 non-null object
7 Location Type 362047 non-null object
8 Incident Zip 361502 non-null float64
9 Incident Address 310491 non-null object
10 Street Name 310491 non-null object
11 Cross Street 1 306846 non-null object
12 Cross Street 2 306713 non-null object
13 Intersection Street 1 50628 non-null object
14 Intersection Street 2 50504 non-null object
15 Address Type 361248 non-null object
16 City 361503 non-null object
17 Landmark 375 non-null object
18 Facility Type 362159 non-null object
19 Status 362177 non-null object
20 Due Date 362176 non-null object
21 Resolution Description 362177 non-null object
22 Resolution Action Updated Date 362138 non-null object
23 Community Board 362177 non-null object
24 Borough 362177 non-null object
25 X Coordinate (State Plane) 360470 non-null float64
26 Y Coordinate (State Plane) 360470 non-null float64
27 Park Facility Name 362177 non-null object
28 Park Borough 362177 non-null object
29 School Name 362177 non-null object
30 School Number 362177 non-null object
31 School Region 362176 non-null object
32 School Code 362176 non-null object
33 School Phone Number 362177 non-null object
34 School Address 362177 non-null object
35 School City 362177 non-null object
36 School State 362177 non-null object
37 School Zip 362176 non-null object
38 School Not Found 362177 non-null object
39 School or Citywide Complaint 0 non-null float64
40 Vehicle Type 0 non-null float64
41 Taxi Company Borough 0 non-null float64
42 Taxi Pick Up Location 0 non-null float64
43 Bridge Highway Name 297 non-null object
44 Bridge Highway Direction 297 non-null object
45 Road Ramp 262 non-null object
46 Bridge Highway Segment 262 non-null object
47 Garage Lot Name 0 non-null float64
48 Ferry Direction 0 non-null float64
49 Ferry Terminal Name 0 non-null float64
50 Latitude 360470 non-null float64
51 Longitude 360470 non-null float64
52 Location 360470 non-null object
53 Time Elapsed1 362177 non-null float64
54 Time Elapsed2 362177 non-null float64
dtypes: datetime64[ns](2), float64(14), int64(1), object(38)
memory usage: 154.7+ MB
In [ ]:
CS_Dataset.to_csv('Customer _Service_Requests.csv',index=False)
In [36]:
CS_Dataset=pd.read_csv("Customer _Service_Requests.csv")
In [37]:
print(CS_Dataset.shape)
print(CS_Dataset.columns )
(362177, 55)
Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',
'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
'Resolution Description', 'Resolution Action Updated Date',
'Community Board', 'Borough', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough',
'School Name', 'School Number', 'School Region', 'School Code',
'School Phone Number', 'School Address', 'School City', 'School State',
'School Zip', 'School Not Found', 'School or Citywide Complaint',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location',
'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp',
'Bridge Highway Segment', 'Garage Lot Name', 'Ferry Direction',
'Ferry Terminal Name', 'Latitude', 'Longitude', 'Location',
'Time Elapsed1', 'Time Elapsed2'],
dtype='object')
In [39]:
CS_Dataset=CS_Dataset.groupby(['City','Complaint Type']).size().unstack().fillna(0)
CS_Dataset
Out[39]: Noise -
Noise
Complaint Animal Blocked Derelict Disorderly Homeless Illegal Noise - House Urinating
Drinking Graffiti ... - Panhandling Traffic Vending
Type Abuse Driveway Vehicle Youth Encampment Parking Commercial of in Public
Vehicle
Worship
City
ARVERNE 46.0 50.0 32.0 2.0 1.0 1.0 4.0 62.0 2.0 14.0 ... 10.0 1.0 1.0 1.0 1.0
ASTORIA 170.0 3436.0 426.0 5.0 43.0 4.0 32.0 1340.0 1653.0 21.0 ... 236.0 2.0 60.0 10.0 57.0
Astoria 0.0 159.0 14.0 0.0 0.0 0.0 0.0 277.0 310.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
BAYSIDE 53.0 514.0 231.0 2.0 1.0 3.0 2.0 638.0 47.0 3.0 ... 24.0 0.0 9.0 0.0 2.0
BELLEROSE 15.0 138.0 120.0 2.0 1.0 0.0 1.0 132.0 38.0 1.0 ... 11.0 1.0 9.0 1.0 0.0
BREEZY
2.0 3.0 3.0 0.0 1.0 0.0 0.0 16.0 4.0 0.0 ... 1.0 0.0 0.0 0.0 0.0
POINT
BRONX 1971.0 17062.0 2402.0 66.0 206.0 15.0 275.0 9889.0 2944.0 90.0 ... 3556.0 20.0 427.0 54.0 433.0
BROOKLYN 3191.0 36445.0 6257.0 79.0 291.0 60.0 948.0 33532.0 13855.0 389.0 ... 5965.0 49.0 1258.0 155.0 575.0
CAMBRIA
15.0 177.0 148.0 0.0 0.0 0.0 6.0 113.0 19.0 2.0 ... 100.0 0.0 7.0 0.0 0.0
HEIGHTS
CENTRAL
0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
PARK
COLLEGE
35.0 597.0 223.0 1.0 1.0 2.0 3.0 449.0 38.0 2.0 ... 140.0 0.0 16.0 0.0 1.0
POINT
CORONA 104.0 3597.0 72.0 6.0 34.0 4.0 26.0 791.0 281.0 3.0 ... 110.0 1.0 14.0 7.0 65.0
EAST
85.0 1925.0 136.0 1.0 9.0 3.0 2.0 1092.0 41.0 25.0 ... 82.0 0.0 24.0 6.0 9.0
ELMHURST
ELMHURST 59.0 1992.0 94.0 2.0 13.0 1.0 34.0 760.0 85.0 6.0 ... 69.0 3.0 18.0 10.0 25.0
East Elmhurst 0.0 0.0 2.0 0.0 0.0 0.0 0.0 28.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
FAR
111.0 383.0 215.0 1.0 4.0 0.0 16.0 339.0 59.0 1.0 ... 83.0 0.0 11.0 1.0 10.0
ROCKAWAY
FLORAL PARK 7.0 33.0 74.0 1.0 1.0 0.0 0.0 72.0 3.0 0.0 ... 2.0 0.0 0.0 0.0 0.0
FLUSHING 191.0 3640.0 532.0 2.0 47.0 6.0 26.0 2250.0 222.0 5.0 ... 147.0 2.0 59.0 12.0 37.0
FOREST HILLS 78.0 873.0 71.0 1.0 1.0 3.0 18.0 627.0 163.0 1.0 ... 70.0 6.0 65.0 2.0 10.0
FRESH
66.0 682.0 347.0 0.0 2.0 0.0 6.0 1158.0 21.0 0.0 ... 97.0 1.0 15.0 1.0 1.0
MEADOWS
GLEN OAKS 5.0 48.0 57.0 0.0 0.0 0.0 0.0 95.0 84.0 0.0 ... 4.0 0.0 3.0 2.0 19.0
Noise -
Noise
Complaint Animal Blocked Derelict Disorderly Homeless Illegal Noise - House Urinating
Drinking Graffiti ... - Panhandling Traffic Vending
Type Abuse Driveway Vehicle Youth Encampment Parking Commercial of in Public
Vehicle
Worship
City
HOLLIS 39.0 442.0 162.0 1.0 3.0 0.0 9.0 181.0 54.0 215.0 ... 52.0 0.0 11.0 2.0 0.0
HOWARD
51.0 215.0 172.0 1.0 4.0 0.0 3.0 384.0 258.0 1.0 ... 10.0 2.0 9.0 0.0 5.0
BEACH
Howard
0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
Beach
JACKSON
50.0 703.0 41.0 0.0 10.0 1.0 11.0 240.0 619.0 2.0 ... 75.0 1.0 13.0 3.0 86.0
HEIGHTS
JAMAICA 317.0 3620.0 1132.0 9.0 40.0 3.0 93.0 1698.0 552.0 15.0 ... 337.0 3.0 632.0 37.0 24.0
KEW
26.0 429.0 16.0 0.0 1.0 0.0 5.0 276.0 203.0 1.0 ... 23.0 0.0 10.0 3.0 1.0
GARDENS
LITTLE NECK 21.0 174.0 73.0 2.0 1.0 0.0 0.0 322.0 77.0 0.0 ... 8.0 0.0 20.0 1.0 0.0
LONG ISLAND
40.0 1052.0 220.0 2.0 8.0 3.0 10.0 987.0 269.0 0.0 ... 124.0 2.0 83.0 3.0 31.0
CITY
Long Island
0.0 55.0 4.0 0.0 0.0 0.0 0.0 64.0 19.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
City
MASPETH 56.0 1000.0 510.0 2.0 9.0 1.0 11.0 1234.0 57.0 2.0 ... 26.0 0.0 71.0 2.0 7.0
MIDDLE
36.0 663.0 366.0 0.0 2.0 0.0 5.0 1104.0 13.0 0.0 ... 45.0 0.0 14.0 0.0 0.0
VILLAGE
NEW HYDE
1.0 76.0 14.0 0.0 0.0 0.0 0.0 32.0 4.0 0.0 ... 2.0 0.0 0.0 0.0 0.0
PARK
NEW YORK 1941.0 2705.0 695.0 81.0 321.0 25.0 3060.0 14549.0 18686.0 222.0 ... 6294.0 206.0 1769.0 264.0 2638.0
OAKLAND
29.0 177.0 117.0 1.0 2.0 0.0 1.0 337.0 2.0 0.0 ... 7.0 0.0 6.0 0.0 2.0
GARDENS
OZONE PARK 72.0 1681.0 479.0 4.0 20.0 0.0 8.0 774.0 125.0 4.0 ... 81.0 7.0 21.0 4.0 1.0
QUEENS 1.0 3.0 2.0 0.0 0.0 0.0 2.0 10.0 6.0 1.0 ... 2.0 0.0 2.0 1.0 0.0
QUEENS
90.0 772.0 478.0 0.0 5.0 1.0 19.0 669.0 49.0 2.0 ... 54.0 1.0 27.0 5.0 2.0
VILLAGE
REGO PARK 33.0 780.0 94.0 0.0 4.0 1.0 6.0 640.0 82.0 1.0 ... 60.0 0.0 16.0 1.0 3.0
RICHMOND
55.0 1099.0 200.0 0.0 10.0 1.0 30.0 489.0 249.0 0.0 ... 69.0 0.0 8.0 5.0 15.0
HILL
RIDGEWOOD 154.0 2161.0 507.0 3.0 10.0 3.0 26.0 2235.0 491.0 2.0 ... 249.0 0.0 50.0 9.0 9.0
ROCKAWAY
33.0 80.0 19.0 4.0 23.0 0.0 4.0 337.0 72.0 0.0 ... 29.0 0.0 7.0 1.0 2.0
PARK
ROSEDALE 44.0 270.0 247.0 0.0 2.0 2.0 4.0 326.0 28.0 2.0 ... 25.0 0.0 25.0 0.0 19.0
SAINT
43.0 318.0 248.0 1.0 3.0 0.0 11.0 237.0 36.0 1.0 ... 50.0 0.0 14.0 1.0 2.0
ALBANS
SOUTH
74.0 1202.0 425.0 2.0 14.0 2.0 5.0 602.0 82.0 5.0 ... 97.0 0.0 36.0 2.0 5.0
OZONE PARK
SOUTH
RICHMOND 40.0 1946.0 356.0 2.0 25.0 0.0 12.0 596.0 223.0 3.0 ... 93.0 0.0 12.0 1.0 24.0
HILL
SPRINGFIELD
42.0 330.0 267.0 0.0 6.0 0.0 7.0 291.0 38.0 1.0 ... 48.0 2.0 12.0 3.0 1.0
GARDENS
STATEN
786.0 2845.0 2184.0 25.0 188.0 6.0 77.0 6224.0 783.0 18.0 ... 424.0 13.0 229.0 19.0 25.0
ISLAND
SUNNYSIDE 40.0 278.0 17.0 2.0 12.0 1.0 12.0 167.0 238.0 0.0 ... 53.0 0.0 17.0 2.0 15.0
WHITESTONE 43.0 279.0 279.0 1.0 3.0 1.0 0.0 631.0 21.0 0.0 ... 31.0 0.0 32.0 0.0 1.0
WOODHAVEN 57.0 1363.0 369.0 0.0 4.0 0.0 10.0 896.0 209.0 3.0 81.0 1.0 7.0 2.0 6.0
In [40]:
CS_Dataset.plot.bar(figsize=(14,10), stacked=True)
plt.ylabel('Number of Complaints')
plt.title('Number of complaints vs. City')
In [42]:
brooklyn = CS_Dataset.loc[CS_Dataset['City'] == 'BROOKLYN']
brooklyn
Posted
New York
2015-12-31 2016-01-01 Illegal Parking 260 21
5 32306554 NYPD City Police Street/Sidewalk 11215.0 ... NaN NaN NaN NaN
23:56:30 01:50:11 Parking Sign STREET
Department
Violation
New York
2015-12-31 2016-01-01 Blocked 1408 66
9 32308391 NYPD City Police No Access Street/Sidewalk 11219.0 ... NaN NaN NaN NaN
23:53:58 01:17:40 Driveway STREET
Department
Posted
New York
2015-12-31 2016-01-01 Illegal Parking 38 COX
13 32305074 NYPD City Police Street/Sidewalk 11208.0 ... NaN NaN NaN NaN
23:47:58 08:18:47 Parking Sign PLACE
Department
Violation
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
New York
2015-01-01 2015-01-01 Blocked 27 HOPE
362160 29612697 NYPD City Police No Access Street/Sidewalk 11211.0 ... NaN NaN NaN NaN
00:19:22 02:41:10 Driveway STREET
Department
New York 19
2015-01-01 2015-01-01 Blocked
362165 29613402 NYPD City Police No Access Street/Sidewalk 11218.0 MICIELI ... NaN NaN NaN NaN
00:15:45 02:04:54 Driveway
Department PLACE
3.2 Check the frequency of various types of complaints for New York City
3.5 Create a DataFrame, df_new, which contains cities as columns and complaint types in rows
3.1 Plot a bar graph to show the types of complaints
In [45]:
#Complaint Types
CS_Dataset['Complaint Type'].unique()
In [46]:
# Display complaint types by counts
CS_Dataset['Complaint Type'].value_counts()
3.2 Check the frequency of various types of complaints for New York City
Displaying compliant type that have only New York City
In [48]:
New_York_City = CS_Dataset.loc[CS_Dataset['City'] == 'NEW YORK']
New_York_City
New York 71
2015-12-31 2016-01-01 Noise - Loud
0 32310363 NYPD City Police Street/Sidewalk 10034.0 VERMILYEA ... NaN NaN NaN NaN
23:59:45 00:55:15 Street/Sidewalk Music/Party
Department AVENUE
Double
New York 133 WEST
2015-12-31 2016-01-01 Parked
23 32308765 NYPD City Police Illegal Parking Street/Sidewalk 10030.0 134 ... NaN NaN NaN NaN
23:32:46 00:25:21 Blocking
Department STREET
Vehicle
Bridge Garage
Unique Created Closed Agency Complaint Incident Incident Road Ferry
Agency Descriptor Location Type ... Highway Lot Terminal
Key Date Date Name Type Zip Address Ramp Direction
Segment Name
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
New York
2015-01-01 2015-01-01 Noise - Loud LUDLOW
362166 29608295 NYPD City Police Street/Sidewalk 10002.0 ... NaN NaN NaN NaN
00:15:33 00:56:37 Street/Sidewalk Music/Party STREET
Department
New York
2015-01-01 2015-01-01 Noise - Loud
362171 29610051 NYPD City Police Street/Sidewalk 10002.0 NaN ... NaN NaN NaN NaN
00:05:05 01:22:10 Street/Sidewalk Music/Party
Department
In [50]:
# solution 2 by using count plotting technique
plt.figure(figsize=(15,10))
sn.countplot(data=New_York_City,y='Complaint Type')
plt.title('Count by Complaint Type for New York City')
plt.xlabel('Complaint Type')
plt.ylabel('Count')
In [53]:
for c in cl:
df_c = CS_Dataset.loc[CS_Dataset['City'] == c]
df_c['Complaint Type'].value_counts().plot(kind='bar', figsize=(18, 10))
plt.title(f"Count by Complaint Type for %s" %c)
plt.xlabel('Complaint Type')
plt.ylabel('Count')
plt.show()
print("=================================================================")
print("/////////////////////////////////////////////////////////////////")
=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
=================================================================
/////////////////////////////////////////////////////////////////
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-53-d02a9c8fb227> in <module>
1 for c in cl:
2 df_c = CS_Dataset.loc[CS_Dataset['City'] == c]
----> 3 df_c['Complaint Type'].value_counts().plot(kind='bar', figsize=(18, 10))
4 plt.title(f"Count by Complaint Type for %s" %c)
('Complaint Type')
3.5 Create a DataFrame, df_new, which contains cities as columns and complaint types in rows
In [54]:
df_new = CS_Dataset[['Complaint Type', 'City']]
df_new
# as a data frame
In [55]:
df_new=df_new.set_index('Complaint Type')
df_new
Out[55]: City
Complaint Type
Complaint Type
... ...
df_complainttypes=CS_Dataset.groupby(['City','Complaint Type']).size().unstack().fillna(0)
df_complainttypes
Out[56]: Noise -
Noise
Complaint Animal Blocked Derelict Disorderly Homeless Illegal Noise - House Urinating
Drinking Graffiti ... - Panhandling Traffic Vending
Type Abuse Driveway Vehicle Youth Encampment Parking Commercial of in Public
Vehicle
Worship
City
ARVERNE 46.0 50.0 32.0 2.0 1.0 1.0 4.0 62.0 2.0 14.0 ... 10.0 1.0 1.0 1.0 1.0
ASTORIA 170.0 3436.0 426.0 5.0 43.0 4.0 32.0 1340.0 1653.0 21.0 ... 236.0 2.0 60.0 10.0 57.0
Astoria 0.0 159.0 14.0 0.0 0.0 0.0 0.0 277.0 310.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
BAYSIDE 53.0 514.0 231.0 2.0 1.0 3.0 2.0 638.0 47.0 3.0 ... 24.0 0.0 9.0 0.0 2.0
BELLEROSE 15.0 138.0 120.0 2.0 1.0 0.0 1.0 132.0 38.0 1.0 ... 11.0 1.0 9.0 1.0 0.0
BREEZY
2.0 3.0 3.0 0.0 1.0 0.0 0.0 16.0 4.0 0.0 ... 1.0 0.0 0.0 0.0 0.0
POINT
BRONX 1971.0 17062.0 2402.0 66.0 206.0 15.0 275.0 9889.0 2944.0 90.0 ... 3556.0 20.0 427.0 54.0 433.0
BROOKLYN 3191.0 36445.0 6257.0 79.0 291.0 60.0 948.0 33532.0 13855.0 389.0 ... 5965.0 49.0 1258.0 155.0 575.0
CAMBRIA
15.0 177.0 148.0 0.0 0.0 0.0 6.0 113.0 19.0 2.0 ... 100.0 0.0 7.0 0.0 0.0
HEIGHTS
CENTRAL
0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
PARK
COLLEGE
35.0 597.0 223.0 1.0 1.0 2.0 3.0 449.0 38.0 2.0 ... 140.0 0.0 16.0 0.0 1.0
POINT
CORONA 104.0 3597.0 72.0 6.0 34.0 4.0 26.0 791.0 281.0 3.0 ... 110.0 1.0 14.0 7.0 65.0
EAST
85.0 1925.0 136.0 1.0 9.0 3.0 2.0 1092.0 41.0 25.0 ... 82.0 0.0 24.0 6.0 9.0
ELMHURST
ELMHURST 59.0 1992.0 94.0 2.0 13.0 1.0 34.0 760.0 85.0 6.0 ... 69.0 3.0 18.0 10.0 25.0
East Elmhurst 0.0 0.0 2.0 0.0 0.0 0.0 0.0 28.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
FAR
111.0 383.0 215.0 1.0 4.0 0.0 16.0 339.0 59.0 1.0 ... 83.0 0.0 11.0 1.0 10.0
ROCKAWAY
FLORAL PARK 7.0 33.0 74.0 1.0 1.0 0.0 0.0 72.0 3.0 0.0 ... 2.0 0.0 0.0 0.0 0.0
FLUSHING 191.0 3640.0 532.0 2.0 47.0 6.0 26.0 2250.0 222.0 5.0 ... 147.0 2.0 59.0 12.0 37.0
FOREST HILLS 78.0 873.0 71.0 1.0 1.0 3.0 18.0 627.0 163.0 1.0 ... 70.0 6.0 65.0 2.0 10.0
FRESH
66.0 682.0 347.0 0.0 2.0 0.0 6.0 1158.0 21.0 0.0 ... 97.0 1.0 15.0 1.0 1.0
MEADOWS
GLEN OAKS 5.0 48.0 57.0 0.0 0.0 0.0 0.0 95.0 84.0 0.0 ... 4.0 0.0 3.0 2.0 19.0
HOLLIS 39.0 442.0 162.0 1.0 3.0 0.0 9.0 181.0 54.0 215.0 ... 52.0 0.0 11.0 2.0 0.0
HOWARD
51.0 215.0 172.0 1.0 4.0 0.0 3.0 384.0 258.0 1.0 ... 10.0 2.0 9.0 0.0 5.0
BEACH
Howard
0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
Beach
JACKSON
50.0 703.0 41.0 0.0 10.0 1.0 11.0 240.0 619.0 2.0 ... 75.0 1.0 13.0 3.0 86.0
HEIGHTS
JAMAICA 317.0 3620.0 1132.0 9.0 40.0 3.0 93.0 1698.0 552.0 15.0 ... 337.0 3.0 632.0 37.0 24.0
KEW
26.0 429.0 16.0 0.0 1.0 0.0 5.0 276.0 203.0 1.0 ... 23.0 0.0 10.0 3.0 1.0
GARDENS
LITTLE NECK 21.0 174.0 73.0 2.0 1.0 0.0 0.0 322.0 77.0 0.0 ... 8.0 0.0 20.0 1.0 0.0
LONG ISLAND
40.0 1052.0 220.0 2.0 8.0 3.0 10.0 987.0 269.0 0.0 ... 124.0 2.0 83.0 3.0 31.0
CITY
Long Island
0.0 55.0 4.0 0.0 0.0 0.0 0.0 64.0 19.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
City
MASPETH 56.0 1000.0 510.0 2.0 9.0 1.0 11.0 1234.0 57.0 2.0 ... 26.0 0.0 71.0 2.0 7.0
MIDDLE
36.0 663.0 366.0 0.0 2.0 0.0 5.0 1104.0 13.0 0.0 ... 45.0 0.0 14.0 0.0 0.0
VILLAGE
Noise -
Noise
Complaint Animal Blocked Derelict Disorderly Homeless Illegal Noise - House Urinating
Drinking Graffiti ... - Panhandling Traffic Vending
Type Abuse Driveway Vehicle Youth Encampment Parking Commercial of in Public
Vehicle
Worship
City
NEW HYDE
1.0 76.0 14.0 0.0 0.0 0.0 0.0 32.0 4.0 0.0 ... 2.0 0.0 0.0 0.0 0.0
PARK
NEW YORK 1941.0 2705.0 695.0 81.0 321.0 25.0 3060.0 14549.0 18686.0 222.0 ... 6294.0 206.0 1769.0 264.0 2638.0
OAKLAND
29.0 177.0 117.0 1.0 2.0 0.0 1.0 337.0 2.0 0.0 ... 7.0 0.0 6.0 0.0 2.0
GARDENS
OZONE PARK 72.0 1681.0 479.0 4.0 20.0 0.0 8.0 774.0 125.0 4.0 ... 81.0 7.0 21.0 4.0 1.0
QUEENS 1.0 3.0 2.0 0.0 0.0 0.0 2.0 10.0 6.0 1.0 ... 2.0 0.0 2.0 1.0 0.0
QUEENS
90.0 772.0 478.0 0.0 5.0 1.0 19.0 669.0 49.0 2.0 ... 54.0 1.0 27.0 5.0 2.0
VILLAGE
REGO PARK 33.0 780.0 94.0 0.0 4.0 1.0 6.0 640.0 82.0 1.0 ... 60.0 0.0 16.0 1.0 3.0
RICHMOND
55.0 1099.0 200.0 0.0 10.0 1.0 30.0 489.0 249.0 0.0 ... 69.0 0.0 8.0 5.0 15.0
HILL
RIDGEWOOD 154.0 2161.0 507.0 3.0 10.0 3.0 26.0 2235.0 491.0 2.0 ... 249.0 0.0 50.0 9.0 9.0
ROCKAWAY
33.0 80.0 19.0 4.0 23.0 0.0 4.0 337.0 72.0 0.0 ... 29.0 0.0 7.0 1.0 2.0
PARK
ROSEDALE 44.0 270.0 247.0 0.0 2.0 2.0 4.0 326.0 28.0 2.0 ... 25.0 0.0 25.0 0.0 19.0
SAINT
43.0 318.0 248.0 1.0 3.0 0.0 11.0 237.0 36.0 1.0 ... 50.0 0.0 14.0 1.0 2.0
ALBANS
SOUTH
74.0 1202.0 425.0 2.0 14.0 2.0 5.0 602.0 82.0 5.0 ... 97.0 0.0 36.0 2.0 5.0
OZONE PARK
SOUTH
RICHMOND 40.0 1946.0 356.0 2.0 25.0 0.0 12.0 596.0 223.0 3.0 ... 93.0 0.0 12.0 1.0 24.0
HILL
SPRINGFIELD
42.0 330.0 267.0 0.0 6.0 0.0 7.0 291.0 38.0 1.0 ... 48.0 2.0 12.0 3.0 1.0
GARDENS
STATEN
786.0 2845.0 2184.0 25.0 188.0 6.0 77.0 6224.0 783.0 18.0 ... 424.0 13.0 229.0 19.0 25.0
ISLAND
SUNNYSIDE 40.0 278.0 17.0 2.0 12.0 1.0 12.0 167.0 238.0 0.0 ... 53.0 0.0 17.0 2.0 15.0
WHITESTONE 43.0 279.0 279.0 1.0 3.0 1.0 0.0 631.0 21.0 0.0 ... 31.0 0.0 32.0 0.0 1.0
WOODHAVEN 57.0 1363.0 369.0 0.0 4.0 0.0 10.0 896.0 209.0 3.0 81.0 1.0 7.0 2.0 6.0
In [57]:
df_complainttypes.plot.bar(figsize=(15,10), stacked=True, colormap='Paired')
plt.ylabel('Number of Complaints')
plt.title('Number of complaints vs. City')
In [59]:
CS_Dataset.columns
Out[59]: Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',
'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
'Intersection Street 1', 'Intersection Street 2', 'Address Type',
'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
'Resolution Description', 'Resolution Action Updated Date',
'Community Board', 'Borough', 'X Coordinate (State Plane)',
'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough',
'School Name', 'School Number', 'School Region', 'School Code',
'School Phone Number', 'School Address', 'School City', 'School State',
'School Zip', 'School Not Found', 'School or Citywide Complaint',
'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location',
'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp',
'Bridge Highway Segment', 'Garage Lot Name', 'Ferry Direction',
'Ferry Terminal Name', 'Latitude', 'Longitude', 'Location',
'Time Elapsed1', 'Time Elapsed2'],
dtype='object')
In [60]:
cl
In [61]:
df_groupby_city = CS_Dataset.groupby(['City','Complaint Type'])['Time Elapsed2'].mean()
df_groupby_location = CS_Dataset.groupby(['Location','Complaint Type'])['Time Elapsed2'].mean()
print(df_groupby_location)
In [62]:
# grop by City
print(df_groupby_city)
5.See whether the average response time across different complaint types is
similar (overall)
5.1 Visualize the average of Request_Closing_Time
In [63]:
# Resolution time according to complaint type
CS_Dataset.groupby('Complaint Type')['Time Elapsed2'].mean().sort_values()
Complaint Type
Drinking 1.382130e+04
Graffiti 2.327634e+04
Panhandling 1.585355e+04
Squeegee 1.456025e+04
Traffic 1.230912e+04
Vending 1.436628e+04
In [65]:
df_mrt.plot(kind='bar', figsize=(15, 10))
plt.title('Average Response Time by Complaint Type')
plt.xlabel('Complaint Type')
plt.ylabel('Average Response Time in minutes')
plt.show()
In [66]:
# another option
plt.figure(figsize=(15,8))
plt.xticks(rotation = 45)
sn.boxplot(data=CS_Dataset, x="Time Elapsed2", y="Complaint Type")
New York
2015-04-18 2015-05-02 Animal in Animal
281061 30427220 NYPD City Police Park NaN NaN ... NaN NaN NaN NaN NaN
09:44:55 10:35:29 a Park Waste
Department
1 rows × 55 columns
so remove this record
In [68]:
CS_Dataset.drop(labels=281061, axis=0, inplace=True)
animal=CS_Dataset[CS_Dataset['Complaint Type']=='Animal in a Park']
animal
0 rows × 55 columns
Complaint Type
Drinking 13821.300570
Graffiti 23276.343949
Panhandling 15853.550769
Squeegee 14560.250000
Traffic 12309.120092
Vending 14366.278375
In [70]:
df_mrt.plot(kind='bar', figsize=(15, 10))
plt.title('Average Response Time by Complaint Type')
plt.xlabel('Complaint Type')
plt.ylabel('Average Response Time in minutes')
plt.show()
6.Identify the significant variables by performing statistical analysis using
p-values
In [71]:
print(CS_Dataset.columns)
print(CS_Dataset.info())
In [73]:
# Drop columns with 0 non-null values
CS_Dataset = CS_Dataset.drop('Ferry Direction', axis=1)
CS_Dataset = CS_Dataset.drop('Ferry Terminal Name', axis=1)
CS_Dataset = CS_Dataset.drop('Garage Lot Name', axis=1)
CS_Dataset = CS_Dataset.drop('School or Citywide Complaint', axis=1)
CS_Dataset = CS_Dataset.drop('Vehicle Type', axis=1)
CS_Dataset = CS_Dataset.drop('Taxi Company Borough', axis=1)
CS_Dataset = CS_Dataset.drop('Taxi Pick Up Location', axis=1)
In [75]:
print(CS_Dataset.columns)
print(CS_Dataset.info())
In [77]:
# drop all columns with same values
for c in columns_with_same_values:
CS_Dataset = CS_Dataset.drop(c, axis=1)
CS_Dataset.columns
In [78]:
print(CS_Dataset.columns)
print(CS_Dataset.info())
In [80]:
CS_Dataset=pd.read_csv("Customer _Service_Requests1.csv")
In [81]:
print(CS_Dataset.columns)
print(CS_Dataset.info())
df_ct={}
for t in CS_Dataset['Complaint Type'].unique():
df_ct[t]= np.log(CS_Dataset[CS_Dataset['Complaint Type']==t]['Time Elapsed2'])
df_ct
lis = []
for t in CS_Dataset['Complaint Type'].unique():
lis.append(df_ct[t])
lis
Out[84]: [0 8.110728
12 9.104535
19 7.957177
38 7.477604
54 8.596928
...
362161 9.278186
362165 7.809541
362169 7.722678
362170 8.439232
362173 7.041412
Name: Time Elapsed2, Length: 51139, dtype: float64,
1 8.562740
2 9.769613
7 8.784009
9 8.521584
10 10.244236
...
362166 9.130106
362167 8.338306
362168 9.976506
362174 9.175024
362175 9.212338
Name: Time Elapsed2, Length: 100624, dtype: float64,
3 10.237349
4 9.430600
5 8.827761
6 8.868132
8 10.335724
...
362103 9.689056
362116 10.335530
362122 8.093157
362148 8.932741
362171 10.520482
Name: Time Elapsed2, Length: 91716, dtype: float64,
14 10.539667
151 9.562686
255 8.499640
256 9.607706
295 7.905442
...
361859 10.368133
361879 9.545812
361900 10.846849
361946 9.070618
362095 8.919587
Name: Time Elapsed2, Length: 21518, dtype: float64,
17 8.032035
18 9.267571
22 8.433377
29 9.106423
30 8.876684
...
362144 6.480045
362146 9.551658
362152 10.217751
362156 7.097549
362162 9.325453
Name: Time Elapsed2, Length: 43751, dtype: float64,
26 7.383989
127 8.693161
572 8.419801
2639 8.149024
3057 9.076466
...
359937 8.159089
360351 9.506139
361093 7.627544
361539 8.231376
361586 8.889446
Name: Time Elapsed2, Length: 1068, dtype: float64,
39 8.941545
42 8.961623
46 8.973732
49 8.993055
51 9.005037
...
348685 7.629490
349016 7.188413
355089 9.321703
355813 8.480114
357309 7.775696
Name: Time Elapsed2, Length: 679, dtype: float64,
87 10.042989
156 8.889308
172 9.334238
221 9.299907
319 7.869019
...
361902 9.424161
361986 8.010028
361987 8.889033
362113 9.702411
362172 9.040026
Name: Time Elapsed2, Length: 19301, dtype: float64,
89 7.358831
140 8.344267
164 8.999002
189 9.166179
247 8.357494
...
361916 10.053587
361941 9.554639
361959 10.150621
362024 9.335739
362063 7.948385
Name: Time Elapsed2, Length: 10530, dtype: float64,
98 9.340228
142 9.919213
341 10.031001
375 8.921458
393 7.556951
...
361818 8.562549
361821 8.611230
361822 7.335634
361856 9.136909
361931 8.718009
Name: Time Elapsed2, Length: 4185, dtype: float64,
130 9.685518
311 9.065661
334 7.737180
336 7.844633
337 9.091444
...
361066 9.526683
361133 7.950502
361172 8.676076
361669 10.034121
361674 9.814110
Name: Time Elapsed2, Length: 5196, dtype: float64,
180 7.936303
466 7.544861
644 9.455402
679 6.642487
796 9.373649
...
361618 7.939515
361657 8.677440
361731 9.782223
361978 8.814628
362020 6.612041
Name: Time Elapsed2, Length: 1404, dtype: float64,
313 9.444147
1843 8.537192
3771 10.296138
3837 8.637817
6296 7.650169
...
353797 9.111514
353883 8.483223
358480 10.610316
360463 8.520787
361223 7.004882
Name: Time Elapsed2, Length: 475, dtype: float64,
374 9.451795
2070 9.501741
3036 8.731498
3913 8.537584
5506 10.401137
...
353289 9.553788
353680 10.699620
355894 9.054972
355961 8.417152
360033 10.706520
Name: Time Elapsed2, Length: 325, dtype: float64,
389 6.259581
592 7.691657
1355 10.698650
3151 11.027833
3818 8.833317
...
355354 10.357616
360857 7.234898
360886 7.150701
360892 7.149132
361942 9.274348
Name: Time Elapsed2, Length: 4089, dtype: float64,
392 7.895808
434 9.747418
458 8.978787
562 8.733272
565 9.342070
...
361334 10.170189
361404 7.446001
361654 8.111928
361793 9.660333
361972 9.258273
Name: Time Elapsed2, Length: 4879, dtype: float64,
585 9.949034
652 8.065265
1148 9.785605
4757 8.980550
6057 8.679142
...
359137 7.478735
360513 8.077758
360603 10.140179
361042 9.118992
361707 6.697034
Name: Time Elapsed2, Length: 641, dtype: float64,
2314 8.656433
6278 10.036094
9391 9.700147
23225 9.107865
25588 9.981143
...
354684 9.429637
359177 8.313117
359735 9.194821
360556 9.249753
360868 11.137039
Name: Time Elapsed2, Length: 157, dtype: float64,
4651 6.569481
8984 8.434898
11959 7.760041
12107 9.873801
17080 8.831858
...
347643 8.983942
347667 7.387090
348140 8.664406
361071 9.825472
362140 7.900266
Name: Time Elapsed2, Length: 315, dtype: float64,
23390 7.039660
34023 10.682881
37725 8.586719
62401 7.298445
64404 9.367173
...
276854 7.443664
303944 8.970559
315108 9.393162
332779 10.740584
362154 8.109225
Name: Time Elapsed2, Length: 172, dtype: float64,
184636 10.206920
186456 10.528918
205740 9.183586
238330 10.113992
244503 8.312135
277038 9.271247
300720 10.335854
320102 7.550135
Name: Time Elapsed2, dtype: float64,
187686 9.997752
213914 10.103649
280566 8.353497
295549 8.934192
Name: Time Elapsed2, dtype: float64]
In [85]:
F,p = f_oneway(*lis)
print("p-value for significance is: ", p)
if p<0.05:
print("reject null hypothesis")
else:
print("accept null hypothesis")
Let's find out any corelation between Complaint Type and Location
In [86]:
df_loc= CS_Dataset[['Complaint Type','Location','City','Borough']]
ccolumns = df_loc.describe(include="O").columns
ccolumns.tolist()
In [87]:
#label encoding
for col in ccolumns:
df_loc[col] = df_loc[col].astype("category").cat.codes
df_loc
<ipython-input-87-910586728146>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
0 14 139904 33 2
1 3 114140 1 3
2 3 141251 6 0
3 10 130296 6 0
4 10 88722 13 3
362171 10 63918 50 3
362172 15 140565 6 0
362173 14 125130 33 2
362174 3 144566 6 0
362175 3 45957 44 3
In [88]:
cor=df_loc.corr(method='pearson')
cor
In [89]:
### None of the value is significant to establish corelation between Complaint Type and location
Out[89]: <AxesSubplot:>
7.2 Reject H0: One or more sample distributions are not equal¶
In [90]:
#declare an empty list l
l = []
for t in CS_Dataset['Complaint Type'].unique():
l.append(CS_Dataset[CS_Dataset['Complaint Type']==t]['Time Elapsed2'].values)
In [91]:
from scipy import stats
stats.kruskal(*l)
In [92]:
# Assuming marging of 5% .05, pvalue < .05
# So it is in the critical region i.e. One or more sample distributions are not equal
# Average Response Time across Complaint Types is different
# Try Chi-square test for Complaint type vs City
df_ct
Complaint Type
Bike/Roller
0 16 0 0 1 0 22 124 0 0 ... 1 1 0 10
/Skate Chronic
Blocked
50 3436 159 514 138 3 17062 36445 177 0 ... 1202 1946 330 2845
Driveway
Derelict Vehicle 32 426 14 231 120 3 2402 6257 148 0 ... 425 356 267 2184
Disorderly
2 5 0 2 2 0 66 79 0 0 ... 2 2 0 25
Youth
Graffiti 1 4 0 3 0 0 15 60 0 0 ... 2 0 0 6
Homeless
4 32 0 2 1 0 275 948 6 0 ... 5 12 7 77
Encampment
SOUTH SOUTH
BREEZY CAMBRIA CENTRAL SPRINGFIELD STATEN
City ARVERNE ASTORIA Astoria BAYSIDE BELLEROSE BRONX BROOKLYN ... OZONE RICHMOND
POINT HEIGHTS PARK GARDENS ISLAND
PARK HILL
Complaint Type
Illegal
0 4 0 0 1 0 24 61 1 0 ... 1 2 1 11
Fireworks
Illegal Parking 62 1340 277 638 132 16 9889 33532 113 5 ... 602 596 291 6224
Noise -
2 1653 310 47 38 4 2944 13855 19 0 ... 82 223 38 783
Commercial
Noise - House
14 21 0 3 1 0 90 389 2 0 ... 5 3 1 18
of Worship
Noise -
29 409 145 17 13 1 9144 13982 29 105 ... 108 93 42 885
Street/Sidewalk
Panhandling 1 2 0 0 1 0 20 49 0 0 ... 0 0 2 13
Posting
0 3 0 0 1 0 18 58 0 0 ... 1 0 2 516
Advertisement
Squeegee 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0
Urinating in
1 10 0 0 1 0 54 155 0 0 ... 2 1 3 19
Public
In [93]:
chi2, p, dof, ex = stats.chi2_contingency(df_ct)
p value 0.0
The p-value of 0 means the two variables (Complaint Type and City) are NOT independent
In [ ]:
In [ ]: