You are on page 1of 26

A Spatial Analysis of Assaults in Chicago, Illinois

Ewa Kielasinska

21 March 2014 GISC9308-D4b Mr. Ian D. Smith, OLS, OLIP Environmental Studies c/o Niagara College 135 Taylor Road Niagara-on-the-Lake, ON L0S 1J0

Dear Mr. Smith, RE: A Spatial Analysis of Assault in Chicago, Illinois, 2011 Please accept this as the final deliverable for GISC 9308 Spatial Statistics. This deliverable consists of two parts, the first of which was delivered to you on February 3rd, 2014. This report will be the second, and final portion. The purpose of this assignment was to gather and report on a set of spatial data (Smith, 2014). This report focuses on the incidence of assault in Chicago, Illinois in 2011. The data were collected from the City of Chicago Data Portal. It was first investigated, then processed, prior to using it for the creation of two predictive surfaces: Inverse Distance Weighting and Kriging. After comparing the two models, as well as comparing them to a Hot Spot Analysis, it was determined that Kriging produced a more amiable results from the point data. Additional efforts were taken to make inferences based on demographic characteristics of the City. I look forward to receiving your comments. Kindest Regards, Ewa Kielasinska, M.A. GIS Analyst, Parallel Spatial Solutions EK/ Enclosures: A Spatial Analysis of Assault in Chicago, Illinois [Report]

A Spatial Analysis of Assaults In Chicago, Illinois

21 March 2014

Contents
1. 2. 3. Introduction ............................................................................................................................. 1 Purpose .................................................................................................................................... 1 Background .............................................................................................................................. 1 3.1. 3.2. 3.3. 4. Study Area ........................................................................................................................ 1 Social Environment .......................................................................................................... 2 Defining Assault ............................................................................................................... 3

Methodology............................................................................................................................ 3 4.1. 4.1.1. 4.1.2. 4.1.3. 4.2. 4.2.1. 4.2.2. 4.2.3. 4.2.4. Data .................................................................................................................................. 3 Spatial Data................................................................................................................... 3 Demographic Data ........................................................................................................ 3 Assault Data .................................................................................................................. 4 Analysis and Results ........................................................................................................ 7 Data Exploration ........................................................................................................... 7 Inverse Distance Weighting........................................................................................ 12 Kriging ........................................................................................................................ 13 Hot Spot Analysis ....................................................................................................... 16

5. 6.

Discussion .............................................................................................................................. 18 Conclusions ........................................................................................................................... 20

Works Cited .................................................................................................................................. 21

Figures
Figure 1: City of Chicago, Illinois - Basemap with overlayed Census Tracts ................................ 2 Figure 2: Individual Assault Events - Chicago, Illinois, 2011 ........................................................ 5 Figure 3: Assaults per census tract, 2011 (Equal Interval) ............................................................. 6 Figure 4: Assaults per census tract, 201 (Natural Breaks) .............................................................. 7 Figure 5: Integrated assault events, 500 m...................................................................................... 8 Figure 6: Collected assault events, 500 m ...................................................................................... 9

Figure 7: Distribution of collected events ..................................................................................... 10 Figure 8: Normal QQ plot of collected events .............................................................................. 10 Figure 9: Semivariogram cloud for collected events ................................................................... 11 Figure 10: Directionality in the semivariogram ............................................................................ 11 Figure 11: Inverse Distance Weighted Surface, Collected Events 500m ..................................... 12 Figure 12: Error for the predicted IDW surface............................................................................ 13 Figure 13: Ordinary Kriging Surface, 500m Collected Events..................................................... 14 Figure 14: Kriged (ordinary) semivariogram ................................................................................ 15 Figure 15: Kriged (ordinary) covariance ...................................................................................... 15 Figure 16: Cross-validated Kriging Model ................................................................................... 16 Figure 17: Hot Spot Analysis ........................................................................................................ 16 Figure 18: IDW vs. HSA .............................................................................................................. 17 Figure 19: Kriging vs. HSA .......................................................................................................... 18 Figure 20: Proportion of the population identifying as White, 2010 US Census (Equal Interval) 19 Figure 21: Proportion of the population identifying as Black, 2010 US Census (Equal Interval) 19

1. Introduction
Inverse Distance Weighting and Kriging are just two ways that Geographic Information Systems (GIS) users can use point data to create predictive surfaces. Predictive surfaces use interpolation methods to assign values to unknown points, based on known points with meaningful data attached to them, to make inferences across a study area. This study considers the incidence of assault in the City of Chicago, Illinois for 2011. Two predictive surfaces are created, and compared. A Hot Spot Analysis is also conducted, and compared to each of the interpolated surfaces. Additionally, a short discussion of possible social factors driving a phenomenon such as assault is included.

2. Purpose
The goal of this report is to highlight the spatial patterning of violent crime in the City of Chicago. Ultimately, the aim is to shed light on the spatial patterning of assaults, based on data collected from 2011. Using methods of spatial interpolation, the analysis aims to communicate whether spatial patterning of assaults exists in the City of Chicago. It will also attempt to make inferences based on the social characteristics of place that may be driving those patterns, should they exist.

3. Background
3.1. Study Area
The City of Chicago is located on the south-western shores of Lake Michigan. It is on the eastern coast of the State of Illinois, border by Wisconsin, Indiana, Iowa, and Missouri. The census population of the City was 2,695,598 in 2010, with an estimated population density of 11,842 persons per square-mile (Chicago QuickFacts, 2014). The City of Chicago is made up of 801 census tracts, and has a surface area of approximately 235 square-miles. Figure 1 provides a visual of where Chicago is situated, as well as its census tract boundaries.

Page | 1

Figure 1: City of Chicago, Illinois - Basemap with overlayed Census Tracts

3.2. Social Environment


The population for the City of Chicago was an estimated 2,705,248 people in 2011 (American FactFinder, 2013). It is also estimated that the population of the City fluctuates by approximately an additional 200,000 every day, accounting for those commuting into the city for work (Chicago, Illinois, 2013). The average age of a Chicago resident is 32.9 years, nearly ten years younger than the average age of Illinois. It is primarily made up of a Caucasian, AfricanAmerican, and Asian population, accounting for nearly 95% of all of the residents in the City (Chicago, Illinois, 2013). Eighty percent of the Citys inhabitants, over the age of 25, have graduated high school, and 33% have achieved a Bachelors degree or higher (Chicago, Illinois, 2013). The median household income for the Citys residents is below the average for the state, at approximately $43,000, compared to $53,000. It is reported that approximately 24% of Chicagos population is living in poverty, with a breakdown of 11% being Caucasian, 34% African-American, 26% Hispanic or Latino, and 29% Other (Chicago, Illinois, 2013).

Page | 2

3.3. Defining Assault


For the purposes of this analysis, we will be considering all types of assault as being one in the same. However, the data received from the City of Chicago are incredibly detailed and include specific definitions of assault for each event. The various types of assault considered by the City of Chicago include: Criminal Sexual Assault: Any act of a sexual nature directed against another person, forcibly or against another persons will, or in cases where the other person is incapable of giving consent. Includes acts against children by family members, can be aggravated or non-aggravated, predatory, or with or without a weapon (Defintion & Description of Crimes, 2014). Aggravated Assault: Any attack by one person on another where the offender threatens the victim with a weapon, i.e. handgun, knife, other firearm, other dangerous weapon (Defintion & Description of Crimes, 2014). Simple Assault: Any physical attack by one person on another where neither of those involved experience any obvious signs injury. Such injuries might include, but are not limited to, loss of teeth, broken bones, severe lacerations, internal injury (Defintion & Description of Crimes, 2014)

4. Methodology
4.1. Data
Data used for this analysis has come from a number of source, depending on its purpose. Each dataset will be discussed individually.

4.1.1. Spatial Data


For the purposes of this study, the investigation of assaults in Chicago will occur at the census tract level. Census tract data were collected from the United States Census Bureau (TIGERLine, 2014). Census tracts are relatively stable statistical divisions of a larger area, such as a county or state, and are the basic geographic unit by which census data are collected in the United States (2010 Geographic Terms and Concepts, 2014). This level of geography is being used in order to facilitate comparisons to other demographic data in order to make inferences regarding possible relationships to any patterns in assault that may be identified in this analysis.

4.1.2. Demographic Data


The census tract data discussed in Section 4.1.1 came equipped with a variety of census data. The census tracts has associated demographic information from the 2010 Unites States Census. This census occurs at a ten year interval, and collects information such as race, income, Page | 3

household status, number of people per household, and more. For the purpose of this analysis, demographic data is important in order to make possible relational inferences as to why spatial patterning of assault events may occur. There has been evidence throughout the literature that supports these relationships (Kielasinska, 2012).

4.1.3. Assault Data


The data used in this analysis was collected from the City of Chicago Data Portal (Crimes 2011, 2011). This is an open source collection of data, providing specific information regarding the crimes that were reported in 2011 includung, but not limited to their associated case number, type of offence, description, whether an arrest was made, and more. Most importantly, however, each incident is accompanied by a coordinate, which allows for interest groups to conduct geospatial analyses. For the purpose of this report, the focus will remain assault incidents occuring in 2011. The total number of events for 2011 was 20, 091. Seventeen events have been ommitted from this analysis because they did not have associated geographic corrdinates, for a total of 20,074 assaults. There are inherent errors built into this data, however. This data only represent those events which have been formally reported to police. All of those events not reported to police will not be included in this analysis, and therefore can skew the data to present fewer events than there were in reality. This may be especially damaging if there is a consistent underreporting of crime in general, and assault more specifically, in certain parts of the city. Unfortunately, given the nature of the events being explored, there is little we could do to account for underreporting. Based on the reporting of 20,091 individual assault events in 2011, the calculated rate of assault for the City of Chicago is 742 per 100,000. However, crime statistics for the city report a number significantly less than this, reporting about 518 assaults per 100,000 (Chicago, Illinois, 2013). In order to be able to analyze this data at the census tract level, the total number of individual events were aggregated to the census tract level by way of a spatial join. A spatial joins occurs by proximity, where one feature class is joined another feature class (Figure 2).

Page | 4

Figure 2: Individual Assault Events - Chicago, Illinois, 2011

In this case, the point data are aggregated to the census tract level, creating a new polygon feature class that includes a count of total events per tract, as we see in Figure 3..

Page | 5

Figure 3: Assaults per census tract, 2011 (Equal Interval)

Aggregating the data to the census tract level allows for the visualization of the events on a spatial level. Though some spatial patterning can be inferred from looking at a visualization of events like this, there is little statistical support to support any conclusions a reader might infer. Additionally, depending on the classification of the data in this form, the patterning can appear to be completely different (Figure 4).

Page | 6

Figure 4: Assaults per census tract, 201 (Natural Breaks)

4.2. Analysis and Results 4.2.1. Data Exploration


In order to maximize the benefits of a predictive surface analysis using event data, such as Inverse Distance Weighting or Kriging, the data must first be processed. Because there are so many events in this dataset (Figure 2), determining whether there is an underlying pattern is difficult to assess, even when a predictive surface is created before any processing. To make the spatial patterning of assault events clearer, the data underwent an integration and collection process, using a 500 metre (m) distance buffer. The Integrate tool, as part of ArcGISs Data Management Toolbox, changes the data to make features coincident, or identical, if they fall within a specified distance of one another (Integrate, 2013). In other words, assaults events occurring within 500 m distances of one another are assigned the same coordinate, effectively reducing the number of points in the study area after they are collected (Figure 5).

Page | 7

Figure 5: Integrated assault events, 500 m

The results of the integration procedure, however, require more processing. At this stage, all that has been accomplished is that the cumulative sum of all assaults have been grouped into a single location based on a 500 m distance. In order for each point to have a value, or weight, the events must be collected. Collect Event is another Data Management Tool. It combines all of the coincident points that were assigned the same location through the integration process, creating a new feature class with a count in the attribute table per each location, and begin to visualize any spatial patterning (Figure 6).

Page | 8

Figure 6: Collected assault events, 500 m

Before moving forward with the surface analyses, it is important to expore the data for a better understanding. This exploration is being done on the already processed data. The histogram for the data shows that there is a positive skew (Figure 7).

Page | 9

Figure 7: Distribution of collected events

The positive skew in the historogram is supported when viewing the Normal QQ Plot (Figure 8).

Figure 8: Normal QQ plot of collected events

A positive skew is common in crime data, and usually indicates some level of clustering (Mapping Crime, 1999). In the case of the data for this analysis, the goal is to determine whether there is an element of clustering in the assault data, and make inferences about the social environment that might be driving that pattern.

Page | 10

The spatial interpolation of data is based on Toblers first law (or the first law of geography), stating that everything is related to everything else, but near things are more related than distant things (Chen & Liu, 2012). Exploring the data in the form of semivarigram clouds allows users to investigate whether there is any form of spatial dependence among the points. If this dataset was highly spatially autocorrelated, what should be expected is a larger range in values along the Y-axis, indicating more dissimilarity, as distance from 0 increases. However, what is seen in Figure 9 is a lack of change with distance from 0 (Spatial Autocorrelation, n.d.).

Figure 9: Semivariogram cloud for collected events

Seeking to ensure that this is true in multiple direction, Figure 10 shows a similar spread to what was seen in Figure 9.

Figure 10: Directionality in the semivariogram

Page | 11

The collected points are now ready for use in further analyses. Two analyses in particular will be utilized in this report, including Inverse Distance Weighting, and Kriging.

4.2.2. Inverse Distance Weighting


Inverse Distance Weighting (IDW) functions by using the weighted average of known points in order to calculate the values of unknown points, as a method of interpolating between data and ultimately creating a smooth and informative surface (Chen & Liu, 2012). Figure 11 shows the results of an IDW surface creation having used the collected count of events processed for this analysis.

Figure 11: Inverse Distance Weighted Surface, Collected Events 500m

The surface has managed to identify a few hot spots in various regions of Chicago. Based on the graduated symbology seen earlier in Figure 6, these hot spots were anticipated. The larger the number of events, the more weight they bear when creating this surface. The surface is not particularly smooth, however does provide some insight into the concentration and spread of assault events across the city. Figure 12 shows the error in the predictive surface generated.

Page | 12

Figure 12: Error for the predicted IDW surface

The geostatistical analyst automatically outputs information regarding the error in the predictive surface created. For the case of the IDW, the Root Means Square (RMS) error is 59.5. RMS is a measure calculating the difference between the known points (input data) and the unknown points (predicted surface) (RMS Error, n.d.), and is used an indicator of how well the IDW model is able to interpolate the input data. The larger the RMS error, the greater the discrepancies between known and unknown points.

4.2.3. Kriging
Kriging uses more information about a dataset in order to create a predictive surface. This is different from IDW because it assumes that surrounding points have some level of spatial correlation, and it takes this into effect when building the model, which makes it especially effective (How Kriging Works, 2011). For the purposes of this analysis, an ordinary kriging was executed. The surface generated by ordinary kriging can be seen in Figure 13.

Page | 13

Figure 13: Ordinary Kriging Surface, 500m Collected Events

The surface generated by the kriging method produced one that is similar to the one produced by the IDW method (Figure 11), however provides a smoother result. The semivariogram for the kriged model support the first law of geography, that things which are closer are more related than things that are further away. The spread of the data is less the smaller the distance is from the origin, and becomes more scattered further away. This is supported by the gradual rise from the nugget (the point at which the model intersects the Yaxis), the range (the point at which the model platues along the X-axis), and the sill (the point at which the model plataeus along the Y-axis). The area between the range and sill are where observations are the most spatially correlated (Figure 14) (How Kriging Works, 2011).

Page | 14

Figure 14: Kriged (ordinary) semivariogram

The covariance confirms this relationship (Figure 15), where as distance increases the less correlation there is between the points.

Figure 15: Kriged (ordinary) covariance

To determine the quality of the predictive surface created by ordinary kriging, the geostatistical wizard provides the user with prediction errors. The RMS error for the kriged model is 57.99, which comes very close to the RMS error produce with IDW (59.56). Based on this it would be difficult to say that one model is better than another. However, the additional output provides more support. Well suited models have a Mean Standardized Error close to 0 (produced is 0.006), Average Standard Error close to the RMS error (58.05 vs. 57.99), and a Standardized RMS close to 1 (1.00) (How Kriging Works, 2011). Based on those parameters, the model produced is a good fit. A summary of the cross-validation outcomes can be found in Figure 16. .

Page | 15

Figure 16: Cross-validated Kriging Model

4.2.4. Hot Spot Analysis


An additional analysis was done in order to compare how well the IDW and Kriged surfaces compared to the statistically supported output from a Hot Spot Analysis (HSA). HSA is done using the Hot Spot Tool in AcGIS, and produces an output of statistically significant hot and cold spots based on weighted inputs. The results of the HSA can be seen in Figure 17.

Figure 17: Hot Spot Analysis

Page | 16

Without creating a surface based on this output, some of the same patterning can be seen. Using this output, we can also comment how well each of the previous methods compare to the HSA by simply overlaying the points over the surfaces. Figure 18 shows the comparison between the IDW predictive surface and the results of the HSA.

Figure 18: IDW vs. HSA

It can be seen, based on the statistically significant hot spots identified through HSA, that the IDW has underestimated the spread of the clustering in the assault data. Based on the HSA, the spread should be much greater. This is especially true for statistically significant hot spots, identified by the IDW as less significant. The same classification was used for both the HSA and the IDW in order to facilitate this type of comparison.

Page | 17

The comparison between Kriging and the HSA is seen in Figure 19.

Figure 19: Kriging vs. HSA

The surface prediction made by Kriging compares much better with the results of the HSA. This is especially true for the cluster of events to the north. Again, like with the IDW comparison, some of the areas are not identified appropriately according to the results of the HSA, however both methods are inherently different.

5. Discussion
Using predictive surfaces alongside demographic data can allow for analysts to make some inferences about the underlying causes of events, such as assaults. This discussion will consider race as a factor associated with violent. We begin by mapping the proportion of the population who identify as White (Figure 20), and those who identify as Black (Figure 21) based on the 2010 United States Census. These two are used because there is a stark spatial contrast between these two populations.

Page | 18

Figure 20: Proportion of the population identifying as White, 2010 US Census (Equal Interval)

Figure 21: Proportion of the population identifying as Black, 2010 US Census (Equal Interval)

Page | 19

Without doing any statistical modelling, it can be seen that the spatial clustering of Black populations is very similar to the clustering we see in the spatial distribution of assaults. This is not to suggest that assaults only occur in neighbourhoods characterised by Black populations. A quantitative analysis would need to be undertaken and other confounding variables included. An argument can be made, however, for the inclusion of race as one of those variables based on the spatial patterning seen in this discussion.

6. Conclusions
Creating predictive surfaces can prove to be useful for point data as an investigative tool to determine whether spatial patterning exists. In this investigation, assault data for the City of Chicago was used. Both IDW and Kriging surfaces showed varying degrees of clustering, which was anticipated with this type of data. Kriging provided a more sound result, offering more tool for validating the model, and also produced a smoother clustering surface. Kriging also produced a better fit when compared with the statistically significant hot and cold spots generated by a Hot Spot Analysis. Finally, a short discussion provided some insight into how the clustering of assault events compares against some demographic variables for the City. Further analysis would provide better insight into the driving forces behind the incidence of crime in a large city such as Chicago.

Page | 20

Works Cited
2010 Geographic Terms and Concepts. (2014, March). Retrieved from United States Census Bereau: http://www.census.gov/geo/reference/gtc/gtc_ct.html American FactFinder. (2013, May). Retrieved from United States Census Buraeu: http://factfinder2.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=bkmk Chen, F.-W., & Liu, C.-W. (2012). Estimation of the spatial rainfall distribution using inverse. Paddy and Water Environment, 209-222. Chicago QuickFacts. (2014, 01 07). Retrieved February 1, 2014, from United States Census Bureau: http://quickfacts.census.gov/qfd/states/17/1714000.html Chicago, Illinois. (2013). Retrieved from City-Data: http://www.city-data.com/city/ChicagoIllinois.html Crimes 2011. (2011). Retrieved from City of Chicago Dara Portal: https://data.cityofchicago.org/Public-Safety/Crimes-2011/qnrb-dui6 Defintion & Description of Crimes. (2014). Retrieved February 2, 2014, from Chicago Police Department: http://gis.chicagopolice.org/clearmap/crime_types.html How Kriging Works. (2011). Retrieved from ArcGIS Resource Centre: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//009z00000076000000.ht m Integrate. (2013). Retrieved from ArcGIS Help 10.1: http://resources.arcgis.com/en/help/main/10.1/index.html#//00170000002s000000 Kielasinska, E. (2012). The Geography of Urban Arson in Toronto. Retrieved from Open Access Dissertations and Theses: http://digitalcommons.mcmaster.ca/opendissertations/6563 Mapping Crime. (1999, December). Retrieved from National Criminal Justice Reference Service: https://www.ncjrs.gov/html/nij/mapping/ch2_6.html RMS Error. (n.d.). Retrieved from ESRI GIS Dictionary: http://support.esri.com/en/knowledgebase/GISDictionary/term/RMS%20error Smith, I. (2014, January). Geostatistical Analysis of Student Collected Spatial Data. GISC9308 Spatial Statistics.

Page | 21

Spatial Autocorrelation. (n.d.). Retrieved from Introduction to Spatial Analysis: planet.botany.uwc.ac.za/nisl/GIS/spatial/chap_1_39.htm TIGERLine. (2014, March). Retrieved from United States Census Bureau: http://www.census.gov/geo/maps-data/data/tiger-data.html

Page | 22