Professional Documents
Culture Documents
October 4, 2019
Abstract
This report outlines the analysis of relation between inspection for each property in United State
and violation records. The data documented in this report contains a list of businesses that have at
least 1 violation record, the number of each type of violation based on violation code as well as the
minimum, maximum and average violations point of every month from 7-2015 until 12-2017. Some
serial number contain multiples of violation which may lead to a great deduction of points. From
the findings, we can understand that even though the business was rated as A grade, it violates
The purpose of this project was to process a great deal of food inspection and health
violation data and produce a precise summary for each task. By storing the raw data into a
database, so users can execute SQLite queries to interact with database or perform analysis.
As long as the user know how to manipulate the SQL query syntax, it could be time saving
method to get the data they want. Representation of data by using MatPlotLib can be used to
Database Structure
Throughout this assignment, I created 3 tables which are “inspections”, “violations” and
For the violations tables, I also used similar function to provide attributes type for the data.
Firstly , I connect the python file with database I created before with using sqlite3. Then, I
retrieve all data in the serial_number. Use the retrieved serial_number in new sql query as a
condition. The method I used to calculate the number of each type of violation based on
violation code was if the violation code first time show up , then set the violation code as
key and provide value of 1 and store it in dictionary. If the violation code appeared before,
then do increment.
The result of “count” dictionary will be (I am only using first 3000 data of violation and
In order to get total point for the specific serial number , I had ran sql query to get total
violation point for a serial number and store it in a dictionary which called “point”. At
below, there are few screenshot of sql query command and its result.
From the screenshot above, it represents that every serial number contain different points.
Later, I ran another sql query to get data of serial number for each zipcode in different time.
Result in https://sqliteonline.com/:
number.
I used another nested dictionary (zip_code_point) to store array of data, the index 0 represent the
zip code that contain the highest violation point , the index 1 represent the highest violation points;
the index 2 represent the zip code that contain the least violation point , the index 3 represent the
least violation points. Finally, the index 4 represent the average violation points in that month. I
also deleted the test key which is used for initial comparison.
I only used first 3000 records data from both excel files. ( inspection and violation). Those data only