You are on page 1of 4

Name

APPLIED DATA SCIENCE


2nd Qtr SY 2019-2020

WORKSHEET #4: CLEANING DATA IN PYTHON

1
Import dob_job_application_filings.csv. Write codes below to determine the first five columns of the dataset. How
many records are in the dataset?
Code

Answers to Questions

2 Date:
Print out the value count for the boroughs. Write the code and the output.
Code

Output

Page 1 of 4
Name
APPLIED DATA SCIENCE
2nd Qtr SY 2019-2020

WORKSHEET #4: CLEANING DATA IN PYTHON

3 Date:
Create a histogram for the column ‘Existing Zoning Sqft’ of the same dataset. Rotate the axis labels by 70 degrees and
use a log scale for both axes. Write the code here and submit a copy of the plot through WS04-03: P04 Cleaning Data in Python
(#3).
Code Output

4 Date:
Import airquality.csv. Melt the columns Ozone, Solar.R, Wind and Temp into rows and assign to
airquality_melt. Rename the default variable column to measurement and the default value column to reading. Print
head() of airquality_melt. Write the code here and submit a copy of the output through WS04-04: P04 Cleaning Data in
Python (#4).
Code Output

5 Date:
Pivot airquality_melt from #4, with the rows indexed by ‘Month’ and ‘Day’, the columns indexed by
‘Measurement’ and ‘Reading’. Assign this to airquality_pivot. Print out the head of airquality_pivot. Write
the code here and submit a copy of the output through WS04-05: P04 Cleaning Data in Python (#5).
Code Output

Page 2 of 4
Name
APPLIED DATA SCIENCE
2nd Qtr SY 2019-2020

WORKSHEET #4: CLEANING DATA IN PYTHON

6 Date:
Import the following files: uber_apr.csv, uber_may.csv, uber_jun.csv. Concatenate these files into a single file, uber.
Print out the head of uber. Write the code here and submit a copy of the output through WS04-06: P04 Cleaning Data in Python
(#6).
Code Output

7 Date:
Merge the following files: ‘site.csv’ and ‘visited.csv’. The output should be as follows:
`

Code

Page 3 of 4
Name
APPLIED DATA SCIENCE
2nd Qtr SY 2019-2020

WORKSHEET #4: CLEANING DATA IN PYTHON

8 Date:
Import tips.csv. Write the name and data type of the seven columns of this dataset.
Code

Answer to Question

9 Date:
Convert the sex and smoker columns to ‘category’.
Code

10 Date:
Import tips_1.csv. Convert the total_bill and tip columns to ‘numeric’.
Code

Page 4 of 4

You might also like