You are on page 1of 1

DS100-3

Worksheet 2.5
APPLIED DATA SCIENCE
CLEANING DATA IN PYTHON

Name:

Page 1 of 1

Write codes in Jupyter notebook as required by the problems. Copy both code and output as screen grab or screen shot and paste
them here.

1 Import the following files: uber_apr.csv, uber_may.csv, uber_jun.csv. Concatenate these files into a single file,
uber. Print out the first 5 lines of the resulting DataFrame.
Code and Output

2 Import tuberculosis.csv. Print the first five lines. Melt the DataFrame, keeping the country and year columns fixed.
Print the first four lines of the melted DataFrame.
Code and Output

3 Use the melted DataFrame in the previous problem. Create (and populate) a gender and an age column from the variable
column. Print the first five lines of the resulting DataFrame. Convert the age column to a numeric data type. Hint: use
pd.to_numeric, with the errors parameter equal to ‘coerce’. Show evidence that this column has indeed been
transformed into a numeric.
Code and Output

4 Merge the files site.csv and visited.csv into a single dataframe. Use the column name of site and the column
site of visited. Make sure that the index labels are in order. Print the resulting DataFrame.
Code and Output

5 Import tips.csv. This dataset has a column named sex. Write a function named recode_gender that has one
parameter (gender) and will recode Male to 0 and Female to 1, and will return np.nan if the value is neither Male nor
Female. Apply this function to the column sex of tips using apply(). Print the first five lines of the new dataframe.
Code and Output

Page 1 of 1

You might also like