Professional Documents
Culture Documents
Details of Task: The assessment aims to evaluate your skills in performing data cleansing
techniques and associated explanation on a data set. It is an individual assignment.
Download the raw data file from Moodle. Students will have different data files. The file and
solutions are linked to your name, so you should not swap or share files. Of course you can work
together on defining strategies and cleaning the data.
Clean the data using techniques discussed in lectures and practiced in Tutorials. You may use any
software.
Report the errors that you found using the template below. You can copy/paste the corrected data into
rows of the table. You may have blank rows or need to insert more rows. In the description column,
describe in a few words what error you fixed (5 marks)
Write a description of the strategy you developed for cleaning the data in under 300 words (dot points
are fine). This strategy should be applicable to any data set you encounter in the future (4 marks).
Business language and presentation format is important. Ensure you use correct terminology
appropriate to data and databases and make sure that your table fits on the landscape page. Your
explanation should be in portrait format on the second page (1 mark).
The reporting template follows (do not submit this instructions page)
Notes: The gender is assigned 0 or 1 , but may not match first name. Ignore these cases. The data is
from a parallel universe where gender assignment may not match preconceived notions of
male/female names.
Describe your assumption when correcting data (see the picture below).
Your student ID Your name
☐ I have read the conditions in the cover sheet and agree that I have complied with them
My General Strategy for Cleaning Data in 300 words or less