You are on page 1of 4

ACF5320: Assignment 1: Data Cleaning

Assessment task title: Data Cleaning


Due Date: Week 6
Weighting/Value: 10%
Word limit: 600 word equivalent
Presentation requirements: You will submit on Moodle.
Estimated return date: Week 8
Criteria for marking: You will be assessed on how clean your data is and your explanations for the
techniques you applied.
Learning objectives assessed: This assessment task is designed to test your group’s achievement of
learning objectives 2 to 5, with particular emphasis on 2 and 4.
Submission details: Softcopy version should be submitted via Moodle before the due date.
Penalties for late lodgement: A penalty of one mark will be deducted for each hour the assignment is
late.

Details of Task: The assessment aims to evaluate your skills in performing data cleansing
techniques and associated explanation on a data set. It is an individual assignment.

Download the raw data file from Moodle. Students will have different data files. The file and
solutions are linked to your name, so you should not swap or share files. Of course you can work
together on defining strategies and cleaning the data.

Clean the data using techniques discussed in lectures and practiced in Tutorials. You may use any
software.

Report the errors that you found using the template below. You can copy/paste the corrected data into
rows of the table. You may have blank rows or need to insert more rows. In the description column,
describe in a few words what error you fixed (5 marks)

Write a description of the strategy you developed for cleaning the data in under 300 words (dot points
are fine). This strategy should be applicable to any data set you encounter in the future (4 marks).

Business language and presentation format is important. Ensure you use correct terminology
appropriate to data and databases and make sure that your table fits on the landscape page. Your
explanation should be in portrait format on the second page (1 mark).

The reporting template follows (do not submit this instructions page)

Notes: The gender is assigned 0 or 1 , but may not match first name. Ignore these cases. The data is
from a parallel universe where gender assignment may not match preconceived notions of
male/female names.

Describe your assumption when correcting data (see the picture below).
Your student ID Your name

The file you used

Name, address, etc from data


Counter from data
ID# from data
What you did to correct this
record (e.g. delete duplicate
ID#), assume correct age, etc.
ID# Counter First_Name Last_Name Street address City Ph_Number Credit_Card Total_Spent Credit_Rating Gender Age Transaction_Ref Description

Student ID --------------------------------------- Family name, First name ----------------------------------------------------------------

Data File Name -------------------------------------

☐ I have read the conditions in the cover sheet and agree that I have complied with them
My General Strategy for Cleaning Data in 300 words or less

You might also like