You are on page 1of 8

The University of the South Pacific

School of Computing, Information and Mathematical Sciences

IS328: Data Mining


Assignment II Semester 2, 2020
Total Marks: 15%
Due Date: as shown on Moodle
This assignment covers both theoretical and practical aspect of this course. The marking rubric is based on Teamwork concepts/issues, and Data
& Information Management, which is in liaison with course outline and BSE program map. Rubrics have been taken from ACS-SCIMS rubrics
V1.0. This assessment covers the following course learning outcomes:

CLO 3: Apply various data mining methods for interpreting results


CLO 5: Apply data mining to solve a real-world problem using a data mining tool.

Overview
The goals of this assignment II or project are
 To apply the concepts learnt in the chosen scenario using appropriate data mining tools.
 As a team of 2 members, propose a project and implement it
For project ideas and data sources, you may visit
www.kdnuggets,com
www.kaggle.com
 Make a report in the given format
 This assignment is an important part of the course and counts for 15% of your final grade. Grades will be based on the choice of topic,
completeness of your project, quality of the project report and its application.

IS328: Assignment 2 Page 1 of 8 Dr.Vani Vasudevan


Grading
 This assignment is worth for 15 marks
 Late delivery without prior notification and permission from the instructor will result in a loss of 10% of the points for that deliverable per
day late.
 Plagiarism/Cheating in any form are strictly prohibited. If found, complete Assignment II will be nullified.

Plagiarism
For all the Assignment/Project works it’s essential that you avoid plagiarism. Not only do you expose yourself to possibly serious disciplinary
consequences, but you’ll also cheat yourself of a proper understanding of the concepts emphasized in the assignment.
 It’s not plagiarism to discuss the assignment with your friends and consider solutions to the problems together. However, it is plagiarism
for you to copy all or part of each other’s solutions.

Project Report (15 Marks)


<Report Size: Should be 15 – 20 pages which include tables and figures, single spaced font: times new roman, size: 12. Topics: times new
roman , size:14 with following project contents>

Project Contents

<Title of the Project Chosen>


<Member Names and IDs >
<The Title and Member information must be specified on a separate page>

 Abstract
 Table of Contents
 List of Figures
 List of Tables
IS328: Assignment 2 Page 2 of 8 Dr.Vani Vasudevan
 Introduction
Motivation
Problem Domain
Aim and Objectives
 Data Mining Techniques
 Dataset Used
 Data Mining Tasks
 Data Preprocessing
 Methods and Models
 Assessment
 Result Analysis and Interpretation
 Conclusion
 Lessons learned
 Future Work
 References

Important Guidelines:
Abstract

< The abstract conveys the most important messages regarding your project, such as: what you set out to do? How did you do it? What results
were obtained? Where it can be applied? However, for your project proposal, just specify “What you set out to do?” in one paragraph and the
remaining part you can complete in the project report.>

Table of Contents
<List the main topics in the report >
List of Figures
<Include all the figures included under each topic with proper numbering>
IS328: Assignment 2 Page 3 of 8 Dr.Vani Vasudevan
List of Tables
<Include all the tables included under each topic with proper numbering>
Introduction
<Provide a brief overview of data mining. Describe what your proposal is about and the organization of the rest of the proposal. Include whether
you will be performing data mining tasks, implementing a new algorithm in R or Weka or combination of both, or modifying some other system to
incorporate data mining features, etc. Basically, provide the nature of your project. This section should be a page or less in length.>
Motivation
<Write a paragraph describing what made you to zero down to this topic>
Problem Domain
<Describe the problem domain that you have chosen. You can refer some of the sample domains specified in this document to pursue your project.
This section should be about a page or less in length>
Aim and objective
<To apply data mining approach to a problem in a chosen domain such as health, education, science and so forth.>

Data Set Used


<Describe the data set(s) you will be using in your project. Include the origin of the data set, an overview of the data set organization, attributes
of the data, and challenges of the data set you've selected. Include any information you have about missing values in the data set. This should be
about one page in length.>
Data Mining Task
<Provide the specific tasks you will perform on the data set. Include specific questions you will investigate, and the goals for the tasks. This should
be independent of the specific techniques you will use to achieve your goals. This section should be a page or less>.
Methods and Models
<Describe in detail the data mining methods and models you plan to employ to achieve the goals you set in the Data Mining Task section of your
document. Include some mention of necessary data transformation. If you're implementing a technique, you should have some idea of how it will
be implemented and incorporated into Weka (or some other data mining tool). If you are combining techniques, explain how you intend to use the
output of one technique as input into another technique. This section should be up to 5 pages in length. Remember, be detailed, include how you
will select the best model from the model space, etc.>
Assessment
<Discuss the assessment methodology you will use to validate that you have found meaningful patterns. Will you use n-fold cross-validation,
confidence intervals for accuracy, etc. How will you create your training and test sets? What baseline models will you use? This section should
be about a page or two in length.>

IS328: Assignment 2 Page 4 of 8 Dr.Vani Vasudevan


Result Analysis and Interpretation
< Discuss each data mining algorithm applied and the result obtained .Do analyse the result and make necessary inferences (similar to Assignment
1). This section should be up to 5 pages>
Conclusion
< Based on the set aim, objectives and results obtained, summarize the findings for the chosen domain and dataset. This section should be a page
or less >
Lessons learned
<Include your our learning experience while choosing and implementing this assignment 2/project. This section should be a page or less>
Future Work
< Briefly explain the possible future directions of your work. This section should be a page or less >
References
<This is where you list bibliographic information for any references you made throughout the proposal. You should have 5 – 10 references.>

Domains
Choose any of these but not limited to these
Health
Business
Education
Science
Security/Crimes
Entertainment/Sports
Real Estate
Weather
Data Mining Techniques
Classification
IS328: Assignment 2 Page 5 of 8 Dr.Vani Vasudevan
Prediction
Clustering
Association
Outlier Detection
Choose whichever is appropriate for the chosen problem domain and datasets. Ideally each group should choose different set of data
domain and data mining task. For Example
Classification – Education
Prediction – Health
Association – Supermarket
Outliers – Security
Clustering - Business

IS328: Assignment 2 Page 6 of 8 Dr.Vani Vasudevan


Marking Rubrics

Mark
Unsatisfactory Satisfactory Good s % Marks
CBOK
(0%-49%) (50% - 75%) (76% - 100%) Alloc Attained
ated
Data and I. Do not identify I. Identified accurately I. Identified accurately
Information accurately any of the some of the data most of the data quality
Management data quality problems quality problems problems

II. Do not perform all II. Performed most of the II. Performed all the
required tasks correctly and required tasks correctly and required tasks correctly 13
consistently consistently and consistently

III. Provided inaccurate III. Provided relatively III. Provided accurate


and/or incomplete reports accurate and complete and complete
reports reports
Teamwork I. Inappropriate task I. Appropriate task I. Appropriate task
concepts & issues distribution and/or distribution & distribution &
failure in completion completion on time completion on time
of tasks in a given II. Submission of II. Individual work
2
timeframe. assignment on time integrated
II. Delay in submission of successfully
assignment

Sub Total &


comments

IS328: Assignment 2 Page 7 of 8 Dr.Vani Vasudevan


Submission Instructions:
1. Completely fill Mark Allocation Sheet and submit with your assignment. Failing to do so may result in deduction of 50% marks.
2. This assignment can to be submitted in groups of 2 members. Assign a group leader and submit the assignment through the group leader’s
moodle account. You have to submit 2 files (1. Project Report , 2.DataSets(original as well as refined with data preprocessing tasks))of
your project. The submission filenames should read A2_Report_Sxxx_Syyy.docx and A2_DataSet_Sxxx_Syyy.xlsx where Sxxx, Syyy are
student ids of the group members. For example, A2_S11003232_S01004488.docx. Incorrect submission will result in high penalty.

Mark Allocation Sheet

After having discussed as group, we recommend the following mark allocation to each group member based on contribution or lack of it
throughout the assignment.

Group Name ________________________

Project manager ________________________

Member ID Percentage contribution of allocated task

Certification

ID Member Name Signature

IS328: Assignment 2 Page 8 of 8 Dr.Vani Vasudevan

You might also like