You are on page 1of 2

SE 416 Data Mining

Term Project
Deadlines: Intermediate: April 15th, 2019 Final: May 06th, 2019

1. Goal of the Project

In this term project, you are expected to design and develop a predictive data mining model,
that is, you will perform a classification (or regression, if you like) task. Then, you are
expected to document your work in a formal report, and to present it in front of the class.

2. Dataset

You can obtain the dataset from several sources like:


 You can use one of the published datasets at kaggle.com.
 You can use one of the published datasets at UCI ML data repository.
 You can collect authentic data from an original source.

Your dataset must not be a very small one. It should contain at least about 1000 objects with
at least about 10 attributes. Besides, the dataset should contain both categorical and numerical
attributes so that you can apply several transformations.

3. Algorithms

You can use any algorithms you like. However, you are supposed to try as many as
algorithms with several options, and then compare the predictive performances of them in
your report. You will try to create a model with the best predictive performance. In order to
do this, you will apply any data preprocessing methods that may work.

4. Tools and Programming Languages

You must use either Python, or R, or Julia programming languages to perform everything in
this project, including data processing, model development, and testing. You are free to use
any common data science libraries. If you like you can use several GUI tools like WEKA etc.
during your work, but at the end, you must have a working code that does the whole work
when you run it. You also need to append your code to your report.

5. Delivery of Project Report and Presentation

There will be two reporting and presentation deliveries: an intermediate and a final one.

In the intermadiate delivery, you have to prepare a report and presentation with the
following outline:

1
1. Introduction
 Provide a small introduction to the project.
2. Problem
Explain the problem you are going to solve in this study.
3. Dataset
 Explain the dataset, givin relevant statistics, correlation charts etc.
 Explain the possible preprocessing that may be helpful.

In the final delivery, you have to extend your report and presentation with the following titles:
4. Data Mining Algortihms Used
 Explain the algorithms you have used, including why you have selected them, and their
parameters.
5. Experiments and Results
 Explain the details of the experiments, and present your result in clear and organized way
with tables and charts.
 Interpret the results.
6. Conclusion
Appendix
Add your project code in this section.

6. Project Teams

You can do this project as a single or as a team of at most two people.

You might also like