You are on page 1of 2

Big data project (Special Topics in Computer Science -CSC649)

To initiate your big data project, you must form a group consisting of a minimum of 3 and a
maximum of 4 members. The project is divided into two parts, as outlined below:

1. Data analysis project


2. Data science project

Your project should involve substantial data, comprising more than 500 original (self-collected)
records. Avoid using data downloaded from the Internet.

For the data analysis project, analyze your data by generating various reports using visualization
tools such as Matplotlib. Include your exploratory data analysis (EDA) findings in your technical
report. The marks for the data analysis project are distributed as follows:

Data preprocessing (data mining):


a. data collection: >500 records 2 marks
b. data cleansing 2 marks

Data analysis (EDA):


a. Numerical data (min. 8 charts) 8 marks
b. Categorical data (min. 8 charts) 8 marks

For the data science project, create a classification model utilizing a machine learning (ML)
algorithm. This model should be based on the same data used in the data analysis project. Ensure
your classification model has at least ten features and one target (label). Employ CV-KFold to
guarantee a model accuracy exceeding 60%. Include your classification model findings in your
technical report. The marks for the data science project are distributed as follows:

Data preprocessing (data mining):


a. data split 2 marks
b. data selection (features and label) 2 marks
c. data transformation 2 marks

Classification model (ML algorithm)


a. cross-validation (KFold) 8 marks
b. testing & performance analysis 8 marks
c. Product/App delivery 8 marks

Submit your technical report, which encompasses both the data analysis and data science projects,
by the 14th lecture of the week. The technical report should not exceed 25 pages. Detailed
information about the technical report items and format can be found in the Group Project
Evaluation Form, which includes a breakdown of the marks for the technical report.

Due date: Week 14


Rubric
ITEM 1-2 3-4 5-6 7-8
Numerical data Graphs are Knowledge >= Knowledge >= Knowledge >=
either missing 6 8 10
or incorrect. Some graphs The graphs are The graphs
contain errors visually appear
in knowledge appealing, and professional,
representation. the knowledge and the
No result is accurately knowledge is
analysis represented. accurately
Graphs are represented.
labelled and Graphs are
titled. labelled and
Contain some titled.
result analysis. Critical analysis
of the results.

Categorical data Graphs are Knowledge >= Knowledge >= Knowledge >=
either missing 6 8 10
or incorrect. Some graphs The graphs are The graphs
contain errors visually appear
in knowledge appealing, and professional,
representation. the knowledge and the
No result is accurately knowledge is
analysis represented. accurately
Graphs are represented.
labelled and Graphs are
titled. labelled and
Contain some titled
result analysis. Critical analysis
of the results.

ITEM 1-2 3-4 5-6 7-8


Cross-validation KFold >= 2 KFold >= 3 KFold >= 4 KFold >= 5
(KFold) CV-KFold table CV-KFold table CV-KFold table CV-KFold table
analysis is not analysis is analysis is analysis is
available. available but available and available,
>= 20 sets of incorrect. correct. correct and
experiments. >= 30 sets of >= 40 sets of looks
experiments. experiments. professional.
>= 50 sets of
experiments.

Testing & Accuracy >60% Accuracy >70% Accuracy >80% Accuracy >90%
performance for test set for test set for test set for test set
analysis

Product/App Product does Product almost Product meets Product meets


delivery not meet the meets the the project the project
project project objective. objective
objective. objective. Product able to perfectly.
Product makes Product makes make Product able to
predictions predictions predictions make
incorrectly. almost correct. correctly. predictions
Product not Product Product excellently.
relevant to the adequately relevant to the Product highly
community and relevant to the community and relevant to the
practitioners. community and practitioners. community and
practitioners. practitioners

You might also like