Professional Documents
Culture Documents
To initiate your big data project, you must form a group consisting of a minimum of 3 and a
maximum of 4 members. The project is divided into two parts, as outlined below:
Your project should involve substantial data, comprising more than 500 original (self-collected)
records. Avoid using data downloaded from the Internet.
For the data analysis project, analyze your data by generating various reports using visualization
tools such as Matplotlib. Include your exploratory data analysis (EDA) findings in your technical
report. The marks for the data analysis project are distributed as follows:
For the data science project, create a classification model utilizing a machine learning (ML)
algorithm. This model should be based on the same data used in the data analysis project. Ensure
your classification model has at least ten features and one target (label). Employ CV-KFold to
guarantee a model accuracy exceeding 60%. Include your classification model findings in your
technical report. The marks for the data science project are distributed as follows:
Submit your technical report, which encompasses both the data analysis and data science projects,
by the 14th lecture of the week. The technical report should not exceed 25 pages. Detailed
information about the technical report items and format can be found in the Group Project
Evaluation Form, which includes a breakdown of the marks for the technical report.
Categorical data Graphs are Knowledge >= Knowledge >= Knowledge >=
either missing 6 8 10
or incorrect. Some graphs The graphs are The graphs
contain errors visually appear
in knowledge appealing, and professional,
representation. the knowledge and the
No result is accurately knowledge is
analysis represented. accurately
Graphs are represented.
labelled and Graphs are
titled. labelled and
Contain some titled
result analysis. Critical analysis
of the results.
Testing & Accuracy >60% Accuracy >70% Accuracy >80% Accuracy >90%
performance for test set for test set for test set for test set
analysis