You are on page 1of 3

MBIS5018 Assessment 3

Final Assignment: Clustering and Text Analytics

Overview:
For your final assignment, you will work in groups of up to four students to analyze a dataset using
clustering and text analytics techniques. The dataset should be sourced from Kaggle to ensure
relevance and contextual understanding. You will use the KNIME analytics platform, following
the architecture and workflow similar to what we used in the lab sessions.

Objectives:
- Apply clustering techniques to identify patterns and groupings within the dataset.
- Perform text analytics to extract meaningful insights from textual data.
- Demonstrate proficiency in using the KNIME analytics platform.

Assignment Details:

1. Dataset:
- Use a dataset downloaded from Kaggle (https://www.kaggle.com/)
- The dataset should be sufficient to execute clustering and text analytics analysis.

2. Tasks:
Data Preprocessing:
o Cleanse the dataset and handle missing values.
o Perform any necessary feature engineering.

Clustering:
o Apply at least one clustering algorithm (e.g., k-means) to identify natural groupings
in the data.
o Evaluate the clustering results using appropriate metrics.
Text Analytics:
o Perform text preprocessing (e.g., tokenization, stop-word removal,
stemming/lemmatization).
o Apply text analytics techniques such as sentiment analysis or topic modeling.

Integration and Analysis:


o Integrate the results from clustering and text analytics analysis to derive meaningful
insights.
o Present the findings in a clear and concise manner.

3. Submission Requirements:
- Submit a comprehensive report detailing the analysis process, findings including the
evaluation criteria, and conclusions.
- Include visualizations (e.g., charts, graphs) to support the analysis.
- Provide a detailed description of the dataset and the rationale behind the chosen
approaches.
- KNIME Workflow: Create a KNIME workflow that outlines the entire data analysis
process, from data preprocessing to clustering and text analytics analysis.
- Report: Prepare a comprehensive report detailing your analysis process, findings, and
insights. Include visualizations, tables, and charts to support your conclusions.
- Presentation: Prepare a brief presentation (5-10 minutes) summarizing your findings.
Presentations will be delivered during a designated class session.

4. Evaluation Criteria:
- Clarity and coherence of the report.
- Depth and relevance of the analysis.
- Accuracy and effectiveness of the applied techniques.
- Collaborative effort and individual contributions of group members.
- Accuracy and thoroughness of the analysis.
- Clarity and organization of the report and presentation.
- Proper use of KNIME and relevant techniques.
- Relevance and significance of insights derived from the analysis.

5. Submission Guidelines:
- Submit your KNIME workflow, report, and any additional materials as a single ZIP file.
- Include a README file with instructions on how to run your KNIME workflow.
- Submit your group's work through the online learning platform by the specified deadline.

6. Presentation:
- Prepare a presentation summarizing the key findings and insights.
- Each group member should present a part of the analysis.

7. Important Notes:
Report Submission-
o Group assessment
o 2500 Words
o Due in Session 8 at 5:00 pm
o Marks contribution 30%
Presentation-
o Group Presentation 10%
o Individual QA Session 20%
o 15 Mins for the presentation
o Due in Session 8; In-Class
o Total: 30%

8. Conclusion:
- This assignment provides an opportunity for students to apply clustering and text analytics
analysis techniques in a real-world context. By working collaboratively in a group, students
will gain valuable insights into these essential aspects of data analysis.

Feel free to adapt these guidelines to suit your specific requirements and provide additional details
as needed for your assignment.

You might also like