You are on page 1of 1

Individual – Experimentation Report

Data
Using the “PLOD-CW” dataset, it will be split with a ratio of 70%, 15%, 15%, for training, validation, and testing
respectively. The optional dataset “PLOD-filtered” will also be used to further expand the training and testing
subsets by 20% in total.

Individual Task

Experiment Setup Experiment Settings


Experiment 1A: Raw Data vs. Basic Text Cleaning (e.g., removing special
Data Preprocessing
characters, lowercasing).
Algorithms and Techniques Experiment 2A: Support Vector Machine (SVM) vs. Naive Bayes.
Text Embeddings and Encoding Experiment 3A: Word2Vec vs. GloVe embeddings.
Hyperparameter Optimization Experiment 4A: Different Learning Rates (e.g., 0.01 vs. 0.001).
Loss Functions and Optimisers Experiment 5B: Binary Cross-Entropy Loss with Adam Optimizer

Development Environment

Programming Language Python V3.11

Framework PyTorch, NLTK, AllenNLP, spaCy, gensim


Scikit-learn, Pandas, NumPy, MatPlotLib

Programming Environment Google Colab

Version Control System GitHub

Deliverables

You will need to submit a Colab notebook documented appropriately, but this needs to be
submitted in two formats:

1. “.ipynb” notebook file (plus any helper files you might use) [mandatory File 1, inside ZIP]

2. pdf report (your report on experiments) with each experiment described in detail. It should
contain either the results in tables or screenshots of results from notebook. It must have your
error analysis of mispredictions made by the best model, and any other relevant sections
based on [mandatory File 2, outside ZIP]

3. DO NOT print the entire dataset or any list/dictionary etc on the notebook. Instead, just print
the top few (maybe 10?) records.

4. DO NOT submit the dataset(s), just add reference(s)

5. DO NOT submit the trained model(s)

The notebook should contain visuals (where appropriate) to support tasks such as: label data
distribution, histogram comparisons, text samples, classification accuracy curve, confusion matrix,
etc. Additional notebooks or Python files can also be included, but make sure you zip the files
together before submitting. If there are library dependencies, please also include a requirements
file.

You might also like