You are on page 1of 4

Applied AI Research Seminar (6414M0712Y)

Individual Assignment

In this individual assignment, you will demonstrate the ability to identify an interest-
ing problem, or a set of problems, that could be addressed using deep learning. You can
work on one of the Group projects that your team did not work on before. However, you
may use the same problem that your team has worked on for the Group project as long
as you make sure that you take a different direction than the one previously explored.

Plagiarism

The University of Amsterdam strictly monitors that students do not commit plagiarism
or collusion in written assignments and exams. All papers will be checked for plagiarism
using URKUND. All cases of plagiarism will be reported to the Examinations Board,
who will determine the imposed sanction. Please make sure that you always quote
your sources. For a guide on citing and sourcing, please check the UvA Website

Submission
The assignment will count towards your grade and should be submitted through Canvas
before the lab session on the 03.02.2023. Your submission consists of:
• A Jupyter Notebook named code-student id.ipynb
• A presentation in PDF format presentation-student id.pdf
• A report in PDF format report-student id.pdf
To test the assignments we will use Anaconda Python 3.9. During the final session
on the 03.02.2023, each student will have 10 minutes to present their work. The report
should not be longer than 10 pages, including the title page, figures, and references. You
will also have to provide the data that your code uses to perform the analysis. If this
is too large to upload to Canvas, you can provide us with a url to download the data,
packed in a single file.

Grading
You can get at most 60 points for this project, which is 60% of your final grade.
B g.sidiropoulos@uva.nl

1
Datasets
Public Health and Health Care
In the Netherlands, Rijksinstituut voor Volksgezondheid en Milieu (RIVM) publishes
public Health datasets throught the platform Volksgezondheidenzorg.info [1]. The list
of interesting collections hosted on the website include i.a. mortality from the cardio-
vascular diseases and cancer per municipality, as well as community health services region
(i.e. GGD-regio) as well as related EU statistics. When searching for answers to your
research questions, you could combine these datasets with e.g. CBS Neighbourhood
Statistics [2].

Amazon Sustainability Data Initiative


Data Initiative ASDI [3], has many available datasets tackling sustainability challenges.
They are all listed on the main web page, and each has its own page with a more detailed
description. You can assign yourself a task different from the suggested one, or e.g. work
on a (randomly sampled) reduced dataset if there is too much data for your laptop.

ACL Anthology Corpus


The Association for Computational Linguistics is a scientific and professional organization
for people working on natural language processing. The ACL Anthology currently hosts
80971 papers on the study of computational linguistics and natural language processing
[4]. You can use Deep Learning to find interesting topics and trends, or any other helpful
analysis relevant to the community.

Named Entity Recognition


Polyglot-NER [5] is a training dataset automatically generated from Wikipedia and Free-
base for the task of named entity recognition. The dataset contains the basic Wikipedia-
based training data for 40 languages for the task of named entity recognition. Can you
use this dataset in order to build a multilingual model for named entity recognition? The
dataset is available at https://huggingface.co/datasets/polyglot_ner.

Fingerspelling Corpus
Fingerspelling is a way of spelling words in a language using hand gestures. The finger-
spelling alphabet is a part of sign language and is used when there is no sign, e.g., people’s
names and places. It also clarifies a sign to the person who cannot read the signer. Like
speaking language, there are also different Fingerspellings like American Fingerspelling
(ASL)1 2 [6]. You can explore these datasets and perform tasks like fingerspelling recog-
nition, generation, or other missions which you think will be helpful for hearing-impaired
people.
1
https://www.kaggle.com/datasets/ayuraj/asl-dataset
2
https://github.com/marlondcu/ISL

2
Your own dataset
You are welcome to try Machine Learning on a dataset of your own. In this case, you
should contact the lecturers, in order to make sure your dataset and task are appropriate.

3
References
[1] Volksgezondheid en zorg. https://www.volksgezondheidenzorg.info/.

[2] CBS. Kerncijfers wijken en buurten 2016. https://www.cbs.nl/nl-nl/


maatwerk/2016/30/kerncijfers-wijken-en-buurten-2016.

[3] Amazon sustainability data initiative. https://sustainability.


aboutamazon.com/tech-for-good/asdi.

[4] Shaurya Rohatgi. Acl anthology corpus with full text. Github, 2022.

[5] Rami Al-Rfou, Vivek Kulkarni, Bryan Perozzi, and Steven Skiena. Polyglot-ner:
Massive multilingual named entity recognition. In Proceedings of the 2015 SIAM
International Conference on Data Mining, pages 586–594. SIAM, 2015.

[6] Marlon Oliveira, Houssem Chatbri, Ylva Ferstl, Mohamed Farouk, Suzanne Little,
Noel O’Connor, and A. Sutherland. A dataset for irish sign language recognition.
2017.

You might also like