You are on page 1of 1

Buyume ML Data Associate Assignment

The candidate is required to complete the following 4 tasks mentioned below to


qualify for the next round, a minimum of first 2 tasks must be completed within the
allotted time duration.

1. Data Collection: Curate a dataset of 50-100 images containing cars and buses. To
achieve this, you should create a web scraping script using tools like Python's
Beautiful Soup or Scrapy to collect images from the web (use keywords like- street
car or bus).

2. Data Annotation: After collecting the dataset, annotate the objects in bounding
box format. You can use any open-source annotation tools such as Roboflow or
LabelImg to manually draw bounding boxes around each car and bus in the images.

3. Data Export: Export the annotated dataset in a suitable format that can be used
for training an object detection model. Common formats include YOLO format, Pascal
VOC format, or COCO format. The candidate should organise the dataset into training
and validation sets.

4. Model Training: Utilise Google Colab to train an object detection model on the
annotated dataset. For example, you can choose the popular YOLO algorithm for this
task. Popular YOLO variants are YOLOv5 - YOLOv8 and fine-tune it on the dataset.
Useful Links: YOLOv8 Github, YOLOv8 Colab, YOLOv5 Github, YOLOv5 Doc.

5. Documentation: Prepare a brief document that describes the steps taken for each
task and results of the training model (hyperparameters used, performance metrics).

The candidate is expected to submit the following:

1. A link to a Google Drive folder containing: (2 Days)


The web scraping Python script used to collect images.
The annotated dataset in the chosen format.
Any additional files or scripts related to data preprocessing if necessary.

2. A link to a Colab notebook (or a Jupyter notebook) containing: (2 Days)


Code for training the object detection model (with comments & explanations)
Evaluation metrics (e.g. precision, recall, mAP) on the validation set.

Evaluation Criteria:
Quality and relevance of the curated dataset.
Accuracy and consistency of the annotations.
Thoroughness and clarity of the documentation.
Innovative approaches or solutions to overcome challenges during the task.

You might also like