You are on page 1of 2

Glasgow Caledonian University

Department of Computing

Big Data Landscape (MMI223994)


Coursework 1 Diet 1 2021/2022

This coursework is worth 40% of the overall module mark and is an individual coursework,
not a group coursework.

Specification
The purpose of this coursework is to allow you to demonstrate your understanding of material
covered in the Big Data Landscape module.
Initially, you are required to gain an understanding of your allocated public dataset. This is
followed by analysis and visualization, aligned with specific questions chosen by you that you
consider, if answered, can provide value from your dataset.
The work to be undertaken is split into the tasks listed in this specification.

Task 1
Explore and study your allocated public dataset and provide:
1. An overview of the dataset.
2. An appraisal of the quality of the data in terms of the integrity rules covered in lectures.
3. Two questions that you consider, if answered, will provide value along with an explanation
of who might benefit from that value and why.

Task 2
Driven by your questions, undertake the following analysis work on your allocated public dataset:
1. Use the Google BigQuery UI to work with your public dataset to create all the appropriate
SQL statements required to enable you to answer your questions.
2. Create a Google Colab notebook that executes your SQL statements using BigQuery and
stores the results of your queries in Google Cloud Storage as CSV files.

Task 3
Complete the following steps to visualize the data that represents the value you have obtained:
1. Using your CSV files, visualize the data they contain in a manner of your choosing that you
consider clearly communicates your findings and the value obtained.
2. Load your CSV files into a Google Cloud SQL database using an approach of your
choosing.
3. Create appropriate SQL statements that can be used to query your Google Cloud SQL
database to view your findings in tabular form.

BH/BDL/CW1/Diet1/2021_2022 Page 1 of 2
Documentation

A report is required, documenting the work you undertook to complete each coursework
task. You should also provide all supporting files created while working on the coursework.
Please note that supporting files should not be provided in lieu of narrative in your report.

Submission

Copy all your submission files to a directory with a name using the following format
<student_id>_BDL_CW1 e.g. S1712345_BDL_CW1

Zip up the directory to a zip file with a name using the format
<student_id>_BDL_CW1.zip e.g. S1712345_BDL_CW1.zip

Submit your zip file to GCULearn using the link provided.

The submission deadline is: 5pm, Friday 26th November 2021 to GCULearn.

Do not leave it until the last minute to submit your report as the system may be heavily
loaded– it is your responsibility to submit by the stated deadline.

You are warned that any plagiarism or any failure to submit by the submission date will be dealt
with according to the relevant sections of your Programme Handbook which details the relevant
University Assessment Regulations.

Marking Scheme

This coursework is marked out of 100 and accounts for 40% of the overall BDL module mark.
For each task, marks will be awarded, as appropriate to the task, for the work undertaken to
complete the task and your account of the work in your report.

Specifically, more marks will be allocated for narrative that is clear, provides a complete
description of the work undertaken and is supported within the report by code and
screenshots providing evidence of the work.

The following table shows the available marks:

Tasks Marks Available


Task 1 30
Task 2 35
Task 3 35
Total 100

BH/BDL/CW1/Diet1/2021_2022 Page 2 of 2

You might also like