You are on page 1of 6

UEL-CN-7031

Resit Summative
assessment Final Project
60%
Resit Final Project Presentation 40%

Resit Submission instructions


• Cover sheet to be attached to the front of the assignment when
submitted
• Question paper to be attached to assignment when submitted
• All pages to be numbered sequentially

Dear students,
This is your Resit Assessment. Prior attempting this assignment, please carefully read
relevant materials found in the Module Revision Material of the course shell.
Assessment Resit Submission Rules and Important Notes:
1. You are able to use the feedback and constructive comments provided by your
tutor in order to improve/enhance your work. Please ensure that you work on your initial
piece of assignment and improve it based on the feedback received.

2. During the resit period you are given the opportunity to revise and resubmit
originally failed module assessment(s), but no further academic instruction will be
provided. However, you are able to use the feedback and constructive comments
provided by your tutor in order to improve/enhance your work.

3. Your assessment should be submitted via the appropriate VLE Submission Link
by 11:59 PM (VLE) time, at the end of the resit week, the specific date of which has been
provided to you on the day your access was granted to the resit module. You may
request confirmation of your deadline in a timely manner via resubmission@unicaf.org.

4. Assignments submitted up to 24 hours late will be accepted, but the assignment


mark will be subject to a deduction of 5 marks from the mark awarded. Work submitted
more than 24 hours late after the submission deadline will be recorded as 0%.

5. We are here to help and support you during the resit period, so if you need any
nontutor, technical assistance for any issues affecting your ability to submit your resit
assignment please contact the Resubmission Services via resubmission@unicaf.org as
a first step to getting in touch and allow 48 hours for us to answer you before moving on
to Student Support.

6. The maximum mark attainable for the components upon reassessment will be
50%. Please write your solutions clearly and concisely. If you do not explain your answer
you will be given no credit. You must write your own solution. Copying someone else’s
solution will be considered plagiarism and may result in failing the whole course.
Page 1 of 6

UEL-CN-7031 - Big Data


Analytics

This coursework (CRWK) must be attempted as an individual work. This coursework is


divided into two sections: (1) Big Data analytics on a real case study and (2) presentation.

Overall mark for CRWK comes from two main activities as follows:
1- Big Data Analytics report (around 5,000 words, with a tolerance of ± 10%) (60%)
2- Presentation (around 1000 words, with a tolerance of + 10%) (40%)

Marking Scheme Big Data Analytics report

Topic Total Remarks


mark (breakdown of marks for each sub-task)
Big Data Analytics (10) Providing big data queries using HIVE.
using HIVE
(10) Using Built-in (Date, Math, Conditional, and String)
30 Functions in HIVE.

(10) Visualizing the results of queries into the graphical


representations and be able to interpret them
Big Data Analytics (15) Analyzing the dataset through statistical analysis
using Spark methods.
50
(35) Designing single- and multi-class classifiers and
evaluate and visualize the accuracy/performance.

Individual 10 (10) Find alternative solutions for high level languages


assessment and analytics approaches (use references), and
Express findings from big data analytics with the
relevant theories.

Documentation 10 (10) Write down a scientific report.

Total 100

Page 2 of 6

Big Data Analytics using Hadoop and Spark


UEL-CN-7031 – Big Data Analytics

Tasks:

(1) Understanding Dataset: UNSW-NB15

1
The raw network packets of the UNSW-
NB15 dataset was created by the IXIA
PerfectStorm

tool in the Cyber Range Lab of the Australian Centre for Cyber Security
(ACCS) for generating a hybrid of real modern normal activities and synthetic contemporary
attack behaviours. Tcpdump tool used to capture 100 GB of the raw traffic (e.g., Pcap files). This
data set has nine types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic,
Reconnaissance, Shellcode and Worms. The Argus and Bro-IDS tools are used and twelve
algorithms are developed to generate totally 49 features with the class label.

a) The features are described here.

b) The number of attacks and their sub-categories is described here.

c) In this coursework, we use the total number of 10-million records that was stored in
the CSV file (download). The total size is about 600MB, which is big enough to
employ big data methodologies for analytics. As a big data specialist, firstly, we
would like to read and understand its features, then apply modeling techniques. If
you want to see a few records of this dataset, you can import it into Hadoop HDFS,
then make a Hive query for printing the first 5-10 records for your understanding.

(2) Big Data Query & Analysis by Apache Hive [30 marks]
This task is using Apache Hive for converting big raw data into useful information for the
end users. To do so, firstly understand the dataset carefully. Then, make at least 4 Hive
queries (refer to the marking scheme). Apply appropriate visualization tools to present
your findings numerically and graphically. Interpret shortly your findings.

Finally, take screenshot of your outcomes (e.g., tables and plots) together with the
scripts/queries into the report.

Tip: The mark for this section depends on the level of your HIVE queries’ complexities, for
instance using the simple select query is not supposed for full mark.

1 source: https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/
Page 3 of 6
(3) Advanced Analytics using PySpark [50 marks]
In this section, you will conduct advanced analytics using PySpark.

3.1. Analyze and Interpret Big Data (15 marks)


We need to learn and understand the data
through at least 4 analytical methods
(descriptive statistics, correlation,
hypothesis testing, density estimation, etc.). You need to present your work numerically
and graphically. Apply tooltip text, legend, title, X-Y labels etc. accordingly to help end-
users for getting insights.

3.2. Design and Build a Classifier (35 marks)

a) Design and build a binary classifier over the dataset. Explain your algorithm and its
configuration. Explain your findings into both numerical and graphical
representations. Evaluate the performance of the model and verify the accuracy
and the effectiveness of your model. [15 marks]

b) Apply a multi-class classifier to classify data into ten classes (categories): one
normal and nine attacks (e.g., Fuzzers, Analysis, Backdoors, DoS, Exploits,
Generic, Reconnaissance, Shellcode and Worms). Briefly explain your model with
supportive statements on its parameters, accuracy and effectiveness. [20 marks]

Tip: you can use this link (https://spark.apache.org/docs/2.2.0/ml -

classification-regression.html) for more information on modelling.

(4) Individual Assessment [10 marks]


Discuss (1) what other alternative technologies are available for tasks 2 and 3 and how
they are differ (use academic references), and (2) what was surprisingly new thinking
evoked and/or neglected at your end?

Tip: add individual assessment of each member in a same report.

(5) Documentation [10 marks]


Document all your work. Your final report must follow 5 sections detailed in the “format of
final submission” section (refer to the next page). Your work must demonstrate
appropriate understanding of academic writing and integrity.

Page 4 of 6
Marking Scheme for the
Presentation

Topic Tot Remarks


al
Mar
ks
Content 50 Covers topic in-depth with details.

Presentation 20
design & Makes excellent use of fonts, colors, graphics, effects,
layout features, transitions to enhance the presentation.

Animations & transitions

Length 10 Correct use of number of slides, Word Count (1000 words)?

Organization 20 Students present information in a logical, interesting sequence that the


audience can follow.

Total 100

This will be the second Submission which is located at a different submission link and here you
will submit a presentation based on the report above. This will have a weight of 40% of your
Final Grade.

Page 5 of 6

FORMAT OF FINAL SUBMISSION


• You need to prepare one single file in PDF format as your
coursework within the following sections:

1. Use ONLY one Cover Page

2. Table of Contents

3. Report of the tasks (it needs sub-sections for few tasks, accordingly)

4. References (if any)


• And one PDF file for the
presentation

SUBMISSION
single PDF into Turnitin in Moodle, by the end of Week 12

single PDF into Turnitin in Moodle at the second submission link for the presentation, by the end
of Week 12

PLAGIARISM
The University defines an assessment offence as any action(s) or behaviour likely to confer
an unfair advantage in assessment, whether by advantaging the alleged offender or
disadvantaging (deliberately or unconsciously) another or others. A number of
examples are set out in the Regulations and these include:
“D.5.7.1 (e) the submission of material (written, visual or oral), originally produced by another
person or persons, without due acknowledgement, so that the work could be assumed the
student’s own. For the purposes of these Regulations, this includes incorporation of
significant extracts or elements taken from the work of (an) other(s), without
acknowledgement or reference, and the submission of work produced in collaboration for an
assignment based on the assessment of individual work. (Such offences are typically
described as plagiarism and collusion.)”.

You might also like