You are on page 1of 6

COURSE GUIDE

Big Data
UNIVERSITY COLLEGE GRONINGEN
Academic Year 2020/2021, semester Ib
1. General information
Title Big Data
Course code UCGSC219
Level Bachelor
Faculty University College Groningen
Schedule See rooster.rug.nl (UG schedule generator)
Language English
Coordinator Dr. Muhamed Amin
Lecturer(s) Dr. Muhamed Amin
Entry requirements Admission to UCG
Number of ECTS 5

2. Course overview
With 2.5 quintillion bytes of data on a daily basis, efficient algorithms and software to
manipulate these data are essential to extract patterns and information and to learn from these
large amounts of data. In this course, the concept of big data will be introduced and explained.
Python tools and libraries used in data science will be taught. In addition, practical examples
and assignments are essential for students to get hands on experience in data science. Thus,
the course will be based on practical sessions. Version controlled software tools will be used
to work on and submit assignments

3. Learning objectives

1- Students will learn the meaning of "Data Science"


2- Students will learn the programming tools to manipulate large amount of data
efficiently 
3- Students will be able to extract patterns from big data
4- Students will learn how to clean, process and visualize big data

4. Literature
Recommended literature (digitally available)
Title Author(s) ISBN Price

Python For Everyone, 3rd Edition Cay Horstmann 978-1-119- 84$


49853-7
Rance Necaise

The Coder’s Apprentice Pieter Spronck This is a free textbook that can
be downloaded from
Learning Programming with Python 3 http://www.spronck.net/pythonb
ook/

Think Python How to Think Like a Allen B. Downey This is a free textbook that can
Computer Scientist be downloaded from
http://www.greenteapress.com/t

1
Version 2019-2020 v1.0
hinkpython/thinkpython.html

https://www.w3schools.com/python/
python_getstarted.asp

5. Schedule

Week Topics Activity Literature Assignment


(The
Coder’s
Apprentice)

1 Introduction to Data Lecture/ Ch 1


Science Computer Lab

Introduction to Linux Lecture/


commands Computer Lab

2 Numpy Arrays Lecture/ Ch 2


Computer Lab

Numpy Arrays Lecture/ Problem Set-1 due


Computer Lab Feb 22 11:00

3 HDF5 files Lecture/ Ch 3


Computer Lab

HDF5 files Lecture/ Problem Set-2 due


Computer Lab Mar 1, 11:00

4 Searching Big Data Lecture/ Ch 4


Computer Lab

Searching Big Data Lecture/ Problem Set-3 due


Computer Lab Mar 8, 11:00

5 Practice Exercises Computer Lab Ch 5

Midterm Exam Lecture/ Problem Set-4 due


Computer Lab Mar 15, 11:00

6 Pandas-Cleaning Lecture/ Ch 6
Data Computer Lab

Pandas-Cleaning Lecture/ Problem Set-5 due


Data Computer Lab Mar 22, 11:00

7 SQL-MySQL Lecture/ Ch 7
Computer Lab

SQL-MySQL Lecture/

2
Version 2019-2020 v1.0
Computer Lab

8 NoSQL-Mongo DB Lecture/ Ch 8
Computer Lab

Map-Reduce and Lecture/ Problem Set-6 due


Hadoop Computer Lab Apr 5, 18:00

9 Final Exam Written Exam

6. Instructions for assignments/exams


Problem Sets (Assignments)
There are assignments will be given during the block. You may collaborate with another
student(s) to discuss the assignment, however you are expected to write your code
independently and write the name(s) of all collaborators in your file. If any similarity between
the codes is detected, this will be treated as plagiarism.

Exams
All course material will be assessed in the midterm and final exams. The exam will consist of
open questions such as basic programming knowledge and explaining/correcting problems in
a given example code. The exam will be on your laptop.

Resit
You are allowed to resit the course during the resit period (July 1-5, 2019).

Project
Students will be divided into 4 groups. Each group will work on a project, which will be
assigned after the midterm.

7. Assessment

Assessment method Contribution to final grade Further information

Assignments 10% There will be 8 computer lab sessions during the


block. You are expected to work on the exercises
during the practical sessions. For each activity
you will earn up to 100 points for attendance,
participation and completeness of the work.
Please note that, if you don’t have a valid excuse
to miss a computer lab, you will receive a score
of zero for that session.

Midterm Exam 45% The midterm exam is an open book open web
exam

Final Exam 45% The final exam is an open book open web exam

8. Availability of the lecturer(s)


3
Version 2019-2020 v1.0
Dr. Muhamed Amin
Email address: m.a.a.amin@rug.nl
Office: UCG Room 134 (via Skype)
Office hours: By appointment

9. Student workload
Activity Required number of hours

Contact hours 34

Assignments 50

Preparation for the exam 20

Self-study 36

Number of ECTS = 5 / Total number of required hours = 140 (1 ECTS = 28 hours)

10. Attendance policy


UCG expects active participation from its students. Attending all the classes of a course is
part of this expected participation. However, it will not always be possible to achieve a 100%
attendance rate (for instance, due to illness). Nevertheless, the UCG requires a minimum of
80% attendance by students for all of its classes in order for students to pass. Should any
students’ attendance drop below this percentage, the course coordinator can decide whether
and how to provide an opportunity to compensate for the absences (for instance, via a repair
assignment). Lecturers are expected to explain the attendance policy for a course during the
first meeting/seminar/lecture.

Appendix A: Policy on fraud and plagiarism


Cheating is an act or omission by a student designed to partly or wholly hinder the forming of
a correct assessment of his or her own or someone else’s knowledge, understanding and skills.
Cheating also includes plagiarism, which means copying someone else’s work without correct
reference to the source and includes self-plagiarism, which means reusing significant,
identical, or nearly identical portions of one's own work without acknowledging that one is
doing so or citing the original work.

All written assignments will be handed in via Nestor and will be scanned by Ephorus for
plagiarism.

If suspected fraud or plagiarism is discovered (either during the assessment or after the
assessment has taken place), the examiner of the course has to inform the Board of
Examiners. No definitive assessment or registration of results will take place when an
examiner suspects fraud or plagiarism. The Board of Examiners has the sole responsibility for
investigating suspected cases of fraud and for deciding on any sanctions.

For more information on the UCG policy on fraud and plagiarism, please consult the
Teaching and Examination Regulations (TER) and the Rules and Regulations (R&R).

4
Version 2019-2020 v1.0
Appendix B: Position in the programme
Relation to other courses in the programme
Introduction to Programming is a first year elective course, which can be a part of
major/minor in sciences and social sciences. The concepts and skills gained from the course
can be used for 2nd/3rd year courses or any research that may require programming skills.
The course is appropriate for the students without any previous programming experience.

Programme learning outcomes related to the course


Course learning objectives Programme learning outcomes

recognize basics of programming such as data 1.1: Has broad understanding of the fundamental
types, variables, operations, and control paradigms, concepts and models of the academic
structures disciplines within science and medical sciences,
humanities and social sciences;
interpret the syntax and semantics of the Python
programming language

apply decision/repetition structures and 1.4: Has in depth understanding of paradigms,


implement functions in the design of simple concepts and models used in one of the majors;
programs

design, implement and debug simple programs to


4.3: Engages effectively in oral, written and
solve basic computational problems
electronic communication with peers, experts and
demonstrate the use of fundamental data engaged laymen (written report, poster,
structures and dictionaries presentation, debate, film, Facebook, Twitter);

perform basic input/output to process data sets


stored in text files
4.5: Communicates ideas, vision and research
develop problem solving skills to results clearly and discusses them openly.
computationally analyse simple problems

5
Version 2019-2020 v1.0
Appendix C: Rubrics for grading/assessment
Problem Set #:
Name:
Poor Satisfactory Good Excellent
Score
(0-55) (55-70) (71-85) (86-100)
Correctness Program produces Significant details of the program Minor details of the program Program always works correctly
(60 points) incorrect results. specifications are violated. specifications are violated. and meets all of the specifications.
Program often produces incorrect Program performs incorrectly
results. for some inputs.

Readability Program is poorly At least one major issue with the Minor issues with the use of Program is well organized, clear,
(20 points) organized and very use of whitespace, variable whitespace, variable naming, and understandable. The use of
difficult to read. There is names, or organization. Program or general organization. white space and variable naming
no use of white space is readable only by someone who Program is fairly easy to read. make the code easier to read.
and/or variable names are knows what it is supposed to be
ambiguous. doing.
Documentation No or very limited The code is lacking meaningful One or two places that could Program contains appropriate
(15 points) comments present. comments. The comments benefit from comments are documentation for all variables,
embedded in the code do not help missing or the program control structures and functions.
the reader to understand the code. is overly commented. The comments clearly explain
what the code is accomplishing.
Assignment No name, date, or Minor issues with the name, date, NA Includes name, date, and
Specifications assignment title included. assignment title or the file was not assignment title. The file was
(5 points) The file was not named named according to the named according to the
according to the instructions. instructions.
instructions.
Total Score

Further Comments:

6
Version 2019-2020 v1.0

You might also like