You are on page 1of 19

Module 3: Big Data and Databases 2022-2023

Unit 1: Big Data

1.1. Pre-Reading: Data Quiz


1. What type of data displays are these?
a. Line charts
b. Bar charts
c. Pie charts
d. Box and Whisker Plot

2. What is the measure of centre, also known as the average, of a data set?
a. Median
b. Mean Absolute Deviation
c. Mean
d. Mode

3. Values in a data set are separated from the other data values.
a. Frequency
b. Outliers
c. Mode
d. Whiskers

4. Middle value in a data set.


a. Upper Quartile
b. Mode
c. Mean
d. Median

5. The median of the lower half of a data set.


a. Mode
b. Frequency
c. Upper Quartile
d. Lower Quartile

6. The distance between the highest and lowest data values


a. Outlier
b. Interquartile Range
c. Frequency
d. Range
1.2.
7. How many times a number is in the data set.
a. Mode
b. Frequency
c. Range
d. Median

8. The number there is the most of in a data set.


a. Interquartile Range
b. Frequency
c. Mode

51
Module 3: Big Data and Databases 2022-2023

d. Range
9. This process shows the accuracy or consistency of a data set.
a. Interquartile Range
b. Mode
c. Mean Absolute Deviation
d. Upper Quartile

10. To find the Median, write the data in number order and find the_______ number.
a. highest
b. lowest
c. middle
d. end

11. To find the Mode, first I need the______ of all numbers in the data set.
a. range
b. outliers
c. mean absolute deviation
d. frequency

12. Before finding the upper quartile of a data set, I need to find the ______.
a. whiskers
b. median
c. outliers
d. range

13. When finding the IQR, I need to find the distance between the upper and lower ______.
a. ranges
b. extremes
c. quartiles
d. whiskers

14. Find the mean of the data set. 35, 56, 34, 44, 52, 12, 34, 45

15. Find the Median of the data set. 32, 23, 22, 33, 33, 23, 32, 23, 22

16. Find the range of the data set. 38, 89, 61, 61, 27, 69, 79, 82, 45, 79

17. Find the Mode of the data set. 87, 67, 91, 47, 78, 18, 83, 78, 91, 82

18. Define Big Data

19. Define Web Analytics

20. Define Digital Analytics

52
Module 3: Big Data and Databases 2022-2023

1.2. Reading: Big Data


Insert these words in the spaces in the text: accurately, care, harness, improve, insight, pickups and
drop-offs, raw, records, relied, retail, retailers, store, treatment, uncover, utilities

What is Big Data?

Big data is a term that describes the large volume of data – both structured and unstructured – that
inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s
what organizations do with the data that matters. Data can add value to your organisation by
improving the day-to-day running of the organisation. For example, UPS (a logistics and transport
company) stores a large amount of data – much of which comes from sensors in its vehicles. That
data not only monitors daily performance, but also triggered a major redesign of UPS drivers' route
structures. The initiative was called ORION (On-Road Integration Optimization and Navigation), and
was arguably the world's largest operations research project. It 1) __________ heavily on online map
data to reconfigure a driver's 2) _____________________ in real time. The project led to savings of
more than 41.6 million litres of fuel by cutting 136 million kilometres off of daily routes. UPS
estimates that saving only one daily kilometre per driver saves the company $30 million. It’s
important to remember that the primary value from big data comes not from the data in its
3) _____________ form, but from the processing and analysis of it and the insights, products, and
services that emerge from analysis.

Who uses Big Data?

1. Education: Educators armed with data-driven 4) ________ can make a significant impact on
school and university systems, students and curriculums. By analyzing big data, they can
identify at-risk students, make sure students are making adequate progress, and can
implement a better system for evaluation and support of teachers.
2. Government: When government agencies are able to 5) ________ and apply analytics to
their big data, they gain significant ground when it comes to managing 6) ______________,
running agencies, dealing with traffic congestion or preventing crime.
3. Health Care: Patient 7) __________. 8)_______________ plans. Prescription information.
When it comes to health care, everything needs to be done quickly, 9) ______________.
When big data is managed effectively, health care providers can uncover hidden insights
that 10) _________ patient 11) _______________.
4. Retail: Customer relationship building is critical to the 12) _________ industry – and the best
way to manage that is to manage big data. 13) ___________ need to know the best way to
market to customers.

The sources for big data generally fall into one of three categories:
1. Streaming data
This category includes data that reaches your IT systems from a web of connected devices.
You can analyze this data as it arrives and make decisions on what data to keep, what not to
keep and what requires further analysis.
2. Social media data
The data on social interactions is an increasingly attractive set of information, particularly for
marketing, sales and support functions. It's often in unstructured or semi-structured forms,
so it poses a unique challenge when it comes to consumption and analysis.

53
Module 3: Big Data and Databases 2022-2023

3. Publicly available sources


Massive amounts of data are available through open data sources like the US government’s
data.gov or the European Union Open Data Portal.

To begin making use of the information in big data, you will need to carry out three steps which
include: How to 14) ____________ and manage it; How to analyze it; How to use any insights you
15) _________________.

Reading Comprehension:

1. How did big data help the logistics company UPS?

2. How can educational institutions use big data?

3. How can governments use big data?

4. How can health care institutions use big data?

5. How can the retail industry use big data?

6. What are the three sources for big data?

7. What are the three steps to obtain information from big data?

Vocabulary

Match the following words with their synonyms:


1. adequate a) aceptable, satisfactory
2. apparatus b) argue
3. available c) believe sb
4. avoid d) carefully
5. bother e) deal with
6. breakdown f) device
7. improve g) different
8. inundate h) eat
9. massive i) excellent
10. outstanding j) flood
11. quarrel k) huge
12. rely on/upon sb/sth l) make an effort
13. sort m) make better
14. source n) mental illness
15. store o) obtainable
16. swallow p) order
17. tackle q) origin
18. take sb's word for it r) prevent
19. thoroughly s) save
20. unlike t) trust, depend

54
Module 3: Big Data and Databases 2022-2023

1.3. Narrative Tenses

In pairs, you are going to put a narrative in order. The sentences of the narrative are the following:

1. But that was quite normal as he usually took long lunches.


2. He left a few minutes later and asked me to tell his dad he'd been in and would call him later.
3. He looked very surprised and said he hadn't got a son.
4. He said he needed to pick something up from Arthur’s office and went in.
5. He then looked in the office and found that his briefcase and some cash had been stolen.
6. I told him Arthur had gone out and asked if I could help.
7. I was working for a small family-owned business in London.
8. I’d been doing the stocktaking all day and I was tired.
9. I’d just finished my tea and was about to go back to work
10. It was a Friday afternoon and I was taking a tea break.
11. The boss had gone out at lunchtime and hadn’t come back yet.
12. Then he said, ‘Oh, you’re new, aren’t you?’
13. This happened about two years ago when
14. when a young man walked into the office and asked for ‘dad.’
15. When I asked him who he meant, he said, ‘Arthur, the boss’.
16. When the boss came back I told him his son had been in.
17. Which was true as I had only been working there a few weeks.

55
Module 3: Big Data and Databases 2022-2023

The sentences on this page make a complete story so read all of them before you start choosing the
correct verb forms.

1 This happened about ten years ago. I _______________ with a college friend, Sarah, in the
country.

was staying

stayed

had stayed

had been staying

2 They ____________ an old country house a few miles from Cambridge

recently bought

were recently buying

had recently bought

had been buying

3 and she ______________me down to see it.

had invited

invited

had been inviting

was inviting

4 Anyway, it was a dark, winter afternoon and we _____________ in the sitting room.

had been chatting

were chatting

chatted

had chatted

56
Module 3: Big Data and Databases 2022-2023

5 Sarah's mum ____________________ a little while earlier, so we were alone in the house.

went shopping

had been shopping

had gone shopping

had been going shopping

6 Then, to our surprise we ____________ someone walking around in the room above.

were hearing

had heard

heard

had been hearing

7 But Sarah said, 'Oh I expect Mum _________________ something - she always does.'

had been forgetting

had forgotten

forgot

was forgetting

8 So we took no notice and ___________________ talking.

had carried on

were carrying on

had been carrying on

carried on

9 Imagine our surprise when five minutes later we looked out the window and _________Sarah's
mum in her car driving up to the house!

had seen

saw

had been seeing

seen

57
Module 3: Big Data and Databases 2022-2023

10 So who...or what...______________ around upstairs?

had walked

walked

had been walking

was walking

Listening Practice of the Past Tenses


Here’s the transcript to a mystery story, but with some of the verbs ‘gapped’. Try to put them in the
correct tense.
The Mystery Story
Last night I _________________ (walk) home next to the river Thames, when something strange
_________________ (happen) to me. It was late at night and I _________________ (have) a long
and difficult day at work. There was a large full moon in the sky and everything was quiet. I was tired
and lonely and I _________________ (just have) a few pints of beer in my local pub, so I decided to
stop by the riverside and look at the moon for a while.

I _________________ (sit) on some steps very close to the water’s edge and looked up at the big
yellow moon and wondered if it really was made of cheese. I felt very tired so I _________________
(close) my eyes and after a few minutes, I _________________ (fall) asleep. When I woke up, the
moon _________________ (move) behind a cloud and it was very dark and cold. The wind
_________________ (blow) and an owl _________________ (hoot) in a tree above me. I rubbed my
eyes and started to get up, when suddenly I _________________ (hear) a splash. I
_________________ (look) down at the water and saw something. Something terrible and
frightening, and unlike anything I’d ever seen before. Something _________________ (come) out of
the water and _________________ (move) towards me. Something green and strange and ugly. It
was a long green arm and it _________________ (stretch) out from the water to grab my leg. I was
so scared that I couldn’t move. I _________________ (never be) so scared in my whole life. The cold
green hand _________________ (move) closer and closer when suddenly there was a blue flash and
a strange noise from behind me. Someone _________________ (jump) onto the stairs next to me.
He _________________ (wear) strange clothes and he had a crazy look in his eyes. He shouted “Get
Back!” and _________________ (point) something at the monster in the water. There was a bright
flash and the monster hissed and disappeared.

I looked up at the man. He looked strange, but kind. “Don’t fall asleep by the river when there’s a full
moon”, he said “The Moon Goblins will get you.” I _________________ (never hear) of moon goblins
before. I didn’t know what to do. “Who… who are you?” I asked him. “You can call me… The Doctor.”
He said. I _________________ (try) to think of something else to say when he turned around and
said, “Watch the stars at night, and be careful of the full moon”. I was trying to understand what he
meant, when there was another blue flash and I closed my eyes. When I opened them again, he
_________________ (go).

I couldn’t believe what _________________(happen). What on earth were Moon Goblins, and who
was the mysterious Doctor? And why had he saved me? I was determined to find the answers to
these strange questions. I stood up, looked at the moon and quickly walked home.

1.4. Listening: What is Big Data?

58
Module 3: Big Data and Databases 2022-2023

Pre-Listening: Speaking
1. Who do you think uses social media most (men or women)?
2. Who has the largest military budget in the world?
3. How wealthy is the USA?
4. Which countries spend the largest percentage of their GDP on their military budget?
5. Which country has the most soldiers?
6. What do you think would happen if you worked out the percentage of soldiers over
the total population?

7. What do you think this expression means?


Data is oil:

Listening: What is Big Data?


(https://www.youtube.com/watch?v=7D1CQ_LOizA)
1. Explaining Big Data
Write a definition (according to the video):
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

2. The Big Data Explosion


What kind of data is being generated?

a) ____________________ databases
b) _________________________________________________________________________
c) _________________________________________________________________________
d) _________________________________________________________________________
e) _________________________________________________________________________
f) _________________________________________________________________________

3. Big Data Characteristics


Big data is often characterized using the three Vs:_________________________________________

Traditional Data Big Data

4. Toward Smart Organizations

59
Module 3: Big Data and Databases 2022-2023

What does the video mean when it says organizations have to “excrete” large quantities of
information or when it talks about “data exhaust”?

__________________________________________________________________________________

What does the video say about retailer loyalty cards?

__________________________________________________________________________________

What does the video say about hospital images/video data?

__________________________________________________________________________________

5. Big Data Technologies


What kind of library is Hadoop?

__________________________________________________________________________________

Hadoop distributes__________________________________________________________________

__________________________________________________________________________________

What are the two key components of Hadoop?

__________________________________________________________________________________

How does MapReduce function?

__________________________________________________________________________________

What does Amazon host?

__________________________________________________________________________________

6. Big Data Opportunities


How can big data help farmers?

__________________________________________________________________________________

How can big data help governments?

__________________________________________________________________________________

What kind of savings may be produced by big data in the US and Europe (according to the video)?

__________________________________________________________________________________

7. Zettabyte Horizons
How much is a zettabyte? ____________________________________________________________

60
Module 3: Big Data and Databases 2022-2023

1.5. Speaking: Big Data


Look at the figure below and discuss in groups what you think it represents:

Write a short description of the process here:

__________________________________________________________________________________

__________________________________________________________________________________

__________________________________________________________________________________

__________________________________________________________________________________

61
Module 3: Big Data and Databases 2022-2023

1.6. Speaking: Analysing Big Data

Directions: Web Sites:


 With a partner, analyse the 1. Web archive http://www.archive.org
tools in the list to the right. 2. Measure of America http://www.measureofamerica.org/maps/
 Choose three to analyse. 3. Wind Sensor network http://earth.nullschool.net/
 Determine what the tool is 4. Twitter sentiment
showing. https://www.csc.ncsu.edu/faculty/healey/tweet_viz/tweet_app/
 Find the source of the data it 5. Alternative Fuel Locator
allows you to explore. http://www.afdc.energy.gov/locator/stations/
 Complete the table below.

Website Name 1.

2.

3.

What is this website potentially 1.


useful for? What kinds of problems
could the provided information be 2.
used to solve?
3.

Is the provided visualization useful? 1.


Does it provide insight into the data?
How does it help you look at a lot of 2.
information at once? How could it be
improved?
3.

Where is the data coming from? 1.


Check “About”, “Download”, or
“API”. 2.
 Is the data from one source or
many? Is it static or live?
3.
 Is the source reputable? Why or
why not?

62
Module 3: Big Data and Databases 2022-2023

Do you consider this “big” data? 1.


Explain your reasoning.
2.

3.

Open Data: You might be interested in looking at some of the publicly available datasets provided at
these sites.

 https://www.data.gov/
 https://www.ons.gov.uk/
 https://ec.europa.eu/eurostat
 https://www.ine.es/en/welcome.shtml

Google Maps Traffic: Another big data resource that students may use every day

 Go to maps.google.com and zoom in on your town or city.


 Turn on the Live Traffic view for your area or a nearby town or city.
 The map should show real-time traffic data.

1.7. Use of English


1. Multiple Choice Cloze
For questions 1- 8, read the text below and decide which answer (A, B, C or D) best fits each gap.

The Netherlands
Welcome to the Netherlands, a tiny country that only extends, at its broadest, 312 km north to
south, and 264 km east to west - (1) ... the land area increases slightly each year as a (2) ... of
continuous land reclamation and drainage. With a lot of heart and much to offer, 'Holland,' as it
is (3) ... known to most of us abroad - a name stemming (4) ... its once most prominent provinces -
has more going on per kilometre than most countries, and more English-speaking natives. You'll be
impressed by its (5) ... cities and charmed by its countryside and villages, full of contrasts. From the
exciting variety (6) ... offer, you could choose a romantic canal boat tour in Amsterdam, a Royal Tour
by coach in The Hague, or a hydrofoil tour around the biggest harbour in the world - Rotterdam. In
season you could visit the dazzling bulb fields, enjoy a full day on a boat, or take a bike tour through
the pancake-flat countryside spiced with windmills. The possibilities are countless and the
nationwide tourist office, which is on hand to give you information and (7) ... reservations. You'll
have (8) ... language problems here, as the Dutch are true linguists and English is spoken here almost
universally.

63
Module 3: Big Data and Databases 2022-2023

1. a) so b) despite c) in spite of d) although

2. a) whole b) consequently c) rule d) result

3. a) regularly b) occasionally c) commonly d) unusually

4. a) in b) from c) on d) of

5. a) historic b) historical c) historically d) historian

6. a) at b) in c) on d) for

7. a) sit b) catch c) do d) make

8. a) few b) a few c) little d) a little

2. Open Cloze
For questions 1-8, read the text below and think of the word which fits each gap. Use only one word
in each gap.

Cats

Cats of all kinds are present in the legends, religion, mythology, and history of (1) ________ different
cultures. Cave paintings created by early humans display different types of wild cats (2) __________
are now extinct, or no longer around. Many of these great beasts saw humans as food, but were
hunted by humans in return. Cats similar (3) __________ the ones kept as pets today started
showing up in artwork thousands of years ago. For example, the ancient Egyptians believed cats
were the sacred, or special, animal of a goddess named Bast. They believed that Bast often appeared
as a cat, so many ancient Egyptians respected and honoured cats and kittens. (4) __________, other
cultures feared cats or thought that they brought illnesses and bad luck. Today, with millions kept as
pets in homes around the world, cats have become important members of many families. No one
knows for sure when or (5) __________ cats became very popular household pets. It's possible that
people noticed how cats hunted mice and rats, (6) __________ they set food and milk out to keep
the cats near their homes. This helped to prevent (7) __________ of these rodents (8) ___________
coming into homes and eating people's food or spreading sickness.

3. Word Formation
For questions 1- 8, read the text below. Use the word given in capitals at the end of some of the lines
to form a word that fits in the gap in the same line.

About Fish and Aquariums


There are more than 200,000 species of fish inhabiting many (1) __________ waters. New DIFFER
species of fish are discovered every year. From the deepest part of the seas thousands of
feet down in total (2) __________, to the beautiful aqua-blue waters of the coral reefs, to DARK
the streams, lakes, and ponds of freshwater found throughout the world, fish have
BEHAVE
adapted an incredible variety of life-forms, styles, and (3) __________ . The group of
aquatic animals we call fishes has evolved for over 400 million years to be the

64
Module 3: Big Data and Databases 2022-2023

most (4) __________ and diverse of the major vertebrate groups. Forty-one percent of NUMBER
the world's fish species inhabit only fresh water. This is pretty (5) __________ considering
AMAZE
that fresh water covers only 1 percent of the world's surface. As you probably already
know salt water covers 70 percent of the earth's surface. So the number
and (6) __________ of fresh water species to marine or saltwater species is all the more VARY
mind-boggling. While they inhabit the smallest amount of water, they have, in fact,
WIDE
adapted to a much (7) __________ range of habitats and to a greater variety of water
conditions. Let's take a closer look at the unique adaptations of fish that have allowed
SUCCESS
them to live so (8) __________ in the medium we call water.

4. Keyword Transformation
For questions 1-6, complete the second sentence so that it has a similar meaning to the first sentence,
using the word given. Do not change the word given. You must use between two and five words,
including the word given.

1) It wasn't Mark that you met in the shop.


HAVE
It __________________________ Mark that you met in the shop.

2) She was just going to have her breakfast when the phone rang.
ABOUT
She was just __________________________ breakfast when the phone rang.

3) Steve didn't manage to complete his work.


FAILED
Steve __________________________ his work.

4) How long has she been studying English?


BEGIN
When _________________________ studying English?

5) George wrote his last novel five years ago.


WAS
It _________________________ George wrote his last novel.

6) Nobody took any notice of his bad behaviour.


ATTENTION
Nobody _________________________ his bad behaviour.

7) Was it necessary for her to spend so much money on it?


HAVE
Did ________________________ spend so much money on it?

65
Module 3: Big Data and Databases 2022-2023

8) She's driving too fast for me to keep up with her.


ENOUGH
She ________________________ for me to keep up with her.

9) It's possible that he hasn't been informed about his uncle's death.
MIGHT
He ________________________ informed about his uncle's death.

10) Mark is very patient, he'll never give up.


TOO
Mark is ________________________ give up.

1.8. Reading & Speaking: You and your data

Preparation task

Match the definitions (a–h) with the vocabulary (1–8).

Vocabulary Definition

1. ...... data
2. ...... to be aware of
3. ...... consent
4. ...... to keep track / to track
5. ...... a scandal
6. ...... targeted
7. ...... to regulate
8. ...... to compromise

a) directed at a particular person or group


b) permission to do something
c) to risk having a harmful effect on something
d) to control an activity or process, especially with rules
e) information, especially facts or numbers, that is collected for a future purpose
f) to study or record someone’s behaviour over time
g) to have noticed or know about something
h) a public feeling of shock and disapproval

You and your data

As the internet and digital technology become a bigger part of our lives, more of our
data becomes publicly accessible, leading to questions about privacy. So, how do we
interact with the growing digital world without compromising the security of our
information and our right to privacy?

66
Module 3: Big Data and Databases 2022-2023

Imagine that you want to learn a new language. You search ‘Is German a difficult
language?’ on your phone. You click on a link and read an article with advice for learning
German. There’s a search function to find German courses, so you enter your city name.
It asks you to activate location services to find courses near you. You click ‘accept’. You
then message a German friend to ask for her advice. When you look her up on social
media, an advertisement for a book and an app called German for Beginners instantly
pops up. Later the same day, while you’re sending an email, you see an advert offering
you a discount at a local language school. How did they know? The simple answer is
online data. At all stages of your search, your devices, websites and applications were
collecting data on your preferences and tracking your behaviour online. ‘They’ have
been following you.

Who uses our data and why?


In the past, it was easy for people to keep track of their personal information. Like their
possessions, people’s information existed mostly in physical form: on paper, kept in a
folder, locked in a cupboard or an office. Today, our personal information can be
collected and stored online, and it’s accessible to more people than ever before. Many
of us share our physical location, our travel plans, our political opinions, our shopping
interests and our family photos online – as key services like ordering a takeaway meal,
booking a plane, taking part in a poll or buying new clothes now take place online and
require us to give out our data.

Every search you make, service you use, message you send and item you buy is part of
your ‘digital footprint’. Companies and online platforms use this ‘footprint’ to track
exactly what we are doing, from what links we click on to how much time we spend on a
website. Based on your online activity, they can guess what you are interested in and
what things you might want to buy. Knowing so much about you gives online platforms
and companies a lot of power and a lot of money. By selling your data or providing
targeted content, companies can turn your online activity into profit. This is the
foundation of the growing industry of digital marketing.

Can you protect your data?


Yes … and no! Some of the time our personal data is shared online with our consent. We
post our birthday, our photographs and even our opinions online on social media. We
know that this information is publicly accessible. However, our data often travels further
than we realise, and can be used in ways that we did not intend. Certain news scandals
about data breaches, where personal data has been lost, leaked or shared without
consent, have recently made people much more aware of the potential dangers of
sharing information online. So, can we do anything to protect our data? Or should we
just accept that in fact nothing is ‘free’ and sharing our data is the price we have to pay
for using many online services? As people are increasingly aware of and worried about
data protection, governments and organisations are taking a more active role in
protecting privacy. For example, the European Union passed the General Data
Protection Law, which regulates how personal information is collected online. However,
there is still much work to be done. As internet users, we should all have a say in how
our data is used. It is important that we pay more attention to how data is acquired,

67
Module 3: Big Data and Databases 2022-2023

where it is stored and how it is used. As the ways in which we use the internet continue
to grow and change, we will need to stay informed and keep demanding new laws and
regulations, and better information about how to protect ourselves.

Task 1

Are the sentences true or false?

1. Information about you is collected when you look at websites. True or False
2. Using different devices (for example, your phone and your laptop) makes it
impossible for companies to track you. True or False
3. The train of information you leave online is called your ‘digital footprint’. True or
False
4. Companies use your digital footprint to make money. True or False
5. This issue has not been in the news, so most people are completely unaware of
it. True or False
6. European law on the protection of online data has changed. True or False
7. The writer thinks the new law has solved the problem. True or False
8. The article concludes by saying individuals should stay up to date and know how
their information is used. True or False

Task 2

Complete the sentences with the following words:

Aware, compromise, consent, data, regulates, scandal, targeted, track

1. Until recently, many people were not ............................. of how much of their
personal information was collected and shared.
2. Our devices, websites and applications collect ....................... about our online
behaviour.
3. Information about products you are interested in is used to create ..................
advertising.
4. The news of how certain applications used people’s private information
caused a .................................... .
5. People felt their information had been used for purposes that they had not
agreed to, without their .................................... .
6. The General Data Protection Law....................................how personal data is
collected online.
7. When private information was stored physically, on paper, it was easier to
keep ............................ of where your data went.
8. If you want to use many online apps and services, you still have to ..................
your right to privacy.

Discussion

68
Module 3: Big Data and Databases 2022-2023

In pairs, make a list of all the data you think is stored about you on Google, Facebook
etc.

What do you do to protect your data?

69

You might also like