You are on page 1of 25

BIG DATA IN 25 MINUTES

A Little history (1/3) – Storage and Processing

Not long ago, storage and massive data processing solutions were very expensive, proprietary,
and hardly scalable

Contract cost of a TeraData Disk System for the US government


A Little history (2/3) – Storage and Processing

With Big Data now we use cheap, highly redundant, scalable, and easy to operate servers

Hadoop cluster of the SurfSara Science Park in Amsterdam, with 170 nodes , totalling 1370 cores and 2.3 Pb storage size
A Little history (2/3) - Software

Before, we used proprietary, not quite adaptable, expensive, and hard to maintain software
A Little history (2/3) - Software

With Big Data, we use software which is highly scalable, open source, free, and highly specific
to each industry
A Little history (3/3) – Databases

Before Big Data, database solutions were based on the previous creation of fields
with a rigid size and format, creating limits in their scalability and in the format of the
data to be stored in them
A Little history (3/3) – DataBases

With Big Data we now have the so called “NoSQL” databases. These are created in structures of
variable size, which can thus receive data in any size and format
End result = Lots of Data

The end result of all that is that the amount of data grew exponentially, confirming the
need to store and process massive amounts of data quickly and inexpensively
The original problema however still remains…

However, the original problem remains, which is the fact that data arrive with “Volume”,
“Velocity”, and “Variery”, but still need “Veracity” (the so called “4V”)
Big Data: central part of a 3 level pyramid

In our vision, Big Data is really the central part of a three level pyramid: Data
Capture (IOT), Storage and Processing (Big Data), and Data Analysis (AI)
IOT (Internet of Things) – Data Capture

The base of the pyramid is IOT, or the “Internet of Things”. It is made of thousand
of devices, sensors, etc which capture data and transmit them over the internet
or other networks. Thus, IOT generates the data which will be processed
Big Data – Storage and Processing

In the middle we find Big Data, which is responsable for storing and processing
the data obtained by IOT, so they can be then analyzed by AI
AI – Predicting the future

Finally, the top is AI (Artificial Intelligence) is about finding pattners in the data captured by IOT
and stored/processed by Big Data, to be able to predict the future and thus react accordingly

Hari Sheldon, fictitious characted created by Isaac Asimov, who could predict the future with math formulas
Impact – Greater competition

But the most striking effect of Big Data is that, just like the internet allowed the small
enterprise to compete with large ones in foreign markets, Big Data will allow it to know its
processes and customers as well as the large ones and, again, compete effectively.
Need: Multidisciplinary team

Again, just like the internet showed the need of a multidisciplinary team which included
Marketing, Management, etc, the same happens with Big Data, which generally use:
1 – SysAdmin and SysOps
2 – Programmers
3 – Mathematicians and Statisticians
4 – System Architects
5 – Marketing Managers
Examples Big Data projects (1/4) – Power use prediction

FACTS: All of the power grid in Spain shall have Smart Meters before 31 of December
of 2018, as required by the government

CHALLENGE: An effective implementation means a huge amount of data, as each


Smart Meter is expected to send data every 20 seconds

EXAMPLE: Power company Endesa is leading the change in Spain with over 3.5 million
meters installed: over 30% of its client base. Similarly, swedish company Sweco
processed 5 billion lines of 200 thousand clients in 3 years

BENEFITS: Allows to follow in real time the level of usage of the grid, redirecting
power as needed. Also, allows to predict the increase in power needs, thus
proactively increasing the equipment required to handle the future load
Examples Big Data projects (2/4) – Corporate image

FACTS: In 2010, 1,5 million people saw a video promoted by Greenpeace on how Nestle’s Kit
Kat chocolate bar was in fact killing orangutans. The company only reacted after receiving
over 200 thousand emails protesting, and tried to erase the video and the comments in
youtube.

CHALLENGE: “Sentiment Analysis”, or the automated read of online comments by


customers, is quite still in its infancy and thus the libraries many times are not in the
language used in the comments

EXAMPLE: Today Nestlé uses Sentiment Analysis, which allowed it to move from the 16th to
the 12th position in the “Most Respected Company” index in the world. Similarly, Exxon
developed with IHS a system which analyzed over 20 thousand tweets the public’s opinión
on fracking

BENEFITS: Increase in the company’s corporate image and thus on amount of customers
Implementación
Examples Big típica
Data projects (3/4) 3 Management
– Fleet
Análisis Geoposicional de la flota

FACTS: Corporate car fleets are frequently victim of unauthorized use, generating delays on
deliveries, customer support, etc, increased costs in fuel, and a decrease in the vehicle’s value

CHALLENGE: Geopositioning is inherently imprecise and thus requires much processing in


order to avoid errors

EXAMPLE: The city of Boston was able to eliminate potholes in its streets thanks to an app
which reads the user’s mobile accelerometer and thus identifies each time his car hits a
pothole

BENEFITS: Reduction in fuel use, increase in the city’s fleet time, decrease in the amount og
fraud and fines, and an increase in the city’s image by tourists
Examples Big Data projects (4/4) – Churning Control

FACTS: The impossibility to predict when a customer plans to leave creates a loss in
revenues

CHALLENGE: Any prediction system requires a large amount of pre-collected data from
several diverse sources

EXAMPLE: T-Mobile developed a system which calculates the customer lifetime value based
on 3 variables: Billing analysis, Drop call analysis, and Sentiment analysis. The end result is
that churning decreased by 50% in only 3 months

BENEFITS: Reduction in the amount of customers lost, increase in market share, and
increase in customer satisfaction
Reality of Big Data today (1/4)

Sadly however, most part of the Big Data implementations today are the simple gathering
and storing of data without any need or focus

Episode of the TV show “South Park”, where gnomes collect underwear to make money off of them, but without knowing how
Reality of Big Data today (2/4)

In fact, study from December 2013 by TeraData found out that half ot the companies do not
know if they actually got any benefit from Big Data
And that hasn’t changed: later study of November 2015 by PWC showed that only 4% of the
companies actually obtain benefits of Big Data
Reality of Big Data today (3/4)

The huge growth of the amount of technologies available created a déficit in the
amount of qualified professionals and a large increase in their salaries

2012
Reality of Big Data today (4/4)

Most of the companies still program for 1st generation, when we are already at the 4th
Conclusions

1 – Big Data is not about just storing data, but about finding trends
to allow more revenue, less costs, and better service

2 – For that, Big Data allows the development of highly stable,


scalable systems at a minimum cost

3 –Big Data thus allows small enterprises to have technologies which


only large ones could afford, thus providing them with the tools to
effectively compete

4 – However, Big Data requires a highly specialized and diverse team


which does not limit itself to the TI folk and, thus, should be led
diretly by the company’s manager

5 – Thus, Big Data si not just a fad destined to a few chosen ones,
but a new technology which in a few years will be available to
everyone just as it happened with the internet
Thank you for your time

Synthetic Data
Email: info@syntheticdata.eu
Web: http://www. syntheticdata.eu

You might also like