Materi-Is 502-M01-Perkenalan Konsep Maha Data-Gsl 2020

Pertemuan 1
Perkenalan Konsep
Maha Data
Buku referensi:
Akhtar, S. M. F. (2018). Big Data Architect’s Handbook: A guide to building proficiency in tools and systems
used by leading big data experts. Packt Publishing Ltd.
Definisi Maha Data
What is Big Data?
Big data Definitions
◉ If we take a simpler definition, big data basically a huge volume of

data that cannot be stored and processed using the traditional
approach.
◉ In precise definition, data that is massive in volume, with respect

to the processing system, with a variety of structured and
unstructured data containing different data patterns to be analyzed.
Karakteristik Maha
Data
Characteristics of Big Data
Characteristics of Big Data
1 Volume
◉ In a big data context, it is an amount of data that is massive with

respect to the processing system that cannot be gathered, stored,
and processing using traditional approach.
◉ It is data at rest that is already collected and streaming data that is

continuously being generated.
Volume of data/information created worldwide from 2010 to 2024 (in zettabytes)
Resource: statisca.com
2 Velocity
◉ The rate at which the data is being generated, or how fast the data
is coming in.
◉ The period of time during which data will make sense and be
valuable.
3 Variety
◉ The classification of data: structured or unstructured data.
◉ Structured data: is preferred for information that has a predefined

schema or that has a data model with predefined columns, data
types, and so on.
◉ Unstructured doesn’t have any of those characteristics. Kind of

unstructured data needs to be processed, either for a better user
experience or to generate revenue for the companies itself.
4 Veracity
◉ This vector deals with the uncertainty of data. It may be because of

poor data quality or because of the noise of data.
◉ Veracity is all about uncertainty and how much trust you have in
your data, but when we use it in terms of the big data context, it
may be that we have to redefine trusted data with a different
definition.
5 Variability
◉ This vector of big data derives from the lack of consistency or

fixed patterns in data.
◉ Consider the same for data; if the meaning and understanding of

data keeps on changing, it will have a huge impact on your
analysis and attempts to identify patterns.
6 Value
◉ After addressing all the other Vs, which takes a lot of time, effort,
and resources, now it’s time to decide whether it’s worth storing
that data and investing in infrastructure, either on premise or the
cloud.
◉ One aspect of value is you have to store a huge amount of data

before you can utilize it in order to give valuable information in
return.
Pendekatan Maha
Data
Solution-based approach of data
Clustered Computing
◉ A set of computers connected to each other in such a way they act

as a single server to the end user.
◉ It can be configured to work with different characteristics that

enable high availability, load balancing, and parallel processing.
◉ Each computer in these configurations is called a node.

Illustration of a computer clustered environment
Benefits
High availability Resource pooling Easy scalability

Cluster computing Not just data storage To add additional
provides fault tolerance capacity is shared; storage capacity or
tools and mechanism to CPU and memory computational power,
provide maximum pooling can also be just add new
uptime without affecting utilized in individual machines with the
performance, so that computers to process required hardware to
everyone has their data different tasks the group.
ready for analysis and independently and then
processing. merge outputs to
produce a result.
Perbedaan Maha
data
How does it make a difference?
◉ Big data solutions focused on combining all the data dimensions that were
previously ignored or considered of minimum value, taking all the available
sources and types into considerations and analyzing them for different and
difficult-to-identify patterns.
◉ Big data solutions are not just about the data itself or other characteristics of
data; it is also about affordability, making it easier for organization to store all
of their data for analysis and in real time, if required.
◉ Big data solutions comprise clustered computing mechanism, which involve

commodity hardware with no high-end servers or resources and can easily be
scaled up or down.
Big data solutions - Cloud vs. On-
Premises infrastructure
Big data solutions - Cloud vs. On-
Premises infrastructure
◉ Cost
◉ Security
◉ Current capabilities
◉ Scalability
Thanks!

Materi-Is 502-M01-Perkenalan Konsep Maha Data-Gsl 2020

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Materi-Is 502-M01-Perkenalan Konsep Maha Data-Gsl 2020

Uploaded by

Copyright:

Available Formats

Pertemuan 1

◉ If we take a simpler definition, big data basically a huge volume of

◉ In precise definition, data that is massive in volume, with respect

◉ In a big data context, it is an amount of data that is massive with

◉ It is data at rest that is already collected and streaming data that is

◉ The classification of data: structured or unstructured data.

◉ Structured data: is preferred for information that has a predefined

◉ Unstructured doesn’t have any of those characteristics. Kind of

◉ This vector deals with the uncertainty of data. It may be because of

◉ This vector of big data derives from the lack of consistency or

◉ Consider the same for data; if the meaning and understanding of

◉ One aspect of value is you have to store a huge amount of data

◉ A set of computers connected to each other in such a way they act

◉ It can be configured to work with different characteristics that

◉ Each computer in these configurations is called a node.

High availability Resource pooling Easy scalability

◉ Big data solutions comprise clustered computing mechanism, which involve

You might also like