You are on page 1of 22

Pertemuan 1

Perkenalan Konsep
Maha Data

Buku referensi:
Akhtar, S. M. F. (2018). Big Data Architect’s Handbook: A guide to building proficiency in tools and systems
used by leading big data experts. Packt Publishing Ltd.
Definisi Maha Data
What is Big Data?
Big data Definitions

◉ If we take a simpler definition, big data basically a huge volume of


data that cannot be stored and processed using the traditional
approach.

◉ In precise definition, data that is massive in volume, with respect


to the processing system, with a variety of structured and
unstructured data containing different data patterns to be analyzed.
Karakteristik Maha
Data
Characteristics of Big Data
Characteristics of Big Data
1 Volume

◉ In a big data context, it is an amount of data that is massive with


respect to the processing system that cannot be gathered, stored,
and processing using traditional approach.

◉ It is data at rest that is already collected and streaming data that is


continuously being generated.
Volume of data/information created worldwide from 2010 to 2024 (in zettabytes)
Resource: statisca.com
2 Velocity

◉ The rate at which the data is being generated, or how fast the data
is coming in.

◉ The period of time during which data will make sense and be
valuable.
3 Variety

◉ The classification of data: structured or unstructured data.

◉ Structured data: is preferred for information that has a predefined


schema or that has a data model with predefined columns, data
types, and so on.

◉ Unstructured doesn’t have any of those characteristics. Kind of


unstructured data needs to be processed, either for a better user
experience or to generate revenue for the companies itself.
4 Veracity

◉ This vector deals with the uncertainty of data. It may be because of


poor data quality or because of the noise of data.

◉ Veracity is all about uncertainty and how much trust you have in
your data, but when we use it in terms of the big data context, it
may be that we have to redefine trusted data with a different
definition.
5 Variability

◉ This vector of big data derives from the lack of consistency or


fixed patterns in data.

◉ Consider the same for data; if the meaning and understanding of


data keeps on changing, it will have a huge impact on your
analysis and attempts to identify patterns.
6 Value

◉ After addressing all the other Vs, which takes a lot of time, effort,
and resources, now it’s time to decide whether it’s worth storing
that data and investing in infrastructure, either on premise or the
cloud.

◉ One aspect of value is you have to store a huge amount of data


before you can utilize it in order to give valuable information in
return.
Pendekatan Maha
Data
Solution-based approach of data
Clustered Computing

◉ A set of computers connected to each other in such a way they act


as a single server to the end user.

◉ It can be configured to work with different characteristics that


enable high availability, load balancing, and parallel processing.

◉ Each computer in these configurations is called a node.


Illustration of a computer clustered environment
Benefits

High availability Resource pooling Easy scalability


Cluster computing Not just data storage To add additional
provides fault tolerance capacity is shared; storage capacity or
tools and mechanism to CPU and memory computational power,
provide maximum pooling can also be just add new
uptime without affecting utilized in individual machines with the
performance, so that computers to process required hardware to
everyone has their data different tasks the group.
ready for analysis and independently and then
processing. merge outputs to
produce a result.
Perbedaan Maha
data
How does it make a difference?
◉ Big data solutions focused on combining all the data dimensions that were
previously ignored or considered of minimum value, taking all the available
sources and types into considerations and analyzing them for different and
difficult-to-identify patterns.

◉ Big data solutions are not just about the data itself or other characteristics of
data; it is also about affordability, making it easier for organization to store all
of their data for analysis and in real time, if required.

◉ Big data solutions comprise clustered computing mechanism, which involve


commodity hardware with no high-end servers or resources and can easily be
scaled up or down.
Big data solutions - Cloud vs. On-
Premises infrastructure
Big data solutions - Cloud vs. On-
Premises infrastructure

◉ Cost
◉ Security
◉ Current capabilities
◉ Scalability
Thanks!

You might also like