You are on page 1of 40

BIG DATA Management

Big Data Fundamental Concept


and Ecosystem
Kuliah Business Intelligence Semester Gasal 2022/2023
Learning Objective

• Understand the big data ecosystem


for enterprises.

2
Outline
• Big Data for enterprises
• Big Data Ecosystem:
– Big Data Platform
– Big Data and Data Science
– Skills for Data Scientists
– Data Science Process

3
Why do we learn Big Data
Because everybody does

4
Why do we learn Big Data

5
Why do we learn Big Data

6
What is Big Data?

“Big Data is high-volume, high-


velocity, and/or high-variety
information assets that demand
cost-effective, innovative forms of
information processing that enable
enhanced insight, decision making
and process automation.”
– Gartner
7
What is Big Data ?
“Big Data refers to the dynamic, large and
disparate volumes of data being created by
people, tools and machines. It requires new,
innovative, and scalable technology to
collect, host and analytically process the vast
amount of data gathered in order to derive
real-time business insights that relate to
consumers, risk, profit, performance,
productivity management and enhanced
shareholder value.”
– Ernst & Young (EY)

8
What is Big Data ?
• Big data: the enhancement of the Data
Warehousing, Business
Analytics/Intelligence (Erl, 2016)
• Because of:
– Computing perfect storm: four global things,
including: Moore’s Law, Mobile Computing,
Social Networking, Cloud Computing
– Data perfect storm: Volume, Velocity, dan
Variety data
– Convergence perfect storm: Data Management
and Analytic software dan hardware + open
source + commodity hardware → new Big Data
Analytics

9
What is Big Data ?
• Veteran IT: Big data is not a new
concept
• John Meister, Group Executive of
Data Warehouse Tech. MasterCard
Worldwide, manages billion of
data in the weekend
– The new innovation and technology
enable the new management of Big
Data

10
• What is Big Data ?
DIKAR (Data – Information – Knowledge – Action – Results)
Data Information
Data Processing

strategic
tactical
transactional 11
Data Analytics

12
Data Analytics
• Descriptive: Answer questions about events that
have already occurs
– What was the sales volume over past 12
months ?
– What is the monthly commission earned by
each sales agent ?
• Diagnostic: Cause of phenomenon that occurred
in the past/reason behind the event
– Why Q2 sales less than Q1 sales ?
– Why was there an increase in patient re-
admission rate over the past 3 months
13
Data Analytics: Diagnostic

14
Data Analytics
• Predictive: Attempt to determine the outcome of
an event that might occur in the future
– If a customer has purchased Products A and B,
what are the chances that they will also
purchase Product C?

15
Data Analytics
• Prescriptive: from the predictive analytics to
further create solution / prescribe for profit
maximizing and risk mitigation
– When is the best time to trade a particular
stock?
– Among three drugs, which one provides the
best results?

16
17
Business Intelligence and Big Data

18
Business Intelligence + Big Data

19
5 V’s of Big Data

20
Value
• Value yang dihasilkan dari penerapan Big Data
dapat berupa:
– Keuntungan materi (profit)
– Keuntungan non-materi (medical & social
benefit)
– Kepuasan (customer/employee/personal
satisfaction)

21
Big Data Platform
The 6 main aspects in Big Data Platform:
1. Integration
Big data manage integrated data from internal, external, unstructured,
archived data source.
Example : Data Warehouse, Hadoop Distributed File System (HDFS)
store data from many locations and sources and create centralized
data storage.

2. Analysis
Analisis big data dilakukan untuk memberikan layanan yang
sesuai dengan yang dibutuhkan berdasarkan trend atau
kecenderungan data-data yang ada, bisa dalam bentuk
descriptive, diagnostic, predictive, or prescriptive
contoh: Walmart menggunakan search engine untuk menganalisis
behavior pengguna untuk memberikan suggestion untuk item-item
yang mungkin disukai user.

22
Big Data Platform

3. Visualization
Big data harus dapat divisualisasikan dalam bentuk grafis
agar dapat mudah dipahami oleh siapapun, terutama untuk
para pengguna yang tidak memahami teknis.
Contoh

Visualisasi temperatur suatu


daerah pada peta, user akan
lebih mudah memahaminya
dibandingkan jika hanya
menampilkan data dalam
tabel

23
Big Data Platform
4. Workload Optimization
Optimalisasi sumber daya open source untuk
meningkatkan efisiensi penyimpanan dan
pemrosesan data

5. Security
Privasi data setiap orang harus dijamin dan dilindungi dengan
berkembangnya penerapan big data. Dalam penerapan big
data, harus selalu dipertimbangkan bagaimana data diperoleh,
digunakan, diproses, serta direpresentasikan. Selain itu
organisasi/perusahaan juga harus menerapkan aturan control
dan privacy policies untuk menjamin keamanan data dan user.

24
Big Data Platform

6 aspek utama dalam Big Data Platform


(cont’d):
6. Governance
Pengelolaan dan pengorganisasian dalam big data memerlukan 3 hal:
• Automated integration
memberikan akses data yang mudah dimanapun lokasi data
tersebut
• Visual context
memberikan kemudahan pengkategorian, indexing, dan discovery
pada big data untuk mengoptimalkan penggunaannya
• Agile Governance
Definisi dan eksekusi pengelolaan data dan penggunaan data
tersebut

25
Big Data and Data Science
Technical Skills for Data Scientists:
• Hadoop
open source platform for distributed massive data computing
• Apache Oozie
Jobs Scheduling System on Hadoop
• Apache Hive
data warehouse software project for data query and analysis
• Apache Flume
Log data collection, aggregation, and migration software (ETL)
• Apache Spark
open-source distributed general-purpose cluster-computing
framework (programming interface)
• Apache Pig
Program development platform runs on hadoop
• Apache Sqoop
Command line Interface (CLI) for relational data transfer on
hadoop

26
Big Data: Hadoop

• Open Source Software using Hadoop Distributed


File System (HDFS) and Parallel processing
(MapReduce)

Some Hadoop Distributions


1. Cloudera
2. Amazon Web Service (AWS) Elastic Map Reduce
3. Hortonworks
4. IBM Infosphere Insights
5. MapR
6. Microsoft Distribution (Azure)
Cloudera

Cloudera
Hadoop
Amazon Web Services
Data Science Process
“Data science is the process of cleaning, mining, and
analyzing data to derive insights of value from it”

31
Data Science Process
• Extracting insight from data, for
decision making.

32
Data Science Skills

33
Individual Assignment
If you were the CEO of the big company in (please
choose 1)
– Retail
– Manufacturing
– Energy
– Government
Define the question for:
• Descriptive Analytics
• Diagnostic Analytics
• Predictive Analytics
• Prescriptive Analytics
34
How to develop Business
Intelligence + Big Data ?
• BI Architecture:

ORDERS

Extract Query
Clean Report
SHIPPING Transform DATA Analyze
Transfer WAREHOUSE Mine
Load Visualize

INVENTORY

Data Warehouse Environment Analytical Environment


35
Basic BI components

36
How a BI system37
works.
Business Intelligence Tools

38
Data Science and Machine Learning

39
References
• Michael Minelli, Big Data, Big
Analytics, Wiley, 2013
• Big Data Fundamentals: Concepts,
Drivers & Techniques (The Pearson
Service Technology Series from Thomas
Erl) 1st Edition, 2016

40

You might also like