Professional Documents
Culture Documents
2
Outline
• Big Data for enterprises
• Big Data Ecosystem:
– Big Data Platform
– Big Data and Data Science
– Skills for Data Scientists
– Data Science Process
3
Why do we learn Big Data
Because everybody does
4
Why do we learn Big Data
5
Why do we learn Big Data
6
What is Big Data?
8
What is Big Data ?
• Big data: the enhancement of the Data
Warehousing, Business
Analytics/Intelligence (Erl, 2016)
• Because of:
– Computing perfect storm: four global things,
including: Moore’s Law, Mobile Computing,
Social Networking, Cloud Computing
– Data perfect storm: Volume, Velocity, dan
Variety data
– Convergence perfect storm: Data Management
and Analytic software dan hardware + open
source + commodity hardware → new Big Data
Analytics
9
What is Big Data ?
• Veteran IT: Big data is not a new
concept
• John Meister, Group Executive of
Data Warehouse Tech. MasterCard
Worldwide, manages billion of
data in the weekend
– The new innovation and technology
enable the new management of Big
Data
10
• What is Big Data ?
DIKAR (Data – Information – Knowledge – Action – Results)
Data Information
Data Processing
strategic
tactical
transactional 11
Data Analytics
12
Data Analytics
• Descriptive: Answer questions about events that
have already occurs
– What was the sales volume over past 12
months ?
– What is the monthly commission earned by
each sales agent ?
• Diagnostic: Cause of phenomenon that occurred
in the past/reason behind the event
– Why Q2 sales less than Q1 sales ?
– Why was there an increase in patient re-
admission rate over the past 3 months
13
Data Analytics: Diagnostic
14
Data Analytics
• Predictive: Attempt to determine the outcome of
an event that might occur in the future
– If a customer has purchased Products A and B,
what are the chances that they will also
purchase Product C?
15
Data Analytics
• Prescriptive: from the predictive analytics to
further create solution / prescribe for profit
maximizing and risk mitigation
– When is the best time to trade a particular
stock?
– Among three drugs, which one provides the
best results?
16
17
Business Intelligence and Big Data
18
Business Intelligence + Big Data
19
5 V’s of Big Data
20
Value
• Value yang dihasilkan dari penerapan Big Data
dapat berupa:
– Keuntungan materi (profit)
– Keuntungan non-materi (medical & social
benefit)
– Kepuasan (customer/employee/personal
satisfaction)
21
Big Data Platform
The 6 main aspects in Big Data Platform:
1. Integration
Big data manage integrated data from internal, external, unstructured,
archived data source.
Example : Data Warehouse, Hadoop Distributed File System (HDFS)
store data from many locations and sources and create centralized
data storage.
2. Analysis
Analisis big data dilakukan untuk memberikan layanan yang
sesuai dengan yang dibutuhkan berdasarkan trend atau
kecenderungan data-data yang ada, bisa dalam bentuk
descriptive, diagnostic, predictive, or prescriptive
contoh: Walmart menggunakan search engine untuk menganalisis
behavior pengguna untuk memberikan suggestion untuk item-item
yang mungkin disukai user.
22
Big Data Platform
3. Visualization
Big data harus dapat divisualisasikan dalam bentuk grafis
agar dapat mudah dipahami oleh siapapun, terutama untuk
para pengguna yang tidak memahami teknis.
Contoh
23
Big Data Platform
4. Workload Optimization
Optimalisasi sumber daya open source untuk
meningkatkan efisiensi penyimpanan dan
pemrosesan data
5. Security
Privasi data setiap orang harus dijamin dan dilindungi dengan
berkembangnya penerapan big data. Dalam penerapan big
data, harus selalu dipertimbangkan bagaimana data diperoleh,
digunakan, diproses, serta direpresentasikan. Selain itu
organisasi/perusahaan juga harus menerapkan aturan control
dan privacy policies untuk menjamin keamanan data dan user.
24
Big Data Platform
25
Big Data and Data Science
Technical Skills for Data Scientists:
• Hadoop
open source platform for distributed massive data computing
• Apache Oozie
Jobs Scheduling System on Hadoop
• Apache Hive
data warehouse software project for data query and analysis
• Apache Flume
Log data collection, aggregation, and migration software (ETL)
• Apache Spark
open-source distributed general-purpose cluster-computing
framework (programming interface)
• Apache Pig
Program development platform runs on hadoop
• Apache Sqoop
Command line Interface (CLI) for relational data transfer on
hadoop
26
Big Data: Hadoop
Cloudera
Hadoop
Amazon Web Services
Data Science Process
“Data science is the process of cleaning, mining, and
analyzing data to derive insights of value from it”
31
Data Science Process
• Extracting insight from data, for
decision making.
32
Data Science Skills
33
Individual Assignment
If you were the CEO of the big company in (please
choose 1)
– Retail
– Manufacturing
– Energy
– Government
Define the question for:
• Descriptive Analytics
• Diagnostic Analytics
• Predictive Analytics
• Prescriptive Analytics
34
How to develop Business
Intelligence + Big Data ?
• BI Architecture:
ORDERS
Extract Query
Clean Report
SHIPPING Transform DATA Analyze
Transfer WAREHOUSE Mine
Load Visualize
INVENTORY
36
How a BI system37
works.
Business Intelligence Tools
38
Data Science and Machine Learning
39
References
• Michael Minelli, Big Data, Big
Analytics, Wiley, 2013
• Big Data Fundamentals: Concepts,
Drivers & Techniques (The Pearson
Service Technology Series from Thomas
Erl) 1st Edition, 2016
40