Professional Documents
Culture Documents
Collect, Analyze
Visualize, Report
Learning, Optimizing, Monitoring, Pattern detection
New business models
In 2003, 2004, 2005 1. Google File System (GFS)
Google storing all web content
Google released three
academic papers 2. Map-Reduce
describing Google’s Google calculating PageRank and web search index
technology for massive
data processing 3. BigTable
Google storing Crawling data Analytics, Earth and
Personalized Search in columnar database
HADOOP
In 2004/5 Doug Cutting developed Nutch
open source web search engine struggling
with huge data processing issues
2007
Google
file system
reimpleme
ntation
Big Data technical stack
Document databases
Business analytics tools Data Mining /Modeling tools
Data integration
Columnar databases
Business analytics Data Mining / Modeling
Machine- On-line
SQL Real-time
learning Database
Batch/Map-
Data Sources Reduce
Script Search In-Memory Output
… Batch data Metadata management (HCatalog) …
integration
… …
Hadoop cluster of hosts
.. Streaming data ….
flow Cluster management / monitoring (Ambari)
.. ….
HDFS
Relational Data vs BIG DATA
Relational data DATA BIG DATA
management management
Apply data schema
(ETL)
Store data
Store in Relational
database Apply data schema
Schema
on read
Apply analytics Apply analytics
Algorithms Algorithms
Hadoop based
data Data Warehouse
warehouse
GPRS /
Customer
internet CRM, SCM, Legacy data 3rd party
CDR data behavioural
sessions ERP sources data sources
data
data
Big Data values for Industry 4.0
Efficient technology Data capture and analysis New opportunities