You are on page 1of 1

read-through scalability

data transfer engine (file/relational)


write-through redundancy & availability
on-disk storage device in-memory data grid (IMDG)
write-behind fast access
processing engine (batch)
refresh-ahead
In-Memory Storage Devices Storage Device Characteristics long-term storage
query engine in-memory database (IMDB)
schema-less storage
workflow engine simple example
inexpensive storage
Big Data Solutions
data transfer engine (event/file/relational) complex example
storage device (on-disk/in-memory) distributed file system
On-Disk Storage key-value
processing engine (batch/realtime) database RDBMS
column-family
analytics engine NoSQL
document
serialization engine NewSQL
graph
compression engine
workflow engine
map
associative property combine (optional)
commutative property
Advanced MapReduce map task partition
MapReduce Algorithms
shuffle and sort
reduce task
reduce

master-slave
replication
data collection peer-to-peer
data refinement stages Big Data Pipeline
sharding consistency
data consumption
availability
Module 9 Big Data Storage Terminology & Concepts CAP theorem partition tolerance

aggregators
Big Data Engineering Lab atomicity
ACID
Bulk Synchronous Parallel (BSP) Processing Engine consistency
local processing combiners BASE isolation
communication durability
barrier synchronization superstep stages
basically available
soft state
cyclic graph graph types
eventual consistency
acyclic graph
weighted graph
cluster
unweighted graph
uni-directed graph Fundamental Big Data Processing batch mode
directed graph
bi-directed graph realtime mode
undirected graph

1. establish and evaluate data inputs and outputs


2. determine data wrangling requirements
3. select data representation format
4. assess processing engine suitability Big Data Solutions Design Process distributed/parallel data processing
5. develop data processing routines schema-less data processing
6. develop visualizations multi-workload support
7. automate solution execution Processing Engine Characteristics scalability
extract-load-transform (ELT)
Realtime Big Data Processing redundancy & fault-tolerance
event stream processing (ESP) low cost

complex event processing (CEP)

speed SCV serialization engine


consistency
compression engine
Advanced Mechanisms
volume

Module 9: Big Data Engineering Lab Big Data Science Certified Professional (BDSCP) Program
Official Mind Map Supplement Copyright © Arcitura Education Inc. www.arcitura.com

You might also like