Professional Documents
Culture Documents
Jiaul Paik
Lecture 3
Old Tools for Big Data Processing
Shared Memory Message Passing
• Programming models
Memory
• Shared memory (pthreads)
• Message passing (MPI) P1 P2 P3 P4 P5 P1 P 2 P 3 P 4 P 5
• Design Patterns
• Master-slaves
• Producer-consumer flows
• Shared work queues producer consumer
master
work queue
slaves
producer consumer
Some Difficulties
• Concurrency is difficult to reason about
• At the scale of datacenters and across datacenters
• In the presence of failures
• In terms of multiple interacting services
• Debugging: Even more difficult
• The reality:
• Lots of one-off solutions, custom code
• Write you own dedicated library, then program with it
• Burden on the programmer to explicitly manage everything
Source: MIT Open Courseware
Source: MIT Open Courseware
The datacenter is the computer!
Source: Google
The datacenter is the computer
• It’s all about the right level of abstraction
• Needs new “instruction set” for datacenter computers
“Work”
Partition
w1 w2 w3
r1 r2 r3
“Result” Combine
Building Blocks
• You have 100 Billion web pages in millions of files. Your goal is to
compute the count of each word appearing the collection.
Standard Solution
• Use multiple interconnected machine
Rack
switch
1 gbps between
switch switch any pair of nodes
Rack 1 Rack 2
• Node failures
• Data loss
• Network bottleneck
• Why?
• The programmer has to manage too many low level things apart from writing
code for the task
Other Challenges
• Thus, we need:
• Semaphores (lock, unlock)
• Conditional variables (wait, notify, broadcast)
The datacenter is the computer
a data data
dat d e
co
code
data
Centralized code co code
de
processor
data data
da
ta
“Work”
Partition
w1 w2 w3
r1 r2 r3
“Result” Combine
Programming Model
(Hadoop Map-reduce)
Simple programming model: Map-reduce
• Simple programing model
• Mainly using two functions (map-reduce)
Map
Reduce
Programmer’s responsibility:
define only two functions, Map and Reduce suitable for your problem
Word Count Using MapReduce: Pseudocode
map(key, value)
// key: document name; value: text of the document
for each word w in value
emit(w, 1)
reduce(key, values)
// key: a word; value: set of counts values for a word
result = 0
for each count v in values:
result += v
emit(key, result)
Programming Model
Splitting job
Intermediate combining
Final combining
Map-reduce Data Flow
map task
HDFS data block
• Off-rack processing
HDFS
replication
• Commodity hardware
• Uses cluster of commodity hardwares
• Thus chance of node failure is high
• HDFS is designed to handle such failures without noticeable interruption
When HDFS does not work well
• Low-latency data access
• Applications that require tens of milliseconds range
• It provides high throughput but at the expense of low latency
• Datanode or workers
Namenode
• The namenode manages the filesystem namespace.
• It maintains the filesystem tree and the metadata for all the files and
directories in the tree.
• This information is stored persistently on the local disk in two files:
• the namespace image and the edit log.
• The namenode also knows the datanodes on which all the blocks for a
given file are located
• They store and retrieve blocks when they are told to (by clients or
the namenode)
• Datanode failure
• Won’t be a problem
• Data blocks are stored in many machines
• Can be recovered from another machine
HDFS (Hadoop) Architecture
namenode = master node
HDFS namenode
Application /foo/bar
(file name, block id)
File namespace block 3df2
HDFS Client
(block id, block location)
instructions to datanode
datanode state
(block id, byte range)
HDFS datanode HDFS datanode
block data
Linux file system Linux file system
… …
… … …
slave node slave node slave node
Block Caching
• Generally, datanodes read blocks from the disk
• Job schedulers tries to run the code on the block that is cached
HDFS Federation
• This means, for a very large cluster, namenode may run out of
memory to hold the metadata
• How to handle?
• Maintain a replica of the metadata into another passive machine
• When the active namenode fails, the admin can start the passive namenode
• It needs to load the namepace into memory before it starts
Filesystem Operations
• Major Filesystem operations:
• reading files, creating directories, moving files, deleting data, and listing
directories.
hadoop fs -help
Filesystem Operations
• Copying a file from the local filesystem to HDFS