You are on page 1of 49

Presenters: Abhishek Verma, Nicolas Zea

 Map Reduce
 Clean abstraction
 Extremely rigid 2 stage group-by aggregation
 Code reuse and maintenance difficult
 Google → MapReduce, Sawzall
 Yahoo → Hadoop, Pig Latin
 Microsoft → Dryad, DryadLINQ
 Improving MapReduce in heterogeneous
environment
Input Output
records k1 v1 k1 v1 records
map k21 v23 k1 v3 reduce
k21 v23 k1 v5
Split
Local QSort

map k21 v45 k2 v2 reduce

Split
k21 v54 k2 v4

shuffle
 Extremely rigid data flow
M R
 Other flows hacked in

M R M R
Stages Joins Splits
 Common operations must be coded by hand
 Join, filter, projection, aggregates, sorting,distinct
 Semantics hidden inside map-reduce fns
 Difficult to maintain, extend, and optimize
Christopher Olston, Benjamin Reed, Utkarsh
Srivastava, Ravi Kumar, Andrew Tomkins

Research
 Pigs Eat Anything
 Can operate on data w/o metadata : relational, nested, or
unstructured.
 Pigs Live Anywhere
 Not tied to one particular parallel framework
 Pigs Are Domestic Animals
 Designed to be easily controlled and modified by its users.
 UDFs : transformation functions, aggregates, grouping functions, and
conditionals.
 Pigs Fly
 Processes data quickly(?)

6
 Dataflow language
 Procedural : different from SQL
 Quick Start and Interoperability
 Nested Data Model
 UDFs as First-Class Citizens
 Parallelism Required
 Debugging Environment

7
 Data Model
 Atom : 'cs'
 Tuple: ('cs', 'ece', 'ee')
 Bag: { ('cs', 'ece'), ('cs')}
 Map: [ 'courses' → { ('523', '525', '599'}]
 Expressions
 Fields by position $0
 Fields by name f1,
 Map Lookup #

8
Find the top 10 most visited pages in each category
Visits URL Info
User URL Time URL Category PageRank

Amy cnn.com 8:00 cnn.com News 0.9

Amy bbc.com 10:00 bbc.com News 0.8

Amy flickr.com 10:05 flickr.com Photos 0.7

Fred cnn.com 12:00 espn.com Sports 0.9


Load Visits

Group by url

Foreach url
Load Url Info
generate count

Join on url

Group by category

Foreach category
generate top10 urls
visits = load ‘/data/visits’ as (user, url, time);
gVisits = group visits by url;
visitCounts = foreach gVisits generate url, count(visits);

urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);


visitCounts = join visitCounts by url, urlInfo by url;

gCategories = group visitCounts by category;


topUrls = foreach gCategories
generate top(visitCounts,10);

store topUrls into ‘/data/topUrls’;


visits = load ‘/data/visits’ as (user, url, time);
gVisits = group visits by url;
visitCounts = foreach gVisits generate url, count(visits);

urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);


visitCounts = join visitCounts by url, urlInfo by url;

gCategories = group visitCounts by category;


topUrls = foreach gCategories
generate top(visitCounts,10);

store topUrls into ‘/data/topUrls’;


Operates directly over files
visits = load ‘/data/visits’ as (user, url, time);
gVisits = group visits by url;
visitCounts = foreach gVisits generate url, count(visits);

urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);


visitCounts = join visitCounts by url, urlInfo by url;

gCategories = group visitCounts by category;


topUrls = foreach gCategories
Schemas 0ptional
generate top(visitCounts,10);
can be assigned
store topUrls into ‘/data/topUrls’; dynamically
visits = load ‘/data/visits’ as (user, url, time);
gVisits = group visits by url;
visitCounts = foreach gVisits generate url, count(visits);

urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);


visitCounts = join visitCounts by url, urlInfo by url;

gCategories = group visitCounts by category;


topUrls = foreach gCategories
generate top(visitCounts,10);

store topUrls into ‘/data/topUrls’; UDFs can be used


in every construct
 LOAD: specifying input data
 FOREACH: per-tuple processing
 FLATTEN: eliminate nesting
 FILTER: discarding unwanted data
 COGROUP: getting related data together
 GROUP, JOIN
 STORE: asking for output
 Other: UNION, CROSS, ORDER, DISTINCT

15
Map1 Every group or join operation forms
Load Visits
a map-reduce boundary
Group by url Reduce1
Map2
Foreach url
Load Url Info
generate count

Join on url Reduce


2

Map3
Other operations Group by category
pipelined into map and Reduce3
reduce phases Foreach category
generate top10 urls
 Write-run-debug cycle
 Sandbox dataset
 Objectives:
 Realism
 Conciseness
 Completeness
 Problems:
 UDFs

18
 Optional “safe” query optimizer
 Performs only high-confidence rewrites
 User interface
 Boxes and arrows UI
 Promote collaboration, sharing code fragments
and UDFs
 Tight integration with a scripting language
 Use loops, conditionals of host language
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu,
Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey
data plane
Files, TCP, FIFO, Network
job schedule

V V V

NS PD PD PD

Job manager control plane cluster


Collection<T> collection;
bool IsLegal(Key);
string Hash(Key);

var results = from c in collection


where IsLegal(c.key)
select new { Hash(c.key), c.value};
 Partitioning: Hash, Range, RoundRobin

Partition C# objects

Collection
 Apply, Fork
 Hints
Collection<T> collection;
bool IsLegal(Key k);
string Hash(Key);
Vertex
code var results = from c in collection
where IsLegal(c.key)
select new { Hash(c.key), c.value}; Query
plan
(Dryad job)
Data

collection

C# C# C# C#
results
Client machine

DryadLINQ Data center


C#

Distributed Invoke Query


ToDryadTable Query Expr query plan
Input Tables

Dryad
JM
Execution

Output
foreach C# Objects DryadTabl
(11) Results Output Tables
e
 LINQ expressions converted to execution plan graph (EPG)
 similar to database query plan
 DAG
 annotated with metadata properties
 EPG is skeleton of Dryad DFG
 as long as native operations are used, properties can
propagate helping optimization
 Pipelining
 Multiple operations in a single process
 Removing redundancy
 Eager Aggregation
 Move aggregations in front of partitionings
 I/O Reduction
 Try to use TCP and in-memory FIFO instead of disk space
 As information
from job becomes
available, mutate
execution graph
 Dataset size based
decisions
▪ Intelligent
partitioning of data
 Aggregation can
turn into tree to
improve I/O
based on locality
 Example if part of
computation is
done locally, then
aggregated before
being sent across
network
 TeraSort - scalability

 240 computer
cluster of 2.6Ghz
dual core AMD
Opterons
 Sort 10 billion 100-
byte records on 10-
byte key
 Each computer
stores 3.87 GBs
 DryadLINQ vs Dryad - SkyServer

 Dryad is hand
optimized
 No dynamic
optimization
overhead
 DryadLINQ is 10%
native code
 High level and data type transparent
 Automatic optimization friendly
 Manual optimizations using Apply operator
 Leverage any system running LINQ framework
 Support for interacting with SQL databases
 Single computer debugging made easy
 Strong typing, narrow interface
 Deterministic replay execution
 Dynamic optimizations appear data intensive
 What kind of overhead?
 EPG analysis overhead -> high latency
 No real comparison with other systems
 Progress tracking is difficult
 No speculation
 Will Solid State Drives diminish advantages of MapReduce?
 Why not use Parallel Databases?
 MapReduce Vs Dryad
 How different from Sawzall and Pig?
Language Sawzall Pig Latin DryadLINQ

Built by Google Yahoo Microsoft


Imperative &
Programming Imperative Imperative
Declarative Hybrid
Resemblance to SQL Least Moderate Most
Google
Execution Engine Hadoop Dryad
MapReduce
Performance * Very Efficient 5-10 times slower 1.3-2 times slower
Internal, inside Open Source Internal, inside
Implementation
Google Apache-License Microsoft

Model Operate per record Sequence of MR DAGs


+ Machine + Iterative
Usage Log Analysis
Learning computations
Matei Zaharia, Andy Konwinski, Anthony Joseph,
Randy Katz, Ion Stoica

University of California at Berkeley


 Speculative tasks executed only if no failed or waiting avail.
 Notion of progress
 3 phases of execution
1.Copy phase
2.Sort phase
3.Reduce phase
 Each phase weighted by % data processed
 Determines whether a job failed or is a straggler and
available for speculation
1. Nodes can perform work at exactly the same rate
2. Tasks progress at a constant rate throughout time
3. There is no cost to launching a speculative task on an idle
node
4. The three phases of execution take approximately same time
5. Tasks with a low progress score are stragglers
6. Maps and Reduces require roughly the same amount of work
 Virtualization breaks down homogeneity
 Amazon EC2 - multiple vm’s on same physical host
 Compete for memory/network bandwidth
 Ex: two map tasks can compete for disk bandwidth, causing one
to be a straggler
 Progress threshold in Hadoop is fixed and
assumes low progress = faulty node
 Too Many speculative tasks executed
 Speculative execution can harm running tasks
 Task’s phases are not equal
 Copy phase typically the most expensive due to network
communication cost
 Causes rapid jump from 1/3 progress to 1 of many tasks,
creating fake stragglers
 Real stragglers get usurped
 Unnecessary copying due to fake stragglers
 Progress score means anything with >80% never speculatively
executed
 Longest Approximate Time to End
 Primary assumption: best task to execute is the one
that finishes furthest into the future
 Secondary: tasks make progress at approx. constant
rate
 Progress Rate = ProgressScore/T*
 T = time task has run for
 Time to completion = (1-ProgressScore)/T
 Launch speculative jobs on fast nodes
 best chance to overcome straggler vs using first
available node
 Cap on total number of speculative tasks
 ‘Slowness’ minimum threshold
 Does not take into account data locality
 EC2 test cluster
 1.0-1.2 Ghz
Opteron/Xeon w/1.7
GB mem

Sort
 Manually slowed down 8
VM’s with background
processes

Sort
Grep WordCount
1. Make decisions early
2. Use finishing times
3. Nodes are not equal
4. Resources are precious
 Focusing work on small vm’s fair?
 Would it be better to pay for large vm and
implement system with more customized
control?
 Could this be used in other systems?
 Progress tracking is key
 Is this a fundamental contribution? Or just
an optimization?
 “Good” research?

You might also like