You are on page 1of 15

Data-parallel programming

Goals of this lecture


Understand design decisions in programming and query languages that facilitate parallelization. Foundations of (higher-order) functional programming: the general story of map and reduce.
Scala programming with collections Monad algebra and nested relational algebra (Piglatin) Pregel

Nested relational algebra (NRA) and monad algebra


NRA: relational algebra + nest, unnest. Monad algebra see separate slides. Found query languages on explicit data-parallel operations such as map. Work with complex values: (serialized) objects, XML, JSON, Programming language embedding fix the impedance mismatch.

Piglatin
Hatway hriscay Olstonlay oesday otnay antway ouyay otay nowkay: Igpay atinlay islay ustjay estednay elationalray algebrajay, othingnay oremay.

Classroom team task #1

Pagerank
Input: Weighted web graph/stochastic matrix M. Perform a random walk following edges probabilistically. PR: probability of being at any given node at a given time. - Principal eigenvector of the Web graph. (fixed point of the equation M*p=p)
1

0 1 1 x x .5 0 0 * y = y .5 0 0 z z

PageRank, ctd.
The eigenvector can be computed by starting with a random vector p, and iteratively multiplying with M. The Web graph is a Markov chain, and some MCs have bad properties (are not ergodic, so convergence does not happen).

Trick: make random surfer sometimes stop surfing and jump to a random node. The Web graph becomes complete; but we want the matrix to remain sparse:
PR(u) = PR(v) * w(u,v) * (1-lambda) + lambda / N

DryadLinQ

Googles Pregel
Bulk-synchronous parallel (BSP) programming model Supersteps: in each superstep, each node's compute function is called (in parallel) The compute node may send messages to other nodes, which are received in the next superstep. A nodes compute fn processes received msgs from the previous superstep.

Pregel

You might also like