Professional Documents
Culture Documents
Figure 2-3 shows wide dependencies between partitions. In this case the child parti‐
tions (shown at the bottom of Figure 2-3) depend on an arbitrary set of parent parti‐
tions. The wide dependencies (displayed as red arrows) cannot be known fully before
the data is evaluated. In contrast to the coalesce operation, data is partitioned
according to its value. The dependency graph for any operations that cause a shuffle
(such as groupByKey, reduceByKey, sort, and sortByKey) follows this pattern.
The join functions are a bit more complicated, since they can have wide or narrow
dependencies depending on how the two parent RDDs are partitioned. We illustrate
the dependencies in different scenarios for the join operation in “Core Spark Joins”
on page 73.