Professional Documents
Culture Documents
Expanded View:
Global View:
Degree of Parallelism
Fan-out Flow
• Open figure-04.
• Create a copy of the Reformat and the Simple-Out dataset (use Edit...Copy and
Edit…Paste).
Degree of Parallelism
(Abstract)
• Open figure-04.
• Run the application and examine the results (use the “Partition”
option in View Data).
0345Smith Bristol 56
0121Forth Bristol 7 Bristol 63
0322Jones Compton 12 Compton 12
0212Spade London 8
0492West London 23 London 31
0221Black New York 42 New York 42
Expanded View:
Score 1
Departition
Score
2 Output File
Score
3
Global View:
Free space
Used space
Sorted data:
49Jane 02241 2
44Bob 02116 8
43Mark 02114 9
47Bill 02114 14
46Rick 02116 23
42John 02116 30
48Mary 02116 38
45Sue 02241 92
Blocked components
• Not key-based.
• Result ordering is by partition.
• Serializes pipelined computation.
• Useful for:
• creating serial flow from partitioned data
• appending headers and trailers
• writing DML
• Used infrequently
Blocked components
• Key-based.
• Result ordering is sorted if each input is sorted.
• Possibly synchronizes pipelined computation; may
even serialize.
• Useful for creating ordered data flows.
• Used more than concatenate, but still infrequently
Reading flows in
round-robin sequence
• Not key-based.
• Result ordering is inverse of round-robin.
• Synchronizes pipelined computation.
• Useful for restoring original order following a
record-independent parallel computation
partitioned by round-robin.
• Used in rare circumstances
Reading flows as
data is available
• Not key-based.
• Result ordering is unpredictable.
• Neither serializes nor synchronizes pipelined
computation.
• Useful for efficient collection of data from multiple
partitions and for repartitioning.
• Used most frequently
Blocking on read
Blocking on write
Expanded View:
Global View:
Partition by Key:
Gather:
C 5 B 3 A 4
B 5 E 2 D 6
D 5 F 2 B 6
G 7 A 1 E 6
F 7 A 2 D 4
C 5 A 3 D 6
Sort:
C 5 A 1 A 4
C 5 A 2 D 4
B 5 E 2 B 6
D 5 F 2 E 6
G 7 A 3 D 6
F 7 B 3 D 6
Blocking on read
Blocking on write
Expanded View:
Global View:
Partition by Key:
Gather:
C 5 B 3 A 4
B 5 E 2 D 6
D 5 F 2 B 6
G 7 A 1 E 6
F 7 A 2 D 4
C 5 A 3 D 6
Sort:
C 5 A 1 A 4
C 5 A 2 D 4
B 5 E 2 B 6
D 5 F 2 E 6
G 7 A 3 D 6
F 7 B 3 D 6
files on
Node X
file on Node X
Serial
Parallel
3-way multifile on
file on Node W Node X,Y,Z
file on Node W
file on Node W
Propagate
(default)
Bind layout to that
of another component
Construct layout
manually
Run on these
hosts
Phase 0 Phase 1
Host
GDE Agent Agent
Host
GDE Agent Agent
• Component Execution
• Component processes do their jobs.
• Component processes communicate directly with
datasets and each other to move data around.
Host
GDE Agent Agent
Host
GDE Agent Agent
• Agent Termination
• When all of an Agent’s Component processes exit,
the Agent informs the Host process that those
components are finished.
• The Agent process then exits.
Host
GDE
• Host Termination
• When all Agents have exited, the Host process
informs the GDE that the job is complete.
• The Host process then exits.
Host
GDE
Host
GDE Agent Agent
• Agent Termination
• When every Component process of an Agent have
been killed, the Agent informs the Host process that
those components are finished.
• The Agent process then exits.
Host
GDE
• Host Termination
• When all Agents have exited, the Host
process informs the GDE that the job failed.
• The Host process then exits.
Host
GDE