14 Parallel 3

Parallel Computing in TensorFlow
Shusen Wang
TensorFlow Strategies
• MirroredStrategy
• TPUStrategy
• MultiWorkerMirroredStrategy
• CentralStorageStrategy
• ParameterServerStrategy
• OneDeviceStrategy
Parallel Training CNN on MNIST
Find Devices
In [5]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
Out[5]:
For example, my server has 1 CPU and 4 GPUs:
[name: "/device:CPU:0"
• device_type:
/device:CPU:0"CPU"
• memory_limit:
/device:GPU:0 268435456
• locality {
/device:GPU:1
}
• /device:GPU:2
incarnation: 13474655953084031604, name: "/device:XLA_CP
• device_type:
/device:GPU:3"XLA_CPU"
memory_limit: 17179869184
locality {
number of test samples: 10000
Mirrored Strategy
In [4]:
from tensorflow import distribute
strategy = distribute.MirroredStrategy()
In [5]:
# number of processors
m = strategy.num_replicas_in_sync
print('Number of devices: {}'.format(m))
Number of devices: 4
number of test samples: 10000
In [4]: Mirrored Strategy
In [4]:
In [5]:
In [5]: of processors
# number
# number of processors
print('Number of devices: {}'.format(m))
print('Number
Number of devices:
of devices: 4 {}'.format(m))
Number of devices: 4
Load and Process MNIST Data
In [1]:
import tensorflow as tf
def scale(image, label):

image = tf.cast(image, tf.float32)
image /= 255
return image, label
In [2]:
import tensorflow_datasets as tfds
datasets, info = tfds.load(name='mnist',

with_info=True,
as_supervised=True)
In [1]:
Load
import
In [1]:
and
tensorflow Process
as tf MNIST Data
import tensorflow as tf
image /= 255
return image, label
image /= 255
In [2]:
return image, label
In [2]:
with_info=True,
as_supervised=True)
with_info=True,
mnist_train = datasets['train'].map(scale).cache()
as_supervised=True)
mnist_test = datasets['test'].map(scale)
In [ ]:

In [6]:
BUFFER_SIZE = 10000
BATCH_SIZE_PER_REPLICA = 128
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * m
data_train = mnist_train.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
data_test = mnist_test.batch(BATCH_SIZE)
In [ ]:
In [7]:
from tensorflow import keras
In [ ]:

In [6]:
BUFFER_SIZE = 10000
BATCH_SIZE_PER_REPLICA = 128
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * m
data_train = mnist_train.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
data_test = mnist_test.batch(BATCH_SIZE)
In [ ]:
In [7]:
In [ ]:
Build Neural Network
In [7]:

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
with strategy.scope():
model = keras.Sequential()
model.add(Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D())
model.add(Conv2D(64, 3, activation='relu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))
In [8]:
model.summary()
Build Neural Network
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 1600) 0
_________________________________________________________________
dense (Dense) (None, 64) 102464
_________________________________________________________________
dense_1 (Dense) (None, 10) 650
=================================================================
Total params: 121,930
Trainable params: 121,930
Non-trainable params: 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 650
Compile the Model
=================================================================
Total params: 121,930
Trainable params: 121,930
Non-trainable params: 0
_________________________________________________________________
In [9]:
model.compile(loss='sparse_categorical_crossentropy',
optimizer=keras.optimizers.RMSprop(learning_rate=1E-3),
metrics=['accuracy'])
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 the

INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:loc
Train the Model


In [10]:
model.fit(data_train, epochs=10)
Epoch 1/10
INFO:tensorflow:batch_all_reduce: 8 all-reduces
118/118 [==============================] with algorithm
- 18s 151ms/step = nccl,
- loss: 0.4800num_packs = 1,0.8558
- accuracy: agg_small_gra
Epoch 2/10
INFO:tensorflow:batch_all_reduce: 8 all-reduces
118/118 [==============================] with algorithm
- 1s 9ms/step = nccl,- num_packs
- loss: 0.1229 accuracy: =0.9632
1, agg_small_gra
Epoch 3/10
118/118 [==============================] - 1s 9ms/step - loss: 0.0772 - accuracy: 0.9764
Epoch 4/10
INFO:tensorflow:Reduce
Epoch 5/10 to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:loc
Epoch 6/10
118/118 [==============================]
INFO:tensorflow:Reduce - 1s 9ms/step - loss: 0.0364 - then
to /job:localhost/replica:0/task:0/device:CPU:0 accuracy: 0.9888
broadcast to ('/job:loc
Epoch 7/10
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0
118/118 [==============================] - 1s 9ms/step - loss: 0.0311 - then broadcast
accuracy: to ('/job:loc
0.9905
Epoch 8/10
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0
118/118 [==============================] - 1s 9ms/step - loss: 0.0258 - then broadcast
accuracy: to ('/job:loc
0.9919
Epoch 9/10
Epoch 10/10
INFO:tensorflow:batch_all_reduce: 8 all-reduces with algorithm = nccl, num_packs = 1, agg_small_gra
INFO:tensorflow:batch_all_reduce: 8 all-reduces with algorithm = nccl, num_packs = 1, agg_small_gra
Epoch 8/10
Epoch 9/10 Evaluate the Model on Test Set
Epoch 10/10
Out[10]:
<tensorflow.python.keras.callbacks.History at 0x7fec1e322828>
In [11]:
eval_loss, eval_acc = model.evaluate(data_test)
In [12]:
eval_loss, eval_acc = model.evaluate(data_test)
In [ ]:
Ring All-Reduce
Reduce VS All-Reduce
• Reduce: The server gets the result of reduce (e.g., sum, mean, count.)
Server
7+3+4+1=15
Reduce (sum)
Output: 7 Output: 3 Output: 4 Output: 1
Worker 1 Worker 2 Worker 3 Worker 4

• All-Reduce: Every node gets a copies of the result of reduce.
• E.g., all-reduce via reduce+broadcast.
Server
7+3+4+1=15
7 1
3 4
All-Reduce (sum)
15 15 15 15

• E.g., all-reduce via all-to-all communication.
1 7
1 3
3 3 4
7 4 1
4 7
• E.g., all-reduce via all-to-all communication.
• E.g., ring all-reduce (this lecture.)
Ring All-Reduce
GPU:0
GPU:3 GPU:1
GPU:2
Ring All-Reduce
GPU:0
𝐠"
GPU:3 GPU:1
𝐠% 𝐠#
GPU:2
𝐠$
Ring All-Reduce
GPU:0
𝐠"
GPU:3 GPU:1
𝐠% 𝐠#
Goal: Compute 𝐠 = 𝐠 " + 𝐠# + 𝐠 $ + 𝐠 % and send it to all the GPUs.
GPU:2
𝐠$
Ring All-Reduce (Naïve Approach)
GPU:0 𝐠"
𝐠"
GPU:3 GPU:1
𝐠% 𝐠#
GPU:2
𝐠$
GPU:0 𝐠"
𝐠"
GPU:3 GPU:1
𝐠% 𝐠 " + 𝐠#
GPU:2
𝐠$
GPU:0
𝐠"
GPU:3 GPU:1
𝐠% 𝐠 " + 𝐠#
𝐠 " + 𝐠#
GPU:2
𝐠$
GPU:0
𝐠"
GPU:3 GPU:1
𝐠% 𝐠 " + 𝐠#
𝐠 " + 𝐠#
GPU:2
𝐠 " + 𝐠# + 𝐠 $
GPU:0
𝐠"
GPU:3 GPU:1
𝐠% 𝐠 " + 𝐠#
𝐠 " + 𝐠# + 𝐠 $
GPU:2
𝐠 " + 𝐠# + 𝐠 $
GPU:0
𝐠"
GPU:3 GPU:1
𝐠 " + 𝐠# + 𝐠 $ + 𝐠 % 𝐠 " + 𝐠#
𝐠 " + 𝐠# + 𝐠 $
GPU:2
𝐠 " + 𝐠# + 𝐠 $
GPU:0
𝐠"
GPU:3 GPU:1
𝐠 = 𝐠 " + 𝐠# + 𝐠 $ + 𝐠 % 𝐠 " + 𝐠#
GPU:2
𝐠 " + 𝐠# + 𝐠 $
𝐠 GPU:0
𝐠"
GPU:3 GPU:1
𝐠 𝐠 " + 𝐠#
GPU:2
𝐠 " + 𝐠# + 𝐠 $
𝐠 GPU:0
𝐠
GPU:3 GPU:1
𝐠 𝐠 " + 𝐠#
GPU:2
𝐠 " + 𝐠# + 𝐠 $
GPU:0 𝐠
𝐠
GPU:3 GPU:1
𝐠 𝐠 " + 𝐠#
GPU:2
𝐠 " + 𝐠# + 𝐠 $
GPU:0 𝐠
𝐠
GPU:3 GPU:1
𝐠 𝐠
GPU:2
𝐠 " + 𝐠# + 𝐠 $
GPU:0
𝐠
GPU:3 GPU:1
𝐠 𝐠
𝐠
GPU:2
𝐠 " + 𝐠# + 𝐠 $
GPU:0
𝐠
GPU:3 GPU:1
𝐠 𝐠
𝐠
GPU:2
𝐠
GPU:0
𝐰 ← 𝐰 − 𝛼 ⋅ 𝐠.
GPU:3 GPU:1
𝐰 ← 𝐰 − 𝛼 ⋅ 𝐠. 𝐰 ← 𝐰 − 𝛼 ⋅ 𝐠.
GPU:2
𝐰 ← 𝐰 − 𝛼 ⋅ 𝐠.
What is wrong with the naïve approach?
• Most computer networks are idle.

GPU:0
-.
• Communication time: . (Ignore latency.)
/
• 𝑚: number of GPUs.
• 𝑑: number of parameters.
• 𝑏: network bandwidth. GPU:3 GPU:1
GPU:2
Ring All-Reduce (Efficient Approach)
GPU:0 𝐠 " = 𝐚" ; 𝐛" ; 𝐜" ; 𝐝"
GPU:1 𝐠# = 𝐚# ; 𝐛# ; 𝐜# ; 𝐝#
GPU:2 𝐠 $ = 𝐚$ ; 𝐛$ ; 𝐜$ ; 𝐝$
GPU:3 𝐠 % = 𝐚% ; 𝐛% ; 𝐜% ; 𝐝%
GPU:0 𝐚" 𝐛" 𝐜" 𝐝"
GPU:1 𝐚# 𝐛# 𝐜# 𝐝#
GPU:2 𝐚$ 𝐛$ 𝐜$ 𝐝$
GPU:3 𝐚% 𝐛% 𝐜% 𝐝%
GPU:0 𝐚" 𝐛" 𝐜" 𝐝"
GPU:1 𝐚# 𝐛# 𝐜# 𝐝#
GPU:2 𝐚$ 𝐛$ 𝐜$ 𝐝$
GPU:3 𝐚% 𝐛% 𝐜% 𝐝%
GPU:0 𝐚" 𝐛" 𝐜" 𝐝" 𝐝%
GPU:1 𝐚" 𝐚# 𝐛# 𝐜# 𝐝#
GPU:2 𝐚$ 𝐛# 𝐛$ 𝐜$ 𝐝$
GPU:3 𝐚% 𝐛% 𝐜$ 𝐜% 𝐝%
GPU:0 𝐚" 𝐛" 𝐜" 𝐝" 𝐝%
GPU:1 𝐚" 𝐚# 𝐛# 𝐜# 𝐝#
GPU:2 𝐚$ 𝐛# 𝐛$ 𝐜$ 𝐝$
GPU:3 𝐚% 𝐛% 𝐜$ 𝐜% 𝐝%
GPU:0 𝐚" 𝐛" 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%
GPU:1 𝐚" 𝐚# 𝐛# 𝐜# 𝐝" 𝐝# 𝐝%
GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜$ 𝐝$
GPU:3 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜$ 𝐜% 𝐝%
GPU:0 𝐚" 𝐛" 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%
GPU:1 𝐚" 𝐚# 𝐛# 𝐜# 𝐝" 𝐝# 𝐝%
GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜$ 𝐝$
GPU:3 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜$ 𝐜% 𝐝%
GPU:0 𝐚" 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%
GPU:1 𝐚" 𝐚# 𝐛# 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%
GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜$ 𝐝" 𝐝# 𝐝$ 𝐝%
GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜$ 𝐜% 𝐝%
GPU:0 𝐚" 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%
GPU:1 𝐚" 𝐚# 𝐛# 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%
GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜$ 𝐝" 𝐝# 𝐝$ 𝐝%
GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜$ 𝐜% 𝐝%
GPU:0 𝐚" 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%
GPU:1 𝐚" 𝐚# 𝐛# 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%
GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜$ 𝐝" 𝐝# 𝐝$ 𝐝%
GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜$ 𝐜% 𝐝%
GPU:0 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%
GPU:1 𝐚" 𝐚# 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%
GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%
GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:0 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%
GPU:1 𝐚" 𝐚# 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%
GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%
GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:0 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%
GPU:1 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%
GPU:2 𝐚" 𝐚# 𝐚$ 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%
GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:0 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%
GPU:1 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%
GPU:2 𝐚" 𝐚# 𝐚$ 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%
GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:0 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

Comparisons
Naïve Algorithm Efficient Algorithm
• Most computer networks are idle. • No idle computer network.

-. .
• Communication time: . • Communication time: .
/ /
• 𝑚: number of GPUs. • 𝑑: number of parameters.
• 𝑑: number of parameters. • 𝑏: network bandwidth.
• 𝑏: network bandwidth.
Thank you!

14 Parallel 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

14 Parallel 3

Uploaded by

Copyright:

Available Formats

Parallel Computing in TensorFlow

print('Number of devices: {}'.format(m))

def scale(image, label):

import tensorflow_datasets as tfds

datasets, info = tfds.load(name='mnist',

Load and Process MNIST Data

Load and Process MNIST Data

from tensorflow import keras

INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 the

INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 the

INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 the

INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 the

Train the Model

INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:loc

eval_loss, eval_acc = model.evaluate(data_test)

20/20 [==============================] - 2s 92ms/step - loss: 0.0255 - accuracy: 0.9896

20/20 [==============================] - 2s 92ms/step - loss: 0.0255 - accuracy: 0.9896

Output: 7 Output: 3 Output: 4 Output: 1

Worker 1 Worker 2 Worker 3 Worker 4

Worker 1 Worker 2 Worker 3 Worker 4

Goal: Compute 𝐠 = 𝐠 " + 𝐠# + 𝐠 $ + 𝐠 % and send it to all the GPUs.

• Most computer networks are idle.

GPU:0 𝐠 " = 𝐚" ; 𝐛" ; 𝐜" ; 𝐝"

GPU:0 𝐚" 𝐛" 𝐜" 𝐝"

GPU:0 𝐚" 𝐛" 𝐜" 𝐝"

GPU:0 𝐚" 𝐛" 𝐜" 𝐝" 𝐝%

GPU:0 𝐚" 𝐛" 𝐜" 𝐝" 𝐝%

GPU:0 𝐚" 𝐛" 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%

GPU:1 𝐚" 𝐚# 𝐛# 𝐜# 𝐝" 𝐝# 𝐝%

GPU:0 𝐚" 𝐛" 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%

GPU:1 𝐚" 𝐚# 𝐛# 𝐜# 𝐝" 𝐝# 𝐝%

GPU:0 𝐚" 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%

GPU:1 𝐚" 𝐚# 𝐛# 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%

GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜$ 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:0 𝐚" 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%

GPU:1 𝐚" 𝐚# 𝐛# 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%

GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜$ 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:0 𝐚" 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%

GPU:1 𝐚" 𝐚# 𝐛# 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%

GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜$ 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:0 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%

GPU:1 𝐚" 𝐚# 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%

GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:0 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝%

GPU:1 𝐚" 𝐚# 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%

GPU:2 𝐚" 𝐚# 𝐚$ 𝐛# 𝐛$ 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:0 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:1 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%

GPU:2 𝐚" 𝐚# 𝐚$ 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:0 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:1 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝%

GPU:2 𝐚" 𝐚# 𝐚$ 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:0 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:1 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:2 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

GPU:3 𝐚" 𝐚# 𝐚$ 𝐚% 𝐛" 𝐛# 𝐛$ 𝐛% 𝐜" 𝐜# 𝐜$ 𝐜% 𝐝" 𝐝# 𝐝$ 𝐝%

Naïve Algorithm Efficient Algorithm

• Most computer networks are idle. • No idle computer network.

You might also like