Ntust Kuberun and DL

Dev.
Environment Setup
&
Deep Learning: A Concise Intro.
HonghuTech 翁啟閎
@NTUST
2022/12/15
Docker KubeRun
Introduction Introduction
User Interface
Docker Command Lines
for Admin. & for User
O cial Docker Images for Deep Customize Your Own Image

Learning (TensorFlow, PyTorch, etc) • We aim to let you
Initialize a Jupyter Dev. know how to
Build docker images by yourself Environment quickly deploy
environments for
Submit GPU Jobs
development
Deep Learning
Convolutional Neural Network
some Basic Concepts • We aim to give you a good
starting point for AI model
construction / training
Build a ResNet from Scratch
• Theoretical details or recent
developments are not
Distributed Training Principle covered here
and a Demo
ffi
THE ECOSYSTEM OF NVIDIA…
Frameworks
Libraries
NCCL cuDNN
cuBLAS
CUDA
NVIDIA GPUs
Docker vs. Virtual Machine
• Developed by Docker, Inc. (2013 until now)

• Aim at fast deployment of Apps
Virtual Machine:
Resources are pre-allocated
Virtual Machine vs. Docker
https://www.hitechnectar.com/blogs/hypervisor-vs-docker/
Docker image: a pile of layers
https://ragin.medium.com/docker-what-it-is-how-images-are-structured-docker-vs-vm-and-some-tips-part-1-d9686303590f?source=post_internal_links---------1----------------------------
Docker and Deep Learning?
• Easy to deploy a variety of DL environments (TensorFlow, PyTorch

O cial images available)
• Can deploy the same environment everywhere
• Version control (CUDA, cuDNN, NCCL, DL frameworks).
• To ensure you co-work with your teammates using the same version of
DL environment.
ffi
NVIDIA Container Toolkit
• NVIDIA container runtime: expose GPU drivers

installed on the host side to the running container.
https://github.com/NVIDIA/nvidia-docker
Common usage
docker images #看共有哪些Docker images
docker search [possible_image_name] #找image
docker pull [image_name] #抓image
nvidia-docker run [options image_name] #將⼀個image啟動(變成Container)
docker stop [container_name/id] #停⽌Container

docker start [container_name/id]
#啟動Container
docker rm [container_name_or_id] #移除Container

docker rmi [image_name] #刪image
docker ps #看⽬前啟動了哪些process (看有哪些Container正在啟動)
docker inspect [container_name] #檢查Container

docker logs [container_name] #看執⾏紀錄
docker exec -it [container_name_or_id] /bin/bash

#進入該Container的終端機
docker build [dockerfile_path] .
根據Docker le建立新image
fi
Fetch an image from Docker Hub via:
docker pull maintainer/repository:tag
Share your App on Docker Hub
https://hub.docker.com/
Run containers via docker / nvidia-docker run
Google docker run -it --rm --gpus=0,1 -it --rm -p 8888:8888

--name tf -v $HOME:/workspace --shm-size 32gb nvcr.io/
TensorFlow
nvidia/tensorflow:22.10.1-tf2-py3 bash
NVIDIA docker run -d --gpus=all -p 5000:5000 —name digits

Digits -v $HOME:/workspace nvidia/digits
-it : interactive tty

--rm: remove
-v: volume
--name: name
-d: daemon
-p: port

docker exec [command]: execute some

command inside the running container
Execute the BASH of a running container
docker exec -it tf /bin/bash

Build your own image: nvidia-docker build
mkdir build_tf # create a folder

echo '# Add `seaborn, polars` to the Google TensorFlow image
FROM tensorflow/tensorflow:2.10.1-gpu-jupyter
MAINTAINER Chi-Hung Weng <chihung@honghutech.com>
RUN apt update && apt install -y vim
RUN pip3 install seaborn polars
' > build_tf/Dockerfile Creat the `Docker le`
# put the instruction file (Dockerfile) into the created folder
# Enter the folder and Build the image

cd build_tf && docker build -t="me/my_tf" . Build the image based on the created `Docker le`
# Run the image

docker run -it --rm --gpus=0,1 -it --rm -p 8888:8888 --name my_tf -v $HOME:/workspace --shm-size 32gb me/my_tf bash
Run the created image

fi

fi

Deep-learning images provided by

NVIDIA GPU Cloud (NGC)
https://ngc.nvidia.com/
• Maintained by NVIDIA; updated approximately monthly.
• You must agree the NVIDIA GPU CLOUD TERMS OF USE…

Deep-learning images provided by DockerHub
https://hub.docker.com/r/tensor ow/tensor ow/
• Maintained by the TensorFlow Community

fl
fl
Kubernetes (K8s)
• Released by Google; now hosted by the Cloud Native

Computing Foundation.
• Can deploy, scale, manage containers across cluster nodes
• If you want to get started, visit here.

K8s for Deep Learning? Why?
• Scalable.
• GPU jobs can be scheduled!
• We have released KubeRun, a web UI for job scheduling.

KubeRun
Theory of ML
A Typical Deep Learning procedure
1. Choose a model that corresponds to your needs

ResNet? Transformer? Etc? Check paperswithcode.com
2. De ne the Loss Function
Linear regression：Least Square Loss
Binary classi cation：Cross Entropy Loss
Multi-class classi cation：Cross Entropy Loss
3. Loss Optimization
Use something such as
Stochastic Gradient Descent for loss minimization
4. Evaluate Model Performance
Accuracy, Precision, Recall, F1, AP, AUC, …
5. If you observe Over tting
Say, consider adding L1 or L2 penalty
23 有著作權，侵害必究
fi
fi
fi
fi
Computer Vision
The Convolutional Layer
(5 + 2*1 -3 )/1 + 1 = 5 (5 + 2*1 -3 )/2 + 1 = 3
w’=5 w’=3
W=5, P=1, S=1, F=3 W=5, P=1, S=2, F=3

(5 + 2*1 -3 )/2 + 1 = 3
Width, Height (W, H)

CS231n, Andrej Karpathy & Fei-Fei Li Padding (P) 左右各補幾個零
Stride (S) 滑動間隔
Filter size (F) 濾鏡⼤⼩
101 100
Depth (D) 這個Conv層有多少濾鏡
010 010 W + 2P
0 F
101 001 W = +1
S
Computer Vision
The Convolutional Layer
CS231n, Andrej Karpathy & Fei-Fei Li:
Typical-looking lters on the rst CONV layer of a trained AlexNet
• ⽤同⼀組3 x 3 lter掃視整張圖：
表⽰我們假設圖像裡的每個3 x 3
的⼩區塊是有共通性。
• 好處：降低資料量，讓網路容易訓
練。
fi
fi
fi
Computer Vision
The Max Pooling Layer

W =4
H=4
Width (W)
Height (H)
Padding (P)
Stride (SW, SH)
Filter size (FW, FH)
0 W + 2P F
W = +1
W=4 SW
CS231n, Andrej Karpathy & Fei-Fei Li
H=4
W + 2P F
P=0 0
H = +1
SW=SH=2 SH
FW=FH=2
Computer Vision
Test: Convolutional Layer I/O

Width (W)
Height (H)
Width=5 Padding (P)
Height=5 Stride (SW, SH)
Filter size=3X3 Filter size (FW, FH)
Stride=1X1 Depth (D)
Padding= 0
Depth=96 0 W + 2P F
W = +1
SW
輸入：10X5X5X3
0 H + 2P F
輸入： #samples X Height X Width X Channel 輸出：10X3X3X96 H = +1
SH
輸出： #samples X Height X Width X Depth
randData=np.random.normal(0,1,(10,5,5,3)) # normal分佈的亂數資料當input, 10個3D樣本
model = Sequential()
model.add(Conv2D(filters=96, kernel_size=(3, 3),
strides=(1,1),
padding='valid',
input_shape=(5,5,3)
)
)
print( model.predict(randData).shape ) 27 # 看輸出資料的形狀

Computer Vision
Test: Max Pooling Layer I/O

Width (W)
Width=4 Height (H)
Height=4
Padding (P)
Pool size=2X2
Stride (SW, SH)
Stride=2X2
Padding= 0
Pool size (FW, FH)
Depth=3
0 W + 2P F
W = +1
SW
輸入： #samples X Width X Height X Depth 輸入：10X4X4X3
輸出： #samples X Width X Height X Depth 0 H + 2P F
輸出：10X2X2X3 H = +1
SH
randData=np.random.normal(0,1,(10,4,4,3)) # normal分佈的亂數資料當input, 10個3D樣本

model = Sequential()
model.add(MaxPooling2D(pool_size=(2, 2),
strides=(2,2),
padding='valid',
input_shape=(4,4,3)
)
)
print( model.predict(randData).shape ) # 看輸出資料的形狀
28

Computer Vision
Concept: Receptive Field

濾鏡能夠感受到多少視野範圍？
Image
Feature map 1
Feature map 2
3 X 3 Conv 2 X 2 Conv
(stride=1) (stride=1)
Computer Vision

Image
Feature map 1
Feature map 2
此濾鏡能感受到
多少原圖視野？
Computer Vision

Image
Feature map 1
Feature map 2
此濾鏡能感受到
多少原圖視野？
Computer Vision
WHAT HAS THE CONVOLUTIONAL LAYERS LEARNT?
第⼆層 lter資訊(已還原⾄圖像空間)
第⼀層 lter樣貌
第四層 lter資訊(已還原⾄圖像空間) Zeiler and Fergus 2013
越後⾯的層能呈現越複雜的圖像 / 機器能夠學會保留對分類有⽤的特徵
fi
fi
fi
AlexNet
Computer Vision
Alex Krizhevsky, Ilya Sutskever, Geo rey E. Hinton (2012)
from a slide made by Bickson
ff
Computer Vision
The Deeper, the Better?
https://medium.com/@Lidinwise/the-revolution-of-depth-facf174924f5
VGGNet
Computer Vision
size: 224 3x3 Conv, 64
3x3 Conv, 64
max pool Karen Simonyan, Andrew Zisserman (2014)
size: 112 3x3 Conv, 128 http://www.robots.ox.ac.uk/~vgg/
3x3 Conv, 128
max pool
3x3 Conv, 256
3x3 Conv, 256
max pool
3x3 Conv, 512
3x3 Conv, 512
max pool
3x3 Conv, 512
3x3 Conv, 512

max pool
size: 7 Dense, 4096
Dense, 4096
Dense, 1000
VGGNet
Computer Vision
3x3 Conv, 64
max pool Karen Simonyan, Andrew Zisserman (2014)
size: 112 3x3 Conv, 128
http://www.robots.ox.ac.uk/~vgg/
3x3 Conv, 128
max pool
3x3 Conv, 256
3x3 Conv, 256
max pool
3x3 Conv, 512
3x3 Conv, 512
max pool
3x3 Conv, 512
3x3 Conv, 512

max pool
size: 7 Dense, 4096
Dense, 4096
Dense, 1000
Computer Vision
ResNet
Kaiming He et al (2015)
Before ResNet, we have an issue:
ResNet v1: Deep Residual Learning for Image Recognition (https://arxiv.org/abs/1512.03385)
Computer Vision
ResNet
Computer Vision
ResNet
F(x) x
Conv 3x3
Conv 3x3
Computer Vision
ResNet
F(x) x
Conv 3x3
Conv 3x3
• Allows Incremental learning
Residual block
有著作權，侵害必究
DEMO
DISTRIBUTED TRAINING
L = l(1) + l(2) + l(3) + l(4)
GPU
L = l(1) + l(2) +
1. 正向傳遞, 得到所有樣本誤差: l(3) + l(4)
Model
X(1), X(2), X(3), X(4)
43
單卡模型訓練
L = l(1) + l(2) + l(3) + l(4)
GPU
2. 整體誤差倒傳遞, 讓權重w得到梯度
Model @L
gw =
@w
@l(1) @l(2) @l(3) @l(4)
= + + +
X(1), X(2), X(3), X(4)
@w @w @w @w
3. 利⽤w的梯度更新w
w := w ⌘gw
44
多卡模型訓練
l(1), l(2) l(3), l(4)
GPU1 GPU2 1. 各GPU做正向傳遞, 得

到各⾃的樣本誤差
Model Model
X(1), X(2) X(1), X(2)
CPU1
X(1), X(2), X(3), X(4)
45
多卡模型訓練
GPU1 GPU2
@l(1) @l(2) @l(3) @l(4)

Model + Model +
@w @w @w @w
2. 各GPU做倒傳遞, 得到
各⾃的梯度
46
多卡模型訓練
GPU1 GPU2
@l(1) @l(2) @l(3) @l(4) @l(1) @l(2) @l(3) @l(4)

Model + + + Model @w
+
@w
+
@w
+
@w
@w @w @w @w
3. 利⽤allreduce, 使得每張
卡都可以得到完整的w梯度
4. 利⽤w的梯度更新w
w := w ⌘gw
47
Quick DEMO
• https://github.com/horovod/horovod
• https://github.com/horovod/horovod/blob/master/examples/tensor ow2/tensor ow2_keras_mnist.py
48
fl
fl

Ntust Kuberun and DL

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ntust Kuberun and DL

Uploaded by

Copyright:

Available Formats

Dev.

O cial Docker Images for Deep Customize Your Own Image

• Developed by Docker, Inc. (2013 until now)

Virtual Machine vs. Docker

• Easy to deploy a variety of DL environments (TensorFlow, PyTorch

• NVIDIA container runtime: expose GPU drivers

docker stop [container_name/id] #停⽌Container

docker rm [container_name_or_id] #移除Container

docker ps #看⽬前啟動了哪些process (看有哪些Container正在啟動)

docker inspect [container_name] #檢查Container

docker exec -it [container_name_or_id] /bin/bash

Share your App on Docker Hub

Google docker run -it --rm --gpus=0,1 -it --rm -p 8888:8888

NVIDIA docker run -d --gpus=all -p 5000:5000 —name digits

-it : interactive tty

docker exec [command]: execute some

Execute the BASH of a running container

docker exec -it tf /bin/bash

mkdir build_tf # create a folder

# Enter the folder and Build the image

# Run the image

Run the created image

Deep-learning images provided by

• Maintained by NVIDIA; updated approximately monthly.

• You must agree the NVIDIA GPU CLOUD TERMS OF USE…

Deep-learning images provided by DockerHub

https://hub.docker.com/r/tensor ow/tensor ow/

• Maintained by the TensorFlow Community

• Released by Google; now hosted by the Cloud Native

• Can deploy, scale, manage containers across cluster nodes

• If you want to get started, visit here.

• GPU jobs can be scheduled!

• We have released KubeRun, a web UI for job scheduling.

1. Choose a model that corresponds to your needs

The Convolutional Layer

(5 + 2*1 -3 )/1 + 1 = 5 (5 + 2*1 -3 )/2 + 1 = 3

W=5, P=1, S=1, F=3 W=5, P=1, S=2, F=3

Width, Height (W, H)

The Convolutional Layer

CS231n, Andrej Karpathy & Fei-Fei Li:

Typical-looking lters on the rst CONV layer of a trained AlexNet

The Max Pooling Layer

Test: Convolutional Layer I/O

randData=np.random.normal(0,1,(10,5,5,3)) # normal分佈的亂數資料當input, 10個3D樣本

Test: Max Pooling Layer I/O

randData=np.random.normal(0,1,(10,4,4,3)) # normal分佈的亂數資料當input, 10個3D樣本

Concept: Receptive Field

Concept: Receptive Field

Concept: Receptive Field

WHAT HAS THE CONVOLUTIONAL LAYERS LEARNT?

第四層 lter資訊(已還原⾄圖像空間) Zeiler and Fergus 2013

Alex Krizhevsky, Ilya Sutskever, Geo rey E. Hinton (2012)

from a slide made by Bickson

The Deeper, the Better?

3x3 Conv, 256

3x3 Conv, 256

3x3 Conv, 512

3x3 Conv, 512

3x3 Conv, 512

3x3 Conv, 512

3x3 Conv, 256

3x3 Conv, 256

3x3 Conv, 512

(5 + 21 -3 )/1 + 1 = 5 (5 + 21 -3 )/2 + 1 = 3