Professional Documents
Culture Documents
JupyterLab 03 Optimizations
JupyterLab 03 Optimizations
3. Optimizations
Currently, the model is experiencing the checkerboard problem.
Thankfully, we have a few tricks up our generated T-shirt sleeve to resolve this
and generally improve the performance of the model.
Learning Objectives
The goals of this notebook are to:
# Visualization tools
import matplotlib.pyplot as plt
from torchview import draw_graph
import graphviz
from IPython.display import Image
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 1/16
28/04/2024 10:14 03_Optimizations
IMG_SIZE = 16
IMG_CH = 1
BATCH_SIZE = 128
data, dataloader = other_utils.load_transformed_fashionMNIST(IMG_SIZE, BA
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/tra
in-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/tra
in-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/train-images-idx3-ubyt
e.gz
100%|██████████| 26421880/26421880 [00:01<00:00, 13882102.94it/s]
Extracting ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/Fa
shionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/tra
in-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/tra
in-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/train-labels-idx1-ubyt
e.gz
100%|██████████| 29515/29515 [00:00<00:00, 329948.99it/s]
Extracting ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/Fa
shionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10
k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10
k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.g
z
100%|██████████| 4422102/4422102 [00:00<00:00, 6062736.92it/s]
Extracting ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/Fas
hionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10
k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10
k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.g
z
100%|██████████| 5148/5148 [00:00<00:00, 12205922.55it/s]
Extracting ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/Fas
hionMNIST/raw
In [2]: nrows = 10
ncols = 15
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 2/16
28/04/2024 10:14 03_Optimizations
T = nrows * ncols
B_start = 0.0001
B_end = 0.02
B = torch.linspace(B_start, B_end, T).to(device)
ddpm = ddpm_utils.DDPM(B, device)
Considering color images have multiple color channels, this can have an
interesting impact on the output colors of generated images. Try experimenting
to see the effect!
Learn more about normalization techniques in this blog post by Aakash Bindal.
3.1.2 GELU
ReLU is a popular choice for an activation function because it is computationally
quick and easy to calculate the gradient for. Unfortunately, it isn't perfect. When
the bias term becomes largely negative, a ReLU neuron "dies" because both its
output and gradient are zero.
At a slight cost in computational power, GELU seeks to rectify the rectified linear
unit by mimicking the shape of the ReLU function while avoiding a zero gradient.
In this small example with FashionMNIST, it is unlikely we will see any dead
neurons. However, the larger a model gets, the more likely it can face the dying
ReLU phenomenon.
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 3/16
28/04/2024 10:14 03_Optimizations
super().__init__()
layers = [
nn.Conv2d(in_ch, out_ch, 3, 1, 1),
nn.GroupNorm(group_size, out_ch),
nn.GELU()
]
self.model = nn.Sequential(*layers)
Enter the einops library and the Rearrange layer. We can assign each layer a
variable and use that to rearrange our values. Additionally, we can use
parentheses () to identify a set of variables that are multiplied together.
We also have a p1 and p2 value that are both equal to 2 . The left portion of
the equation before the arrow is saying "split the height and width dimensions
in half. The right portion of the equation after the arrow is saying "stack the split
dimensions along the channel dimension".
The code block below sets up a test_image to practice on. Try swapping h
with p1 on the left side of the arrow. What happens? How about when w and
p2 are swapped? What happens when p1 is set to 3 instead of 2 ?
test_image = [
[
[
[1, 2, 3, 4, 5, 6],
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 4/16
28/04/2024 10:14 03_Optimizations
tensor([[[[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24],
[25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36]]]])
Out[4]: tensor([[[[ 1, 3, 5],
[13, 15, 17],
[25, 27, 29]],
[[ 2, 4, 6],
[14, 16, 18],
[26, 28, 30]],
[[ 7, 9, 11],
[19, 21, 23],
[31, 33, 35]],
[[ 8, 10, 12],
[20, 22, 24],
[32, 34, 36]]]])
Next, we can pass this through our GELUConvBlock to let the neural network
decide how it wants to weigh the values within our "pool". Notice the
4*in_chs as a parameter of the GELUConvBlock ? This is because the
channel dimension is now p1 * p2 larger.
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 5/16
28/04/2024 10:14 03_Optimizations
TODO: There's an input to the UpBlock that makes separates it from the
DownBlock . What was it again?
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 6/16
28/04/2024 10:14 03_Optimizations
Before diffusion models, this was a problem that plagued natural language
processing. For long dialogues, how can we capture where we are? The goal was
to find a way to uniquely represent a large range of discrete numbers with a
small number of continuous numbers. Using a single float is ineffective since the
neural network will interpret timesteps as continuous rather than discrete.
Researchers ultimately settled on a sum of sines and cosines.
For an excellent explanation for why this works and how this technique was likely
developed, please refer to Jonathan Kernes' Master Positional Encoding.
class SinusoidalPositionEmbedBlock(nn.Module):
def __init__(self, dim):
super().__init__()
self.dim = dim
It looks like the one below has been overrun with FIXME s. Can you remember
how it was supposed to look?
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 7/16
28/04/2024 10:14 03_Optimizations
nn.Linear(emb_dim, FIXME),
nn.Unflatten(1, (FIXME, 1, 1))
]
self.model = nn.Sequential(*layers)
Below is the updated model. Notice the change at the very last line? Another
skip connection has been added from the output of our ResidualConvBlock
to the final self.out block. This connection is surprisingly powerful, and of all
the changes listed above, had the biggest influence on the checkerboard
problem for this dataset.
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 8/16
28/04/2024 10:14 03_Optimizations
# Inital convolution
self.down0 = ResidualConvBlock(img_chs, down_chs[0], small_group_
# Downsample
self.down1 = DownBlock(down_chs[0], down_chs[1], big_group_size)
self.down2 = DownBlock(down_chs[1], down_chs[2], big_group_size)
self.to_vec = nn.Sequential(nn.Flatten(), nn.GELU())
# Embeddings
self.dense_emb = nn.Sequential(
nn.Linear(down_chs[2]*latent_image_size**2, down_chs[1]),
nn.ReLU(),
nn.Linear(down_chs[1], down_chs[1]),
nn.ReLU(),
nn.Linear(down_chs[1], down_chs[2]*latent_image_size**2),
nn.ReLU()
)
# Upsample
self.up0 = nn.Sequential(
nn.Unflatten(1, (up_chs[0], latent_image_size, latent_image_s
GELUConvBlock(up_chs[0], up_chs[0], big_group_size) # New
)
self.up1 = UpBlock(up_chs[0], up_chs[1], big_group_size) # New
self.up2 = UpBlock(up_chs[1], up_chs[2], big_group_size) # New
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 9/16
28/04/2024 10:14 03_Optimizations
self.out = nn.Sequential(
nn.Conv2d(2 * up_chs[-1], up_chs[-1], 3, 1, 1),
nn.GroupNorm(small_group_size, up_chs[-1]), # New
nn.ReLU(),
nn.Conv2d(up_chs[-1], img_chs, 3, 1, 1)
)
latent_vec = self.dense_emb(latent_vec)
t = t.float() / T # Convert from [0, T] to [0, 1]
t = self.sinusoidaltime(t) # New
temb_1 = self.temb_1(t)
temb_2 = self.temb_2(t)
up0 = self.up0(latent_vec)
up1 = self.up1(up0+temb_1, down2)
up2 = self.up2(up1+temb_2, down1)
return self.out(torch.cat((up2, down0), 1)) # New
# Inital convolution
self.down0 = ResidualConvBlock(img_chs, down_chs[0], small_group_
# Downsample
self.down1 = DownBlock(down_chs[0], down_chs[1], big_group_size)
self.down2 = DownBlock(down_chs[1], down_chs[2], big_group_size)
self.to_vec = nn.Sequential(nn.Flatten(), nn.GELU())
# Embeddings
self.dense_emb = nn.Sequential(
nn.Linear(down_chs[2]*latent_image_size**2, down_chs[1]),
nn.ReLU(),
nn.Linear(down_chs[1], down_chs[1]),
nn.ReLU(),
nn.Linear(down_chs[1], down_chs[2]*latent_image_size**2),
nn.ReLU()
)
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 10/16
28/04/2024 10:14 03_Optimizations
# Upsample
self.up0 = nn.Sequential(
nn.Unflatten(1, (up_chs[0], latent_image_size, latent_image_s
GELUConvBlock(up_chs[0], up_chs[0], big_group_size) # New
)
self.up1 = UpBlock(up_chs[0], up_chs[1], big_group_size) # New
self.up2 = UpBlock(up_chs[1], up_chs[2], big_group_size) # New
latent_vec = self.dense_emb(latent_vec)
t = t.float() / T # Convert from [0, T] to [0, 1]
t = self.sinusoidaltime(t) # New
temb_1 = self.temb_1(t)
temb_2 = self.temb_2(t)
up0 = self.up0(latent_vec)
up1 = self.up1(up0+temb_1, down2)
up2 = self.up2(up1+temb_2, down1)
return self.out(torch.cat((up2, down0), 1)) # New
Finally, it's time to train the model. Let's see if all these changes made a
difference.
model.train()
for epoch in range(epochs):
for step, batch in enumerate(dataloader):
optimizer.zero_grad()
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 11/16
28/04/2024 10:14 03_Optimizations
x = batch[0].to(device)
loss = ddpm.get_loss(model, x, t)
loss.backward()
optimizer.step()
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 12/16
28/04/2024 10:14 03_Optimizations
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 13/16
28/04/2024 10:14 03_Optimizations
How about a closer look? Can you recognize a shoe, a purse, or a shirt?
In [15]: model.eval()
plt.figure(figsize=(8,8))
ncols = 3 # Should evenly divide T
for _ in range(10):
ddpm.sample_images(model, IMG_CH, IMG_SIZE, ncols)
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 14/16
28/04/2024 10:14 03_Optimizations
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 15/16
28/04/2024 10:14 03_Optimizations
3.5 Next
If you don't see a particular class such as a shoe or a shirt, try running the above
code block again. Currently, our model does not accept category input, so the
user can't define what kind of output they would like. Where's the fun in that?
In the next notebook, we will finally add a way for users to control the model!
In [ ]: import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)
dli-69a8471a1f06-53ed30.aws.labs.courses.nvidia.com/lab/lab 16/16