You are on page 1of 60

Assignment No 2

SUBMITTED BY: ALEEZA ANJUM


SUBMITTED TO: DR. JAVED IQBAL
REG NO: 20-CS-101
SECTION: ALPHA
ADVANCED TOPICS IN COMPUTER SCIENCES

10/10/2023

DEPARTMENT OF COMPUTER SCIENCE


UNIVERSITY OF ENGINEERING & TECHNOLOGY TAXILA
Question: You have to provide critical comparison of the following. The comparison should
be based on their Architectures from their base research papers, Parameters details and
Pros/Cons of using these models.
1. AlexNet
2. GoogLeNet
3. VGG 16/19
4. ResNet 18/50/101/152
5. DesnseNet
6. SqeezNet
Provide also perform implementation of these deep learning models in Python on benchmark
datasets.

Ans:
Comparison of AlexNet and GoogleNet:
AlexNet, introduced by Krizhevsky et al. in GoogLeNet, introduced in the ILSVRC 2014
the paper "ImageNet Classification with Deep competition, represents a significant
Convolutional Neural Networks," represents a advancement in neural network architecture,
significant milestone in the development of particularly with the use of the Inception
convolutional neural networks (CNNs) for architecture. The key aspects of GoogLeNet
image classification tasks. Here are some key include:
points about AlexNet based on the Inception Architecture:
information provided in the paper: Variants: GoogLeNet utilized different
Objective: The primary goal of AlexNet versions of the Inception architecture, and one
was to participate in the ImageNet Large- deeper and wider Inception network, although
Scale Visual Recognition Challenge the latter had only marginal improvements.
(ILSVRC) and achieve improved Activation Function: All convolutions,
performance on large-scale image including those inside Inception modules, use
classification tasks. rectified linear activation.
Network Structure:
Dataset: The model was trained on the Receptive Field: The network's receptive
ImageNet dataset, which contains over 15 field is 224x224 in RGB color space with
million labeled high-resolution images zero mean.
belonging to approximately 22,000 Reduction Layers: The network incorporates
categories. The dataset was used in the "reduce" layers before 3x3 and 5x5
ILSVRC competitions. convolutions, using 1x1 filters.
Efficiency: The network is designed for
computational efficiency and practicality,
ensuring it can run on devices with limited
Architecture: computational resources and low-memory
footprint.
AlexNet consists of a large, deep Architecture Details:
convolutional neural network with a total of Depth: The network is 22 layers deep (or 27
eight layers. These layers include five layers counting pooling).
convolutional layers and three fully-
connected layers. Auxiliary Classifiers: To address the
The convolutional layers are followed by vanishing gradient problem, auxiliary
max-pooling layers, and the final layers classifiers are added to intermediate layers.
include a 1000-way softmax for classification. These classifiers take the form of smaller
The first convolutional layer filters the convolutional networks connected to the
224x224x3 input image with 96 kernels of output of specific Inception modules.
size 11x11x3. Training: Asynchronous stochastic gradient
Rectified Linear Units (ReLUs) were used as descent with 0.9 momentum was used for
the activation function in the network. training. The learning rate schedule decreased
Local Response Normalization (LRN) was the learning rate by 4% every 8 epochs.
applied after the first and second Training Methodology:
convolutional layers. DistBelief: GoogLeNet was trained using the
Overlapping pooling was employed in the DistBelief distributed machine learning
pooling layers. system.
Dropout was applied in the first two fully- Training Time: A rough estimate suggests
connected layers to prevent overfitting. training the network to convergence using
high-end GPUs within a week, with memory
Training Techniques: usage being a limiting factor.
Stochastic Gradient Descent (SGD) was used Image Sampling: Various image-patch
for training with a batch size of 128 examples, sampling methods were employed during
momentum of 0.9, and weight decay of training. Photometric distortions were used to
0.0005. combat overfitting.
Data augmentation was applied to artificially
enlarge the dataset, involving image Classification Challenge Results:
translations and horizontal reflections. ILSVRC 2014: GoogLeNet achieved a top-5
Intensity alterations of RGB channels in error rate of 6.67%, ranking first in the
training images were performed using PCA ILSVRC 2014 classification challenge.
on the set of RGB pixel values. Ensemble Prediction: Seven versions of the
same GoogLeNet model were independently
Results: trained, and ensemble prediction was
AlexNet achieved top-1 and top-5 error rates performed, contributing to improved
of 37.5% and 17.0% on the ILSVRC-2010 performance.
test set, outperforming the previous state-of- Detection Challenge Results:
the-art methods.
In the ILSVRC-2012 competition, AlexNet ILSVRC 2014 Detection Challenge:
achieved a winning top-5 test error rate of GoogLeNet also participated in the detection
15.3%, compared to 26.2% achieved by the task, achieving a mean average precision
second-best entry. (mAP) of 43.9% without using bounding box
Impact: The success of AlexNet demonstrated regression.
the effectiveness of deep convolutional neural Ensemble for Detection: An ensemble of 6
networks for image classification tasks and GoogLeNets was used for classification in the
paved the way for subsequent advancements detection task.
in deep learning, particularly in computer In summary, GoogLeNet's success lies in its
vision. innovative Inception architecture, efficient
design, and its strong performance in both
Pros and Cons: image classification and object detection tasks
In summary, AlexNet introduced several key during the ILSVRC 2014 competition.
architectural and training innovations that
significantly improved the accuracy of image
classification models, setting a new standard
in the field.

Comparison between VGG 16/19 and Resnet 18/15/101/152:

VGG, developed by the Visual Geometry Abstract:


Group at the University of Oxford, is a The paper introduces a residual learning
significant neural network architecture known framework to address the challenge of
for its simplicity and effectiveness in image training deeper neural networks. Unlike
classification tasks, particularly during the traditional approaches, the layers are
2014 ImageNet Large Scale Visual reformulated as learning residual functions
Recognition Challenge (ILSVRC). with reference to the layer inputs. The authors
Architecture: provide empirical evidence demonstrating
VGG's architecture is characterized by its that these residual networks are easier to
deep stack of convolutional layers, optimize and achieve improved accuracy with
predominantly using 3x3 convolutional filters increased depth. The proposed method is
for consistency. The network's depth, with evaluated on the ImageNet dataset, where
versions like VGG16 and VGG19, contributes residual nets with up to 152 layers outperform
to its ability to capture intricate features from shallower networks and win the ILSVRC
input images. 2015 classification task.

Introduction:
Key Features: Deep convolutional neural networks have led
a. Consistency in Convolutional Layers: to breakthroughs in image classification. The
VGG employs 3x3 convolutional filters importance of network depth is highlighted.
consistently, promoting spatial hierarchy The question of whether learning better
learning. networks is as easy as stacking more layers is
b. Deep Stacking: raised. The paper addresses the degradation
The deep architecture, with up to 19 layers, problem observed with deeper networks.
enables the network to learn complex
hierarchical features. However, this depth Related Work:
introduces challenges in training and Residual Representations are discussed,
citing examples such as VLAD and Fisher
computational requirements.
Vector in image recognition. The Multigrid
c. Pooling Layers: method in solving Partial Differential
Max-pooling layers are integrated for Equations is mentioned. Comparison with
downsampling feature maps, reducing spatial "highway networks" that present shortcut
dimensions while retaining crucial connections with gating functions is made.
information.
Deep Residual Learning:
Introduces the concept of residual learning
framework. The degradation problem is
Versions: addressed by letting the stacked layers fit a
a. VGG16:
residual mapping. Formally, the residual
Comprising 16 weight layers, including 13
function F(x) + x is introduced, and this is
convolutional and 3 fully connected layers,
realized using feedforward neural networks
VGG16 gained popularity for its competitive
with "shortcut connections." Identity mapping
performance in image classification.
by shortcuts and projection shortcuts are
b. VGG19:
discussed. Network architectures for plain and
Extending VGG16, VGG19 incorporates
residual networks are presented.
three additional convolutional layers for a
deeper representation suitable for more
complex tasks. Experiments:
Evaluation on the ImageNet 2012
classification dataset is conducted. Plain
networks and ResNets with different depths
(18, 34, 50, 101, 152 layers) are compared.
The effectiveness of identity vs. projection
shortcuts is studied. Deeper bottleneck
Applications: architectures are introduced. Comparative
VGG finds applications in diverse computer analysis with state-of-the-art methods is
vision tasks, including image classification, presented
object detection, segmentation, and feature
extraction.
Challenges:
VGG faces challenges related to Comparative Analysis: The paper
computational intensity and the difficulty of includes a comprehensive comparative
training very deep networks. These challenges analysis with state-of-the-art methods existing
spurred subsequent architectures like ResNet at the time. ResNets with varying depths (18,
to address these issues effectively. 34, 50, 101, 152 layers) are compared against
Legacy: plain networks. The results showcase that as
While newer architectures have surpassed the network depth increases, ResNets
VGG in terms of performance and efficiency, consistently outperform plain networks.
its simplicity and effectiveness have solidified Notably, ResNets with 152 layers achieve
its place in the history of deep learning, superior performance on the ImageNet 2012
influencing subsequent advancements in classification dataset, winning the ILSVRC
convolutional neural networks. 2015 classification task.

Improved Optimization and


Accuracy: The empirical evidence
presented in the experiments emphasizes that,
contrary to conventional thinking, increasing
network depth does not necessarily lead to
optimization difficulties. The introduction of
residual learning allows for the training of
very deep networks, providing both enhanced
accuracy and computational efficiency.

Comparison between Dense Net and Squeeze Net:


Abstract: Abstract:
Recent work has shown that convolutional SqueezeNet presents a novel deep neural
networks can be substantially deeper, more network architecture that achieves high
accurate, and efficient to train if they contain accuracy on image classification tasks with
shorter connections between layers close to significantly fewer parameters than traditional
the input and those close to the output. models. The core innovation involves a
Introduction: "squeeze" layer that reduces the number of
As convolutional neural networks (CNNs) input channels before feeding into the
become increasingly deep, a new research expensive 3x3 convolutional layers. This
problem emerges: the vanishing and "washing
design minimizes model size while
out" of information as it passes through many
maintaining performance.
layers. Various approaches, including
ResNets and Highway Networks, have been
proposed to address this issue. In this paper,
we propose an architecture that distills these Introduction:
insights into a simple connectivity pattern,
connecting all layers directly to ensure SqueezeNet addresses the challenge of
maximum information flow. deploying deep neural networks on resource-
constrained devices by reducing model size
Related Work: The exploration of without compromising accuracy. The
network architectures has been a part of architecture focuses on efficient use of
neural network research since their initial parameters while preserving representational
discovery. Previous work, such as Highway power. By employing a combination of 1x1
Networks and ResNets, addressed the convolutions and a "fire" module.
challenges of training deep networks.
Fire Module:
DenseNets: The key building block in SqueezeNet is the
Consider a single image passed through a fire module, which combines a squeeze layer
convolutional network with L layers. The (1x1 convolution) to reduce the number of
DenseNet introduces direct connections from input channels and expand layers (1x1 and
any layer to all subsequent layers, resulting in 3x3 convolutions) to capture complex
a dense connectivity pattern. patterns. This design philosophy aims to
strike a balance between computational
Experiments: We evaluate DenseNets on efficiency and expressive power, ensuring the
benchmark datasets, including CIFAR-10, network's effectiveness in various
CIFAR-100, SVHN, and ImageNet. The applications.
models demonstrate better parameter
efficiency, improved information flow, and SqueezeNet Architecture:
reduced overfitting on tasks with smaller SqueezeNet comprises multiple fire modules
training set sizes. stacked sequentially. The architecture
incorporates global average pooling and
Datasets: We conduct experiments on dropout layers to enhance generalization. Skip
CIFAR and SVHN datasets, adopting connections facilitate gradient flow, aiding in
standard data augmentation schemes. For training deeper networks. The resulting model
ImageNet, we use a DenseNet-BC structure is lightweight, making it suitable for real-time
with 4 dense blocks. applications and scenarios with limited
computational resources.
Training: All networks are trained using
stochastic gradient descent (SGD) with batch Performance Evaluation:
sizes and learning rates determined for each SqueezeNet achieves competitive accuracy on
dataset. We use weight decay and Nesterov benchmark datasets like ImageNet with a
momentum in our training settings. Memory- significantly reduced number of parameters
efficient implementation techniques are compared to contemporary models. The
employed to reduce GPU memory model's efficiency makes it a compelling
consumption. choice for applications where computational
resources are constrained, such as mobile and
embedded systems.

Conclusion: DenseNets provide a new Pros and Cons:


convolutional network architecture with dense Pros:
connectivity, allowing for feature reuse and Compact model size, suitable for deployment
improved information flow. The models on resource-constrained devices.
achieve state-of-the-art results on various Maintains competitive accuracy compared to
datasets, demonstrating their effectiveness in larger architectures.
terms of accuracy, parameter efficiency, and Efficient use of parameters through the
reduced overfitting. Further exploration of innovative fire module.
feature transfer with DenseNets is planned for Cons:
future work. May experience challenges capturing highly
intricate patterns compared to larger and more
complex models.
Sensitive to hyperparameter choices,
requiring careful tuning for optimal
performance.

Part-2:
AlexNet CIFAR-10 Classifier:
This Python code initializes and configures the PyTorch environment for GPU (CUDA) if
available, ensuring deterministic behavior for CUDA operations using the cuDNN library. It also
includes necessary imports for handling data, defining neural network architectures, and setting
up data transformations for image datasets.
import os
import time
import random

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.utils.data.dataset import Subset

from torchvision import datasets


from torchvision import transforms

import matplotlib.pyplot as plt


from PIL import Image

if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True

Model Settings

Setting a random seed


def set_all_seeds(seed):
os.environ["PL_GLOBAL_SEED"] = str(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

This code defines a function, set_all_seeds, that takes a seed as input and sets the random seed
for various libraries to ensure reproducibility in machine learning experiments. It covers global
seed setting for PyTorch, NumPy, Python's built-in random module, and PyTorch's CUDA
operations on GPU if available.
Setting cuDNN and PyTorch algorithmic behavior to deterministic
def set_deterministic():
if torch.cuda.is_available():
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.set_deterministic(True)

This code snippet sets PyTorch to use deterministic algorithms for CUDA operations if a GPU is
available, ensuring reproducibility in computations by turning off CUDA's nondeterministic
features.
##########################
### SETTINGS
##########################

# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.0001
BATCH_SIZE = 256
NUM_EPOCHS = 40

# Architecture
NUM_CLASSES = 10

# Other
DEVICE = "cuda:0"

set_all_seeds(RANDOM_SEED)

# Deterministic behavior not yet supported by AdaptiveAvgPool2d


#set_deterministic()

Import utility functions

import sys

sys.path.insert(0, "..") # to include ../helper_evaluate.py etc.

from helper_evaluate import compute_accuracy

from helper_data import get_dataloaders_cifar10

from helper_train import train_classifier_simple_v1


We’re importing utility functions for evaluating, training a simple classifier, and loading CIFAR-
10 dataset using helper modules. Additionally, it adjusts the system path to include the parent
directory for importing related modules.

Dataset
### Set random seed ###
set_all_seeds(RANDOM_SEED)

##########################
### Dataset
##########################

train_transforms = transforms.Compose([transforms.Resize((70, 70)),


transforms.RandomCrop((64, 64)),
transforms.ToTensor()])

test_transforms = transforms.Compose([transforms.Resize((70, 70)),


transforms.CenterCrop((64, 64)),
transforms.ToTensor()])

train_loader, valid_loader, test_loader = get_dataloaders_cifar10(


batch_size=BATCH_SIZE,
num_workers=2,
train_transforms=train_transforms,
test_transforms=test_transforms,
validation_fraction=0.1)

Here we set a random seed, defines transformations for training and testing data on CIFAR-10,
and creates data loaders for training, validation, and testing using these transformations. The data
loaders are configured with specified batch size, number of workers, and other parameters.
# Checking the dataset
print('Training Set:\n')
for images, labels in train_loader:
print('Image batch dimensions:', images.size())
print('Image label dimensions:', labels.size())
print(labels[:10])
break

# Checking the dataset


print('\nValidation Set:')
for images, labels in valid_loader:
print('Image batch dimensions:', images.size())
print('Image label dimensions:', labels.size())
print(labels[:10])
break

# Checking the dataset


print('\nTesting Set:')
for images, labels in train_loader:
print('Image batch dimensions:', images.size())
print('Image label dimensions:', labels.size())
print(labels[:10])
break
Here we’re checking and printing the dimensions of image batches and labels for the training,
validation, and testing sets using PyTorch's data loader. It provides insights into the structure of
the datasets used for training a neural network.

Model
##########################
### MODEL
##########################

class AlexNet(nn.Module):

def __init__(self, num_classes):


super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),

nn.Conv2d(64, 192, kernel_size=5, padding=2),


nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),

nn.Conv2d(192, 384, kernel_size=3, padding=1),


nn.ReLU(inplace=True),

nn.Conv2d(384, 256, kernel_size=3, padding=1),


nn.ReLU(inplace=True),

nn.Conv2d(256, 256, kernel_size=3, padding=1),


nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes)
)

def forward(self, x):


x = self.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 256 * 6 * 6)
logits = self.classifier(x)
probas = F.softmax(logits, dim=1)
return logits

This code defines the architecture of AlexNet, a convolutional neural network (CNN), using the
PyTorch framework. The model consists of convolutional layers, ReLU activation functions,
max-pooling layers, and fully connected layers, culminating in a softmax output for
classification.

torch.manual_seed(RANDOM_SEED)

model = AlexNet(NUM_CLASSES)
model.to(DEVICE)

optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

Training
log_dict = train_classifier_simple_v1(num_epochs=NUM_EPOCHS, model=model,
optimizer=optimizer, device=DEVICE,
train_loader=train_loader,
valid_loader=valid_loader,
logging_interval=50)
This snippet is likely training a classifier using a simple version 1 training function
(train_classifier_simple_v1) for a specified number of epochs (NUM_EPOCHS). It uses a given
model, optimizer, and data loaders for training and validation, with logging every 50 intervals.
The training progress is stored in the log_dict.
Evaluation
import matplotlib.pyplot as plt
%matplotlib inline
loss_list = log_dict['train_loss_per_batch']

plt.plot(loss_list, label='Minibatch loss')


plt.plot(np.convolve(loss_list,
np.ones(200,)/200, mode='valid'),
label='Running average')

plt.ylabel('Cross Entropy')
plt.xlabel('Iteration')
plt.legend()
Here we visualize the training loss per batch using matplotlib. It plots both the original minibatch
loss and its running average with a window size of 200 iterations, providing insights into the
training convergence over time.
plt.plot(np.arange(1, NUM_EPOCHS+1), log_dict['train_acc_per_epoch'],
label='Training')
plt.plot(np.arange(1, NUM_EPOCHS+1), log_dict['valid_acc_per_epoch'],
label='Validation')

plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
We’re using Matplotlib to plot the training and validation accuracy over epochs, visualizing the
model's learning progress during training.
with torch.set_grad_enabled(False):

train_acc = compute_accuracy(model=model,
data_loader=test_loader,
device=DEVICE)

test_acc = compute_accuracy(model=model,
data_loader=test_loader,
device=DEVICE)

valid_acc = compute_accuracy(model=model,
data_loader=valid_loader,
device=DEVICE)

print(f'Train ACC: {valid_acc:.2f}%')


print(f'Validation ACC: {valid_acc:.2f}%')
print(f'Test ACC: {test_acc:.2f}%')

In this code snippet, accuracy metrics (train, test, and validation) are computed for a PyTorch
model using the compute_accuracy function, and then these metrics are printed. However, there
is a discrepancy in the variable names used for accuracy, where valid_acc is printed for both
training and validation accuracy.

VGG 16:
import time
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader

if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True

This code sets up a PyTorch environment for deep learning, checking for GPU availability and
ensuring deterministic behavior for CUDA operations if a GPU is present. It includes essential
imports for handling datasets and neural network operations.

Settings and Dataset


##########################
### SETTINGS
##########################

# Device
DEVICE = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")
print('Device:', DEVICE)

# Hyperparameters
random_seed = 1
learning_rate = 0.001
num_epochs = 10
batch_size = 128
# Architecture
num_features = 784
num_classes = 10

Here we sets up the environment and hyperparameters for a machine learning model using
PyTorch. It specifies the device (GPU if available, otherwise CPU), sets hyperparameters like
learning rate and batch size, and defines the input features and output classes for a classification
task.
##########################
### MNIST DATASET
##########################

# Note transforms.ToTensor() scales input images


# to 0-1 range
train_dataset = datasets.CIFAR10(root='data',
train=True,
transform=transforms.ToTensor(),
download=True)

test_dataset = datasets.CIFAR10(root='data',
train=False,
transform=transforms.ToTensor())

train_loader = DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)

test_loader = DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)

# Checking the dataset


for images, labels in train_loader:
print('Image batch dimensions:', images.shape)
print('Image label dimensions:', labels.shape)
break

Model
##########################
### MODEL
##########################
class VGG16(torch.nn.Module):

def __init__(self, num_features, num_classes):


super(VGG16, self).__init__()

# calculate same padding:


# (w - k + 2*p)/s + 1 = o
# => p = (s(o-1) - w + k)/2

self.block_1 = nn.Sequential(
nn.Conv2d(in_channels=3,
out_channels=64,
kernel_size=(3, 3),
stride=(1, 1),
# (1(32-1)- 32 + 3)/2 = 1
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=64,
out_channels=64,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)

self.block_2 = nn.Sequential(
nn.Conv2d(in_channels=64,
out_channels=128,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=128,
out_channels=128,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)

self.block_3 = nn.Sequential(
nn.Conv2d(in_channels=128,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=256,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=256,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)

self.block_4 = nn.Sequential(
nn.Conv2d(in_channels=256,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.block_5 = nn.Sequential(
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)

self.classifier = nn.Sequential(
nn.Linear(512, 4096),
nn.ReLU(True),
#nn.Dropout(p=0.5),
nn.Linear(4096, 4096),
nn.ReLU(True),
#nn.Dropout(p=0.5),
nn.Linear(4096, num_classes),
)

for m in self.modules():
if isinstance(m, torch.nn.Conv2d) or isinstance(m,
torch.nn.Linear):
nn.init.kaiming_uniform_(m.weight, mode='fan_in',
nonlinearity='relu')
if m.bias is not None:
m.bias.detach().zero_()

#self.avgpool = nn.AdaptiveAvgPool2d((7, 7))


def forward(self, x):

x = self.block_1(x)
x = self.block_2(x)
x = self.block_3(x)
x = self.block_4(x)
x = self.block_5(x)
#x = self.avgpool(x)
x = x.view(x.size(0), -1)
logits = self.classifier(x)
probas = F.softmax(logits, dim=1)

return logits, probas

torch.manual_seed(random_seed)
model = VGG16(num_features=num_features,
num_classes=num_classes)

model = model.to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
This code defines a VGG16 convolutional neural network (CNN) architecture using PyTorch.
The VGG16 model is a deep neural network commonly used for image classification tasks. The
architecture is organized into five convolutional blocks, each followed by max-pooling layers,
and a fully connected classifier. The model is initialized with Kaiming weight initialization, and
the Adam optimizer is used for training. The forward method computes the output logits and
probabilities for a given input. The model is then instantiated, moved to the specified device
(e.g., GPU), and an Adam optimizer is set up for training. The random seed is also set for
reproducibility.

Training
def compute_accuracy(model, data_loader):
model.eval()
correct_pred, num_examples = 0, 0
for i, (features, targets) in enumerate(data_loader):

features = features.to(DEVICE)
targets = targets.to(DEVICE)

logits, probas = model(features)


_, predicted_labels = torch.max(probas, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100
def compute_epoch_loss(model, data_loader):
model.eval()
curr_loss, num_examples = 0., 0
with torch.no_grad():
for features, targets in data_loader:
features = features.to(DEVICE)
targets = targets.to(DEVICE)
logits, probas = model(features)
loss = F.cross_entropy(logits, targets, reduction='sum')
num_examples += targets.size(0)
curr_loss += loss

curr_loss = curr_loss / num_examples


return curr_loss

start_time = time.time()
for epoch in range(num_epochs):

model.train()
for batch_idx, (features, targets) in enumerate(train_loader):

features = features.to(DEVICE)
targets = targets.to(DEVICE)

### FORWARD AND BACK PROP


logits, probas = model(features)
cost = F.cross_entropy(logits, targets)
optimizer.zero_grad()

cost.backward()

### UPDATE MODEL PARAMETERS


optimizer.step()

### LOGGING
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f'
%(epoch+1, num_epochs, batch_idx,
len(train_loader), cost))

model.eval()
with torch.set_grad_enabled(False): # save memory during inference
print('Epoch: %03d/%03d | Train: %.3f%% | Loss: %.3f' % (
epoch+1, num_epochs,
compute_accuracy(model, train_loader),
compute_epoch_loss(model, train_loader)))

print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))

print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))

It defines functions for computing accuracy and epoch loss of a PyTorch neural network model.
It then performs training over multiple epochs using a specified training loader. In each epoch, it
iterates through batches of training data, computes the forward and backward propagation,
updates the model parameters, and logs the training cost. The code also evaluates and prints the
training accuracy and loss at the end of each epoch. Additionally, it measures and displays the
total training time. The training process is implemented with the cross-entropy loss and uses the
Adam optimizer.

Evaluation
with torch.set_grad_enabled(False): # save memory during inference
print('Test accuracy: %.2f%%' % (compute_accuracy(model,
test_loader)))
The PyTorch function torch.set_grad_enabled(False) is used to disable gradient computation,
which helps conserve memory during inference. The subsequent line prints the test accuracy of a
machine learning model on a given test dataset using the compute_accuracy function, likely
defined elsewhere in the code.

VGG 19:
import numpy as np
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader
Settings and Dataset
##########################
### SETTINGS
##########################

# Device
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('Device:', DEVICE)

# Hyperparameters
random_seed = 1
learning_rate = 0.001
num_epochs = 20
batch_size = 128

# Architecture
num_features = 784
num_classes = 10

##########################
### MNIST DATASET
##########################

# Note transforms.ToTensor() scales input images


# to 0-1 range
train_dataset = datasets.CIFAR10(root='data',
train=True,
transform=transforms.ToTensor(),
download=True)

test_dataset = datasets.CIFAR10(root='data',
train=False,
transform=transforms.ToTensor())

train_loader = DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)

test_loader = DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)
# Checking the dataset
for images, labels in train_loader:
print('Image batch dimensions:', images.shape)
print('Image label dimensions:', labels.shape)
break
Here we sets up configurations and loads the CIFAR-10 dataset using PyTorch's DataLoader. It
specifies settings such as device (CPU or GPU), hyperparameters like learning rate and batch
size, and defines the architecture parameters for a neural network (784 input features, 10 output
classes). The CIFAR-10 dataset is then loaded, transformed to tensors, and split into training and
testing sets. Finally, the DataLoader is used to iterate through the training set, printing the
dimensions of image batches and their corresponding labels.

Model
##########################
### MODEL
##########################

class VGG16(torch.nn.Module):

def __init__(self, num_features, num_classes):


super(VGG16, self).__init__()

# calculate same padding:


# (w - k + 2*p)/s + 1 = o
# => p = (s(o-1) - w + k)/2

self.block_1 = nn.Sequential(
nn.Conv2d(in_channels=3,
out_channels=64,
kernel_size=(3, 3),
stride=(1, 1),
# (1(32-1)- 32 + 3)/2 = 1
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=64,
out_channels=64,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.block_2 = nn.Sequential(
nn.Conv2d(in_channels=64,
out_channels=128,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=128,
out_channels=128,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)

self.block_3 = nn.Sequential(
nn.Conv2d(in_channels=128,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=256,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=256,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=256,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)

self.block_4 = nn.Sequential(
nn.Conv2d(in_channels=256,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)

self.block_5 = nn.Sequential(
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)

self.classifier = nn.Sequential(
nn.Linear(512, 4096),
nn.ReLU(True),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Linear(4096, num_classes)
)

for m in self.modules():
if isinstance(m, torch.nn.Conv2d):
#n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
#m.weight.data.normal_(0, np.sqrt(2. / n))
m.weight.detach().normal_(0, 0.05)
if m.bias is not None:
m.bias.detach().zero_()
elif isinstance(m, torch.nn.Linear):
m.weight.detach().normal_(0, 0.05)
m.bias.detach().detach().zero_()

def forward(self, x):

x = self.block_1(x)
x = self.block_2(x)
x = self.block_3(x)
x = self.block_4(x)
x = self.block_5(x)
logits = self.classifier(x.view(-1, 512))
probas = F.softmax(logits, dim=1)

return logits, probas

torch.manual_seed(random_seed)
model = VGG16(num_features=num_features,
num_classes=num_classes)

model = model.to(DEVICE)

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)


Defined a VGG16 model for image classification using PyTorch. The model architecture
consists of five convolutional blocks (block_1 to block_5) followed by a fully connected
classifier. Each block contains multiple convolutional layers with ReLU activation functions and
max-pooling layers. The classifier consists of three fully connected layers. The weights of
convolutional and linear layers are initialized with a normal distribution. The model is set to run
on a specified device (CPU or GPU), and Adam is chosen as the optimizer with a given learning
rate.

Training
def compute_accuracy(model, data_loader):
model.eval()
correct_pred, num_examples = 0, 0
for i, (features, targets) in enumerate(data_loader):

features = features.to(DEVICE)
targets = targets.to(DEVICE)

logits, probas = model(features)


_, predicted_labels = torch.max(probas, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100

def compute_epoch_loss(model, data_loader):


model.eval()
curr_loss, num_examples = 0., 0
with torch.no_grad():
for features, targets in data_loader:
features = features.to(DEVICE)
targets = targets.to(DEVICE)
logits, probas = model(features)
loss = F.cross_entropy(logits, targets, reduction='sum')
num_examples += targets.size(0)
curr_loss += loss

curr_loss = curr_loss / num_examples


return curr_loss

start_time = time.time()
for epoch in range(num_epochs):

model.train()
for batch_idx, (features, targets) in enumerate(train_loader):

features = features.to(DEVICE)
targets = targets.to(DEVICE)

### FORWARD AND BACK PROP


logits, probas = model(features)
cost = F.cross_entropy(logits, targets)
optimizer.zero_grad()

cost.backward()

### UPDATE MODEL PARAMETERS


optimizer.step()

### LOGGING
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f'
%(epoch+1, num_epochs, batch_idx,
len(train_loader), cost))

model.eval()
with torch.set_grad_enabled(False): # save memory during inference
print('Epoch: %03d/%03d | Train: %.3f%% | Loss: %.3f' % (
epoch+1, num_epochs,
compute_accuracy(model, train_loader),
compute_epoch_loss(model, train_loader)))

print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))

print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))


Defined functions for computing accuracy and epoch loss, then trains a PyTorch model using a
specified number of epochs. It iterates through batches in a training loader, performs forward and
backward propagation, updates model parameters, and logs the training progress. Additionally, it
calculates and prints training accuracy and loss at the end of each epoch, providing insights into
the model's performance during training.

Evaluation
with torch.set_grad_enabled(False): # save memory during inference
print('Test accuracy: %.2f%%' % (compute_accuracy(model,
test_loader)))

Resnet 18:
import os
import time

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader

from torchvision import datasets


from torchvision import transforms

import matplotlib.pyplot as plt


from PIL import Image
if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True

Model Settings
##########################
### SETTINGS
##########################

# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.001
BATCH_SIZE = 128
NUM_EPOCHS = 10

# Architecture
NUM_FEATURES = 28*28
NUM_CLASSES = 10

# Other
DEVICE = "cuda:1"
GRAYSCALE = True

This code sets hyperparameters and configurations for a neural network model. It specifies settings
such as random seed, learning rate, batch size, number of epochs, architecture details (number of
features and classes), device for computation (CUDA), and whether the data is grayscale. These
settings are crucial for training and evaluating a neural network on image data.

MNIST Dataset
##########################
### MNIST DATASET
##########################

# Note transforms.ToTensor() scales input images


# to 0-1 range
train_dataset = datasets.MNIST(root='data',
train=True,
transform=transforms.ToTensor(),
download=True)

test_dataset = datasets.MNIST(root='data',
train=False,
transform=transforms.ToTensor())
train_loader = DataLoader(dataset=train_dataset,
batch_size=BATCH_SIZE,
shuffle=True)

test_loader = DataLoader(dataset=test_dataset,
batch_size=BATCH_SIZE,
shuffle=False)

# Checking the dataset


for images, labels in train_loader:
print('Image batch dimensions:', images.shape)
print('Image label dimensions:', labels.shape)
break
device = torch.device(DEVICE)
torch.manual_seed(0)

for epoch in range(2):

for batch_idx, (x, y) in enumerate(train_loader):

print('Epoch:', epoch+1, end='')


print(' | Batch index:', batch_idx, end='')
print(' | Batch size:', y.size()[0])

x = x.to(device)
y = y.to(device)
break
##########################
### MODEL
##########################

def conv3x3(in_planes, out_planes, stride=1):


"""3x3 convolution with padding"""
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=1, bias=False)

class BasicBlock(nn.Module):
expansion = 1

def __init__(self, inplanes, planes, stride=1, downsample=None):


super(BasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = nn.BatchNorm2d(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = nn.BatchNorm2d(planes)
self.downsample = downsample
self.stride = stride

def forward(self, x):


residual = x

out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)

out = self.conv2(out)
out = self.bn2(out)

if self.downsample is not None:


residual = self.downsample(x)

out += residual
out = self.relu(out)

return out

class ResNet(nn.Module):

def __init__(self, block, layers, num_classes, grayscale):


self.inplanes = 64
if grayscale:
in_dim = 1
else:
in_dim = 3
super(ResNet, self).__init__()
self.conv1 = nn.Conv2d(in_dim, 64, kernel_size=7, stride=2,
padding=3,
bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AvgPool2d(7, stride=1)
self.fc = nn.Linear(512 * block.expansion, num_classes)

for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, (2. / n)**.5)
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()

def _make_layer(self, block, planes, blocks, stride=1):


downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes * block.expansion),
)

layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))

return nn.Sequential(*layers)

def forward(self, x):


x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)

x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
# because MNIST is already 1x1 here:
# disable avg pooling
#x = self.avgpool(x)
x = x.view(x.size(0), -1)
logits = self.fc(x)
probas = F.softmax(logits, dim=1)
return logits, probas

def resnet18(num_classes):
"""Constructs a ResNet-18 model."""
model = ResNet(block=BasicBlock,
layers=[2, 2, 2, 2],
num_classes=NUM_CLASSES,
grayscale=GRAYSCALE)
return model
This code defines a convolutional neural network (CNN) based on the ResNet architecture for
the MNIST dataset. It begins by setting up the MNIST dataset and data loaders. The model is
then defined using the ResNet architecture, specifically ResNet-18, with basic blocks. The
network is trained for two epochs on batches of MNIST images. The ResNet model includes
convolutional layers, batch normalization, ReLU activation, and residual connections, providing
a deeper architecture that is easier to optimize.

Training
def compute_accuracy(model, data_loader, device):
correct_pred, num_examples = 0, 0
for i, (features, targets) in enumerate(data_loader):

features = features.to(device)
targets = targets.to(device)

logits, probas = model(features)


_, predicted_labels = torch.max(probas, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100

start_time = time.time()
for epoch in range(NUM_EPOCHS):

model.train()
for batch_idx, (features, targets) in enumerate(train_loader):

features = features.to(DEVICE)
targets = targets.to(DEVICE)

### FORWARD AND BACK PROP


logits, probas = model(features)
cost = F.cross_entropy(logits, targets)
optimizer.zero_grad()

cost.backward()

### UPDATE MODEL PARAMETERS


optimizer.step()

### LOGGING
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f'
%(epoch+1, NUM_EPOCHS, batch_idx,
len(train_loader), cost))

model.eval()
with torch.set_grad_enabled(False): # save memory during inference
print('Epoch: %03d/%03d | Train: %.3f%%' % (
epoch+1, NUM_EPOCHS,
compute_accuracy(model, train_loader, device=DEVICE)))

print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))

print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))


This code defines a function compute_accuracy to calculate the accuracy of a PyTorch neural
network model on a given data loader. The main section then trains the model using a specified
number of epochs, performing forward and backward propagation, updating model
parameters, and logging the training progress, including the cost at regular intervals. The
training time and final training accuracy are also displayed.

Evaluation
with torch.set_grad_enabled(False): # save memory during inference
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader,
device=DEVICE)))
for batch_idx, (features, targets) in enumerate(test_loader):

features = features
targets = targets
break
model.eval()
logits, probas = model(features.to(device)[0, None])
print('Probability 7 %.2f%%' % (probas[0][7]*100))

ResNet 101:
import os
import time

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import Dataset


from torch.utils.data import DataLoader
from torch.utils.data.dataset import Subset

from torchvision import datasets


from torchvision import transforms

import time

import matplotlib.pyplot as plt


from PIL import Image

if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True

Settings
##########################
### SETTINGS
##########################

# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.01
NUM_EPOCHS = 50

# Architecture
NUM_CLASSES = 10
BATCH_SIZE = 128
DEVICE = torch.device('cuda:3')
GRAYSCALE = False

Dataset
##########################
### CIFAR-10 Dataset
##########################

# Note transforms.ToTensor() scales input images


# to 0-1 range

train_indices = torch.arange(0, 49000)


valid_indices = torch.arange(49000, 50000)

train_and_valid = datasets.CIFAR10(root='data',
train=True,
transform=transforms.ToTensor(),
download=True)

train_dataset = Subset(train_and_valid, train_indices)


valid_dataset = Subset(train_and_valid, valid_indices)

test_dataset = datasets.CIFAR10(root='data',
train=False,
transform=transforms.ToTensor())

#####################################################
### Data Loaders
#####################################################

train_loader = DataLoader(dataset=train_dataset,
batch_size=BATCH_SIZE,
num_workers=8,
shuffle=True)

valid_loader = DataLoader(dataset=valid_dataset,
batch_size=BATCH_SIZE,
num_workers=8,
shuffle=False)

test_loader = DataLoader(dataset=test_dataset,
batch_size=BATCH_SIZE,
num_workers=8,
shuffle=False)

#####################################################

# Checking the dataset


for images, labels in train_loader:
print('Image batch dimensions:', images.shape)
print('Image label dimensions:', labels.shape)
break

for images, labels in test_loader:


print('Image batch dimensions:', images.shape)
print('Image label dimensions:', labels.shape)
break

for images, labels in valid_loader:


print('Image batch dimensions:', images.shape)
print('Image label dimensions:', labels.shape)
break

Model
##########################
### MODEL
##########################

def conv3x3(in_planes, out_planes, stride=1):


"""3x3 convolution with padding"""
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=1, bias=False)

class Bottleneck(nn.Module):
expansion = 4

def __init__(self, inplanes, planes, stride=1, downsample=None):


super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1,
bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
stride=stride,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1,
bias=False)
self.bn3 = nn.BatchNorm2d(planes * 4)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride

def forward(self, x):


residual = x

out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)

out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)

out = self.conv3(out)
out = self.bn3(out)

if self.downsample is not None:


residual = self.downsample(x)

out += residual
out = self.relu(out)

return out

class ResNet(nn.Module):

def __init__(self, block, layers, num_classes, grayscale):


self.inplanes = 64
if grayscale:
in_dim = 1
else:
in_dim = 3
super(ResNet, self).__init__()
self.conv1 = nn.Conv2d(in_dim, 64, kernel_size=7, stride=2,
padding=3,
bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AvgPool2d(7, stride=1, padding=2)
#self.fc = nn.Linear(2048 * block.expansion, num_classes)
self.fc = nn.Linear(2048, num_classes)

for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, (2. / n)**.5)
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()

def _make_layer(self, block, planes, blocks, stride=1):


downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes * block.expansion),
)

layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))

return nn.Sequential(*layers)

def forward(self, x):


x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)

#x = self.avgpool(x)
x = x.view(x.size(0), -1)
logits = self.fc(x)
probas = F.softmax(logits, dim=1)
return logits, probas

def resnet101(num_classes, grayscale):


"""Constructs a ResNet-101 model."""
model = ResNet(block=Bottleneck,
layers=[3, 4, 23, 3],
num_classes=NUM_CLASSES,
grayscale=grayscale)
return model
torch.manual_seed(RANDOM_SEED)

##########################
### COST AND OPTIMIZER
##########################

model = resnet101(NUM_CLASSES, GRAYSCALE)


model.to(DEVICE)

optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

Training
def compute_accuracy(model, data_loader, device):
correct_pred, num_examples = 0, 0
for i, (features, targets) in enumerate(data_loader):

features = features.to(device)
targets = targets.to(device)

logits, probas = model(features)


_, predicted_labels = torch.max(probas, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100
start_time = time.time()

# use random seed for reproducibility (here batch shuffling)


torch.manual_seed(RANDOM_SEED)

for epoch in range(NUM_EPOCHS):

model.train()

for batch_idx, (features, targets) in enumerate(train_loader):

### PREPARE MINIBATCH


features = features.to(DEVICE)
targets = targets.to(DEVICE)

### FORWARD AND BACK PROP


logits, probas = model(features)
cost = F.cross_entropy(logits, targets)
optimizer.zero_grad()

cost.backward()

### UPDATE MODEL PARAMETERS


optimizer.step()

### LOGGING
if not batch_idx % 120:
print (f'Epoch: {epoch+1:03d}/{NUM_EPOCHS:03d} | '
f'Batch {batch_idx:03d}/{len(train_loader):03d} |'
f' Cost: {cost:.4f}')

# no need to build the computation graph for backprop when computing


accuracy
with torch.set_grad_enabled(False):
train_acc = compute_accuracy(model, train_loader, device=DEVICE)
valid_acc = compute_accuracy(model, valid_loader, device=DEVICE)
print(f'Epoch: {epoch+1:03d}/{NUM_EPOCHS:03d} Train Acc.:
{train_acc:.2f}%'
f' | Validation Acc.: {valid_acc:.2f}%')

elapsed = (time.time() - start_time)/60


print(f'Time elapsed: {elapsed:.2f} min')

elapsed = (time.time() - start_time)/60


print(f'Total Training Time: {elapsed:.2f} min')

Evaluation
with torch.set_grad_enabled(False): # save memory during inference
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader,
device=DEVICE)))

Resnet152:
Imports
import os
import time

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import Dataset


from torch.utils.data import DataLoader

from torchvision import datasets


from torchvision import transforms

import matplotlib.pyplot as plt


from PIL import Image

if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True

Settings
##########################
### SETTINGS
##########################

# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.001
NUM_EPOCHS = 10
# Architecture
NUM_FEATURES = 128*128
NUM_CLASSES = 2
BATCH_SIZE = 128
DEVICE = 'cuda:2' # default GPU device
GRAYSCALE = False

Dataset
Downloading the Dataset

1) Download and unzip the file img_align_celeba.zip, which contains the images in jpeg
format.
2) Download the list_attr_celeba.txt file, which contains the class labels
3) Download the list_eval_partition.txt file, which contains training/validation/test
partitioning info

Preparing the Dataset


df1 = pd.read_csv('list_attr_celeba.txt', sep="\s+", skiprows=1,
usecols=['Male'])

# Make 0 (female) & 1 (male) labels instead of -1 & 1


df1.loc[df1['Male'] == -1, 'Male'] = 0

df1.head()
df2 = pd.read_csv('list_eval_partition.txt', sep="\s+", skiprows=0,
header=None)
df2.columns = ['Filename', 'Partition']
df2 = df2.set_index('Filename')

df2.head()
df3 = df1.merge(df2, left_index=True, right_index=True)
df3.head()
df3.to_csv('celeba-gender-partitions.csv')
df4 = pd.read_csv('celeba-gender-partitions.csv', index_col=0)
df4.head()
df4.loc[df4['Partition'] == 0].to_csv('celeba-gender-train.csv')
df4.loc[df4['Partition'] == 1].to_csv('celeba-gender-valid.csv')
df4.loc[df4['Partition'] == 2].to_csv('celeba-gender-test.csv')
img = Image.open('img_align_celeba/000001.jpg')
print(np.asarray(img, dtype=np.uint8).shape)
plt.imshow(img);

Implementing a Custom DataLoader Class


class CelebaDataset(Dataset):
"""Custom Dataset for loading CelebA face images"""

def __init__(self, csv_path, img_dir, transform=None):

df = pd.read_csv(csv_path, index_col=0)
self.img_dir = img_dir
self.csv_path = csv_path
self.img_names = df.index.values
self.y = df['Male'].values
self.transform = transform

def __getitem__(self, index):


img = Image.open(os.path.join(self.img_dir,
self.img_names[index]))

if self.transform is not None:


img = self.transform(img)

label = self.y[index]
return img, label

def __len__(self):
return self.y.shape[0]
# Note that transforms.ToTensor()
# already divides pixels by 255. internally

custom_transform = transforms.Compose([transforms.CenterCrop((178, 178)),


transforms.Resize((128, 128)),
#transforms.Grayscale(),
#transforms.Lambda(lambda x:
x/255.),
transforms.ToTensor()])

train_dataset = CelebaDataset(csv_path='celeba-gender-train.csv',
img_dir='img_align_celeba/',
transform=custom_transform)

valid_dataset = CelebaDataset(csv_path='celeba-gender-valid.csv',
img_dir='img_align_celeba/',
transform=custom_transform)
test_dataset = CelebaDataset(csv_path='celeba-gender-test.csv',
img_dir='img_align_celeba/',
transform=custom_transform)

train_loader = DataLoader(dataset=train_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=4)

valid_loader = DataLoader(dataset=valid_dataset,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=4)

test_loader = DataLoader(dataset=test_dataset,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=4)
torch.manual_seed(0)

for epoch in range(2):

for batch_idx, (x, y) in enumerate(train_loader):

print('Epoch:', epoch+1, end='')


print(' | Batch index:', batch_idx, end='')
print(' | Batch size:', y.size()[0])

x = x.to(DEVICE)
y = y.to(DEVICE)
time.sleep(1)
break

Model
##########################
### MODEL
##########################

def conv3x3(in_planes, out_planes, stride=1):


"""3x3 convolution with padding"""
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=1, bias=False)
class Bottleneck(nn.Module):
expansion = 4

def __init__(self, inplanes, planes, stride=1, downsample=None):


super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1,
bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
stride=stride,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1,
bias=False)
self.bn3 = nn.BatchNorm2d(planes * 4)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride

def forward(self, x):


residual = x

out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)

out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)

out = self.conv3(out)
out = self.bn3(out)

if self.downsample is not None:


residual = self.downsample(x)

out += residual
out = self.relu(out)

return out

class ResNet(nn.Module):
def __init__(self, block, layers, num_classes, grayscale):
self.inplanes = 64
if grayscale:
in_dim = 1
else:
in_dim = 3
super(ResNet, self).__init__()
self.conv1 = nn.Conv2d(in_dim, 64, kernel_size=7, stride=2,
padding=3,
bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AvgPool2d(7, stride=1, padding=2)
self.fc = nn.Linear(2048 * block.expansion, num_classes)

for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, (2. / n)**.5)
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()

def _make_layer(self, block, planes, blocks, stride=1):


downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes * block.expansion),
)

layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))

return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)

x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)

x = self.avgpool(x)
x = x.view(x.size(0), -1)
logits = self.fc(x)
probas = F.softmax(logits, dim=1)
return logits, probas

def resnet152(num_classes, grayscale):


"""Constructs a ResNet-152 model."""
model = ResNet(block=Bottleneck,
layers=[3, 4, 36, 3],
num_classes=NUM_CLASSES,
grayscale=grayscale)
return model
torch.manual_seed(RANDOM_SEED)

##########################
### COST AND OPTIMIZER
##########################

model = resnet152(NUM_CLASSES, GRAYSCALE)


model.to(DEVICE)

optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

Training
def compute_accuracy(model, data_loader, device):
correct_pred, num_examples = 0, 0
for i, (features, targets) in enumerate(data_loader):

features = features.to(device)
targets = targets.to(device)
logits, probas = model(features)
_, predicted_labels = torch.max(probas, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100

start_time = time.time()
for epoch in range(NUM_EPOCHS):

model.train()
for batch_idx, (features, targets) in enumerate(train_loader):

features = features.to(DEVICE)
targets = targets.to(DEVICE)

### FORWARD AND BACK PROP


logits, probas = model(features)
cost = F.cross_entropy(logits, targets)
optimizer.zero_grad()

cost.backward()

### UPDATE MODEL PARAMETERS


optimizer.step()

### LOGGING
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f'
%(epoch+1, NUM_EPOCHS, batch_idx,
len(train_loader), cost))

model.eval()
with torch.set_grad_enabled(False): # save memory during inference
print('Epoch: %03d/%03d | Train: %.3f%% | Valid: %.3f%%' % (
epoch+1, NUM_EPOCHS,
compute_accuracy(model, train_loader, device=DEVICE),
compute_accuracy(model, valid_loader, device=DEVICE)))

print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))

print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))


Evaluation
with torch.set_grad_enabled(False): # save memory during inference
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader,
device=DEVICE)))
for batch_idx, (features, targets) in enumerate(test_loader):

features = features
targets = targets
break

plt.imshow(np.transpose(features[0], (1, 2, 0)))


model.eval()
logits, probas = model(features.to(DEVICE)[0, None])
print('Probability Female %.2f%%' % (probas[0][0]*100))

DenseNet:
Imports
import os
import time

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.utils.data.dataset import Subset

from torchvision import datasets


from torchvision import transforms

import matplotlib.pyplot as plt


from PIL import Image

if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True
##########################
### SETTINGS
##########################
# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.001
BATCH_SIZE = 128
NUM_EPOCHS = 20

# Architecture
NUM_CLASSES = 10

# Other
DEVICE = "cuda:0"
GRAYSCALE = False

CIFAR-10 Dataset
train_indices = torch.arange(0, 48000)
valid_indices = torch.arange(48000, 50000)

train_and_valid = datasets.CIFAR10(root='data',
train=True,
transform=transforms.ToTensor(),
download=True)

train_dataset = Subset(train_and_valid, train_indices)


valid_dataset = Subset(train_and_valid, valid_indices)
test_dataset = datasets.CIFAR10(root='data',
train=False,
transform=transforms.ToTensor(),
download=False)

train_loader = DataLoader(dataset=train_dataset,
batch_size=BATCH_SIZE,
num_workers=4,
shuffle=True)

valid_loader = DataLoader(dataset=valid_dataset,
batch_size=BATCH_SIZE,
num_workers=4,
shuffle=False)

test_loader = DataLoader(dataset=test_dataset,
batch_size=BATCH_SIZE,
num_workers=4,
shuffle=False)
device = torch.device(DEVICE)
torch.manual_seed(0)

for epoch in range(2):

for batch_idx, (x, y) in enumerate(train_loader):

print('Epoch:', epoch+1, end='')


print(' | Batch index:', batch_idx, end='')
print(' | Batch size:', y.size()[0])

x = x.to(device)
y = y.to(device)
break
# Check that shuffling works properly
# i.e., label indices should be in random order.
# Also, the label order should be different in the second
# epoch.

for images, labels in train_loader:


pass
print(labels[:10])

for images, labels in train_loader:


pass
print(labels[:10])
# Check that validation set and test sets are diverse
# i.e., that they contain all classes

for images, labels in valid_loader:


pass
print(labels[:10])

for images, labels in test_loader:


pass
print(labels[:10])
##########################
### MODEL
##########################

# The following code cell that implements the DenseNet-121 architecture


# is a derivative of the code provided at
#
https://github.com/pytorch/vision/blob/master/torchvision/models/densenet.
py

import re
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.checkpoint as cp
from collections import OrderedDict

def _bn_function_factory(norm, relu, conv):


def bn_function(*inputs):
concated_features = torch.cat(inputs, 1)
bottleneck_output = conv(relu(norm(concated_features)))
return bottleneck_output

return bn_function

class _DenseLayer(nn.Sequential):
def __init__(self, num_input_features, growth_rate, bn_size,
drop_rate, memory_efficient=False):
super(_DenseLayer, self).__init__()
self.add_module('norm1', nn.BatchNorm2d(num_input_features)),
self.add_module('relu1', nn.ReLU(inplace=True)),
self.add_module('conv1', nn.Conv2d(num_input_features, bn_size *
growth_rate, kernel_size=1,
stride=1,
bias=False)),
self.add_module('norm2', nn.BatchNorm2d(bn_size * growth_rate)),
self.add_module('relu2', nn.ReLU(inplace=True)),
self.add_module('conv2', nn.Conv2d(bn_size * growth_rate,
growth_rate,
kernel_size=3, stride=1,
padding=1,
bias=False)),
self.drop_rate = drop_rate
self.memory_efficient = memory_efficient

def forward(self, *prev_features):


bn_function = _bn_function_factory(self.norm1, self.relu1,
self.conv1)
if self.memory_efficient and any(prev_feature.requires_grad for
prev_feature in prev_features):
bottleneck_output = cp.checkpoint(bn_function, *prev_features)
else:
bottleneck_output = bn_function(*prev_features)
new_features =
self.conv2(self.relu2(self.norm2(bottleneck_output)))
if self.drop_rate > 0:
new_features = F.dropout(new_features, p=self.drop_rate,
training=self.training)
return new_features

class _DenseBlock(nn.Module):
def __init__(self, num_layers, num_input_features, bn_size,
growth_rate, drop_rate, memory_efficient=False):
super(_DenseBlock, self).__init__()
for i in range(num_layers):
layer = _DenseLayer(
num_input_features + i * growth_rate,
growth_rate=growth_rate,
bn_size=bn_size,
drop_rate=drop_rate,
memory_efficient=memory_efficient,
)
self.add_module('denselayer%d' % (i + 1), layer)

def forward(self, init_features):


features = [init_features]
for name, layer in self.named_children():
new_features = layer(*features)
features.append(new_features)
return torch.cat(features, 1)

class _Transition(nn.Sequential):
def __init__(self, num_input_features, num_output_features):
super(_Transition, self).__init__()
self.add_module('norm', nn.BatchNorm2d(num_input_features))
self.add_module('relu', nn.ReLU(inplace=True))
self.add_module('conv', nn.Conv2d(num_input_features,
num_output_features,
kernel_size=1, stride=1,
bias=False))
self.add_module('pool', nn.AvgPool2d(kernel_size=2, stride=2))
class DenseNet121(nn.Module):
r"""Densenet-BC model class, based on
`"Densely Connected Convolutional Networks"
<https://arxiv.org/pdf/1608.06993.pdf>`_

Args:
growth_rate (int) - how many filters to add each layer (`k` in
paper)
block_config (list of 4 ints) - how many layers in each pooling
block
num_init_featuremaps (int) - the number of filters to learn in the
first convolution layer
bn_size (int) - multiplicative factor for number of bottle neck
layers
(i.e. bn_size * k features in the bottleneck layer)
drop_rate (float) - dropout rate after each dense layer
num_classes (int) - number of classification classes
memory_efficient (bool) - If True, uses checkpointing. Much more
memory efficient,
but slower. Default: *False*. See `"paper"
<https://arxiv.org/pdf/1707.06990.pdf>`_
"""

def __init__(self, growth_rate=32, block_config=(6, 12, 24, 16),


num_init_featuremaps=64, bn_size=4, drop_rate=0,
num_classes=1000, memory_efficient=False,
grayscale=False):

super(DenseNet121, self).__init__()

# First convolution
if grayscale:
in_channels=1
else:
in_channels=3

self.features = nn.Sequential(OrderedDict([
('conv0', nn.Conv2d(in_channels=in_channels,
out_channels=num_init_featuremaps,
kernel_size=7, stride=2,
padding=3, bias=False)), # bias is
redundant when using batchnorm
('norm0', nn.BatchNorm2d(num_features=num_init_featuremaps)),
('relu0', nn.ReLU(inplace=True)),
('pool0', nn.MaxPool2d(kernel_size=3, stride=2, padding=1)),
]))

# Each denseblock
num_features = num_init_featuremaps
for i, num_layers in enumerate(block_config):
block = _DenseBlock(
num_layers=num_layers,
num_input_features=num_features,
bn_size=bn_size,
growth_rate=growth_rate,
drop_rate=drop_rate,
memory_efficient=memory_efficient
)
self.features.add_module('denseblock%d' % (i + 1), block)
num_features = num_features + num_layers * growth_rate
if i != len(block_config) - 1:
trans = _Transition(num_input_features=num_features,
num_output_features=num_features // 2)
self.features.add_module('transition%d' % (i + 1), trans)
num_features = num_features // 2

# Final batch norm


self.features.add_module('norm5', nn.BatchNorm2d(num_features))

# Linear layer
self.classifier = nn.Linear(num_features, num_classes)

# Official init from torch repo.


for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.constant_(m.bias, 0)

def forward(self, x):


features = self.features(x)
out = F.relu(features, inplace=True)
out = F.adaptive_avg_pool2d(out, (1, 1))
out = torch.flatten(out, 1)
logits = self.classifier(out)
probas = F.softmax(logits, dim=1)
return logits, probas

Training
def compute_acc(model, data_loader, device):
correct_pred, num_examples = 0, 0
model.eval()
for i, (features, targets) in enumerate(data_loader):

features = features.to(device)
targets = targets.to(device)

logits, probas = model(features)


_, predicted_labels = torch.max(probas, 1)
num_examples += targets.size(0)
assert predicted_labels.size() == targets.size()
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100
start_time = time.time()

cost_list = []
train_acc_list, valid_acc_list = [], []

for epoch in range(NUM_EPOCHS):

model.train()
for batch_idx, (features, targets) in enumerate(train_loader):

features = features.to(DEVICE)
targets = targets.to(DEVICE)

### FORWARD AND BACK PROP


logits, probas = model(features)
cost = F.cross_entropy(logits, targets)
optimizer.zero_grad()

cost.backward()

### UPDATE MODEL PARAMETERS


optimizer.step()

#################################################
### CODE ONLY FOR LOGGING BEYOND THIS POINT
################################################
cost_list.append(cost.item())
if not batch_idx % 150:
print (f'Epoch: {epoch+1:03d}/{NUM_EPOCHS:03d} | '
f'Batch {batch_idx:03d}/{len(train_loader):03d} |'
f' Cost: {cost:.4f}')

model.eval()
with torch.set_grad_enabled(False): # save memory during inference

train_acc = compute_acc(model, train_loader, device=DEVICE)


valid_acc = compute_acc(model, valid_loader, device=DEVICE)

print(f'Epoch: {epoch+1:03d}/{NUM_EPOCHS:03d}\n'
f'Train ACC: {train_acc:.2f} | Validation ACC:
{valid_acc:.2f}')

train_acc_list.append(train_acc)
valid_acc_list.append(valid_acc)

elapsed = (time.time() - start_time)/60


print(f'Time elapsed: {elapsed:.2f} min')

elapsed = (time.time() - start_time)/60


print(f'Total Training Time: {elapsed:.2f} min')

Evaluation
plt.plot(cost_list, label='Minibatch cost')
plt.plot(np.convolve(cost_list,
np.ones(200,)/200, mode='valid'),
label='Running average')

plt.ylabel('Cross Entropy')
plt.xlabel('Iteration')
plt.legend()
plt.show()

plt.plot(np.arange(1, NUM_EPOCHS+1), train_acc_list, label='Training')


plt.plot(np.arange(1, NUM_EPOCHS+1), valid_acc_list, label='Validation')

plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
with torch.set_grad_enabled(False):
test_acc = compute_acc(model=model,
data_loader=test_loader,
device=DEVICE)

valid_acc = compute_acc(model=model,
data_loader=valid_loader,
device=DEVICE)

print(f'Validation ACC: {valid_acc:.2f}%')


print(f'Test ACC: {test_acc:.2f}%')

Now the Code and Explanation Of Google Net and


SqueezNet is in rar file.

You might also like