Professional Documents
Culture Documents
Assignment No 2 (Aleeza Anjum CS101)
Assignment No 2 (Aleeza Anjum CS101)
10/10/2023
Ans:
Comparison of AlexNet and GoogleNet:
AlexNet, introduced by Krizhevsky et al. in GoogLeNet, introduced in the ILSVRC 2014
the paper "ImageNet Classification with Deep competition, represents a significant
Convolutional Neural Networks," represents a advancement in neural network architecture,
significant milestone in the development of particularly with the use of the Inception
convolutional neural networks (CNNs) for architecture. The key aspects of GoogLeNet
image classification tasks. Here are some key include:
points about AlexNet based on the Inception Architecture:
information provided in the paper: Variants: GoogLeNet utilized different
Objective: The primary goal of AlexNet versions of the Inception architecture, and one
was to participate in the ImageNet Large- deeper and wider Inception network, although
Scale Visual Recognition Challenge the latter had only marginal improvements.
(ILSVRC) and achieve improved Activation Function: All convolutions,
performance on large-scale image including those inside Inception modules, use
classification tasks. rectified linear activation.
Network Structure:
Dataset: The model was trained on the Receptive Field: The network's receptive
ImageNet dataset, which contains over 15 field is 224x224 in RGB color space with
million labeled high-resolution images zero mean.
belonging to approximately 22,000 Reduction Layers: The network incorporates
categories. The dataset was used in the "reduce" layers before 3x3 and 5x5
ILSVRC competitions. convolutions, using 1x1 filters.
Efficiency: The network is designed for
computational efficiency and practicality,
ensuring it can run on devices with limited
Architecture: computational resources and low-memory
footprint.
AlexNet consists of a large, deep Architecture Details:
convolutional neural network with a total of Depth: The network is 22 layers deep (or 27
eight layers. These layers include five layers counting pooling).
convolutional layers and three fully-
connected layers. Auxiliary Classifiers: To address the
The convolutional layers are followed by vanishing gradient problem, auxiliary
max-pooling layers, and the final layers classifiers are added to intermediate layers.
include a 1000-way softmax for classification. These classifiers take the form of smaller
The first convolutional layer filters the convolutional networks connected to the
224x224x3 input image with 96 kernels of output of specific Inception modules.
size 11x11x3. Training: Asynchronous stochastic gradient
Rectified Linear Units (ReLUs) were used as descent with 0.9 momentum was used for
the activation function in the network. training. The learning rate schedule decreased
Local Response Normalization (LRN) was the learning rate by 4% every 8 epochs.
applied after the first and second Training Methodology:
convolutional layers. DistBelief: GoogLeNet was trained using the
Overlapping pooling was employed in the DistBelief distributed machine learning
pooling layers. system.
Dropout was applied in the first two fully- Training Time: A rough estimate suggests
connected layers to prevent overfitting. training the network to convergence using
high-end GPUs within a week, with memory
Training Techniques: usage being a limiting factor.
Stochastic Gradient Descent (SGD) was used Image Sampling: Various image-patch
for training with a batch size of 128 examples, sampling methods were employed during
momentum of 0.9, and weight decay of training. Photometric distortions were used to
0.0005. combat overfitting.
Data augmentation was applied to artificially
enlarge the dataset, involving image Classification Challenge Results:
translations and horizontal reflections. ILSVRC 2014: GoogLeNet achieved a top-5
Intensity alterations of RGB channels in error rate of 6.67%, ranking first in the
training images were performed using PCA ILSVRC 2014 classification challenge.
on the set of RGB pixel values. Ensemble Prediction: Seven versions of the
same GoogLeNet model were independently
Results: trained, and ensemble prediction was
AlexNet achieved top-1 and top-5 error rates performed, contributing to improved
of 37.5% and 17.0% on the ILSVRC-2010 performance.
test set, outperforming the previous state-of- Detection Challenge Results:
the-art methods.
In the ILSVRC-2012 competition, AlexNet ILSVRC 2014 Detection Challenge:
achieved a winning top-5 test error rate of GoogLeNet also participated in the detection
15.3%, compared to 26.2% achieved by the task, achieving a mean average precision
second-best entry. (mAP) of 43.9% without using bounding box
Impact: The success of AlexNet demonstrated regression.
the effectiveness of deep convolutional neural Ensemble for Detection: An ensemble of 6
networks for image classification tasks and GoogLeNets was used for classification in the
paved the way for subsequent advancements detection task.
in deep learning, particularly in computer In summary, GoogLeNet's success lies in its
vision. innovative Inception architecture, efficient
design, and its strong performance in both
Pros and Cons: image classification and object detection tasks
In summary, AlexNet introduced several key during the ILSVRC 2014 competition.
architectural and training innovations that
significantly improved the accuracy of image
classification models, setting a new standard
in the field.
Introduction:
Key Features: Deep convolutional neural networks have led
a. Consistency in Convolutional Layers: to breakthroughs in image classification. The
VGG employs 3x3 convolutional filters importance of network depth is highlighted.
consistently, promoting spatial hierarchy The question of whether learning better
learning. networks is as easy as stacking more layers is
b. Deep Stacking: raised. The paper addresses the degradation
The deep architecture, with up to 19 layers, problem observed with deeper networks.
enables the network to learn complex
hierarchical features. However, this depth Related Work:
introduces challenges in training and Residual Representations are discussed,
citing examples such as VLAD and Fisher
computational requirements.
Vector in image recognition. The Multigrid
c. Pooling Layers: method in solving Partial Differential
Max-pooling layers are integrated for Equations is mentioned. Comparison with
downsampling feature maps, reducing spatial "highway networks" that present shortcut
dimensions while retaining crucial connections with gating functions is made.
information.
Deep Residual Learning:
Introduces the concept of residual learning
framework. The degradation problem is
Versions: addressed by letting the stacked layers fit a
a. VGG16:
residual mapping. Formally, the residual
Comprising 16 weight layers, including 13
function F(x) + x is introduced, and this is
convolutional and 3 fully connected layers,
realized using feedforward neural networks
VGG16 gained popularity for its competitive
with "shortcut connections." Identity mapping
performance in image classification.
by shortcuts and projection shortcuts are
b. VGG19:
discussed. Network architectures for plain and
Extending VGG16, VGG19 incorporates
residual networks are presented.
three additional convolutional layers for a
deeper representation suitable for more
complex tasks. Experiments:
Evaluation on the ImageNet 2012
classification dataset is conducted. Plain
networks and ResNets with different depths
(18, 34, 50, 101, 152 layers) are compared.
The effectiveness of identity vs. projection
shortcuts is studied. Deeper bottleneck
Applications: architectures are introduced. Comparative
VGG finds applications in diverse computer analysis with state-of-the-art methods is
vision tasks, including image classification, presented
object detection, segmentation, and feature
extraction.
Challenges:
VGG faces challenges related to Comparative Analysis: The paper
computational intensity and the difficulty of includes a comprehensive comparative
training very deep networks. These challenges analysis with state-of-the-art methods existing
spurred subsequent architectures like ResNet at the time. ResNets with varying depths (18,
to address these issues effectively. 34, 50, 101, 152 layers) are compared against
Legacy: plain networks. The results showcase that as
While newer architectures have surpassed the network depth increases, ResNets
VGG in terms of performance and efficiency, consistently outperform plain networks.
its simplicity and effectiveness have solidified Notably, ResNets with 152 layers achieve
its place in the history of deep learning, superior performance on the ImageNet 2012
influencing subsequent advancements in classification dataset, winning the ILSVRC
convolutional neural networks. 2015 classification task.
Part-2:
AlexNet CIFAR-10 Classifier:
This Python code initializes and configures the PyTorch environment for GPU (CUDA) if
available, ensuring deterministic behavior for CUDA operations using the cuDNN library. It also
includes necessary imports for handling data, defining neural network architectures, and setting
up data transformations for image datasets.
import os
import time
import random
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.utils.data.dataset import Subset
if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True
Model Settings
This code defines a function, set_all_seeds, that takes a seed as input and sets the random seed
for various libraries to ensure reproducibility in machine learning experiments. It covers global
seed setting for PyTorch, NumPy, Python's built-in random module, and PyTorch's CUDA
operations on GPU if available.
Setting cuDNN and PyTorch algorithmic behavior to deterministic
def set_deterministic():
if torch.cuda.is_available():
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.set_deterministic(True)
This code snippet sets PyTorch to use deterministic algorithms for CUDA operations if a GPU is
available, ensuring reproducibility in computations by turning off CUDA's nondeterministic
features.
##########################
### SETTINGS
##########################
# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.0001
BATCH_SIZE = 256
NUM_EPOCHS = 40
# Architecture
NUM_CLASSES = 10
# Other
DEVICE = "cuda:0"
set_all_seeds(RANDOM_SEED)
import sys
Dataset
### Set random seed ###
set_all_seeds(RANDOM_SEED)
##########################
### Dataset
##########################
Here we set a random seed, defines transformations for training and testing data on CIFAR-10,
and creates data loaders for training, validation, and testing using these transformations. The data
loaders are configured with specified batch size, number of workers, and other parameters.
# Checking the dataset
print('Training Set:\n')
for images, labels in train_loader:
print('Image batch dimensions:', images.size())
print('Image label dimensions:', labels.size())
print(labels[:10])
break
Model
##########################
### MODEL
##########################
class AlexNet(nn.Module):
This code defines the architecture of AlexNet, a convolutional neural network (CNN), using the
PyTorch framework. The model consists of convolutional layers, ReLU activation functions,
max-pooling layers, and fully connected layers, culminating in a softmax output for
classification.
torch.manual_seed(RANDOM_SEED)
model = AlexNet(NUM_CLASSES)
model.to(DEVICE)
Training
log_dict = train_classifier_simple_v1(num_epochs=NUM_EPOCHS, model=model,
optimizer=optimizer, device=DEVICE,
train_loader=train_loader,
valid_loader=valid_loader,
logging_interval=50)
This snippet is likely training a classifier using a simple version 1 training function
(train_classifier_simple_v1) for a specified number of epochs (NUM_EPOCHS). It uses a given
model, optimizer, and data loaders for training and validation, with logging every 50 intervals.
The training progress is stored in the log_dict.
Evaluation
import matplotlib.pyplot as plt
%matplotlib inline
loss_list = log_dict['train_loss_per_batch']
plt.ylabel('Cross Entropy')
plt.xlabel('Iteration')
plt.legend()
Here we visualize the training loss per batch using matplotlib. It plots both the original minibatch
loss and its running average with a window size of 200 iterations, providing insights into the
training convergence over time.
plt.plot(np.arange(1, NUM_EPOCHS+1), log_dict['train_acc_per_epoch'],
label='Training')
plt.plot(np.arange(1, NUM_EPOCHS+1), log_dict['valid_acc_per_epoch'],
label='Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
We’re using Matplotlib to plot the training and validation accuracy over epochs, visualizing the
model's learning progress during training.
with torch.set_grad_enabled(False):
train_acc = compute_accuracy(model=model,
data_loader=test_loader,
device=DEVICE)
test_acc = compute_accuracy(model=model,
data_loader=test_loader,
device=DEVICE)
valid_acc = compute_accuracy(model=model,
data_loader=valid_loader,
device=DEVICE)
In this code snippet, accuracy metrics (train, test, and validation) are computed for a PyTorch
model using the compute_accuracy function, and then these metrics are printed. However, there
is a discrepancy in the variable names used for accuracy, where valid_acc is printed for both
training and validation accuracy.
VGG 16:
import time
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader
if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True
This code sets up a PyTorch environment for deep learning, checking for GPU availability and
ensuring deterministic behavior for CUDA operations if a GPU is present. It includes essential
imports for handling datasets and neural network operations.
# Device
DEVICE = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")
print('Device:', DEVICE)
# Hyperparameters
random_seed = 1
learning_rate = 0.001
num_epochs = 10
batch_size = 128
# Architecture
num_features = 784
num_classes = 10
Here we sets up the environment and hyperparameters for a machine learning model using
PyTorch. It specifies the device (GPU if available, otherwise CPU), sets hyperparameters like
learning rate and batch size, and defines the input features and output classes for a classification
task.
##########################
### MNIST DATASET
##########################
test_dataset = datasets.CIFAR10(root='data',
train=False,
transform=transforms.ToTensor())
train_loader = DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)
test_loader = DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)
Model
##########################
### MODEL
##########################
class VGG16(torch.nn.Module):
self.block_1 = nn.Sequential(
nn.Conv2d(in_channels=3,
out_channels=64,
kernel_size=(3, 3),
stride=(1, 1),
# (1(32-1)- 32 + 3)/2 = 1
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=64,
out_channels=64,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.block_2 = nn.Sequential(
nn.Conv2d(in_channels=64,
out_channels=128,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=128,
out_channels=128,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.block_3 = nn.Sequential(
nn.Conv2d(in_channels=128,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=256,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=256,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.block_4 = nn.Sequential(
nn.Conv2d(in_channels=256,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.block_5 = nn.Sequential(
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.classifier = nn.Sequential(
nn.Linear(512, 4096),
nn.ReLU(True),
#nn.Dropout(p=0.5),
nn.Linear(4096, 4096),
nn.ReLU(True),
#nn.Dropout(p=0.5),
nn.Linear(4096, num_classes),
)
for m in self.modules():
if isinstance(m, torch.nn.Conv2d) or isinstance(m,
torch.nn.Linear):
nn.init.kaiming_uniform_(m.weight, mode='fan_in',
nonlinearity='relu')
if m.bias is not None:
m.bias.detach().zero_()
x = self.block_1(x)
x = self.block_2(x)
x = self.block_3(x)
x = self.block_4(x)
x = self.block_5(x)
#x = self.avgpool(x)
x = x.view(x.size(0), -1)
logits = self.classifier(x)
probas = F.softmax(logits, dim=1)
torch.manual_seed(random_seed)
model = VGG16(num_features=num_features,
num_classes=num_classes)
model = model.to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
This code defines a VGG16 convolutional neural network (CNN) architecture using PyTorch.
The VGG16 model is a deep neural network commonly used for image classification tasks. The
architecture is organized into five convolutional blocks, each followed by max-pooling layers,
and a fully connected classifier. The model is initialized with Kaiming weight initialization, and
the Adam optimizer is used for training. The forward method computes the output logits and
probabilities for a given input. The model is then instantiated, moved to the specified device
(e.g., GPU), and an Adam optimizer is set up for training. The random seed is also set for
reproducibility.
Training
def compute_accuracy(model, data_loader):
model.eval()
correct_pred, num_examples = 0, 0
for i, (features, targets) in enumerate(data_loader):
features = features.to(DEVICE)
targets = targets.to(DEVICE)
start_time = time.time()
for epoch in range(num_epochs):
model.train()
for batch_idx, (features, targets) in enumerate(train_loader):
features = features.to(DEVICE)
targets = targets.to(DEVICE)
cost.backward()
### LOGGING
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f'
%(epoch+1, num_epochs, batch_idx,
len(train_loader), cost))
model.eval()
with torch.set_grad_enabled(False): # save memory during inference
print('Epoch: %03d/%03d | Train: %.3f%% | Loss: %.3f' % (
epoch+1, num_epochs,
compute_accuracy(model, train_loader),
compute_epoch_loss(model, train_loader)))
It defines functions for computing accuracy and epoch loss of a PyTorch neural network model.
It then performs training over multiple epochs using a specified training loader. In each epoch, it
iterates through batches of training data, computes the forward and backward propagation,
updates the model parameters, and logs the training cost. The code also evaluates and prints the
training accuracy and loss at the end of each epoch. Additionally, it measures and displays the
total training time. The training process is implemented with the cross-entropy loss and uses the
Adam optimizer.
Evaluation
with torch.set_grad_enabled(False): # save memory during inference
print('Test accuracy: %.2f%%' % (compute_accuracy(model,
test_loader)))
The PyTorch function torch.set_grad_enabled(False) is used to disable gradient computation,
which helps conserve memory during inference. The subsequent line prints the test accuracy of a
machine learning model on a given test dataset using the compute_accuracy function, likely
defined elsewhere in the code.
VGG 19:
import numpy as np
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader
Settings and Dataset
##########################
### SETTINGS
##########################
# Device
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('Device:', DEVICE)
# Hyperparameters
random_seed = 1
learning_rate = 0.001
num_epochs = 20
batch_size = 128
# Architecture
num_features = 784
num_classes = 10
##########################
### MNIST DATASET
##########################
test_dataset = datasets.CIFAR10(root='data',
train=False,
transform=transforms.ToTensor())
train_loader = DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)
test_loader = DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)
# Checking the dataset
for images, labels in train_loader:
print('Image batch dimensions:', images.shape)
print('Image label dimensions:', labels.shape)
break
Here we sets up configurations and loads the CIFAR-10 dataset using PyTorch's DataLoader. It
specifies settings such as device (CPU or GPU), hyperparameters like learning rate and batch
size, and defines the architecture parameters for a neural network (784 input features, 10 output
classes). The CIFAR-10 dataset is then loaded, transformed to tensors, and split into training and
testing sets. Finally, the DataLoader is used to iterate through the training set, printing the
dimensions of image batches and their corresponding labels.
Model
##########################
### MODEL
##########################
class VGG16(torch.nn.Module):
self.block_1 = nn.Sequential(
nn.Conv2d(in_channels=3,
out_channels=64,
kernel_size=(3, 3),
stride=(1, 1),
# (1(32-1)- 32 + 3)/2 = 1
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=64,
out_channels=64,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.block_2 = nn.Sequential(
nn.Conv2d(in_channels=64,
out_channels=128,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=128,
out_channels=128,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.block_3 = nn.Sequential(
nn.Conv2d(in_channels=128,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=256,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=256,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=256,
out_channels=256,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.block_4 = nn.Sequential(
nn.Conv2d(in_channels=256,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.block_5 = nn.Sequential(
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=512,
out_channels=512,
kernel_size=(3, 3),
stride=(1, 1),
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2))
)
self.classifier = nn.Sequential(
nn.Linear(512, 4096),
nn.ReLU(True),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Linear(4096, num_classes)
)
for m in self.modules():
if isinstance(m, torch.nn.Conv2d):
#n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
#m.weight.data.normal_(0, np.sqrt(2. / n))
m.weight.detach().normal_(0, 0.05)
if m.bias is not None:
m.bias.detach().zero_()
elif isinstance(m, torch.nn.Linear):
m.weight.detach().normal_(0, 0.05)
m.bias.detach().detach().zero_()
x = self.block_1(x)
x = self.block_2(x)
x = self.block_3(x)
x = self.block_4(x)
x = self.block_5(x)
logits = self.classifier(x.view(-1, 512))
probas = F.softmax(logits, dim=1)
torch.manual_seed(random_seed)
model = VGG16(num_features=num_features,
num_classes=num_classes)
model = model.to(DEVICE)
Training
def compute_accuracy(model, data_loader):
model.eval()
correct_pred, num_examples = 0, 0
for i, (features, targets) in enumerate(data_loader):
features = features.to(DEVICE)
targets = targets.to(DEVICE)
start_time = time.time()
for epoch in range(num_epochs):
model.train()
for batch_idx, (features, targets) in enumerate(train_loader):
features = features.to(DEVICE)
targets = targets.to(DEVICE)
cost.backward()
### LOGGING
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f'
%(epoch+1, num_epochs, batch_idx,
len(train_loader), cost))
model.eval()
with torch.set_grad_enabled(False): # save memory during inference
print('Epoch: %03d/%03d | Train: %.3f%% | Loss: %.3f' % (
epoch+1, num_epochs,
compute_accuracy(model, train_loader),
compute_epoch_loss(model, train_loader)))
Evaluation
with torch.set_grad_enabled(False): # save memory during inference
print('Test accuracy: %.2f%%' % (compute_accuracy(model,
test_loader)))
Resnet 18:
import os
import time
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
Model Settings
##########################
### SETTINGS
##########################
# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.001
BATCH_SIZE = 128
NUM_EPOCHS = 10
# Architecture
NUM_FEATURES = 28*28
NUM_CLASSES = 10
# Other
DEVICE = "cuda:1"
GRAYSCALE = True
This code sets hyperparameters and configurations for a neural network model. It specifies settings
such as random seed, learning rate, batch size, number of epochs, architecture details (number of
features and classes), device for computation (CUDA), and whether the data is grayscale. These
settings are crucial for training and evaluating a neural network on image data.
MNIST Dataset
##########################
### MNIST DATASET
##########################
test_dataset = datasets.MNIST(root='data',
train=False,
transform=transforms.ToTensor())
train_loader = DataLoader(dataset=train_dataset,
batch_size=BATCH_SIZE,
shuffle=True)
test_loader = DataLoader(dataset=test_dataset,
batch_size=BATCH_SIZE,
shuffle=False)
x = x.to(device)
y = y.to(device)
break
##########################
### MODEL
##########################
class BasicBlock(nn.Module):
expansion = 1
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out += residual
out = self.relu(out)
return out
class ResNet(nn.Module):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, (2. / n)**.5)
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
# because MNIST is already 1x1 here:
# disable avg pooling
#x = self.avgpool(x)
x = x.view(x.size(0), -1)
logits = self.fc(x)
probas = F.softmax(logits, dim=1)
return logits, probas
def resnet18(num_classes):
"""Constructs a ResNet-18 model."""
model = ResNet(block=BasicBlock,
layers=[2, 2, 2, 2],
num_classes=NUM_CLASSES,
grayscale=GRAYSCALE)
return model
This code defines a convolutional neural network (CNN) based on the ResNet architecture for
the MNIST dataset. It begins by setting up the MNIST dataset and data loaders. The model is
then defined using the ResNet architecture, specifically ResNet-18, with basic blocks. The
network is trained for two epochs on batches of MNIST images. The ResNet model includes
convolutional layers, batch normalization, ReLU activation, and residual connections, providing
a deeper architecture that is easier to optimize.
Training
def compute_accuracy(model, data_loader, device):
correct_pred, num_examples = 0, 0
for i, (features, targets) in enumerate(data_loader):
features = features.to(device)
targets = targets.to(device)
start_time = time.time()
for epoch in range(NUM_EPOCHS):
model.train()
for batch_idx, (features, targets) in enumerate(train_loader):
features = features.to(DEVICE)
targets = targets.to(DEVICE)
cost.backward()
### LOGGING
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f'
%(epoch+1, NUM_EPOCHS, batch_idx,
len(train_loader), cost))
model.eval()
with torch.set_grad_enabled(False): # save memory during inference
print('Epoch: %03d/%03d | Train: %.3f%%' % (
epoch+1, NUM_EPOCHS,
compute_accuracy(model, train_loader, device=DEVICE)))
Evaluation
with torch.set_grad_enabled(False): # save memory during inference
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader,
device=DEVICE)))
for batch_idx, (features, targets) in enumerate(test_loader):
features = features
targets = targets
break
model.eval()
logits, probas = model(features.to(device)[0, None])
print('Probability 7 %.2f%%' % (probas[0][7]*100))
ResNet 101:
import os
import time
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import time
if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True
Settings
##########################
### SETTINGS
##########################
# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.01
NUM_EPOCHS = 50
# Architecture
NUM_CLASSES = 10
BATCH_SIZE = 128
DEVICE = torch.device('cuda:3')
GRAYSCALE = False
Dataset
##########################
### CIFAR-10 Dataset
##########################
train_and_valid = datasets.CIFAR10(root='data',
train=True,
transform=transforms.ToTensor(),
download=True)
test_dataset = datasets.CIFAR10(root='data',
train=False,
transform=transforms.ToTensor())
#####################################################
### Data Loaders
#####################################################
train_loader = DataLoader(dataset=train_dataset,
batch_size=BATCH_SIZE,
num_workers=8,
shuffle=True)
valid_loader = DataLoader(dataset=valid_dataset,
batch_size=BATCH_SIZE,
num_workers=8,
shuffle=False)
test_loader = DataLoader(dataset=test_dataset,
batch_size=BATCH_SIZE,
num_workers=8,
shuffle=False)
#####################################################
Model
##########################
### MODEL
##########################
class Bottleneck(nn.Module):
expansion = 4
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
out += residual
out = self.relu(out)
return out
class ResNet(nn.Module):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, (2. / n)**.5)
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
#x = self.avgpool(x)
x = x.view(x.size(0), -1)
logits = self.fc(x)
probas = F.softmax(logits, dim=1)
return logits, probas
##########################
### COST AND OPTIMIZER
##########################
Training
def compute_accuracy(model, data_loader, device):
correct_pred, num_examples = 0, 0
for i, (features, targets) in enumerate(data_loader):
features = features.to(device)
targets = targets.to(device)
model.train()
cost.backward()
### LOGGING
if not batch_idx % 120:
print (f'Epoch: {epoch+1:03d}/{NUM_EPOCHS:03d} | '
f'Batch {batch_idx:03d}/{len(train_loader):03d} |'
f' Cost: {cost:.4f}')
Evaluation
with torch.set_grad_enabled(False): # save memory during inference
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader,
device=DEVICE)))
Resnet152:
Imports
import os
import time
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True
Settings
##########################
### SETTINGS
##########################
# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.001
NUM_EPOCHS = 10
# Architecture
NUM_FEATURES = 128*128
NUM_CLASSES = 2
BATCH_SIZE = 128
DEVICE = 'cuda:2' # default GPU device
GRAYSCALE = False
Dataset
Downloading the Dataset
1) Download and unzip the file img_align_celeba.zip, which contains the images in jpeg
format.
2) Download the list_attr_celeba.txt file, which contains the class labels
3) Download the list_eval_partition.txt file, which contains training/validation/test
partitioning info
df1.head()
df2 = pd.read_csv('list_eval_partition.txt', sep="\s+", skiprows=0,
header=None)
df2.columns = ['Filename', 'Partition']
df2 = df2.set_index('Filename')
df2.head()
df3 = df1.merge(df2, left_index=True, right_index=True)
df3.head()
df3.to_csv('celeba-gender-partitions.csv')
df4 = pd.read_csv('celeba-gender-partitions.csv', index_col=0)
df4.head()
df4.loc[df4['Partition'] == 0].to_csv('celeba-gender-train.csv')
df4.loc[df4['Partition'] == 1].to_csv('celeba-gender-valid.csv')
df4.loc[df4['Partition'] == 2].to_csv('celeba-gender-test.csv')
img = Image.open('img_align_celeba/000001.jpg')
print(np.asarray(img, dtype=np.uint8).shape)
plt.imshow(img);
df = pd.read_csv(csv_path, index_col=0)
self.img_dir = img_dir
self.csv_path = csv_path
self.img_names = df.index.values
self.y = df['Male'].values
self.transform = transform
label = self.y[index]
return img, label
def __len__(self):
return self.y.shape[0]
# Note that transforms.ToTensor()
# already divides pixels by 255. internally
train_dataset = CelebaDataset(csv_path='celeba-gender-train.csv',
img_dir='img_align_celeba/',
transform=custom_transform)
valid_dataset = CelebaDataset(csv_path='celeba-gender-valid.csv',
img_dir='img_align_celeba/',
transform=custom_transform)
test_dataset = CelebaDataset(csv_path='celeba-gender-test.csv',
img_dir='img_align_celeba/',
transform=custom_transform)
train_loader = DataLoader(dataset=train_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=4)
valid_loader = DataLoader(dataset=valid_dataset,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=4)
test_loader = DataLoader(dataset=test_dataset,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=4)
torch.manual_seed(0)
x = x.to(DEVICE)
y = y.to(DEVICE)
time.sleep(1)
break
Model
##########################
### MODEL
##########################
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
out += residual
out = self.relu(out)
return out
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes, grayscale):
self.inplanes = 64
if grayscale:
in_dim = 1
else:
in_dim = 3
super(ResNet, self).__init__()
self.conv1 = nn.Conv2d(in_dim, 64, kernel_size=7, stride=2,
padding=3,
bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AvgPool2d(7, stride=1, padding=2)
self.fc = nn.Linear(2048 * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, (2. / n)**.5)
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = x.view(x.size(0), -1)
logits = self.fc(x)
probas = F.softmax(logits, dim=1)
return logits, probas
##########################
### COST AND OPTIMIZER
##########################
Training
def compute_accuracy(model, data_loader, device):
correct_pred, num_examples = 0, 0
for i, (features, targets) in enumerate(data_loader):
features = features.to(device)
targets = targets.to(device)
logits, probas = model(features)
_, predicted_labels = torch.max(probas, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100
start_time = time.time()
for epoch in range(NUM_EPOCHS):
model.train()
for batch_idx, (features, targets) in enumerate(train_loader):
features = features.to(DEVICE)
targets = targets.to(DEVICE)
cost.backward()
### LOGGING
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f'
%(epoch+1, NUM_EPOCHS, batch_idx,
len(train_loader), cost))
model.eval()
with torch.set_grad_enabled(False): # save memory during inference
print('Epoch: %03d/%03d | Train: %.3f%% | Valid: %.3f%%' % (
epoch+1, NUM_EPOCHS,
compute_accuracy(model, train_loader, device=DEVICE),
compute_accuracy(model, valid_loader, device=DEVICE)))
features = features
targets = targets
break
DenseNet:
Imports
import os
import time
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.utils.data.dataset import Subset
if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True
##########################
### SETTINGS
##########################
# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.001
BATCH_SIZE = 128
NUM_EPOCHS = 20
# Architecture
NUM_CLASSES = 10
# Other
DEVICE = "cuda:0"
GRAYSCALE = False
CIFAR-10 Dataset
train_indices = torch.arange(0, 48000)
valid_indices = torch.arange(48000, 50000)
train_and_valid = datasets.CIFAR10(root='data',
train=True,
transform=transforms.ToTensor(),
download=True)
train_loader = DataLoader(dataset=train_dataset,
batch_size=BATCH_SIZE,
num_workers=4,
shuffle=True)
valid_loader = DataLoader(dataset=valid_dataset,
batch_size=BATCH_SIZE,
num_workers=4,
shuffle=False)
test_loader = DataLoader(dataset=test_dataset,
batch_size=BATCH_SIZE,
num_workers=4,
shuffle=False)
device = torch.device(DEVICE)
torch.manual_seed(0)
x = x.to(device)
y = y.to(device)
break
# Check that shuffling works properly
# i.e., label indices should be in random order.
# Also, the label order should be different in the second
# epoch.
import re
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.checkpoint as cp
from collections import OrderedDict
return bn_function
class _DenseLayer(nn.Sequential):
def __init__(self, num_input_features, growth_rate, bn_size,
drop_rate, memory_efficient=False):
super(_DenseLayer, self).__init__()
self.add_module('norm1', nn.BatchNorm2d(num_input_features)),
self.add_module('relu1', nn.ReLU(inplace=True)),
self.add_module('conv1', nn.Conv2d(num_input_features, bn_size *
growth_rate, kernel_size=1,
stride=1,
bias=False)),
self.add_module('norm2', nn.BatchNorm2d(bn_size * growth_rate)),
self.add_module('relu2', nn.ReLU(inplace=True)),
self.add_module('conv2', nn.Conv2d(bn_size * growth_rate,
growth_rate,
kernel_size=3, stride=1,
padding=1,
bias=False)),
self.drop_rate = drop_rate
self.memory_efficient = memory_efficient
class _DenseBlock(nn.Module):
def __init__(self, num_layers, num_input_features, bn_size,
growth_rate, drop_rate, memory_efficient=False):
super(_DenseBlock, self).__init__()
for i in range(num_layers):
layer = _DenseLayer(
num_input_features + i * growth_rate,
growth_rate=growth_rate,
bn_size=bn_size,
drop_rate=drop_rate,
memory_efficient=memory_efficient,
)
self.add_module('denselayer%d' % (i + 1), layer)
class _Transition(nn.Sequential):
def __init__(self, num_input_features, num_output_features):
super(_Transition, self).__init__()
self.add_module('norm', nn.BatchNorm2d(num_input_features))
self.add_module('relu', nn.ReLU(inplace=True))
self.add_module('conv', nn.Conv2d(num_input_features,
num_output_features,
kernel_size=1, stride=1,
bias=False))
self.add_module('pool', nn.AvgPool2d(kernel_size=2, stride=2))
class DenseNet121(nn.Module):
r"""Densenet-BC model class, based on
`"Densely Connected Convolutional Networks"
<https://arxiv.org/pdf/1608.06993.pdf>`_
Args:
growth_rate (int) - how many filters to add each layer (`k` in
paper)
block_config (list of 4 ints) - how many layers in each pooling
block
num_init_featuremaps (int) - the number of filters to learn in the
first convolution layer
bn_size (int) - multiplicative factor for number of bottle neck
layers
(i.e. bn_size * k features in the bottleneck layer)
drop_rate (float) - dropout rate after each dense layer
num_classes (int) - number of classification classes
memory_efficient (bool) - If True, uses checkpointing. Much more
memory efficient,
but slower. Default: *False*. See `"paper"
<https://arxiv.org/pdf/1707.06990.pdf>`_
"""
super(DenseNet121, self).__init__()
# First convolution
if grayscale:
in_channels=1
else:
in_channels=3
self.features = nn.Sequential(OrderedDict([
('conv0', nn.Conv2d(in_channels=in_channels,
out_channels=num_init_featuremaps,
kernel_size=7, stride=2,
padding=3, bias=False)), # bias is
redundant when using batchnorm
('norm0', nn.BatchNorm2d(num_features=num_init_featuremaps)),
('relu0', nn.ReLU(inplace=True)),
('pool0', nn.MaxPool2d(kernel_size=3, stride=2, padding=1)),
]))
# Each denseblock
num_features = num_init_featuremaps
for i, num_layers in enumerate(block_config):
block = _DenseBlock(
num_layers=num_layers,
num_input_features=num_features,
bn_size=bn_size,
growth_rate=growth_rate,
drop_rate=drop_rate,
memory_efficient=memory_efficient
)
self.features.add_module('denseblock%d' % (i + 1), block)
num_features = num_features + num_layers * growth_rate
if i != len(block_config) - 1:
trans = _Transition(num_input_features=num_features,
num_output_features=num_features // 2)
self.features.add_module('transition%d' % (i + 1), trans)
num_features = num_features // 2
# Linear layer
self.classifier = nn.Linear(num_features, num_classes)
Training
def compute_acc(model, data_loader, device):
correct_pred, num_examples = 0, 0
model.eval()
for i, (features, targets) in enumerate(data_loader):
features = features.to(device)
targets = targets.to(device)
cost_list = []
train_acc_list, valid_acc_list = [], []
model.train()
for batch_idx, (features, targets) in enumerate(train_loader):
features = features.to(DEVICE)
targets = targets.to(DEVICE)
cost.backward()
#################################################
### CODE ONLY FOR LOGGING BEYOND THIS POINT
################################################
cost_list.append(cost.item())
if not batch_idx % 150:
print (f'Epoch: {epoch+1:03d}/{NUM_EPOCHS:03d} | '
f'Batch {batch_idx:03d}/{len(train_loader):03d} |'
f' Cost: {cost:.4f}')
model.eval()
with torch.set_grad_enabled(False): # save memory during inference
print(f'Epoch: {epoch+1:03d}/{NUM_EPOCHS:03d}\n'
f'Train ACC: {train_acc:.2f} | Validation ACC:
{valid_acc:.2f}')
train_acc_list.append(train_acc)
valid_acc_list.append(valid_acc)
Evaluation
plt.plot(cost_list, label='Minibatch cost')
plt.plot(np.convolve(cost_list,
np.ones(200,)/200, mode='valid'),
label='Running average')
plt.ylabel('Cross Entropy')
plt.xlabel('Iteration')
plt.legend()
plt.show()
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
with torch.set_grad_enabled(False):
test_acc = compute_acc(model=model,
data_loader=test_loader,
device=DEVICE)
valid_acc = compute_acc(model=model,
data_loader=valid_loader,
device=DEVICE)