Professional Documents
Culture Documents
Computer Vision
CHAPTER 2
INTRODUCTION TO
NEURAL NETWORK
https://reurl.cc/eOyQNM [19:13]
Deep Learning for Computer Vision 3
https://reurl.cc/QZ3VD5
Deep Learning for Computer Vision 4
Every pixel (Neuron) holds a number, the number in the neuron is called activation.
https://reurl.cc/Do9D2d [17:03-18:30]
Deep Learning for Computer Vision 9
Sigmoid Activation Function
https://reurl.cc/Do9D2d
Deep Learning for Computer Vision 10
Sigmoid Activation Function
https://www.youtube.com/watch?v=IHZwWFHWa-w [21:00]
Deep Learning for Computer Vision 14
Training progress
Utilizing testing data to calculate predicted accuracy.
https://www.youtube.com/watch?v=Ilg3gGewQ5U [13:53]
Deep Learning for Computer Vision 23
Backpropagation
https://reurl.cc/QZ3VD5
Deep Learning for Computer Vision 24
Backpropagation
https://reurl.cc/QZ3VKq
Deep Learning for Computer Vision 32
Convolutional Layer
https://reurl.cc/QZ3VKq
Deep Learning for Computer Vision 33
Convolutional Neural Networks
ttps://reurl.cc/KQlDlj [8:36]
Deep Learning for Computer Vision 35
Convolution Animations
N.B.: Blue maps are inputs, and cyan maps are outputs.
https://reurl.cc/ZbVMol[10:44]
Deep Learning for Computer Vision 37
Pooling Layer
• Pooling is the operation of down-sampling
input, two reasons that we need to use
pooling:
➢To reduce the input size.
➢To achieve local translation and rotation
invariance.
0.5x down-sampling
https://reurl.cc/4pd873 [10:49]
Deep Learning for Computer Vision 39
Pooling : Average Pooling
Computing the output values of a 3×3 average pooling operation on a 5×5 input using 1×1 strides.
Deep Learning for Computer Vision 40
Pooling : Max Pooling
Computing the output values of a 3×3 max pooling operation on a 5×5 input using 1×1 strides.
Deep Learning for Computer Vision 41
Example 2.3 Convolution Filter
1. Please download 2-3_Convolution_Example.zip and unzip it.
2. Given a face image Iimg, please generate Fconv , a 5x5 convolution filter with random
coefficients in a scale of [0, 1].
3. Apply “zero padding” on the given image, and convolve the image with unit stride.
4. Compute the output by convolving Iimg with Fconv and calculate the MSE loss (i.e.,
information loss) between the original image and the one after convolving.
KERNAL_SIZE = 5 Iimg Fconv
STRIDE = 1
PADDING = (KERNAL_SIZE - STRIDE)/2
PADDING = int(PADDING)
img = cv2.imread('./006_01_01_051_08.png')
img = cv2.resize(img,(28,32))
img = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
Conv_Filter = np.random.rand(KERNAL_SIZE,KERNAL_SIZE)
Conv_Filter = Conv_Filter/np.sum(Conv_Filter)
Input Image Conv kernel
img_F = img
KERNAL_SIZE = 5
STRIDE = 1
PADDING = (KERNAL_SIZE - STRIDE)/2 cv2.cvtcolor(): method is used to
PADDING = int(PADDING)
img = cv2.imread('./006_01_01_051_08.png') convert an image from one color
img = cv2.resize(img,(28,32))
space to another.
img = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
Conv_Filter = np.random.rand(KERNAL_SIZE,KERNAL_SIZE)
#Conv_Filter =
np.random.rand(): Return a
np.random.normal(mean,std,(KERNAL_SIZE,KERNAL_SIZE)) #Normal sample (or samples) from the
distribution
Conv_Filter = Conv_Filter/np.sum(Conv_Filter) “standard normal” distribution.
img_F = img
new_feature[h,w] = np.sum(aa)
img_S = img_F.astype(np.uint8)
img_new = new_feature.astype(np.uint8)
Implement the element-wise
multiplication of the portion of
cv2.rectangle(img_S, (int(w*STRIDE), int(h*STRIDE)), (int((w*STRIDE +
KERNAL_SIZE)), int((h*STRIDE + KERNAL_SIZE))), (255, 0, 0), 1) the image with Conv filter.
cv2.namedWindow('Conv_process', cv2.WINDOW_NORMAL) Do the summation to generate
cv2.resizeWindow("Conv_process", 300, 300)
cv2.imshow('Conv_process',img_S) the output feature value.
cv2.namedWindow('Conv_result', cv2.WINDOW_NORMAL) np.astype(): Copy of the array,
cv2.resizeWindow("Conv_result", 300, 300)
cv2.imshow('Conv_result',img_new)
cast to a specified type.
cv2.waitKey(100) cv2.rectangle(): Used to draw a
rectangle on any image.
Deep Learning for Computer Vision 44
Example 2.4 Max Pooling
1. Please download 2-4_Pooling_Example.zip and unzip it.
2. Given a face image Iimg, please generate Fconv , a 5x5 convolution filter with random
coefficients in a normal distribution (std = 0, mean=2)
3. Use “same zeros padding” with unit stride on the convolution kernel Fconv .
4. Use “max pooling” with 3x3 kernel size and unit stride.
5. Please compute the output by convolving Iimg with Fconv and calculate the MSE loss (i.e.,
information loss) between the original image and the one after convolving.
Convolution、Max Pooling
Use max pooling
Input Image Ouput Image
https://www.youtube.com/watch?v=-7scQpJT7uo [8:58]
Deep Learning for Computer Vision 46
ReLU Activation Function
• Issue :
➢ All the negative values become zero
which decreases the ability of the
model to fit or train from the data
properly.
https://www.youtube.com/watch?v=-7scQpJT7uo
Deep Learning for Computer Vision 47
Leaky ReLU Activation Function
It is an attempt to solve the dying ReLU problem.
https://www.youtube.com/watch?v=IVVVjBSk9N0 [12:55]
Deep Learning for Computer Vision 49
Loss Functions
• The Loss function is used to measure the performance of a prediction
model in terms of the difference between the prediction and ground-
truth output.
• From a very simplified perspective, the loss function J can be defined as
a function of the prediction and ground-true output.
https://reurl.cc/V4XQpY
Deep Learning for Computer Vision 50
Common Loss Functions
https://www.renom.jp/notebooks/tutorial/basic_algorithm/lossfunction/no
Deep Learning for Computer Vision tebook.html 51
Classification Loss
• When a neural network is used to predict a
class label, we can consider it a classification
model.
• The number of the output nodes is the number
of classes in the data.
• Each node will represent a single class.
• The value of each output node represents the
probability of that class being the correct class.
https://www.v7labs.com/blog/cross-entropy-loss-guide
Deep Learning for Computer Vision 52
Classification Loss
The purpose of Softmax is to normalize to facilitate subsequent cross entropy with one-hot.
https://www.v7labs.com/blog/cross-entropy-loss-guide
Deep Learning for Computer Vision 53
Softmax Function
• The softmax function
➢ As known as soft-argmax or normalized exponential function
➢ It is a function that takes as input a vector of 𝐾 real numbers and
normalizes it into a probability distribution consisting of 𝐾
probabilities.
• Softmax is often used in Neural Network, to map the non-normalized
output of a network to a probability distribution over predicted output
classes.
• The standard (unit) softmax function 𝜎: ℝ𝐾 → ℝ𝐾 is define as follows:
𝑒 𝑧𝑖 𝐾
𝜎(𝑧)𝑖 = 𝑘 𝑓𝑜𝑟 𝑖 = 1, … , 𝑘 𝑎𝑛𝑑 𝑧 = (𝑧1 , … , 𝑧𝑘 ) ∈ ℝ
σ𝑗=1 𝑒 𝑧𝑗
Deep Learning for Computer Vision 54
Softmax Function
• We apply the standard exponential function to each element 𝑧𝑖 of the
input vector 𝑧 and normalize these values by dividing by the sum of all
these exponentials; this normalization ensures that the sum of the
components of the output vector 𝜎(𝑧) is 1.
• Instead of 𝑒, a different base 𝑏 > 0 can be used; choosing a larger value
of 𝑏 will create a probability distribution that is more concentrated
around the positions of the largest input values. Writing 𝑏 = 𝑒 𝛽 or 𝑏 =
𝑒 −𝛽 (for real 𝛽) yields the expressions:
𝑒 𝛽𝑧𝑖 𝑒 −𝛽𝑧𝑖
𝜎(𝑧)𝑖 = 𝑜𝑟 𝜎(𝑧)𝑖 = 𝑓𝑜𝑟 𝑖 = 1, … , 𝑘
σ𝑘𝑗=1 𝑒 𝛽𝑧𝑗 σ𝑘𝑗=1 𝑒 −𝛽𝑧𝑗
1
Binary cross-entropy loss 𝐸 𝑦, 𝑦ො = 𝑁 σ𝑖 −[𝑦𝑖 𝑙𝑜𝑔𝑦ො𝑖 + (1 − 𝑦𝑖 )log(1 − 𝑦ො𝑖 )]
𝑦𝑖 : Target value of sample 𝑖 (1 for positive class, 0 for negative class)
𝑦ො𝑖 : The predicted probability
𝑁: Number of data points
Reference: https://reurl.cc/5pda66
Deep Learning for Computer Vision 57
Mutual Information
https://reurl.cc/l7d4Lq
Deep Learning for Computer Vision 58
Mutual Information
• Describes the relationship between two random variables (𝑋, 𝑌) numerically.
• 𝑝 𝑥, 𝑦 can be two events that occur coincidentally in probability.
• Similar to the entropy function.
• 𝐼(𝑋, 𝑌) represents how strongly X and Y are related in numbers.
𝑝(𝑥, 𝑦)
𝑀𝑢𝑡𝑢𝑎𝑙 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐼(𝑋, 𝑌) = 𝑝 𝑥, 𝑦 𝑙𝑜𝑔
𝑝 𝑥 𝑝(𝑦)
𝑥∈𝑋 𝑦∈𝑌
1
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = 𝑝 𝑥 𝑙𝑜𝑔
𝑝(𝑥)
https://reurl.cc/l7d4Lq
Deep Learning for Computer Vision 59
Mutual Information Calculation
• We select two subjects Furry and Herbivorous as random variables
Information table
𝑝 𝐹, 𝐻
𝐼 𝐹, 𝐻 = 𝑝 𝐹, 𝐻 𝑙𝑜𝑔
𝑝 𝐹 𝑝 𝐻
𝑥∈𝐹 𝑦∈𝐻
Probability table
Deep Learning for Computer Vision 64
Mutual Information Loss
• Derive from entropy function 𝐻(𝑋)
𝐼 𝑦, 𝐺(𝑦)
ො = 𝐻 𝑦 − 𝐻(𝑦|𝐺(𝑦))
ො
LeNet (1998)
Deep Learning for Computer Vision 66
Example 2.5 Train the LeNet Network
These numbers denotes the no. of data. (10,000 in total) GT: ground truth
Pred: prediction
Deep Learning for Computer Vision 68