Professional Documents
Culture Documents
(Sheet 7)
Yehia Zakaria
yehia.Zakaria@eng.asu.edu.eg
Question 2
• Compare between Bayes’ and SVM classification techniques.
X Label
F(.)
3
Question 4
• Write a computer algorithm for a simple linear classifier approach.
𝑥1
𝑋𝑖 = 𝑥 𝑦𝑖 = 𝑦 𝒙𝟐
2
Feature Vector label
2 𝑦1 = −1
𝑋1 =
4
7 𝑦2 = +1 𝟒
𝑋2 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤0 = 0
7
⋮ ⋮
𝑥1𝑛
𝑋𝑛 = 𝑥 𝑦𝑛 = ⋯ 𝟐 𝒙𝟏
2𝑛
4
Question 4
• Write a computer algorithm for a simple linear classifier approach.
𝒙𝟑 𝒃
𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + 𝑤0 = 0
𝒈 𝒙𝟐
𝒙𝟏 𝒓
5
Question 4
• Write a computer algorithm for a simple linear classifier approach.
• Given labelled training samples {(𝑋1, 𝑦1), … , (𝑋𝑁, 𝑦𝑁)} where 𝑋𝑖 is the feature
vector and 𝑦𝑖 is the label.
1. Initialize weight vector (𝑊) randomly
2. Calculate the classification error such that:
𝜀 (𝑊)= ∑max(0, −𝑦𝑖𝑊 𝑇𝑋 𝑖)
3. Update the weights using any optimization technique.
𝜕𝜀
4. Repeat steps 2 and 3 until ≈0
𝜕𝑊
• After training given unknown sample 𝑋𝑢
o If 𝑊 𝑇 𝑋 𝑢 > 0 then 𝑋𝑢 𝜖 𝐶1
o Otherwise 𝑋𝑢 𝜖 𝐶2
6
Question 5
• Describe algorithmically how a multi-class SVM works. What is the role of optimization in
the problem? You need to write an expression for the penalizing objective function.
𝑓(𝑥,𝑊) = 𝑊𝑥 + 𝑏
7
Question 5
• Describe algorithmically how a multi-class SVM works. What is the role of optimization in
the problem? You need to write an expression for the penalizing objective function.
During the training:
• Initialize the weights for each class randomly.
• Calculate the scores of each class in the training data such
that: 𝐒𝐢 = 𝒇(𝑿, 𝑾𝒊 ) = 𝑾𝒊 . 𝑿
• Define a loss function that represent the amount of
error in the training data.
• Hing loss function with the following form is used:
8
Question 5
• Describe algorithmically how a multi-class SVM works. What is the role of optimization in
the problem? You need to write an expression for the penalizing objective function.
During the training:
• After calculating loss of each class the total loss is
calculated using the following eqn:
9
Question 5
• Describe algorithmically how a multi-class SVM works. What is the role of optimization in
the problem? You need to write an expression for the penalizing objective function.
During Testing:
• Given an unknown sample 𝑋 and a trained classifier for
classes {𝐶1, 𝐶2, … , 𝐶𝑛} with weights 𝑊1, 𝑊2, … , 𝑊𝑛 the
sample is classified based on the maximum value of the
dot product 𝑊𝑖. 𝑋
10
Question 6
Neural network is one of the machine learning techniques. Describe this and illustrate the
universality theorem.
Machine learning framework can be divided into two stages:
Training: Given a training set of labeled examples: {(𝑥1, 𝑦1), … , (𝑥𝑁, 𝑦𝑁)} estimate the prediction
function 𝒇 by minimizing the prediction error on the training set.
Testing: Apply 𝒇 to unseen test example 𝑥𝑢 and output the predicted value 𝑦𝑢 = 𝑓(𝑥𝑢 ) to classify 𝑥𝑢 .
Universality theorem:
Any continuous function f such that:
f : R N → RM
Can be realized using a network with one hidden layer given enough number of hidden neurons.
That proves that neural network can realize prediction function and hence it’s one of the
machine learning techniques.
11
Question 7
14
Question 7
15
Question 7
(-3,5)
𝑆1 3
𝑆2 -0.3
𝑆3 -170
Class Class 1
16
Question 10
Why thin and tall networks are preferred to fat and short networks?
Because:
• They can automatically Learn high-level features.
• Ability to do transfer learning.
• Modularity of Deep Neural Network can be used as building blocks like LEGO
17
Q12: Vanishing Gradient Problem
Describe how to overcome the vanishing gradients problem? How does this problem affect the
training of neural networks?
Vanishing gradients problems happens due to lots of reason, one of them is that the activation function have
low or zero gradient values, as you can see in the gradient of sigmoid and Tanh only small range of input
values will have a gradient value, and in case of sigmoid it’s less than 0.25,
18
Q12: Vanishing Gradient Problem
Describe how to overcome the vanishing gradients problem? How does this problem affect the
training of neural networks?
19
Q13: Overfitting problem
• What is the overfitting problem? How to avoid such a problem when training neural
networks?
Overfitting happens when the Neural Network starts to over tune its
parameters that it becomes specific for only the training dataset. In this
case, the error in predicting the training samples become very low
(almost zero) while the performance in the testing dataset increases. It’s
said that the network failed to generalize.
epochs
20
Q13: Overfitting problem
• What is the overfitting problem? How to avoid such a problem when training neural
networks?
This can be avoided by:
• Using a validation dataset to check how generalized is the network.
• Data augmentation to increase variations in the training dataset.
• Using Drop out and batch normalization techniques.
epochs
21
Question 14
What is the relation between ReLUs and drop out?
The output of ReLU is 0 for all negative numbers, which means that this node is “shut down” during
calculation of this sample or batch. It’s doesn’t contribute neither in output nor in backpropagation.
This behavior is similar to the drop out technique which involves shutting down a percentage of the nodes
during both feedforward and backpropagation.
Both are proven to increase the ability of the network to generalize and achieve better results in the testing
dataset.
22
Q17: Describe how network parameters are initialized
1 1
• Uniform distribution in range − ,
𝑟 𝑟
1
• Gaussian distribution with mean of 0 and standard deviation of 𝑟
• Xavier initialization suggests initialization of weights by Gaussian distribution of mean 0 and standard
deviation that is different in each layer according to following equation
2
𝑟𝑖𝑛 + 𝑟𝑜𝑢𝑡
Where 𝑟𝑖 𝑛 and 𝑟𝑜 𝑢 𝑡 are the number of inputs and outputs of the layer respectively
23
Question 18
• What are the main three properties of CNN? Which part of the network is related to which property?
24
Question 18
• What are the main three properties of CNN? Which part of the network is related to which property?
25
Question 18
• What are the main three properties of CNN? Which part of the network is related to which property?
26
Question 18
• What are the main three properties of CNN? Which part of the network is related to which property?
The first two properties are related to the convolution layers while the last property is related to
the pooling layer.
27
Question 21
The following CNN architecture is called AlexNet:-
29
Question 21
• AlexNet Explained
30
Question 20
For the VGGNET structure given below, assume that the filter size is 3X3 in the convolutional layers with
a stride and a padding amount of 1. Discuss and calculate the size of each stage.
31
Layer Input size Output size Parameters
F = 3 , S=1 , P=1
1st Conv. Layer [64] [224x224x3] 𝟐𝟐𝟒 − 𝟑 + 𝟐 𝟏 [[3x3x3]+1]x64
+𝟏
𝟏
[224x224x64]
2nd Conv. Layer [64] [224x224x64] [224x224x64] [[3x3x64]+1]x64
F = 2 , S=2 , P=0
𝟐𝟐𝟒 − 𝟐 + 𝟐 𝟎
+𝟏
𝟐
Max pooling [224x224x64] [112x112x64] 0
3rd Conv. Layer [128] [112x112x64] [112x112x128] [[3x3x64]+1]x128
4th Conv. Layer [128] [112x112x128] [112x112x128] [[3x3x128]+1]x128
Max pooling [112x112x128] [56x56x128] 0
5th Conv. Layer [256] [56x56x128] [56x56x256] [[3x3x128]+1]x256
32
Question 20
34