You are on page 1of 1

Standard notations for Deep Learning

This document has the purpose of discussing a new standard for deep learning
mathematical notations.
1 Neural Networks Notations.
General comments:
_ superscript (i) will denote the ith training example while superscript [l] will
denote the lth layer
Sizes:
_m : number of examples in the dataset
_nx : input size
_ny : output size (or number of classes)
_n[l]
h : number of hidden units of the lth layer
In a for loop, it is possible to denote nx = n[0]
h and ny = nh
[number of layers +1].
_L : number of layers in the network.
Objects:
_X 2 Rnx_m is the input matrix
_x(i) 2 Rnx is the ithexample represented as a column vector
_Y 2 Rny_m is the label matrix
_y(i) 2 Rny is the output label for the ith example
_W[l] 2 Rnumber of units in next layer _ number of units in the previous layer is the
weight matrix,superscript [l] indicates the layer
_b[l] 2 Rnumber of units in next layer is the bias vector in the lth layer
_^y 2 Rny is the predicted output vector. It can also be denoted a[L] where L
is the number of layers in the network.
Common forward propagation equation examples:
a = g[l](Wxx(i) + b1) = g[l](z1) where g[l] denotes the lth layer activation
function
^y(i) = softmax(Whh + b2)
_ General Activation Formula: a[l]
j = g[l](
P
k w[l]
jka[l�1]
k + b[l]
j ) = g[l](z[l]
j)
_ J(x;W; b; y) or J(^y; y) denote the cost function.
Examples of cost function:
_ JCE(^y; y) = �
Pm
i=0 y(i) log ^y(i)
_ J1(^y; y) =
Pm
i=0 j y(i) � ^y(i) j
1

You might also like