Professional Documents
Culture Documents
meets
a Gaussian process
Guofei Pang
05/11/18
Crunch seminar
Neural net (fully connected)
Input layer First hidden layer Last hidden layer Output layer
L+1,𝐿 L+1
𝑤1,𝑁 , 𝑏1
1,0 1 𝐿
𝑤1,N , 𝑏1
0
𝑥𝑁0 0 𝐼 𝑦𝑁0 0 1
𝑥𝑁 𝑦𝑁1 1 𝑥𝑁𝐿 𝐿 𝑦𝑁𝐿 𝐿 𝑥𝑁𝐿+1 𝑦𝑁𝐿+1
𝐿 1
1 𝐿 1 +
+
𝐿+1,𝐿 𝐿
𝑦𝑖0 = 𝑥𝑖0 , 𝑥𝑖1 = 𝑁 1,0 0 1
𝑗=1 𝑖,𝑗 𝑦𝑗 + 𝑏𝑖
0 𝑤 𝑥𝑖𝐿+1 = 𝑁 𝑗=1 𝑤𝑖,𝑗
L 𝑦𝑗 + 𝑏𝑖𝐿+1
i =1,2, …, N0 𝑦𝑖1 = 𝜙(𝑥𝑖1 ) 𝑦𝑖𝐿+1 = 𝜙(𝑥𝑖𝐿+1 )
i =1,2, …, N1 i =1,2, …, NL+1
2/18
Neural net (fully connected)
Input layer First hidden layer Last hidden layer Output layer
L+1,𝐿 L+1
𝑤1,𝑁 , 𝑏1
1,0 1 L
𝑤1,N , 𝑏1
0
𝑥𝑁0 0 𝐼 𝑦𝑁0 0 1
𝑥𝑁 𝑦𝑁1 1 𝑥𝑁𝐿 𝐿 𝑦𝑁𝐿 𝐿 𝑥𝑁𝐿+1 𝑦𝑁𝐿+1
𝐿 1
1 𝐿 1 +
+
𝑙+1,𝑙 𝑙
𝑥𝑖𝑙+1 = 𝑁 𝑙+1
𝑗=1 𝑤𝑖,𝑗 𝑦𝑗 + 𝑏𝑖
𝑙
𝑦1𝐿+1 = 𝑓1 𝑥10 , 𝑥20 , … , 𝑥𝑁0 ; {𝐖𝑙+1,𝑙 }, {𝐛𝑙+1 }
𝑦𝑖𝑙+1 = 𝜙(𝑥𝑖𝑙+1 ) 𝑦2𝐿+1 = 𝑓2 𝑥10 , 𝑥20 , … , 𝑥𝑁0 ; {𝐖𝑙+1,𝑙 }, {𝐛𝑙+1 }
0
0
i =1,2, …, Nl+1 …………
l = 0, 1, …, L 𝑦𝑁𝐿+1
𝐿 1
= 𝑓𝑁𝐿 1 𝑥10 , 𝑥20 , … , 𝑥𝑁0 0 ; {𝐖𝑙+1,𝑙 }, {𝐛𝑙+1 }
+ +
𝑦𝑖0 = 𝑥𝑖0
Neural net (fully connected)
𝐿+1,𝐿 𝐿+1
𝑤1,𝑁 , 𝑏1
1,0 1 𝐿
𝑤1,N , 𝑏1
0
𝑥𝑁0 0 𝐼 𝑦𝑁0 0 1
𝑥𝑁 𝑦𝑁1 1 𝑥𝑁𝐿 𝐿 𝑦𝑁𝐿 𝐿 𝑥𝑁𝐿+1 𝑦𝑁𝐿+1
𝐿 1
1 𝐿 1 +
+
𝐖1,0 , 𝐛1 𝐖𝐿+1,𝐿 , 𝐛𝐿
𝐱0 𝐼 𝐲0 𝐱1 𝐲1 𝐱𝐿 𝐲𝐿 𝐱 𝐿+1 𝐲 𝐿+1
5/18
Y ℕ(m, K) Yx := fGP(x) GP(m(x), k(x,x’))
6/18
Neural net (fully connected) & Gaussian process
7/18
Neural net (fully connected) & Gaussian process
y= 𝑓𝑁𝑁 (𝐱; 𝐖 , 𝐛 )
GP(0, 𝑘𝜎 2 , 𝜎 2 (x, x’))
𝑤 𝑏
2
𝜎𝑤
𝐖𝑙+1,𝑙 𝐷 𝟎, 𝐈 (𝒊. 𝒊. 𝒅) 𝑁𝑙 ∞ for hidden layers
𝑁𝑙
8/18
𝑥𝑖1 𝐱 𝐷𝑃(𝟎, 𝑘0(𝐱, 𝐱 ′ )) 𝑥𝑖2 𝐱 𝐺𝑃(0, 𝑘1(𝐱, 𝐱 ′ ))
𝐖1,0 , 𝐛1 𝐖 2,1 , 𝐛2
𝐱 𝐼 𝐲0 𝐱1 𝐲1 𝐱2 𝐼 𝐲2
1,0 0 2,1 1
𝑦𝑖0 = 𝑥𝑖 , 𝑥𝑖1 = 𝑁 1
𝑗=1 𝑤𝑖,𝑗 𝑦𝑗 + 𝑏𝑖
0 𝑥𝑖2 = 𝑁 0
2
𝑗=1 𝑤𝑖,𝑗 𝑦𝑗 + 𝑏𝑖
i =1,2, …, N0 𝑦𝑖1 = 𝜙(𝑥𝑖1 ) 𝑦𝑖2 = 𝑥𝑖2
i =1,2, …, N1 i =1,2, …, N2
𝜎𝑤2
k0(x, x’) = Cov (𝑥𝑖1 (x), 𝑥𝑖1 (x′ )) = 𝜎𝑏2+ x x′ regardless of the subscript i
𝑁0
k1(x, x’) = Cov (𝑥𝑖2 (x), 𝑥𝑖2 (x′)) = 𝜎𝑏2 + 𝜎𝑤2 F(k0(x, x’), k0(x, x), k0(x’, x’) )
𝐖1,0 , 𝐛1 𝐖 2,1 , 𝐛2
𝐱′ 𝐼 𝐲0 𝐱1 𝐲1 𝐱2 𝐼 𝐲2
2,1 1
𝑦𝑖0 = 𝑥′𝑖 , 𝑥𝑖1 = 𝑁 1,0 0 1
𝑗=1 𝑤𝑖,𝑗 𝑦𝑗 + 𝑏𝑖
0 𝑥𝑖2 = 𝑁 1
2
𝑗=1 𝑤𝑖,𝑗 𝑦𝑗 + 𝑏𝑖
i =1,2, …, N0 𝑦𝑖1 = 𝜙(𝑥𝑖1 ) 𝑦𝑖2 = 𝑥𝑖2
i =1,2, …, N1 i =1,2, …, N2
9/18
Neural net (fully connected) & Gaussian process
𝜎 2
𝑤
𝐖𝑙+1,𝑙 𝐷 𝟎, 𝐈 (𝒊. 𝒊. 𝒅) 𝑁𝑙 ∞ for hidden layers
𝑁𝑙
𝑘 𝑙 (x, x’) = 𝜎𝑏2 + 𝜎𝑤2 F (𝑘 𝑙−1 (x, x’), 𝑘 𝑙−1 (x, x), 𝑘 𝑙−1 (x’, x’) ) ,
𝑙 = 1, 2, …, 𝐿
2
𝜎𝑤
𝑘0 (x, x’) = Cov (𝑥𝑖1 (x), 𝑥𝑖1 (x′ )) = 𝜎𝑏2 + x x′
𝑁0
10/18
NNGP: Gaussian process with NN-induced covariance function
11/18
NNGP: Gaussian process with NN-induced covariance function
NN GP Bayesian NN NNGP
Expressivity High Intermediate High High
Uncertainty No Yes Yes Yes
Cost Low intermediate High intermediate
Accuracy It depends It depends It depends It depends
12/18
Expressivity
Deep neural networks (DNN) can compactly express highly
complex functions over input space in a way that shallow networks
with one hidden layer and the same number of neurons cannot.
(Regression)
Deep neural networks can disentangle highly curved manifolds in
input space into flattened manifolds in hidden space.
(Classification)
Poole B, Lahiri S, Raghu M, Sohl-Dickstein J, Ganguli S.
Exponential expressivity in deep neural networks through transient chaos. In
Advances in neural information processing systems 2016 (pp. 3360-3368).
14/18
Accuracy (Regression problem)
15/18
Accuracy (Classification problem, J Lee et al, 2018)
16/18
Computational cost
NNGP: Same as GP
17/18
Future work – to improve, apply, and generalize NNGP
Training a NNGP
Develop NNGP for other types of NNs, say, CNN, RNN, GAN
……
18/18