Professional Documents
Culture Documents
1
We need neural nets because we do not know the joint distribution
(how about KDE?)
Optimal Optimum
2
Second approx.
(time average)
3
Vapnik’s
4
Vapnik- Chervonenkis dimensions
5
Machine capacity too small Machine capacity too large
(over-determined) (under-determined)
In relation to input dimension N
6
f ∈ 𝔉𝔉 Set of all functions that can be realized
𝑃𝑃𝑒𝑒 (𝑓𝑓 ∗ )
𝑃𝑃𝑒𝑒𝑁𝑁 (𝑓𝑓 ∗ ) 𝑃𝑃𝑒𝑒 = min 𝑃𝑃𝑒𝑒 (𝑓𝑓)
𝑓𝑓∈𝔉𝔉
8
𝕊𝕊 𝔉𝔉, 𝑁𝑁 Shatter co-efficient of class 𝔉𝔉
9
If there exists an 𝔛𝔛𝑁𝑁 ⊂ 𝔛𝔛 , for the largest N
where
{𝑜𝑜 ∩ 𝔛𝔛𝑁𝑁 , 𝑜𝑜 ∈ 𝔒𝔒} = 𝖕𝖕(𝔛𝔛𝑁𝑁 )
Then shatter co-efficient of 𝔉𝔉 : 𝖕𝖕 𝔛𝔛𝑁𝑁 = 2𝑁𝑁 (cardinality of
power set)
Otherwise it is < 2𝑁𝑁
VC dimension definition:
The largest integer 𝑘𝑘 ≥ 1 for which 𝕊𝕊 𝔉𝔉, 𝑘𝑘 = 2𝑘𝑘 is called the VC
dimension of the class 𝔉𝔉 and is denoted by 𝑉𝑉𝑐𝑐 .
If 𝕊𝕊 𝔉𝔉, 𝑁𝑁 = 2𝑁𝑁 for every 𝑁𝑁 (in 𝔛𝔛), then VC dimension is infinite.
10
In general, in an L-dimensional space, VC dimension of a
perceptron is L+1.
Sauler-Shelah Lemma:
−𝑁𝑁𝜀𝜀 2
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑃𝑃𝑒𝑒 𝑓𝑓 ∗ − min 𝑃𝑃𝑒𝑒 (𝑓𝑓) ≤ 8𝕊𝕊 𝔉𝔉, 𝑁𝑁 exp( �128)
𝑓𝑓∈𝔉𝔉
11
Training samples #, VC-dimension and error relation:
13
NN output & hidden units:
Before the output: Weights
Distribution not known, but linear gives best chance for gradient flow
14
Logistic
Can be seen as:
Converting this to probability
15
Cost:
16
Softmax units for Multinouli output: WINNER-TAKE-ALL
Mapping to multiple
probability values, with
sum of probabilities as 1.
z themselves are considered the un-
normalized log probabilities
approx