You are on page 1of 673

336 CHAPTER 6 • MULTILAYER NEURAL NETWORKS

(a) Use the trigonometric identity

cos(a) cos(f3) = 21 cos(a + f3) + 21 cos(a-f3)

to write z(x1 , x2) as a linear combination of terms cos(f1 x 1 + hx2) and


cos(f1 x1- hx2).
(b) Show that cos(x) or indeed any continuous function f (x) can be approxi­
mated to any accuracy by a linear combination of sign functions as:

N
l+Sgn(x-x;)
f(x) � f(xo) + � [f(x;+1)-f(x;)] [ ],
2

where the x; are sequential values of x; the smaller x;+ 1 - x;, the better the
approximation.
(c) Put your results together to show that z(x1, x2) can be expressed as a linear
combination of step functions or sign functions whose arguments are them­
selves linear combinations of the input variables x 1 and x2. Explain, in tum,
why this implies that a three-layer network with sigmoidal hidden units and
a linear output unit can approximate any function that can be expressed by
a Fourier series.
(d) Does your construction guarantee that the derivative df(x)/d x can be well
approximated too?

Section 6.3
3. Consider a d -n H - c network trained with n patterns for m e epochs.
(a) What is the space complexity in this problem? (Consider both the storage of
network parameters as well as the storage of patterns, but not the program
itself.)
(b) Suppose the network is trained in stochastic mode. What is the time com­
plexity? Because this is dominated by the number of multiply-accumulates,
use this as a measure of the time complexity.
(c) Suppose the network is trained in batch mode. What is the time complexity?
4. Prove that the formula for the sensitivity 8 for a hidden unit in a three-layer net
(Eq. 21) generalizes to a hidden unit in a four- (or higher-) layer network, where
the sensitivity is the weighted sum of sensitivities of units in the next higher
layer.
5. Explain in words why the backpropagation rule for training input-to-hidden
weights makes intuitive sense by considering the dependency upon each of the
terms in Eq. 21.
6. One might guess that the backpropagation learning rules should be inversely
related to f'(net)-that is, that weight change should be large where the output
does not vary. In fact, as shown in Eq. 17, the learning rule is linear in f'(net).
Explain intuitively why the learning rule should be linear in f'(net).
7. Show that the learning rule described in Eqs. 17 and 21 works for bias, where
xo = Yo = 1 is treated as another input and hidden unit.

You might also like