Professional Documents
Culture Documents
CONTRIBUTED ARTICLE
SyracuseUniversity
Abatraei-Westudy two classes of sigmoidr: the simple si&noi&, de~ned to be odd, asymptotically bounhd,
completely monotone fmctions inonevariable,andthehyperbolicsigmoidr,apropersubsetof simplesigmoih anda
naturalgeneralizationof thehyperbolictangent.Weobtaina completecharactertiationfor theinversesof hyperbolic
sigmoih usingEuler’sincompletebetafunctions,and describecompositionrulesthat illustratehow suchfunctions
may be synthesizedfromothers.Theseresultsare appliedto twoproblems.Firstweshowthat withrespectto simple
sigmoidvthe continuousCohen-Grossberg-Hopjield modelcanbe reducedto the (associated)L.egendredt~erential
equations.Second,we showthat theeffectof usingsimplesigmoti as nodetransferfmctions ina one-hiizkkn layer
feedforwardnetworkwith onesummingoutputmaybe interpretedas representingthe outputfmction as a Fourier
seriessine transformevaluatedat thehiakknlayernodeinputs,thusextendingandcomplementingearlierresultsin
this area. Copyright01996 Elsevier Science Ltd
Keywords-Sigmoid functions, Hypergeometnc series, Legendre equation, Cohen-Grossberg-Hopfield model,
Additive model, Fourier transform.
The main contributions of the paper are as DEFINITION2.1 (real analyticity). Let U S L%be an
follows: open set. A function f: U 4 9? k said to be real
analytic4 at X. E U, lfthe fmction may be represented
● Simple sigmoids, hyperbolic sigmoids and their by a convergent power seraks on some interval of
inversesare completelycharacterizedin Sections4 positive radius centered at XO, i.e., f(x) =
and 5. z~o aj(x–xo)J. The fmction is said to be real
● Using seriesinversiontechniques,in Section 5, we analytic on V c U, f it is real analytic at each
obtain the seriesexpansionsof hyperbolicsigmoids xl) c v. ■
from those of their inverses.These resultsextend
those of Minai and Williams(1993)for the logistic DEFINITION2.2 (monotonicity).Afunctionf: 91 ~ ~
function. is absolutely monotonic in (a, b) &it has non-negative
● In Section 4, we study the composition of simple derivatives of all orders, i.e., f e Cm ((a, b)) and,
sigmoids via differentiation,addition, multiplica-
tion, and functional composition. These results fl’J(x)>o a < x <b, k = O, 1,2.... (2.1)
also completely specify the relationship between
Euler’s incomplete beta function and the para-
metrized sigmoids. A fmction f: ~ - B is completely monotonic in (a,
● In Section6.1 we show thatthecontinuousCohen– b), ~ff(–x) is absolutely monotonic in (–b, – a).
Grossberg-Hopfield equationsbelong to the class Equivalently, f is completely monotonic in (a, b)
of non-homogeneous Legendre differentialequa- z~ff ● Cm((a, b)) and,
tions if the neural transfer function is a simple
sigmoid. (-l)kJ@)(x)>O a < x <b, k=O, 1,2.... (2.2)
● In Section 6.2 we establisha connection between
Fourier transformsand feedforwardnets with one A fmction f: 97*3 is completely convex in (a, b),
summingoutputand one hiddenlayerwhose nodes z~ff ● Cm; ((a, b)), and for aZZnon-negative k and
contain simplesigmoidaltransferfunctions. x E (a, b), (–l)kflk)(x) >0. w
parameters, and the ~is are referred to as the kinds of sigmoids being considered. The following
&nominator parameters of the GH series. ■ results clarify the implications of the fourth
constraint.
In particular, the classical GH seriesb in z,
2F1(rY,~; ~; z) is definedby, PROPOSITION 3.1 (Feller, 1965) A fmction
f: (O, 1) d Si?is absolutely monotone on (O, 1) a~it
*F’,(@,~; ‘y;z) = F(% P; 7; z) = ~ m (a)k(fl)k
(7)k Z’k(~.4)
n possesses a power series expansion with non-negative
coefficients, convergingfor O < x <1. ■
REMARK2.2. In general,theparameters~i and ~i, as (–l)J’)(x) = +al – ti~x + 3a~x2 +.. o (3.1)
well as the variable z, are allowed to be complex; flz)(x)=2a2-6a3x+““ “.
however,we follow common practiceand restrictour
attentionto real values,i.e., Vi : ~i, ~, z G 9i?.Even
where ai > 0 Vi. From real analysis we know that
with this restriction,the hypergeometricfunction is
each of (–l)nfln)(x) has the same convergence
amazinglyversatile.Spanierand Oldham (1987) list
properties as eqn (3.1). Also, the sum of a
over 170functions thatare representablein termsof
convergent infinite alternatingseries is always less
hypergeometricfunction. The hypergeometricfunc-
than or equal to the first term. This fact, along with
tion is a periodic table d la Mendeleev for
the above equationsimpliesthat (–l)~fl~l(x) >0, i.e.,
mathematical functions; different functions get
f(x) is completely monotone on (O, 1). ■
neatlypegged into variousgroups7 by the valuesof
the parameters and the form of the dependent
COROLLARY3.1. u(x)/x k a completely convex
variable.
fmction in (O, 1) If rT(fi/fi is a completely
monotone function in (O, 1).
3. SIMPLE AND HYPERBOLIC SIGMOIDS
Proof If a(x)/x is completelyconvex in (O, 1),thenit
DEFINITION 3.1 (simple sigmoids). A fwction has to be analytic in (O, 1) (Widder, 1946). Also,
u : 9 + (–1, 1) is said to be a simple sigmoid if it cT(x)/x is an even function, implying that its power
satisfies the following conditions: seriesexpansionwillconsistonly of evenpowersin x,
which alternate in sign. From Lemma 3.1,
1. a(.) is a smooth function, i.e., u(–x) is~. a(~/fi, will hence be completely monotone in
2. a(.) is a oddfinction, i.e., 0(–x) = –a(x). (O, 1). The same argument suflices for the con-
3. a(.) has y = +1 as horizontal asymptotes, i.e., verse. ■
lim.+m u ( x ) =1. If a simplesigmoidis also strictly increasing,then
4. a(x)/x is a completely convexfunction in (O, 1). ■ a much stronger statement can be made, as
demonstratedby the following proposition.
Simple sigmoids are required to be odd smooth
functions bound by horizontal asymptotes; con- 3.2 (Krantz&Parks, 1992)Let y = a(x)
PROPOSITION
straintsimpose a degree of standardizationon the be a strictly increasing simple sigmoid (i.e., Vx E 3,
a’(x) > O). Then:
6 The classicalGH seriesis referredto as the Gmwjiinction in
the literature(SpanierandOldham,1987). 1. q - a-l : (–1, 1) -+ 9? exists.
7 ~~~em mustbe manyuniversitiestodaywhere95 per-t, if
not 100per cent, of the functionsstudiedby physics,engineering,
andevenmathematicsstudents,arecoveredby this singlesymbol 8 Lemma3.1 appearsto be “folklore”;we havebeenunableto
F(o, b;qz).” —W. W. Sawyer,citedby Grabamet al. (1989). finda referenee.
A Class of SigmoidFunctions 823
2. q(y) is a strictly increasingfunction, analytic in the inverse that is a solution to some second order
interval (–1, 1). Fuchsian equations.11 sin= any second order
3. q’(y) = l/c’(q(y)), where q’ and u’ are the first Fuchsian equation with three singularitiescan be
akrivatives of q and u respectively. transformed into the Gauss hypergeometricdiffer-
4. q(y)/y is absolutely monotone in (O, 1). entialequation, one solution of which is the classical
GH series (Klein-B6cher theorem; Whittaker &
REMARK3.1. If u(x)/x is completely monotone on Watson, 192’7),it follows that the inverses would
(O, 1) and cr is invertible then q(y)/y is absolutely have classicalseriesexpansions.Theseconsiderations
monotone on (O, 1), whereq denotesthe inverseof a. motivatethe following definition.
The converse is also true, and is an immediate
consequenceof Lemma 3.1. DEFINITION 3.2 (hyperbolic sigrnoids). A function
n : B + (–1, 1) is said to be a hyperbolic sigmoid
REMARK3.2. A simple sigmoid has two horizontal fwction ifit satisfies the following conditions:
asymptotes,hence its inverse (if it exists) will have
two vertical asymptotes(i.e., limY4~lq(y) + +~). 1. u is a real analytic, odd, strictly increasing sigmoid,
It will be seen that as they have been defined, such that Iimx+w a(x) = 1.
sigrnoidsand theirinversesarequitesimilar;both are 2. Letq :(–1, 1) + ~ denote the inverseof o, and
odd, increasing, univalent, analytical functions. q’ its first derivative.Then,
However, the two differ fundamentally in that (a) q(y)/y has a Gauss hypergeometric series
sigmoids are asymptotically bounakd, while their expansion in y2 with at most three para-
inversesare not. meters.
Simple sigmoids encompass many of the often (b) q’(y) has a Gauss hypergeometric series
used sigmoidsdescribedby formulae.The hyperbolic expansion m y2 with at most one parameter.
tangent and its close relative, the “exponential” or
logistic sigmoid, are often used in many neural
network theoretical studies and applications. For
example,most of the “spin-glassexplanations”of the 4. CHARACTERIZATION: INVERSE
CGH netusethehyperbolictangent.9The hyperbolic HYPERBOLIC SIGMOIDS
tangenthas, among others, the following properties: The following result is a complete characterization
for the inversesof hyperbolic sigrnoids.Proofs are
1. It is an odd, strictlyincreasinganalyticalfunction, presentedin the Appendix.
asymptoticallybounded by the linesy = +1.
2. Its inversetanh-l (y) hasa GH expansiongivenby THEOREM4.1 (inverses).Let y = a (x) be a hyperbolic
yF(l, 1/2; 3/2; y2). sigmoid, and let q : (–1, 1) * St be its inverse. Then,
3. The first derivative of tanh-l(y) is given by either
1/(1 – yz) = IFO(l;; y2), i.e., the “GH expansion
of thefirstderivativeof tanh–l(y) is dependenton
only one numeratorparameter.
(4.1)
It can be shown that many other simple sigmoids,
such as that of Elliot (1993), and the Gudermannian
(Section 4.2), also have inverseswith classical GH or
se~es representations.loThe function tanh-*(y)/y
satisfiesa second order linear homogeneous differ- q(y) = YF(a, –; –; Y2)= (l –yy2)a
~ >0 (4.2)
ential equation, with three regular singular points,
located at O, 1 and oo. A sigmoid with similar
analytical behavior could be expected to have an where, by F(a, –; –; y2), we mean F(a, ~; /3;y2)
(B c .%). ■
A proof of Corollary 4.1 may be given along the 3. Connections with other indefinite integrals of
following lines.If a is a hyperbolicsigmoid,thenit is powers of trigonometricor hyperbolic fimctions.
simple on the interval (–1, 1). This follows from 4. Connections with statistics via the function
Theorem4.1. The seriesrepresentationfor its inverse L(P, q) (SPanierand Oldham, 1987).
in (–1, 1) has non-negative coefficients, and this
impliesq(y)/y is absolutelymonotone (Proposition When inversehyperbolic sigmoids are characterized
3.1). Hence a(x)/x is completely monotone, and by eqn (4.2), we can use the identity,
thereforesimple (Lemma 3.1 and Remark 3.1). The
converse is not true. Simple sigmoids need not be cosh(tanh-’ (y)) = (4.4)
hyperbolic. The error function erf(.) is simple, but J&
one can use the study of Carlitz (1963) on this
function to show that it does not have an inverse to show that,
representableby a classicalhypergeometncseries.It
ycoshti(tanh-l(y)) = (l –yy2)a. (4.5)
follows that erf(.) is not a hyperbolic sigmoid, and
hence the set of hyperbolic sigmoids is a proper
subsetof the set of simplesigmoids. ■ The fundamental role played by the hyperbolic
For specificvaluesof itsparameters,thehypergeo- tangent is once again evident. Here, it relates the
metric function often reduces to other well known two types of hyperbolic sigmoids defined by eqns
specialfunctions. When inversehyperbolic sigmoids (4.1) and (4.2).
are characterizedby eqn (4.1), there is an intimate
connection with Euler’sincomplete beta function. 4.1. New Inversesfrom Old
tmh-i(z)
LEMMA 4.1. If qa : (–1, 1) ~ 9% is an inverse
~ i3(l/2; 1 – a; Z2)= cosh2(”-’)(t)dt. ■
~o hyperbolic sigmoid, then the functions qm+l and q._l
defined by:
Spanierand Oldham (1987) give a detaileddescrip- Y2(1-C4 d
tion of the many propertiesof thisimportantspecial %,1(Y) = ~ ~ (y‘-%(Y)) ~~ I (4.6)
function. The following corollary is an immediate 2CZ-1
consequenceof Theorem 4.1 and Proposition 4.1. It
gives the connection between inverse hyperbolic
%-l(Y) =
(h - ;;’1
- y’)a-2
sigmoidsand Euler’sincompletebeta function.
x ~@{
dy {
w}
y;?;-’ z cY>2 (4.7)
In the following, we will use F(O) as an abbreviation The definitionof hyperbolic sigrnoidsimpliesthat
for F(6; /3;-y;z). Equation (4.7) follows from the theirinverseshaveGH expansionsiny2. Theorem4.2
identity: relaxes this requirement by only requiring GH
expansionsin some odd, injective C* function g(y).
- Z)a+d-7-nF(~
(-y- CY)nZ7-a-’(l _ ~) A proof is provided in the Appendix.
(4.12)
provided
in an inverse hyperbolic sigmoid, with parameter
a – 1. ■ i?’(Y)
PI (1 - yq” + “
For inverse hyperbolic sigmoids with “missing”
parameters,thereis a very simplecomposition rule. where g’(.) is thefirst derivative of g(.). ■
yF(cE,1/2; 3/2; y2), the function tanh(.) is note- Y/(1 – Y2)”, a remarkably explicit form for the
worthy: first, it corresponds to the case a = 1; coefficients{b2J+1}~may be given:
secondly, all inverse hyperbolic sigmoids with
integralvalues of CYmay be generatedfrom tanh(x) THEOREM5.1 (hyperbolic s@ioids—1). Z~the inverse
by a process of differentiation(Lemma 4.1); and sigmoid is given by y/(1 – y2)a, cx>0, then in some
thirdly, it is a function often encounteredin neural neighborhood of the origin, we have the valid expansion
nets. As was mentioned in the Introduction, the
logistic function may be thought of as a translated
and scaledversion of the hyperbolictangent.
There is a good example of the hypergeometric
compositiondescribedin Theorem4.2. Sincetan(~y) where,
is an odd, injective,smooth, increasingfunction of y
(for some constant ~ > O), from Theorem 4.2, one b~+’=(-l)k(M+l)’ ((z:l)a)o “1)
may conclude that for positive a, the function, tan
(@Y)F(~, 1/2; 3/2; tin2(@)) is the inverseof some ProoJ See the Appendix. ■
realanalytic,odd, strictlyincreasingsigmoid.It turns
out thattheinverseGudermannianfunction’4may be
obtained from this function, by choosing CY = 1 as
follows:
gal-*(y)= ln(aec(y)+ tan(y)) for – ~ < y < ~ 5.2. HyperbolicSigmoidaof the SecondKind
= 2 tan(y/2)F(l, 1/2; 3/2; tan2(y/2)). When an inversehyperbolic sigmoid is of the form
x = yF(a, 1/2; 3/2; y2), the problem is much hard-
Many such examplescould be generated.15 er. The Lagrange inversion formula leads to an
intractableexpression. Kamber’s formulae, as pre-
sented by Goodman (1983), can be used to give
5. CHARACTERIZATION: HYPERBOLIC explicit expressions for the coefficients. Unfortu-
SIGMOIDS nately, the resulting expressions involve determin-
ants, and are of little computational value. The
It is often desirable and necessary to work with method of repeateddifferentiationis more successful.
sigmoids themselves,rather than their inverses.In The starting point for this line of attack is the
this section, we obtain power series expansionsof observationthat if x = q(y) is an inversehyperbolic
sigmoids. sigmoid, then:
If x = q(y) is an inversehyperbolicsigmoid,then
~ ~ ~–1 must have a Maclaurinseriesexpansionof
dxd 1
the following form: — = — q(y) = q’ (Y) = (1 – y2)a . (5.2)
dydy
dM+lu(x)
and yF(a, 1/2; 3/2; y2). ~+1 = &k+l .
X=(I
5.1. Hypbolic Sigmoidsof the First Kind Note that dy/dx is expressed in terms of y; this
necessitatesthe use of the chainrule.For example,to
When an inversehyperbolic sigmoid is of the form
calculatethe second derivative:
2=(31
14‘f& inverwGudrjrmannian functionfinds use in relating
circularand hyperbolicfunctions,without the usc of complex
functions.
ISme tablesof SpanierandOldham(1987),andHansen(1975)
-’’20%
=(1 - yz)” (-$(1 - Yz)”). (5.4)
in particular,containmanysuchfunctionsandexpansions.
A Claw of SigmoidFunctions 827
D“(y) = D“(a(x))
where Gn : (–1, 1) ~ 9i?i.r a fiction satisfying the n–1
recursion =x C(n, k)yX-n+’(l -y2)w-k, forn>l. (5.8)
k=O
While the procedure implicit in Theorem 5.2 is Theorem 5.3 may be viewedas a generalizationof the
efficient, it does involve the computation of the work of Minai and Williams(1993)on the derivatives
derivative of Gn(y). Equation (5.6) is a partial of the logistic sigmoid. They obtained relations
differenceequation with variablecoefficients.There- similar to eqn (5.7).17 In general, eqn (5.7) is a
fore, thereis littlehope of solvingit in any generality partialdifferenceequationwith variablecoefficients,
and obtaining a closed form expression.Even more and the systemdoes not appearto be relatedto any
sophisticatedmethods—suchas Truesdell’sgenerat- well known setsof numbers.Obtaininga closed form
ing fimction techniqueand Weisner’sgroup theoretic solution for the numbers C(n, k) appears to be
approach (McBride, 1970)-do not give any special intractable.
insightinto the nature of the polynomials G. (y).16
The next theorem offers a somewhat different
approach to the method of repeatedderivatives. 6. APPLICATIONS
In this section, we presenttwo applications.The first
THEOREM5.3 (hyperbolic s@noids—IIB). Let showsthatif theneuralnetworktransferfunction is a
hyperbolic sigmoid, then the dynamical equations
describing the CGH neural network (Hopfield &
Tank, 1986; Grossberg, 1988) can be transfonrted
into a set of non-homogeneousassociatedLegendre
differentialequations. Some conclusions regarding
be an expansion for a hyperbolic sigmoid, whose the behaviorof the CGH model can be drawn,as the
inverse is of the form yF(a, 1/2; 3/2; y2), valid in outputs saturate(i.e., output+ +1).
some neighborhood of the origin. Then bzk = O, and The second application derives an interesting
bx+l = C(2k+ 1, k), where the sequence C(n, k) connection between Fourier transforms and one-
satisjies: hiddenlayerfeedforwardnets(one-HL nets). Subject
Equations
The continuous CGH network with N neurons is Finally, substituteYi = zi(l – v~)Qi2in eqn (6.5),
describedby the following dynamics:18 yielding
dui
~+gi*{j=~ TijVj+Zi=Ei
j
E = –~ ~ Tijvivj =
zvi#=-D’’Ei” (6”2)
+
[
V(v+ ,::21
1) –— f =0. (6.7)
l,]
and substitutingeqn (6.3) into eqn (6.1), we get: CaseIZ. Let ~(vi) = ViF(~, –; –; v?). An analogous
approach leads to the very same conclusion as in
(6.4) CaseI, i.e., it is possibleto transformthe continuous
CGH equationwith the above transferfunction to a
non-homogeneous associated Legendre equation.
The following sequenceof operations is applied to However, the right hand side of the transformed
eqn (6.4): equationis complicatedand we do not consider this
case further.
1. Substitute We emphasizethat the link between the contin-
uous CGH equation and the Legendre differential
dvi equation is not accidental, given that it can be
‘i= x’ established for all hyperbolic sigrnoidal transfer
functions.For ui = tanh-l(vi), a = 1, and the above
and differentiatewith respectto vi. equationshave a ratherelementaryform.
An immediateapplicationof the above transfor-
18me ~ynaptj~wej~ts Z’ijare assumedto ~ symmetric.1t mation is in studyingthe saturationbehavior of the
makesthe derivationssimplerwithno loss in generality. CGH neural net. By saturation,we mean that the
A Class of Sigmoid Functions 829
Zi = C] P; ‘a)(vi) + C2Q:-R)(vJ
which may be solved using the specialfunction s~,al,
= c1P~-u)(vJ+ C2Q:-U)(VJ (6.9) defined and described by Babister(1967). Equation
(1 -;;)@
(6.14) first arose in the context of solving for
4 = c1p~-u)(vi)+ C2Q~-u)(vi). Poisson’s equationin sphericalpolar coordinates.
(1 - ;~)”l’ dt
continuous fmctwn f(x) aljined for all real x and REMARK 6.2. In eqn (6.15), h(t) is an even function.
satisfying thefollowrng properties: Hence the transformis a Fourier cosine transform.
The sinecomponentvanishesduringthe course of an
● f(o) = 1,
integration.
● f(x) = f(–x),
Consider a one-HL net, with k input nodes, n
● f(x) is convex for x >0,
hidden layer nodes with convex simple sigmoidal
● limx+mf(x) = o,
transfer functions a(.), and one summing output
is always a characteristicfmction (Fourier transform) node. Let wij denote the weight of the connection
of an absolutely continuousdistributionfmction,~a i.e., betweenthe ith node in the hidden layer and thejth
node in the input layer; similarly,let Ci denote the
weightof theconnectionbetweenthe ithhiddenlayer
f(X) = #(h(f); X) = j’” exp(ixt)h(t)dz.
-m node and theoutputnode. Then the output O maybe
expressedas,
Furthermore, the a%nsityh(t) is an evenfwction, and ir
continuous everywhere except possibly at t = O. N
O= ~ Ci~f = ~ CiU(U,)
The following result connects simple sigmoids with i=l i=l
n k
(6.17)
Fourier transforms. =
x C,u (x w~jxj+ 0/
i=l j=l )
THEOREM 6.1. Let a(x) be asimpZe sigmoid. lfa(x)/x
is a convex fmctwn, then it is the Fourier transformof
where ui and Oi are the input and bias for the ith
an absolutely continuous distributionfmction, i.e.,
hidden node, respectively.Since m(.) is a convex
U(X)– #(h(t);
x
x) =
r
-m exp(ixt)h(t)dt.
of a timevaryingfimction usinginformationrelating
to its spectralcomponents. Equation(6.19) suggests #-’(o(u)) = -411’(t)~ Cfsin ~ Wfjxj+ Of
that 1-HLnetswith convex simplesigmoidaltransfer i=l (, 1
functions can be thought of as implementing a (6.23)
spectral reconstruction of the output using the
weighted inputs u; to evaluatethe associated pole Equation(6.23)maybe usedas a startingpoint for an
coefficients(residues)of the Heavisideexpansion. analysis identical to that adopted by Gallant and
In particular, it can be demonstrated that the White (1988) in their study of one-HL nets with
resultsof Gallant and White (1988) are implied by “cosine squashing” functions. It is then straightfor-
eqn (6.19).In whatfollows, we shalluse.!?S(h; x) and ward to show that the weights may be so chosen
#c(h; x) to indicate the Fourier sine and cosine (hardwired) so that the one-HL nets embeds as a
transformsof h(t). specialcasea Fouriernetwork,whichyieldsa Fourier
Since h(t), the continuous distribution fimction series approximation to a given function as its
corresponding to cr(x)/x, is an even fhnction output. In this sense, the results of this section
(from Polya’s theorem), it follows that a(x)= extend those of Gallant and White.
x.!F(h(t); x) = xSJh(t); x). Using the property of More generally,one can draw similarconclusions
Fourier transforms that x~c(g(t); x) = by consideringsigmoids that are the L.aplace trans-
&,(–g’(t); x) (Davies, 1978),we may conclude that forms of some function; for exampletanh(x)/x is the
a(x) = #,(–h’ (t); x). Laplacetransformof sgn(sin(rt/2)), wheresgn(x) is
Let Ui= u + ri, where the ri are appropriate +1, Oor –1 dependingon whetherx is greater,equal
functions of the xi (since the ui are functions of the or lessthanzero (Spanier& Oldham, 1987).A similar
inputsXi) analysis would lead to a connection with real
exponential approximation (rather than trigono-
metric approximation).Efficientalgorithms,such as
o(u) = ~ C,#.(–h’(t); u + n). (6.20) Prony’s, exist for certain restricted forms of the
isl
exponentialapproximationproblem (Su, 1971).
Also relatedare the considerationsof Marks and
From the frequency shifting property of Fourier
Arabshahi (1994) on the multidimensionalFourier
transforms(Davies, 1978),
transforms of the output of a one-HL feedfonvard
net; they showed that the transformof the output is
~ W~(f(t); x + a) = &,(f(t) cos(ar);x) the sum of certainscaledDirac deltafimctions.Here,
we view the sigmoiditself as the Fourier transformof
+ .%C(f(t)sin(at); x), (6.21)
some function; the main advantageof our interpreta-
tion is the algorithmsit suggestsfor trainingone-HL
it follows that nets of the type consideredin this section.
Another potentialuse of eqn (6.23) is its possible
O(u) = ~ Ci*.(–h’(t);
U+n) usein exploringthe “goodness” of theapproximation
i=l obtained by a one-HL net with simple sigmoidal
transfer functions. In the last 200 years, much has
=S 2ci{w
s
(-~’(t)cos(rit)”
~) been learned about the errors associated with
i=l
exponential and trigonometric approximation, and
+ SC(–h’(t) SiIl(~it); U)}
ways to deal with it; however,considerationof these
issuesis beyond the scope of this paper.
D .!7. 2 ~ Ci(–h’(t) COS(rit);U)
{ ixl }
n
7. CONCLUSION
+%. 2~ ci(–h’(t) sin(rJ); U)
{ i=l } We have analyzedthe behavior of importantclasses
of sigmoid functions, called simple and hyperbolic
.%-’ (O(U)) = ‘h’(t) ~ Ci Sh(ri + U)t. (6.22)
i=l sigmoids, instancesof which are extensivelyused as
node transferfunctions in artificialneural network
implementations. We have obtained a complete
But we may choose u artibrarily:let u = O,implying
characterization for the inverses of hyperbolic
sigmoids using Euler’s incomplete beta functions,
rt = Ui = ~ WijXj+(.?i, and have describedcomposition rules that illustrate
j=l
how such functionsmay be synthesizedfrom others.
We have obtainedpower seriesexpansionsof hyper-
and eqn (6.22) becomes, bolic sigmoids,and suggestedproceduresfor obtain-
832 A. Menonet al.
ing coefficientsof the expansions.For a largeclassof in the sciences of complexity. Redwood City, CA: Addison-
node fimctions, we have shown that the continuous Wesley.
Hopfield,J. J., & Tank, D. W. (1986). Computingwith neural
CGH net equations can be reduced to Legendre circuits:A model.Science,233, 625-633.
differentialequations. The fact that the connection Hornik, K., Stinchcombe,M., & White, H. (1989). Multi-layer
between Legendre differential equations and the feedforwardnetworks are universal approximators.Neural
CGH equation holds for such a wide variety of Networks, 2, 359-366.
sigrnoids,and is not just an accidentalconsequenceof Kran@ S. G., & Parks,H. R. (1992). A primer of real analytic
fmctions. Berlin:BirkhauserVerlag.
a particularsigmoid, strongly indicatesthat further Lippmann,R. P. (1987).An introductionto computingwithneural
exploration of this connection is warranted.Finally, nets. IEEE ASSP Magazine, 4, 422.
we have shown that a large class of feedforward Maeintyre,A., & Sontag,E. (1993).Finitenessresultsfor sigmoidal
networks representthe output function as a Fourier “neural” networks. In Proc. 25th Annual Symp. Theory
series sine transform evaluatedat the hidden layer Computing, SanDiego. New York:Associationfor Computing
Machinery(ACM).
node inputs, thus extendingan earlierresult due to Marka,R. J., &Arabshahi,P. (1994).Fourieranalysisandtiltering
Gallant and White. of a single hidden layer pereeptron.In Int. ConJ ArtlJicial
Neural Networks (IEEE/ENNS), Sorrento,Italy.
McBride,R. E. (1970). Obtaining generating frictions (Vol. 21).
REFERENCES Berlin:SpringerVerlag.
Albertini, F., Sontag, E., & Maillot, V. (1993). Uniquenessof Minai, A., & Williams, R. (1993). On the derivativesof the
weightsfor neuralnetworks.In R. Mammone(Ed.), Artz@cial sigmoid.NeuralNetworkr,6(6), 845-853.
neural networks with applications in speech and virion. London:
Minsky,M., &Papert,S. A. (1988).Perceptions, an introduction to
ChapmanandHall. computational geometry. Cambridge,MA:The MIT Press.
Babister, A. W. (1967). Transcendental functions satisfying Oberhettinger,F. (1973). Fourier traruforrns of distributions and
nonhomogeneous linear dl~erential equations. New York: their inverses. New York:AcademicPress.
Macmillan. Pao, Y. H., & Sobajic,D. J. (1987).Metricsynthesisandcomxpt
Bohn, E. V. (1963). The transform analysis of linear systems. discoverywithconnectionistnetworks.In Proc. IEEE Systems,
Man and Cybernetics Con$, Alexandria,VA.
Reading,MA: Addison-Wesley.
Carlitz, L. (1963).The inverse of the error function. Pac~c J. Polya,G. (1949).Remarkson the characteristicfunction.In Proc.
4th Berkeley Symp. Math. Statist. & Probab. (pp. 115-123).
Math. 13(2), 459-470.
Cybenko, G. (1989). Approximationby superposition of a
Polya,G. (1974).Onthe zeroesof thederivativesof a functionand
sigmoidrdfunction. Math. Control, Signals and Systems, 2, its analytic character.In R. P. Boas (Ed), George Polya:
collected works. (pp. 178–189).Cambridge,MA: MIT Press.
303-314.
Davies, B. (1978).Integral transforms and their applications. New Rainville,E. D. (1964). Intermeditite d~~erential eguations. New
York:SpringerVerlag. York:Macmillan.
Diaconis,P., & Shahshahani, M. (1984).Onnonlinearfunctionsof Spanier, J., & Oldham,K. (1987).An arhrsof fmctions.
linearcombinations.SIAM J Sci. Stat. Comput. 5, 175-191. Washington,DC: Hemisphere.
Elliot,L. D. (1993).A betteractivationfunctionfor artificialneural Su, K. L. (1971). Tinre+brrain synthesis of linear networks.
networks.TechnicalReport TR 93-8, Institute for Systems EnglewoodCliffs,NJ: Prentice-Hall.
Research Universityof Maryland,CollegePark,MD. Sussmarm,H. J. (1992). Uniqueness of weights for minimal
Erd61yi,A., Magnua,W., Oberhettinger,F., & Tricomi, F. G. feedforwardnets with a given input~utput map. Neural
Networks, 5(4), 58%593.
(1953). Higher transcendental fmctions (Vol. 1). New York:
White, H. (1989). baming in artificial neural networks: a
McGraw-Hill.
Feller, W. (1965). An introduction to probability theory and its statisticalperspective.Neural Computation, 1,42S464.
Whittaker,E. T., & Watson,G. N. (1927).Moakrnanalysis(4th
application (vol. II). New York:JohnWiley.
Funahashi, K. (1989). on the approximate realization of cd.). Cambridge:CambridgeUniversityPress.
Widder, D. V. (1946). The Luplace tromforrn. Princeton, NJ:
continuousmappingsby neural networks. Neural Networks
2(3), 183-192.
PrinectonUniversityPress.
Wilf, H. S. (1989).Generatingfiictionology. New York: Academic
Gallant,A. R., & White,H. (1988).Thereexistsa neuralnetwork
that does not makeavoidablemistakes.In IEEE International Press.
Conf on Neural Networks (Vol. 1, pp. 657%54).San Diego,
CA. APPENDIX
Goodman,A. W. (1983). Univalentfwctions (Vol. I). New York:
THEOREM 4.1. Let y = u(x) be a hyperbolic sigmoid, and let
MannerPublishingCo.
T : (–1, 1) + ~ be its inverse. Then, either
Graham,R. L., Knuth,D. E., & Patashnik,O. (1989). Concrete
mathematics. Reading,MA: Addison-Wesley.
Grossberg,S. (1973).Contourenhancement,short termmemory,
andconstanciesin reverberatingneuralnetworks.Studies Appl.
Math., 52, 217–257.
Grossberg, S. (1988). Nonlinear neural networks: principles, or
onddeperuirontheporornetersqandyi (A.7) ( )
forollnon-zeroz.
p > q i- 1 : *Fqnecessarilydr”verges
From the definitionof hyperbolicsigmoida,A(x) is representable
by a GH functionwithat mostthreeparameters;wemusttherefore
Sincelim,-.., q(z) - *CO,butis finitein theinterval(–1, 1),it makethe identification,@= 1/2 and~ = 3/2. Fromthe symmetry
follows that if a GH series is to representq(.), then it has to propertiesof the GH function,we neednot considerthe case when
convergein the interval(–1, 1), but divergeat z = *1. a = 1/2, ~ = 3/2. It foUowsthat,
This rules out non-positiverntegralvahkesfor the numerator
parameters;otherwise,theserieswouldconvergefor all z c 91 (and
not just in the interval (–1, l)). Yet, even if the numerator y = xF(a, 1/2; 3/2; X2)
parametersdo not havenon-positiveintegralvalues,in threeof the ~ = d~(x)
abovecases,the numberof numeratorparametersto denominator —= F(a, –; -; X2)= —
(1-;2)0“ (A.16)
ok
parametersis suchthat each seriesconvergesfor aUz (case 1), or
divergeafor aUz (cases 3 and 4). That leavesjust one case to
consider,viz., theclassicalseries,2F1(cq, cq;71,z) = F(cz, 0; ~; z), The parametera cannot take any arbitraryreal value. The
i.e., we maytakeq(x) = xF(a, f3;T; X2). behaviorof q(x) at the endpointsof its interval,requiresthat,
Sinceq(.) has to be a GH serieswithat most three parameters,
some of the parametersare allowed to be “missing”.133other linrq(x)+ *CO*hnll A(x)+ *CO. (A.17)
X-* I
words,case 2 spawnsthe followingpossibilities:
Equations(A.16)and(A.17)takentogetherimplythata >0. This
q(x) = xF(fl, ~; ‘y; X2) + CaseZ(a) (A.8) is a necessarybut not sutlicientcondition. The following two
q(X) = X~(CY, ~; –; X2) + Caac2(b) (A.9) propositionsaUowus to pin downa’s valuemoreprecisely.
q(x)= xl+, –; ~; X2) +- caae2(c) (A.1O)
q(x)= xF(a, –; –; X2) - Casc2(d) (All) PROPOSITION C (Erd&lyi et al., 1953).If a and ~ are dtflerent from
0, – 1, . . . then F(cr, P; ~; z) converges absolutely for z <1. For
q(x)= xF(–, –; 7; X2) - Case2(e) (A.12) z = 1:
q(x)= xF(–, –; –; X2) - CZSC2(f). (A.13)
F(cz,~; 7; z)arwergesobsolutely if(a + /3- 7) <0 (A.18)
PropositionA canbe usedonceagainto weedoutallbuttwoof the F(a, p; ~; Z) convergesconditwnolly ~O<(a + B – y) <1 (A.19)
aboveset, viz., eases 2(a) and 2(d). The restlead to inappropriate
divergenceor convergencebehaviorin the interval.The following F(a, @;~; z) diverges if 1< (a + /3- 7). ■ (A.20)
propertyof GH functions will be needed.
D (Erd61yiet al., 1953).1f(~– a – /3)>0 then
PROPOSITION
PROPOSmrON
B (Spanier& Oldham, 1987). If y = F(rI, ~; y; x),
then r(7)r(7– Q- f7)
‘(a’ 7; 1)= r(7 - a)r(~ -p)
~ = ~F(~ + 1,~+ 1;~ +1; X). ■
dx’y where
Case2(d)Gne-parame
ter GH series: with+(0) = 1. Then thereis a neighborhoodof the origin(in the t-
In thiscase, plane)in whichthe equationu = t+(u) has exactly one root for u.
Let
+k
tj(x) = XF(U, –; -; X7 = x ~ (q ~ = (~ -
xX2)-.
k>O
THEORes14.2. Let a : @~ (–1, 1) be a realanalytic,oti, strictly be the Maclaurinexpansionof the functionf’(u)[~ (u)]”.Then
increasingsigmoid, such that its rnverseq : (–1, 1) ~ S?hasa GH
series expansion in some injective, oaii, increasing C’ fitwn g(.),
with at most threeparameters, convergentm (–1, 1). Also let V’ have 0# = : c.-,.
a GH series expansionin g(.), with at most one parameter. Then,
either
Here, y au, xs r, and ~(u) = (1 – yz)o. Takef(u) = ua y,
and the theoremfollowsfrom the Lagrangeinversionformula.
w =d)’)+>
;;:;W))’) THEOREM
5.3. Let
m (a)k (g(y)p ~ilha>l,
“g(Y) k~o~ —k! ‘ (A.21)
=~~o
“ b+, X*
a(x) pk + 1)!
or
bean expansionfor a hyperbolic sigmoid, with an inverse of theform
v(~)= g(y)~(a, –; –; (@))z) yF(a, 1/2; 3/2; y2), valid in some neighborhoodof the origrn. Tken,
i?(Y) (A.22)
bk =0. ~ &+ I = c(2k + 1, k), wherewe d@ne the sequence
=(1 - (g(y))’)” ‘‘i’ha >0 C(n,k) osfollows:
where g’(.) is thejirst derivative of g(.). Here, n and k are natural numbers. D“(u(x)), ths nth derivatiws of
U, are given by:
Prooj The proof of Theorem4.2 is very similar to that for
~e~o~c~;i;~:eti;~t with q(x) =g(x)F(a; 1/2; 3/2; (g(x))’), D“(y) = D“(u(x))
“-l
THEOREM
5.1. I’ the inverse sigmoidis given by y/(1 – yz)”, a >0, Considerthe derivativesof the polynomialf~l(x) = Xk (1 – X2)’,
then in some neighborhoodof the origak,we have the valid expression
- 21X’+’(1 - X’)*’-’
= (k)fi-,,e, (x) + (–2/)&,4-,(x)
where, = ~(jk,, (X)) + ~(_fk,, (x)). (A.27)
~+’=(-l)k(u+l)’
((ui’)a) (A.24) In eqn (A.27)we havesplit the effectof the operator
=
C(I, O)fo,o(z) ‘n 1
C(2, w-l,2tY(4
n = 2
C(4, o)f_3,4e(z) C(4, l)j_l,4e -1 (z) C(4, 2)fI,ta - 2(z) c(4,3)f3,4a -3(Z) n = 4
FIGURE3. BinaryWorlvalion’g
treefor hyperbolkslgmoids.