Professional Documents
Culture Documents
Greb Lick I 1994
Greb Lick I 1994
5 , SEPTEMBER 1994
Abstract-Systems consisting of linear dynamic and memory- teristics cannot be represented in a parametric form. In
less nonlinear subsystems are identified. The paper deals with this paper, our a priori knowledge about the unknown
systems in which the nonlinear element is followed by a linear
element, as well as systems in which the subsystems are con- characteristic is nonparametric, and we apply the non-
nected in parallel. The goal of the identification is to recover the parametric regression methodology to recover it. We refer
nonlinearity from noisy input-output observations of the whole to t91, [lo], t191, t221, t301, t331, D51, W I , 1391 for a
system; signals interconnecting the elements are not measured. summary and large bibliography concerning nonparamet-
Observed values of the input signal are rearranged in increasing ric techniques.
order, and coefficients for the expansion of the nonlinearity in
trigonometric series are estimated from the new sequence of In this paper, we identify nonlinear dynamic systems
observations obtained in this way. Two algorithms are present- consisting of memoryless nonlinear elements and linear
ed, and their mean integrated square error is examined. Condi- dynamic subsystems. We consider two types of such sys-
tions for pointwise convergence are also established. For the tems, i.e., a system in which the nonlinear subsystem is
nonlinearity satisfying the Lipschitz condition, the error con- followed by the dynamic one, and a system in which the
verges to zero. The rate of convergence derived for differentiable
nonlinear characteristics is insensitive to the roughness of the subsystems are connected in parallel. In order to recover
probability density of the input signal. Results of numerical the characteristic of the nonlinear subsystem in both
simulation are also presented. cases, we propose two identification algorithms employing
Fourier expansions. In order to estimate the coefficients
Index Terms-Nonlinear system identification, nonparametric of the trigonometric expansion of the unknown character-
regression, order statistics, spacings, orthogonal series. istic, i.e., the Fourier regression coefficients, we rearrange
input observations in increasing order. In this way, we
I. INTRODUCTION partition the range of the input signal into intervals whose
ends are determined by consecutive ordered observations.
R EGRESSION analysis is a standard tool used for
recovering a general relationship between two ran-
dom variables. Applied to nonlinear system identification,
All the intervals (called spacings in the statistical liter-
ature) have random length; see [32]. As a result, the
estimates of the coefficients, as well as the identification
the analysis makes possible the recovery of the nonlinear
algorithms themselves, are of the form of a certain combi-
characteristic of the system, provided that the nonlinearity
nation of order statistics.
can be represented as a regression function. Classically,
In this way, we obtain computationally simple estimates
the relationship between the random variables, i.e., be-
of the unknown nonlinear characteristic. We examine
tween input and output signals of the identified system, is
both global and pointwise properties of our estimates.
postulated to be of a parametric form. There is no partic-
Imposing some smoothness restrictions on the identified
ular reason, however, to assume that the data observed in
characteristic, we give the rate of convergence of the
a certain real system are actually related parametrically.
algorithms of the characteristic. Our estimates converge
One is obviously closer to real situations when one as-
at a rate insensitive to the roughness of the probability
sumes that the identified characteristic, i.e., a regression
density of the input signal. In fact, we only require for the
function in terms of mathematical statistics, is just bound-
density to be bounded away from zero on its support.
ed, or continuous, or satisfies the Lipschitz condition, or is
The algorithms are also used to identify a nonlinearity
differentiable, or square integrable, and so on. Each of
in a system in which the memoryless and the dynamic
those assumptions is clearly very mild, and leads to non-
systems are connected in parallel. We also present some
parametric inference since the class of all possible charac-
results of simulation experiments.
Manuscript received June 17, 1992; revised May 15, 1993. The work of The nonparametric approach to the identification of
W. Greblicki was supported by NSERC Grant A8131 and a KBN Grant. Hammerstein cascade systems, i.e., systems in which a
The work of M. Pawlak was supported by NSERC Grant A8131 and the nonlinear memoryless element is followed by a linear
Humboldt Foundation. This paper was presented in part at the IEEE
Conference on Information Theory, San Antonio, TX, January 1993. dynamic subsystem, has been proposed by Greblicki and
W. Greblicki is with the Institute of Engineering Cybernetics, Techni- Pawlak [14]-[16]; see also Greblicki [121, Pawlak [31], and
cal University of Wroclaw, 50-370 Wroclaw, Poland. Krzyzak [25]-[26]. Wiener systems, i.e., systems in which
M. Pawlak is with the Department of Electrical and Computer Engi-
neering, University of Manitoba, Winnipeg, Man., Canada R3T 5V6. the subsystems are connected in reverse order, also can
IEEE Log Number 9404932. be identified nonparametrically; see Greblicki [171. Identi-
fication algorithms introduced in this paper are simpler Lemma 1: Let f satisfy (1). Then, for any real p > 0
than those applying orthogonal series examined by Gre- and any n 2 1,
blicki and Pawlak [15]-[16], Greblicki [12], Pawlak [31],
and Krzyzak [251. We directly expand the unknown non- E(u,,, - qi-,))’
Irp6-Pn-P
linear characteristic into trigonometric series, while the
authors mentioned above expanded some functions de-
any i = 1,2;.., n + 1, some 7p > 0,
pending on the unknown characteristic and then recov-
ered the nonlinearity from the expansions. As a conse-
quence, algorithms proposed by them are of the form of a
fraction. Moreover, the convergence rate of our estimates Moreover,
is insensitive to the roughness of the input probability
density, a property not observed for their estimates. Hq,) qi-l))(qj)
- - Is - 2 ( ( n+ l > ( n + 2 w 1 ,
From another viewpoint, our approach is called block-
oriented; see, e.g., Billings [51, since we assume that the any i, j = 1,2,.-., n + 1 such that i f j .
nonlinear dynamic system consists of relatively simple Remark 1: Owing to (B.3) in Appendix B, we note that
elements and has a known structure. The goal is to 7p = (&/(& - 1))pIXp) I 1.67pUp1, where J X p )
recover properties of subsystems from input-output ob- is the gamma function.
servations of the whole system. Because signals intercon- Proof of Lemma 1: Since, by virtue of (11,
necting the subsystems are not measured, the subsystems
can be identified up to some unknown constant, which is
also the case with this paper. This approach is an interest-
ing proposition in the area of identification of nonlinear application of (B.4) and (B.5) given in Appendix B yields
systems; see Bendat 141 and Brillinger [6]. A nonparamet- the first and third parts of the lemma. In order to verify
ric approach to other classes of nonlinear systems has the second, observe
been made by Georgiev [ll]who, however, imposed on
the system some restrictions rather difficult to be verified.
For related problems of nonparametric inference from
j= 1
dependent data, we refer to [191, [22], 231, and references
cited therein.
11. PRELIMINARIES
n+l
In this section, we give some basic results concerning
order statistics which will be used in the paper. Suppose j= 1
that {U,; II = , - 1,0,1,2, } is a sequence of inde-
- 1 .
pendent random variables, bounded to the interval [ - r r , which completes the proof. 0
TI, and having a probability density denoted by f. We 111. IDENTIFICATION
ALGORITHMS
assume, however, that there exists’ 6 > 0 such that
We now identify a nonlinear dynamic system consisting
of a nonlinear memoryless subsystem followed by a linear
dynamic one, i.e., a Hammerstein system. We present and
examine two algorithms recovering the nonlinearity from
all u E ( - n-, n-), i.e., it is assumed that f is bounded noisy input-output observations.
away from zero on ( - n-, n-). In further parts of the paper, The identified cascade system, Fig. 1, is driven by a
{U,} will be the input signal of the identified system. We random process defined in Section 11, i.e., by a strictly
rearrange the sequence U,, U,,..., U, into a new one: stationary white random noise {Un; n = ... , - 1,0,1,
2, ... }. Input random variables U,’s are bounded to the
interval [ - r r , n-1. Their probability density denoted by f
is unknown. It nevertheless satisfies (1).
The system comprises two elements, the first of which is
in which U,,) < q2,< < U,,,. Ties have zero proba- nonlinear memoryless and its characteristic is denoted by
bility since the U,’s have a density. Moreover, we de- m. It means that
fine qo, = - n- and U,,, l ) = n-. The sequence qo,, U,l,, W, = m(U,). (2)
q,+
U,,);.., U,,,, 1) is called the order statistics of Ul,
U,,..., U,, while q n + l ) - q n , , q n ) - U,,- 1),*.., ql) - qo) We assume that m is Bore1 measurable and satisfies the
are called spacings. Spacings play important role in the Lipschitz condition of order CY,i.e., that
paper, and we now give some useful results (see [81, [32],
Im(u) - m(u)l Iylu - ula, (3)
[40] for a review of the theory of order statistics and
spacings). We shall need the following. some0 < CY I1 and y > 0, all u , u in [ - n - , r r ] .
1476 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 40, NO. 5, SEPTEMBER 1994
U,
m (3
wg rG
subsystem
yk
[l],[2]. Hence, if {m(U,)}is a sequence of i.i.d. symmetric
Bernoulli random variables, then {Y,} is not strongly mix-
ing [l], [2]. The latter requirement can be easily satisfied
by taking m(u) = 1 for u 2 0 and m(u) = 0 otherwise,
with f(u) being a symmetric density on [ - T ,TI. This
Fig. 1. The cascade system.
observation should be contrasted with the most literature
on the nonparametric estimation for dependent data,
where some sort of mixing conditions are assumed; see,
The dynamic subsystem is described by a state-space e.g., ~191, [22], [23].
equation The identification algorithms proposed in this paper
X,,, =AX, + bW, make use of the trigonometric series { e i k u ;k = 0, 5 1, k
(4) 2, ... }, orthogonal on the interval [ - T ,T]. p is obviously
V, = cTX, + dW, integrable and
where X , is a state vector at time n and where the matrix
A as well as vectors b, c, and d are all unknown. We
assume nevertheless that A is an asymptotically stable
matrix, i.e., that all its eigenvalues are inside the unit where
circle. The system is disturbed by a stationary white ran-
dom noise {Z,; n = ... , - 1,0,1,2, ...} and
Y, = v, + 2,.
The noise is independent of the input signal and has zero is the kth Fourier coefficient of the expansion of p. It is
mean and finite variance. Thus, {x;
n = -1,0,1,
.e.,
worth noting that in [24] the problem of recovering the
2, ... } is a stationary stochastic process. regression model
Our goal is to recover m from observations (Ul,Y,),
<U,,Y,>,..-,(U,,Y,)taken at the input and output of the
system.
Defining [, = c’(X, - EX,), we get
has been studied (clearly, e i k u can be replaced by other
Y, = /-dun) + 5, + z, ( 5 ) orthogonal functions). Here, one would like to estimate
where p ( u ) = dm(u) + p, p = cTEX,,, which means that { e k }and q (model order). It has been assumed, however,
p is observed in the presence of noise 6, + Z,, of which that q is finite and smaller than some known number.
the first component is correlated, while the second is Our case is fully nonparametric since m(u) is represented
white. Observing E{Y,IU, = U } = p(u), we shall now pre- by =ekeiku.
sent two algorithms for estimating the regression of Y, on We estimate the Fourier coefficients in (7) from the
U,, i.e., algorithms which recover p. That we are able to order statistics qo,, q,),q2),-.., q,+
q,,, rather than
identify m up to some unknown constants d and p is a from the input sequence itself. In order to estimate the
simple consequence of the cascade structure of the system coefficients, we rearrange the sequence of input-output
and the fact that the signal W, interconnecting the subsys- observations (U,, Y,),(U2, Y2);**, (U,, Y,) into the follow-
tems is not accessible. Nevertheless, if f(u) is symmetric ing one:
and m(u) is odd, then Em(U,) = 0, and if, additionally,
d = 1, then EX, = 0, and consequently, p ( u ) = m(u).
It is worth noting the (5) can be written in the input-
output form Note that yi,’s are not ordered; they are just paired with
cc
qi)’S.
We propose two estimators of c k , namely,
Y, = Cg,m(U,-J + z,
1=0
As a consequence, our identification algorithms are the observations tends to infinity. We also give the rate of the
following: convergence. For algorithm (81, the MISE is defined as
jxu) = Zkeiku (8)
lkl Iq ( n )
and and similarly for (9). We assume that the integer sequence
satisfies the restriction
,Xu) = EkeikU (9)
Iklsq(n)
q(n) -+ 00 as n + 00 (10)
/xu) =
n
- ql-,))}z
E{ tp](qz) 5 PIE(q,, - qz-1))2>
we have
any i and any n 2 1.
Furthermore,
E{ 5[,1t[,IWl)- q,- I ) ) )
q1- I ) ) ( % ) -
f
some positive S, and 6,, which completes the proof. 0
Remark 4: The result of Theorem 1 can be extended to +(1/2)ji2k2E (q,) q,-l))'
- .
the case when the noise process Z , in ( 5 ) is not homoge- {J'l
neous, i.e., the input-output relationship is given by Applying Lemma 1, we find Vi 5 Sin-'" + 6;k2n-',
some Si, a:, and finally,
Y, = p(U,) + 6, + a(U,)Z,
MISE(,&) IG,q(n)n-' + 6,,q(n)n-2"
where d u ) is a measurable bounded function on ( - n-, n-)
and {Z,} is independent on {U,}. It is obvious that all + 61,q3(n)np2f lck12, (20)
lkl>q(n)
terms in (14), except VI, remain unaltered. The term V, is
now of the form Some 6,, 6,,, a,,. Observing that (13) means that
[ q ( n ) / n 2 / 3 ]-+
3 CC as n -+ CO, i.e., that (13) implies (111, we
can complete the proof. 0
Hence, if a 2 1/2, then q ( n ) / n + 0 and q3(n)/n2-+
0 are sufficient for convergence of MISE(fi) and
This does not exceed MISE( ,&I to zero, respectively. Clearly, q3(n)/n2-+ 0 is
more restrictive than q ( n ) / n -+ 0. This is due to the fact
that E, defined in ( 5 ) is a less accurate approximation of
ck than c^k in (4). However, this scheme fails if m(u) is so
rough that a < 1/2. Then q(n)/n2" -+ 0 is required for
The next theorem concerns algorithm (9). c ( u ) and max(q3(n)/n2, q(n)/n2")-+ 0 for ,&(U). Hence,
Theorem 2: Let f satisfy (l), and let m satisfy (3) with for a < 1/3, the condition q(n)/n2" -+ 0 has to be as-
0 < a 5 1. Let A be an asymptotically stable matrix. If sumed for both estimates. If 1/3 < a < 1/2, the condi-
(lo), (121, and (13) hold, then tion q3(n)/n2-+ 0 is less restrictive than q(n)/n2" 0. -+
V. POINTWISE
CONVERGENCE
MISE(,&) --j 0 as n -+ W.
In this section, we study the pointwise properties of our
Remark 5: Selecting q(n>= ne, we find the estimate estimates. These are summarized in the following theo-
consistent for 0 < E Imin (2/3,2 a ) . rem.
GREBLICKI AND PAWLAK: DYNAMIC SYSTEM IDENTIFICATION 1479
Theorem 3: Let f satisfy (l),and let m satisfy (3) with Concerning the term V,, note that by the Cauchy-
0<aI 1. Let A be an asymptotically stable matrix. If Schwarz inequality, we obtain
(10)-(12) hold, then
E ( , L ( ~-) p(u>>, + o as n + m.
V, Ivar(Z,)6pl ~ ; ( , , ( u - u >du
j= 1
k= 1
(dk - d k + l ) L k + d,L,
E(ji(u) - P(U)>* 5 8(V; + V, + v3 + 1/41 where L , = E:=, /$~!~~Di(,,(u- U ) du. Noting that Lk 5
+ 2(Sq(,,(U) - du)) (3/2n-)q(n) and applying results of Appendix B.l, we can
obtain that VI = O(q(n)/n).Proceeding in a similar way
where and using the result of Lemma 3, we can also conclude
VI = var(2,)
J=1
2 E ( lqJJ
q]-IJ
Dq(,,(u - U ) du
r ,
that V, = O(q(n)/n). The proof of Theorem 3 is thus
complete. 0
V, = E { tt,,p
/=1 %I)
Dq(,)(U - U > du
r ,
VI. CONVERGENCE
RATES
In order to present the convergence rate of the algo-
rithms, we need the following lemma.
Lemmu 4: (Lorentz's inequality [281; see also Bary [3,
V3 = E { 5
J=1
/ q J )[ p(%)) - P(')]
q1-1)
Dq(,)(u - du
f pp. 215-2171]: Let g defined on [ - n-, n-I satisfy the
Lipschitz condition of order a , where 0 < a I
yla,
some c > 0, all x,y E [ - r ,TI. Then
where
o[(
=
8 E (q,,,qn+,,).
y] From Lorentz's inequality and (3), it follows that
Clkl,q(n,l~k12
following.
I p/q(nI2", some p > 0, provided that 0 <
a 5 1. From this, (31, (19), and (20), we easily get the
1480 IEEE TRANSACTlONS ON INFORMATION THEORY, VOL. 40, NO. 5 , SEPTEMBER 1994
Theorem 4: Suppose the f satisfies (1) and m satisfies Cl,,> (“)IckI2I d/q2”(n), p = s + a , some d > 0. Hence,
(3). Let 0 < CY I1/2. For q(n) = n2a/(1+2a! for q{n) 2 n 1 / ( 1 + 2 ~ ) ,
MISE ( jl) = O(n-2”(’+2p)) and
MISE ( jl) = O(12-4a2/(1
+2a)) and MISE ( f i ) = O(n-”’/(’+2’’)).
For a smooth characteristic m , i.e., for large p , the rate
MISE( f i ) = 0 ( ~ - 4 * ’ / ( 1 + 2 a ) 1. becomes close to n -’, i.e., the rate typical for parametric
Let 1 / 2 ICY I1. For q(n) = n’/(lt2”), inference. Moreover, the rate is independent of the
roughness of the input density f since the result holds for
MISE( E ) = O ( n - 2 a / ( 1 + 2 aand
)) any density satisfying (1). The density can be an arbitrarily
rough function and, in particular, can have no derivative.
MISE ( G ) = O(n-2*/(’+2*)). Such a property has not been observed for other regres-
Furthermore, Theorem 3 and Lemma 4 yield the follow- sion estimates, even when a regression is observed in the
ing. presence of only a white noise. It should be stressed here
Theorem 5: Suppose that f satisfies (1) and m satisfies that in our identification problem, the noise disturbing the
(3). Let 1 / 2 < CY I1. For q(n) = n 1 / 2 a , regression has two components, white and correlated.
It it also interesting to compare the rate with those
E( jl(u>- p ( u ) ) 2 = O ( n - ( 2 a - 1 ) / 12 a and achieved by other estimates. Of those, the kernel estimate
is most thoroughly examined in the literature, and there-
E( f i ( u ) - p ( u ) I 2 = O ( n - ( 2 a - 1 ) / 21.u fore results obtained for it are not easily equaled. It is
known, see e.g., Hardle [13], that the classical kernel
From the theorem, it follows that the algorithms behave estimate (Nadaraya-Watson estimate)
asymptotically very alike. In particular, for a = 1, MISE n
for both estimates is O(n-2/3),provided that q ( n ) = n1/3, E;K((u - U , ) / h ( n ) ) K ( < u - U , ) / h ( n ) ) , (21)
whereas the mean-square error is O ( n - ’ i 2 ) for q ( n ) = J=1
some c > 0, 0 < a I1, all x,y E [ - r r , r r ] . Thenhas been studied. This is clearly a counterpart of our
estimate (9). The pointwise properties of this estimate
1lZkl2 2 d / q 2 ( s + a ) ,
have been studied, and it has been demonstrated that it
lkl>q
has the mean-squared error of order O ( t f Y ) ,where
some d > 0, where ak’s are the Fourier coefficients of g. y = 2 min [ p / ( 2 p 1),s / ( 2 s + +
01. The density of U is
The proof of this lemma results from [28] and integra- assumed to be defined on a compact set, continuously
tion by parts; see also [9]. There is also a pointwise differentiable and to satisfy condition (1).
counterpart of Lemma 5; see [13, Lemma 21. It is worth These rates have been obtained for conditions more
noting that the periodicity assumption g ( r ) ( - r r ) = g ( r ) ( r ) , comfortable than ours, i.e., for a regression observed in
r = 0, 1;.., s - 1 must be required due to the well-known the presence of white noise only, and for independent
edge effect (Gibbs phenomenon) of trigonometric series; pairs (U,,E;)’s. Observe that the rate equals that derived
see [3]. In fact, if this condition is not assumed, then by us, provided that the input density has p derivatives
[for the estimate in (21)] and that the kernel is suitably
= O(q-2),
selected. Our estimate achieves that rate for any density
Ikl>q satisfying (l),i.e., for a very wide class of densities. There-
fore, if the density does not have a derivative, the rate
regardless of how many derivatives of g(x) exist. As a obtained for our estimate seems to be better. For exam-
result, the rate of convergence cannot be faster than ple, for m and f having three and one derivatives, respec-
O(n-2/3).This phenomenon need not occur for other tively, and for a symmetric kernel function (a common
orthogonal series. See also [20] for some techniques for choice in practice), the kernel estimates in (21) and (22)
removing edge effects in the context of kernel regression converge as fast as O(n-2/1) and O(n-4/5),respectively,
estimates. while ours have the convergence rate It should
Observe now that if m(u) satisfies the conditions of be also noted that the rate derived by us equals the best
Lemma 5, then also p ( u ) = d m ( u ) + p does. Hence, possible one obtained by Stone [381.
~
I I
1481
y.--rl-u
GREBLICKI AND PAWLAK: DYNAMIC SYSTEM IDENTIFICATION
subsystem .
"
y
Finally we mention that, so far, the following orthogo- some physical interest is pictured in Fig. 2, where the
nal series algorithm (in random design setting) has been nonlinear memoryless element (2) and the linear dynamic
studied: system (4) are connected in parallel; see [41. That is,
= AX, + bU,
iikeiku
(23)
and
X,,,
V , = cTX, + dU, 1
where r,=V,+w,+z, (24)
t n I n
where W, = m(U,).
Clearly, this model can also be expressed in the signal-
plus-noise form as in (51, i.e.,
are estimates of the coefficients of expansions of f ( . ) p ( - )
and f(.) in the trigonometric series, respectively; see [12], Y, = r](u,)+ 5, + z,
[13], [151, [161, [21], [25], [31]. The estimate is of a frac- +
where ~ ( n =) m(u) + du p, p = cTEX, and t,, =
tional form, and its pointwise convergence rate is sensitive cT(Xn - EX,).Thus, E{Y,(U, = U ) = r](u), and one can
to the roughness of the input density; see [13] and [21] for recover m(u) up to a linear function. Hence, let $(U) and
some specific convergence results concerning P ( u ) in the G(u) be estimates of r](u) defined as in (8) and (9),
case of independent data. respectively, where now I; is generated from the model
VII. OTHERSYSTEMS (24). In order to show that Theorems 1 and 2 hold for
$(U) and ? ( U ) , let us first observe that Lemmas 2 and 3
Some authors have examined dynamic systems de- are in force. In fact, since now t,, = C , " I I , C % "b(U,
-'~~
scribed by (4) and assumed that d = 0. In such a case, - E q ) , therefore the proof of Lemma 2 is valid here. To
y, = V ( U , _ , ) + 5, + z, evaluate other terms in the proofs of Theorems 1 and 2, it
remains to observe that if m(u) satisfies (31, then
+
where v ( u ) = cTbm(u) &EX,, and 5, = c?A(X, -
EX,). We can say that the regression v is observed in the Iq(u) - v(u)I I(y + d ( 2 n ) 1 p " ) l ~- ula, (2.5)
presence of noise L,, + Z,. Therefore, one can recover v
from pairs (Ul,Y,),(U2,YJ,...,(U,, Y,, 1 ) , i.e., from order i.e., r](u) is also Lipschitz with the same order as m(u).
statistics All of these considerations yield the following consistency
result concerning the identification of the parallel model.
(ql,,? 2 ] ) , (qz),q 3 1 ) , . * * >(qn,, ? , + I ] ) . Theorem 6: Let f satisfy ( 0 , and let m satisfy (3) with
0 < a 5 1. Let the linear dynamic subsystem of the paral-
In the definition of our estimates, YJl should be replaced lel model be an asymptotic stable. If (10)-(12) hold, then
by ? , + l l . One can easily verify that all results in Ap-
pendix A given for t, hold also for 6,. MISE($) -+ 0 as n + W.
Another extension would include a general description
If, on the other hand, (lo), (12), and (13) are satisfied,
of the cascade model of the form
then
n
MISE(5j) 0 as n
,=C
+ -+
Y, =
--li
gn-,y +zn W.
100
. Error(x100)
gest@)
60
40
20
0
0 100 200 300
Fig. 3. Error versus n for estimates (8), (91, (23); m(u) = m,(u).
This is not the case, however, under the assumptions of and (24) have been used:
Lemma 5 since ~ ( m -) +-m) = 2dm and the global
X, = ax,_,+ m(U,>
rate does not exceed O ( n P 2 I 3 )Nevertheless,
. if d is (26)
known, then ~ ( u-) du satisfies the conditions of Lemma Y, x, + z,
=
7 , I
Fig. 4. Error versus q for the estimate in (8), for different distributions of Un;m ( u ) = m,(u), n = 128.
60 I 1
Error(x100)
50
40 - f3
30 -
20 -
10-
7
I q I
0 10 20
Fig. 5. Error versus q for the estimate in (23), for different distributions of U"; m(u) = m,(u),n = 128.
is minimized at q = 2 in all cases. Fig. 5 shows an analo- Fig. 6 shows m 2 ( u )along with estimates (8), (91, and (23).
gous result for the estimate in (23). A strong sensitivity to Optimal values of q(n) were used, and they are equal to
different input distributions can be observed. In particu- 3, 2, and 5 , respectively. All estimates exhibit poor behav-
lar, the estimate exhibits a poor performance for the ior at the end points; see also Section V for a theoretical
discontinuous density f 2 ( u ) . The error is minimized at discussion concerning the boundary problem.
q = 4 in all cases. Now let a = -0.2 in (26) and (27) and n = 128. The
So far, we have employed the characteristic m,(u>which nonlinear characteristic is an odd function of the form
is well behaved at the boundary points U = f r , i.e., m 3 ( u ) = 0 . 5 ~+ 0 . 5 ~cos ( U ) + 2 sin ( U > + 0 . 5 ~cos ( 2 u ) +
m J - T ) = ml(n-)and my)(- r) = my)(n-),Y = 1,2, ... . sin ( 2 u ) and Z , is Gaussian (0; 0.1). The error versus q
In order to see a possible deterioration in the estimate's for the cascade and parallel systems is depicted in Figs. 7
performance for nonperiodic characteristics, let us con- and 8, respectively. Note, moreover, that the error for the
sider the following function: parallel model is calculated as (l/n)Cr= ,El ;<U,> - -
0.8 -
0.2 -
0.8
-3 -2 -1 0
t
I
1 2 3
U
(b)
0.8 -
0.2 -
1
GREBLICKI AND PAWLAK DYNAMIC SYSTEM IDENTIFICATION 1485
100
Error(x100)
300
Error(x100)
200 -
est9) /
100-
0 10 20 30 40
Fig. 8. Parallel dynamical system. Error versus q for estimates (8), (9,(23); m(u) = m&), = 128.
parallel model is about twice as large as the one for optimal, but only in the asymptotic sense, and it depends
the cascade model. To explain this phenomenon, let us on some unknown characteristics of the system. Thus, the
write the state equations for both systems in the sig- natural question arises how to select an optimal (or nearly
nal-plus-noise form, i.e., Y, = m(U,) + c,, where 5, = optimal) q(n) directly from the data. This problem can be
ZII?,anp”z(q> + 2, for the cascade model and f = carried out by some form of cross-validation techniques.
Zl:l..an-Jq + 2, for the parallel one. From this we For example, the so-called future prediction error (see
can easily calculate the variance of 5, for both systems. [lo], [193, [211, [221, [261) CJ‘Xq) = (1/’n)Zy=1ly -
+
This is equal to u 2 var(m(U))(l - a 2 I p 1 and a’ + j.k,(q)I’ can be a good candidate for the choice of q(n).
var(U)(l - a ’ ) - ’ , respectively, where u’ = var(2). In Here, / L , ( u ) is the version of b(u) calculated from all of
our particular case, i.e., m(u) = m,(u), a simple algebra the data points except the jth. This technique requires a
yields var ( U ) > var (m,(U)). Hence, the parallel model considerable amount of computing as the estimate has to
has a smaller signal-to-noise ratio than the cascade one. be formed n times in the computation of CV(q). A
Clearly, the inverse property can occur as well. In fact, if simpler and equally efficient technique aims at minimizing
m(u) is odd and satisfies c l u s m(u) 5 c 2 u for some the penalized version of the residual error (this is often
positive c1, c2,c2 2 c1, then var(U) < var(m(U)) if only called the generalized cross-validation technique):
c1 > 1. All the aforementioned considerations reveal the
importance of the selection of the sequence q(n). One
has to specify the q ( n ) before our estimators can be
applied. The prescription for q(n) given in Theorem 3 is
1486 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 40, NO. 5, SEPTEMBER 1994
100
Error, G (~100)
0 2 4 6 a 10 12
0 ; I I I I I
0 2 4 6 a 10 12
Fig. 10. Error and G versus q for &(U>; m(u) = m,(u), a = -0.2, IZ = 128.
where Ln(q) is a suitable increasing function of q. One ear dynamic systems. Since the characteristic has been
possible prescription for .R(q)is (1 - 2m(q +
1/2)/n)-’, expressed as a regression function, the algorithms are, in
see [lo], [191, [211, [22], [261 for some other choices. fact, nonparametric estimates of a regression function
Fig. 9 illustrates the above approach for the memoryless observed in the presence of additive disturbance. The
system using the estimate in (9). The error (9) and G ( q ) disturbance is a sum of white and correlated noise; the
measures are plotted with m(u>= m,(u) and n = 128. It source of the latter is the dynamic subsystem. The algo-
is seen the the minimum of G ( q ) agrees with the mini- rithms differ from each other by the way the Fourier
mum of error (4). Note, however, that G ( q ) is much coefficients are estimated. Clearly, is easier to calculate
larger than error (9). than t k .On the other hand, the latter is a more accurate
Fig. 10 shows a similar result in the case of the cascade approximation of c k . Moreover, conditions imposed on
dynamical system. The estimate (9) was used with m(u) = {q(n)}are more restrictive for the estimate using E,.
mg(u), U = -0.2, n = 128. Nevertheless, both algorithms reach the same conver-
The problem of optimality of q selected with respect to gence rate.
C V ( q )or G(q)is left for further studies; see [22] for some We emphasize that the algorithms presented in the
related results in the context of kernel estimates. paper cannot be directly extended to multiinput systems
since the notion of spacings would have to be generalized
IX. FINAL
REMARKS to vector spaces. It is, however, known that even a multi-
In this paper, we have proposed two nonparametric dimensional space of observations can be partitioned into
algorithms to recover a nonlinear characteristic of nonlin- subsets whose properties are the same as those of uniform
1 __
.. .
GREBLICKI AND PAWLAK DYNAMIC SYSTEM IDENTIFICATION 1487
sup
n E m ,I;.. I
( A " ) + =P and sup
nE{O,I;..)
2
k=O
(Ak)'= Q
IE(VIIup,uq,U,,U,II
4- 1
(A.1)
where AI = m(q)
- Em(U,), and that 1Al1 I2M, where M =
sup Im(u)l,we have some c1 dependent only on A , b, and c , any p , any q. Since
EA,= 0, for p , q, r , s all different, we find
~ u p l t=
~ ~s ~~pll 5 ~I12 M ( c T ) ' Q b +
i i
and the first part of the lemma follows.
Verification of the second part is much more difficult. Let us
first observe that (C7Ap-r-1bc%q-s-'bArh~, if s < q < r <p
- CTAp-s-lbCTAq-r-lbhrh,, if r < q < s < p
- io, otherwise.
This yields
= var(m(U,))
r=O
c%P+"bc%q+'-' b
2 y ( p , q , r , s) 5 c2n3, (A.6)
+E
{ :r: c%p-r-lbh,
p,q,r,s=l
i
4- 1
. c TA4- I - bAI I Up,U4,U,, U, (A.2) the following event:
I= 1
for any p , q, r , s 2 1. It is evident that the absolute value of the Apqrs = is the ( i -1) smallest, U4 is the ith smallest,
first term o n the right-hand side of (A.2) does not exceed
U, is the ( j - 1 ) t h smallest, Up is the j t h smallest),
var (m(Ul))(cT>'( A P - ' ) + Q ( b ) +( c T ) + Pb. (A.3)
Concerning the second term o n the right-hand side of (A.21, let i.e., U,, u4,
U,, up have ranks (i - l),i , ( j - l), j , respectively.
+ 1. Then, this
us assume, without loss of generality, that p > q Owing to (A.9, we have
term can be decomposed as follows:
B. 1 Uniform Spacings
APPENDIXB
0
N -
and (B.1), we can get the following asymptotic expression E d !
pT(p)n-P as n + 00, i.e., nPEdP p T ( p ) as n + 00. Recall-
ing that the joint density of d, and d,, see [32, p. 3981, is given by
(B.1)
i = 1,2;.., n + 1. Clearly, b , , b,;.., b,, are random and add
any i , any real p > 0, and in particular, E d , = l / ( n + l), up to 1. The joint distribution of { b , , b,;.., b,+,) is the same,
E d : = 2/(n + 1Xn + 2), and so on. Here, T ( x ) is the gamma
see [40], as that of (d,, d,;.., d, discussed in the previous
+
function. In the proof of Theorems 1 and 2, we need a bound for section for uniform spacings, and therefore
E d f , any real p > 0 and any n 2 1. Using (B.1), it is clear that Eb,P I r,,iCP (B.4)
if p is an integer, then
E df IpT(p)n-P where T,, = 1.67pT(p) for any i any p > 0, and any n 2 1.
Furthermore,
for any i and any n 2 1.
Now let p be a real number. Then p can be represented as Eb,b, = l / ( n + l ) ( n + 2>, (€3.5)
p = s + y , where s is an integer and 0 < y < 1. By invoking
(B.11, we can get any i, j , such that i # j .
ACKNOWLEDGMENT
The authors thank the reviewers for their useful com-
ments.
(B.2)
REFERENCES
It is known that for 0 < y < 1, r ( n + y) 2 nYT(n) - [l] D. W. K. Andrews, "Non-strong mixing autoregressive processes,"
e-nny+n-l , and as a result, we have J . Appl. Prob., vol. 21, pp. 930-934, 1984.
[2] -, "A nearly independent, but non-strong mixing, triangular
array," J . Appl. Prob., vol. 22, pp. 729-731, 1985.
[3] N.K. Barry, A Treatise on Trigonometric Series. Oxford: Perga-
T(n +Y) mon, 1964.
GREBLICKI AND PAWLAIC DYNAMIC SYSTEM IDENTIFICATION 1489
J. S. Bendat, Nonlinear System Analysis and Identification. New R. Kannurpatti and G. W. Hart, “Memoryless nonlinear system
York: Wiley, 1990. identification with unknown model order,” ZEEE Trans. Inform.
S. A. Billings, “Identification of nonlinear systems-A survey,” Theory, vol. 37, pp. 1440-1450, 1991.
Proc. IEE, vol. 127, pp. 272-285, 1980. A. Krzyzak, “Identification of discrete Hammerstein systems by
D. R. Brillinger, “The identification of a particular nonlinear time the Fourier series regression estimate,” Znt. J . Syst. Sci., vol. 20,
series system,” Biometrika, vol. 64, pp. 509-515, 1977. pp. 1729-1744, 1989.
C. K. Chu and J. S. Marron, “Choosing a kernel regression -, “On estimation of a class of nonlinear systems by the kernel
estimator,” Statist. Sci., vol. 6, pp. 404-436, 1991. regression estimate,” IEEE Trans. Inform. Theory, vol. 36, pp.
H. A. David, Order Statistics, 2nd ed. New York: Wiley, 1981. 141-152, 1990.
L. Devroye and L. Gyorfi, Nonparametric Density Estimation: The K. C. Li “Asymptotic optimality of C,,, C,, cross-validation and
L , View. New York: Wiley, 1985. generalized cross-validation: Discrete index set,” Ann. Statis., vol.
R. L. Eubank, Spline Smoothing and Nonparametric Regression. 15, pp. 958-975, 1987.
New York Marcel Dekker, 1988. G. G. Lorentz, “Fourier-Koeffizienten und Funktionenklassen,”
A. Georgiev,”Nonparametric kernel algorithm for recovering of Math. Z., vol. 51, pp. 135-149, 1948.
functions from noisy measurement with applications,” IEEE Trans. Y. P. Mack and H. G. Muller, “Convolution type estimators for
Automat. Contr., vol. AC-30, pp. 782-784, 1985. nonparametric regression,” Statis. Prob. Lett., vol. 7, pp. 229-239,
W. Greblicki, “Non-parametric orthogonal series identification of 1989.
Hammerstein systems,” Int. J . Syst. Sci., vol. 20, pp. 2355-2367, H. G. Muller, Nonparametric Analysis of Longitudinal Data (Lecture
1989. Notes in Statistics, Vol. 46). New York: Springer, 1988.
W. Greblicki and M. Pawlak, “Fourier and Hermite series esti- M. Pawlak, “On the series expansion approach to the identifica-
mates of regression functions,” Ann. Inst. Statist. Math., vol. 37, tion of Hammerstein system,” IEEE Trans. Automat. Contr., vol.
pp. 443-459, 1985. 36, pp. 763-767, 1991.
-, “Nonparametric identification of Hammerstein systems,” R. Pyke, “Spacings,” J . Roy. Statist. Soc. B , vol. 27, pp. 395-436,
IEEE Trans. Inform. Theory, vol. 35, pp. 409-418, 1989. 1965.
-, “Nonparametric identification of a cascade nonlinear time
series system,” Signal Processing, vol. 22, pp. 61-7.5, 1991. G. Roussas, Ed., Nonparametric Functional Estimation and Related
Topics (NATO Advanced Study Institute). Dordrecht: Kluwer
-, “Nonparametric identification of a particular nonlinear time
series system,” ZEEE Trans. Signal Processing, vol. 40, pp. 985-989, Academic, 1991.
1992. E. Rafajlowicz, “Nonparametric orthogonal series estimators of
W. Greblicki, “Nonparametric identification of Wiener systems,” regression: A class attaining the opfimal convergence rate in L , ,”
IEEE Trans. Inform. Theory, vol. 38, pp. 1487-1493, 1992. Statist. Prob. Lett., vol. 5, pp. 219-224, 1987.
D. H. Greene and D. E. Knuth, Mathematics for the Analysis of B. L. S. Rao, Nonparametric Functional Estimation. Orlando:
Algorithms, 3rd ed. Boston: Birkhauser, 1990. Academic, 1983.
L. Gyorfi, W. Hardle, P. Sarda, and P. Vieu, Nonparametric Curve L. Rutkowski, “Identification of MIS0 nonlinear regression in the
Estimation from Time Series. Berlin: Springer-Verlag, 1989. presence of a wide class of disturbances,” IEEE Trans. Inform.
P. Hall and T. E. Wehrly, “A geometric method for removing edge Theory, vol. 37, pp. 214-216, 1991.
effects from kernel-type nonparametric regression estimators,” J . C. J. Stone, “Consistent nonparametric regression,” Ann. Statist.,
Amer. Statist. Ass., vol. 86, pp. 665-672, Sept. 1991. vol. 5, pp. 595-620, 1977.
W. Hardle, “A law of the iterated logarithm for nonparametric -, “Optimal global rates of convergence for nonparametric
regression function estimators,” Ann. Statist., vol. 12, pp. 624-635, regression,” Ann. Statist., vol. 10, pp. 1040-1053, 1982.
1984. G. Wahba, Spline Models for Observational Data. Philadelphia:
-, Applied Nonparametric Regression. Cambridge: Cambridge SIAM, 1990.
Univ. Press, 1990. S. S. Wilks, Mathematical Statistics. New York: Wiley, 1962.
W. Hardle and P. Vieu, “Kernel regression smoothing of time A. Zygmund, Trigonometric Series. Cambridge: Cambridge Univ.,
series,” J . Time Series, vol. 13, pp. 209-232, 1992. 1959.