Techniques for High-Speed Implementation of
Nonlinear Cancellation
Sanjay Kasturia, Member. IEEE, and Jack H. Winters, Senior Member, IEEE
Absract—Nonlinear cancellation (NLC) can significa
sauce intersymbol interference ISD) in Tongrhaul direct
tion fiberoptic systems and can thereby
“lsperson-imited data rate a
bby NLC Is achieved by suber
previously detected symbols,
{ulres previous deci
snd days in the feedback
‘ata rate ofa detector operating with NLC. In this paper, we
resent techniques for breaking the bottleneck caused Dy the
Feedback lop. First, we show boo ta simplify the loop to avoid
highspeed hitching of analog signals by using multiple dct
son elements, each with different threshold evel. We then
sow how (o se lookahead computation to increase the delay
permissible inthe feedback loop. These techniques permit NLC
{o be implemented in integrated-ireuit form at rates Timited
only bythe detector sithing speeds. These techakques are aso
‘reful in other applications requiring speedup of fedback loops
With decision elements tn the loop (eg. decision feedback
uatzers)
1. Inrropvcrion
fONLINEAR cancellation (NLC) can greatly reduce
patter dependent intersymbol interference (ISD in a
receiver by sublracting interference caused by previously
detected symbols. With the rapid development of optical
Amplifiers, ISI becomes a major factor limiting data rates
in long-haul direct detection fiber-optic systems: NLC of.
fers the potential of increasing the maximum data rate of
these systems. This potential can only be realized if we
devise techniques for implementing NLC. preferably in
Integrated eieuits, a the high rates desired (gigabits /s).
Previous papers [1], [2] have studied the reduction of
ISI achievable with NLC. It was shown in [2} that NLC
‘ean more than double the maximum data rate of gigabits
per second (Gb/s) lightwave systems by canceling post
‘cursor ISP. Because NLC is # nonlinear technique. it can
handle ISL that arses from a variety of impairments, such
as transmit laser chirp, chromatic dispersion, polarization
‘ispersion, and nonideal receiver frequency response.
[NIC can also reduce ISI caused by bandwidth limitations
The attr ae wih the Netown Syuers Resch, Deporte,
"WSt caused by symbols preceding the one curemy atthe decision el
tice yt caning he decom tres snlor 8) ing» ely
pia egeiae 3). 18.
a
el
of the detecor itself (when itis operated at igh data
fats), thereby increasing the maximum wsef operating
fate ofthe detector. However, NLC, as described it [1],
{2}, is dificult wo implomett at high data rates. Fig. |
shows a simple nonlinear canceler. We see thatthe can
celer in Fig. | requires an analog signal tobe switche at
the data rate in the feedback loop. Such a system i i
fcul wo implement because switching tasiens tend 0
comipt the analog signal. In addition, propagation delays
fof elements in the feedback loop limi the maximum data
rate ofthe detector with NLC. Even though the compar~
tor may beable to operate faster the dats rate is Kime
tothe inverse ofthe maximum propagation and process-
ing delay around the feedback loop because the signal
being fed back must ative atthe comparator before the
rextayibol. This speed limitation, imposed bythe pre
ence of feedback. canbe expressed as an iteration bound?
{Gee Pah and Messerschmit []). For NLC tobe useful
in hgh data rate systems, these problems must be ove
In his paper, we develop implementation techniques
for NLC that eliminate these problems atthe expense of
increased circuit complenity. Integrated cigcuits for de
Cision feedback based equalization at rates of several
tizabits/s willbe easly implemented Using the tech-
niques developed in this paper. Multiple detectors Q',
ubteo og deen elo and CN by rege
{oral be compan ite save op 1)where W is the numberof bits fed hack), each witha dif-
ferent threshold level, are used to avoid the switching of
analog signals. The selection of the threshold level (ie
the feedback signal) occurs after threshold detection. This
is accomplished by choosing the output of the detector
corresponding to the correct threshold rather than by
Switching analog signals. The selection algorithm is then
iterated L times so that the maximum delay permissible in
‘generating the feedback signal is increased from one sym
bol period to L symbol periods. We then pull some of the
‘computation outside the loop (using lookahead computa
tion) to minimize propagation delay inthe feedback loop.
With these techniques, NLC can be easily implemented
‘on the detector chip to inerease the maximums data rate of
Gb/s communication systems andior detectors
In Section Il, we briefly describe nonlinear cancella-
tion, and outline some of the problems of implementing
NLC at high data rates. Techniques for overcoming these
problems are developed in Section Il and are ilustated
‘with the example of a simple one-tap NLC. Section IV
briefly describes how to generalize the techniques to more
complex NLC's and presents some more exatnples. Sec-
tion V summarizes this paper.
I. Tir Nowuisear Canceten
Fig. 1 shows an implementation of NLC as discussed
in (2). The threshold detector compares the received si:
nal 10 the threshold (V,) to determine the output bit dy.
Before making the comparison, the NLC generates a cor
rection signal f(d) based on N previously detected bits?
‘
“This signal is fed back to the input to compensate for the
ISI caused by bits transmitted price tothe bit currently at
the detector. We will make the assumption thatthe pre
vious bts have been detected correctly, and will fo con
venience, use a, instead ofthe more correct d, to denote
the detected bts. The reader should note thatthe assum
tion of correct prior decisions ensures that NLC will re
duce ISL. Whereas this assumption is not entieely correc.
NLC does significantly improve performance in spite of
‘occasional (though rage) errors in the Feedback path. This
issue of errors in the feedback path has been addressed by
others [7], [8], and we will not dwell on it any further.
Before proceeding further, we will modify Fig. 1 by
‘moving the feedback signal tothe threshold input of the
‘comparator as shown in Fig. 2. This is done to avoid in
teraction between the received and feedback signals. Here
we see thatthe feedback signal caa take one of many val
ues, none of which is related tothe voltages used t0 rep
resent the logic values, At low data rates, such a signal
can be generated by a random access meinory followed.
by a digital-to-analog converter (see [9]). However, this,
4,9)
py aia oncom Je ftd = 2 wa) se scngue
is etened oa decision felt eqaliaton 'DFE) 6 Becase DFE
— oat ;
i)
technique is impractical at rates of Gb/s, Swartz has de-
vised a circuit [10] for a vatiale gain latch to be used
with the NLC (at Gb/s rates) as shown in Fig. 3 (for the
case of V = 1). The output of the variable gain latch Q
switehes between Vy and V, which are the appropriate
threshold values corresponding to a previously detected
zero or one. Q must have a large dynamic range, should
not have any large transients, and must have low noise
levels. These are severe constraints, and practical circuits
may be unable o satisfy all of them at Gb/s data rates
This approach also becomes more dificult co implement
as more bits are used in the feedback path (larger N) be-
cause then the latch must be able to switch between 2”
walues.
Apart from the problem of generating the analog feed-
back signal, NEC, as shown in Figs. 1,2, and 3, has its
speed limited to the reciprocal ofthe total propagation and
processing delay around the feedback loop. Basically, the
‘decision on the mth bit has to propagate around the feed
back loop and appear atthe input tothe comparator before
the comparator begins processing the next symbol. Fig. 4
‘depicts the timing requirement pictrially, The spoed lim:
itation imposed by the feedback loop in NLC is severe
because the propagation delay of just the threshold detec-
{or itself ean be greater than one symbol petiod in high-
speed detector [11]. [12]. The timing constraints imposed
by the presence of feedback will also reduce the clock:
jitter tolerance ofthe system.eae
rpms EE /as
IIL. Inruemexrarion Tecwsques
A. Avoiding Analog Siitching
A look at Fig. 2 reveals that at most 2° distinct thresh=
old values may be fed back to the comparator. One way
‘of avoiding this threshold feedback is to use 2" distinc.
comparators, with each using one of the 2% possible
threshold values. Now the problem of feeding back the
correct threshold (which depends on the previously de-
‘ected bits) is replaced by selecting the comparator which
uses the correct threshold value. Fig. 5 shows a block
diagram of such a system; note that we have assumed only
one feedback tap (V = 1), and hence we need «wo com-
parators. Fig. $ shows a 2:1 multiplexer being used 10
Select heween the two comparators. Yj and V, are the wo
threshold values corresponding to the previous bit being
a zero or a one. Fig. 6 incorporates 2 gate-level imple
mentation of the 2:1 multiplexer. An integrated cireu
implementation may combine the gates ofthe multiplexer
With the gates implementing the latch inthe D flip-flop.
‘Although Figs. $ and 6 avoid analog feedback, they
have the same time available (7) for processing and prop-
gation around the feedback loop as the implementation
in Fig. 3. However, the threshold detector and variable
ign latch are no longer in the feedback loop. Instead. we
have a multiplexer; because the propagation and process
ing delay is usually less for a multiplexer than for the
‘components it has replaced, the system shown in Fig. 5
should be able to operate faster than the system in Fig. 3
‘The next subsection shows how to increase the time per-
‘mitted for processing and propagation in the feedback
Toop,
B. Eliminating the Feedback Imposed Bortleneck
Going back to Fig, 5, we see that the mth bit is given
by
Angas + Bal o
Fie 6 Removing alg fests: gate evel
where 4, and B, are outputs ofthe two comparators at the
zh symbol period, and the overbar denotes logical com
plement, Note that all the variables are Boolean, with val
les 2e70 or one. If we write out the corresponding equa
tion fora, ,. we have
ayy = Ae yah ®
Using (2) to substitute fora, in (1), We get
fy = Ay Ay y + By
+ BAL dy
= Ay + Bey) On—2
+ BBy Vy» 3
filtay > + ln, @
ABs
where
Silt) = Aya + By
Silt) = AyByy + ByB ys
Equation (3) represents a, in terms ofthe 4°s, Bs, and
a, whereas (1) represents din terms of dy. The mar
jor significance of this difference is that an implementa-
tion based on (3) will permit a total propagation and pro
cessing delay of 27° in the feedback loop. wheress an
‘implementation based on (1) would permit only 7. Equa-
tion (3) looks a lot more complicated than (1), and one
‘might assume that although it permits a larger delay. the
increased complexity will ake up the increase in permit
{ed delay, resulting in a circuit which may be as slow as
‘one implementing (1), However, ths isnot the ease be-
cause f(n). and f(n) can be computed outside the feed
back loop (lookahead computation) and then be provided
as inputs to the Feeback loop. The feedback loop itselfFi. 9 A ede agra,
‘must just implement (4) see Eig. 7], which is quite sim:
le and is identical to the feedback loop in Fig. 6.
‘The extta delay available inthe loop in Fig. 7 may be
used in two ways. Fist, we ean use ito adda latch within
the multiplexer to pipeline the multiplexer and increase
its speed of operation. Second. it ean be used to reduce
the clock rate within the loop to half the data rate. In the
later approach, the reader must note that (3) represents a,
in terms of «,_s, and therefore if we use this equation
repeatedly, string with ay, we will compute ds.
{y. To compute the missing terms, which are the dag «8
‘We mus iterate the same equation, but this time we must
do it starting with a. Hence. we must replicate the feed
back loop twice. Such a system, implementing (4) si
lustrated in Fig. 8. Fig. 9 shows the same system at the
ate level with a few simple changes to aid implementa
tion. In an integrated circuit implementation, the gates for
the multiplexer may be partly inconporated in the gates
‘used to implement the D flip-flop which follows the mul-
tiplexe.
We have just illustrated a technique to increase the pet=
tmissible processing and propagation delay’ in the feedback
loop from Tto 27 without increasing the circuit comple
ity ofthe feedback loop. We did this by “erating” (1)
twsce. Similarly, by iterating (1) L times, we can increase
the permissible delay to LT. and have Z loops operating
inparallel. Parhi, Meng, and Messerschmit [3], [13] have
used similar techniques. i.e, iterating a recursive rela
tionship, o avoid feedback-imposed speed limitations for
recursive and adaptive filters. However, their techniques
ste limited (0 instances of linear recursion. The vector.
zation of the joint process estimator caried out in [13]‘hich my be ppl
breaks down with the introduction of decision feedback
‘Our approach to the problem is capable of handling re-
ceursion with decisions in the feedback loop,
1) Sharing Some Circuitry: In Figs. 8 and 9. the cit-
‘uit to precomputef; and fy has also been replicated. If
this combinatorial circuitry is fast enough to operate atthe
symbol rate T, we can share this circuitry between the
feedback loops instead of replicating it. This concept is
ilysteated in Fig. 10. If necessary. we can pipeline [14]
‘his combinatorial circuitry to speed it up. Pipelining (by
adding flip-flops between serial processing elements) will,
‘add tothe propagation delay and will increase the overall,
Tatency through the receiver but it will not result in a data
rate limitation because this circuitry isnot inthe feedback
path
IV, Generatizarions.
In the preceding section, we illustrated an L-fold in-
«crease (with examples for '= 2) of the permissible delay
inthe feedback loop for a NLC with one-tap feedback.
For the more general case, with N feedback taps, 2" dis-
tinct threshold values are possible. To avoid analog feed
buck, we must replicate the comparator 2” times. If Ay,
denotes the output of the ith comparator during the mth
symbol period, a, is given by
0, Ain And yy
an 8
To increase the permissible delay in the feedback loop to
LET, we must iterate (5) L times. Each iteration involves
selecting the a with the highest subscript from the right
side of the equation, and then substituting for it in terms
of previous bits using the relationship of (5). Afler iter
ating L times, we can expand the right-hand side into a
‘sum-of products form and group the terms as shown be-
Tow:
2 = font Menten
+ AOD 1ty 1-4 0 Btoner +
1 BoB Bactt *T-toner 61s
where each f(m is @ function of Ayn. Aina °° 5
Ads tome) = 1, +" 2 The ft) can be precom-
pated ouside the feedback loop. Basically the nth output
bit is given by one ofthe f with the selection based on
previous bits delayed by L symbol periods. Note that to
‘write the iterated (5) inthe form shown in (6), we fist
expand the righ-hand side of the iterated equation into a
sum-of-produets form. A product temm which contains
‘more than one occurrence of an a, oF is inverse is then
simplified to contain only one occurence. This is always
possible because the a are zero oF one. aja, is equal toa,
And a, equals @, and any product terms which have
1 ate dropped because this is always zero.
ek: parle oops.
‘We could alternatively rewrite (6) as the sum of a se:
lection of terms from its right-hand side:
4, = fir) o
‘where (i ~ 1) is given by the Abit binary representation
[rather than the logical expression as in (6)]
PT ay tty tn ®
‘As in Section II, the L-fold increase in the permissible
loop propagation and processing delay can be used eitherto pipeline the multiplexer, oF to reduce the clock rate in
the feedback loop. With the second approach, the feed
back loop itself must be replicated Ltimes, with each loop
running at 1/Lih the symbol rate. Because the loops run
at 1/L times the symbol rate and are n0 more complex
than the original loop, our technique can achieve an
Lfold increase inthe speed of operation ofa receiver with
NLC. The precomputation ofthe f(n)s can be pipelined
$0 the combinatorial circuitry is shared between the L
feedback loops. Figs. 11-13 sketch the applications of
our techniques to a system with two-bit feedback. Al-
though we have assumed binary symbols inal ofthe pre-
vious sections, asa further generalization, our techniques
can be easily extended to Mary signaling
\V. Sunistany ap Coxctuston
In summary, by using multiple detectors with digital
logic, we can eliminate the analog switch and propagation
delay problems of nonlinear cancellation and decision
feedback equalization. 2” detectors are required with N
feedback bits. but the additional circuitry required for
higher data rates is all-digital logic and thus easily ame-
rable to integration, Thus, we can achieve higher speed
at the expense of increased circuit complexity, Wit these
techniques, NLC and DFE can be easily implemented on
the detector chip to increase the maximum data rate of
Gb/s communication systems and/or detectors
Rerenences
[HTB Bigies A. Geno, RD. Gili and TL, Lim, “Api am
121 101, Wee aad RD. Gin, “Elec signal processing tc
toes in mg a erp nema EEE fant Comma. vt
Sepp stash Sop oe,
(31 SH Wines nd MO Sito, “agent geliaton ofp
161 i Winer Egatzation acter ihre sysems wing +
tra icf Lanne eat a 801.
151 Rx Pan ad. G. Mesenchmit, Abit parle it eel
{61 J.G. Broais Digit Commanicarions,
IPL DL, Dole. J. Mao. . G. Menenstnit, An per
‘hun he er pay adi eck uals" TEEE
Pans Ir Psy, 0h 1720. pp. 290-359 ely 8
18) R.A. Keanedy. BD. 0. Andon, snd RR, itmend, “Tight
‘wounds on the mor probabilities of decison feedback equalizer,
(0) K.D. Fisher JM Cod, and C. Ne Mls, "An apie deison
Feedhack equalizer fr tre chamel suflering fom sine ISL.
Pre It Co Comma Boson MA, dae 1989. 9. 33.7.1-
(10) KD Gi, RG. Smarz, end. H. Wines
Ii) ATAT LOWHSAX Decision Cie 1.7 b/s, Pinay Data
{121 Gigaic Lope 1060214. Data Soe.
113) T.Meng and DG. Messerschmi Abia high sling ae
dap Rr EEE Tras Sigal Process Wl 38, pp ASS
(0411 8 Japp and 5, Aw. “Etecve iain of itl
STE, ete neti ok
furmene a ATAY Bal Earn, Hold
ok H. Waters $37-M°82-S488) wa
{nd fom 1977 4 19H withthe BetoScence
fray, Sie 1981, he hasbeen with ATAT Bell Laboratories, Holmdel
‘Mig kine ae cae gern Ha es
7 tapercondactorsm commusiaton systema, Currently. he 3 volved
"Be Winer 2 mena of Si Xi He fected The Ohi She Un
scrip EeaScene Labo» svat for ousting dseraton