You are on page 1of 13

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 67, NO.

12, DECEMBER 2020 5079

A Pipelined Reduced Complexity


Two-Stages Parallel LMS Structure
for Adaptive Beamforming
Ghattas Akkad , Ali Mansour , Senior Member, IEEE, Bachar A. ElHassan,
Elie Inaty, Member, IEEE, Rafic Ayoubi, Member, IEEE, and Jalal A. Srar

Abstract— In this paper, we propose a reduced complexity


parallel least mean square structure (RC-pLMS) for adap-
tive beamforming and its pipelined hardware implementation.
RC-pLMS is formed by two least mean square (LMS) stages
operating in parallel (pLMS), where the overall error signal is
derived as a combination of individual stage errors. The pLMS
is further simplified to remove the second independent set of
weights resulting in a reduced complexity pLMS (RC-pLMS)
design. In order to obtain a pipelined hardware architecture of
our proposed RC-pLMS algorithm, we applied the delay and
sum relaxation technique (DRC-pLMS). Convergence, stability Fig. 1. Adaptive Beamforming System [3].
and quantization effect analysis are performed to determine
the upper bound of the step size and assess the behavior of
the system. Computer simulations demonstrate the outstanding communication, sonar and radar tracking. It is used for
performance of the proposed RC-pLMS in providing accelerated directional signal transmission or reception [1], [2] to ease
convergence and reduced error floor while preserving a LMS
identical O(N) complexity, for an antenna array of N elements. spectral congestion, increase its capacity and infer frequency
Synthesis and implementation results show that the proposed re-use. ABF can be achieved by applying a spatial filter with
design achieves a significant increase in the maximum operating varying weights to the arrays outputs, modeled as a linear
frequency over other variants with minimal resource usage. combination of the observed noisy signal. The filter weights
Additionally, the resulting beam radiation pattern show that are concurrently computed by an adaptive algorithm focusing
the finite precision DRC-pLMS implementation presents similar
behavior of the infinite precision theoretical results. the main beam towards the direction of the desired signal and
attenuating interfering signals [2] as shown in Fig. 1. Where
Index Terms— LMS, parallel LMS, adaptive beamforming, x = [x1 , x 2 , . . . . . . , x N ]T is the input vector, w is the vector of
field programmable gate array, sensor array.
filter weights, y is the beamformer output, e is the error and d
I. I NTRODUCTION is the desired signal. A linear antenna with N components
points its main beam, with the highest gain, towards the
E VER since its inception, adaptive beamforming (ABF)
have become an inevitable feature of smart antennas.
ABF is employed in various applications such, as: wireless
desired user while directing nulls to interfering devices. Thus,
a requirement for an efficient adaptive beamforming is the
ability to continuously adapt to the ever changing signal
Manuscript received February 27, 2020; revised April 24, 2020; accepted conditions and users mobility. However, modern wireless com-
May 9, 2020. Date of publication May 22, 2020; date of current version munication have imposed challenging constrains on adaptive
December 1, 2020. This work was supported in part by the l’Agence de
l’ Innovation de Defense a la Direction Generale de l’Armement–Ministere algorithms when implemented on hardware devices like Field
des Armees (AID–DGA) and in part by Agence Nationale de la Recherche Programmable Gate Array (FPGA). Such constraints include
en France (ANR) for ANR-ASTRID – Project under Grant ANR-19-ASTR- reduced complexity, parallelism, accelerated convergence and
0005-03. This article was recommended by Associate Editor G. J. Dolecek.
(Corresponding author: Ghattas Akkad.) low residual error. In contrast to complex blind algorithms [4],
Ghattas Akkad and Ali Mansour are with the Lab-STICC, UMR non-blind adaptive algorithms such as the least mean square
6285, ENSTA Bretagne, 29200 Brest, France (e-mail: ghattas.akkad@ (LMS) and recursive least square (RLS) iteratively minimizes
ensta-bretagne.org; mansour@ieee.org).
Bachar A. ElHassan is with the Faculty of Engineering, Lebanese University, the mean square error (MSE) between the filtered output signal
Tripoli, Lebanon (e-mail: bachar_elhassan@ul.edu.lb). and a reference signal. Compared to RLS, with an undesirable
Elie Inaty and Rafic Ayoubi are with the Department of Com- O(N 2 ) quadratic complexity, LMS presents a simple yet
puter Engineering, University of Balamand, Koura, Lebanon (e-mail:
elie.inaty@balamand.edu.lb; rafic.ayoubi@balamand.edu.lb). attractive O(N) linear complexity structure [3]. However,
Jalal A. Srar is with the Electrical and Electronic Department, Misurata LMS still suffers from a trade-off between the convergence
University, Misratah, Libya (e-mail: jalal.srar@gmail.com). speed and its residual error floor [5].
Color versions of one or more of the figures in this article are available
online at https://ieeexplore.ieee.org. Several variants of the classical LMS algorithm have been
Digital Object Identifier 10.1109/TCSI.2020.2994812 proposed in order to accelerate the convergence and improve
1549-8328 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
5080 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 67, NO. 12, DECEMBER 2020

the beam pointing accuracy [2], [3], [5]–[7]. Concurrently,


other modifications focused on presenting a high throughput,
pipelined architecture for hardware implementation [8]–[13].
These techniques include the Normalized LMS (NLMS)
[14], [15], the Gauss-Seidel Fast Affine Projection algorithm
(GS-FAP) [16], the a Priori Error-Feedback LSL algorithm
(EF-LSL) using logarithmic arithmetic [17], the least - least
mean square (LLMS) [5], the recursive-least mean square
algorithm (RLMS) [3], [7], the relaxed look ahead pipelined
LMS [8], the time shared LUT-less LMS architecture [9] and
the division free and variable regularized LMS [10]. In the
NLMS, the step size, μ, is adjusted in accordance with the
input signal power through auto-correlation, thus allowing Fig. 2. Uniform Linear Antenna Array.
a faster convergence. However, the NLMS depends highly
on the choice of its initial parameters and show degraded it can be formed as a consequence of symbol detection errors
performance in low signal to noise ratio (SNR) conditions [5]. originated from the digital receivers in low SNR environ-
The GS-FAP achieves superior performance over the NLMS; ments [23]. As such, a multiplication by j is proposed for
However, slightly more complex [16]. The EF-LSL achieves robustness against resulting error nulls. The proposed architec-
a major reduction in computation time and look up tables; ture was further simplified to obtain the reduced complexity
However, it is based on the least square algorithm and requires pLMS design (RC-pLMS). The RC-pLMS is achieved by
the use of a logarithmic number system and a dedicated adding a one sample delayed version of the inputs, multiplied
logarithmic arithmetic logic unit (ALU) [17]. The LLMS [5] by j , to the L M S1 . Consequently, removing the second
and RLMS [7] variants are achieved by cascading a LMS/RLS independent set of weights, i.e. L M S2 filter. In order to present
stage with a LMS stage by the use of an estimate of the a pipelined, parallel, hardware architecture for the RC-pLMS,
steering vector and delayed error feedback. This technique we propose the application of the delay and sum relaxed look-
shows superior performance over previously proposed LMS ahead technique (DRC-pLMS).
and RLS variants at the cost of doubling the computational Convergence and stability analysis are performed to deter-
requirements. However, presenting a pipelined hardware archi- mine the upper bound of the step size. The quantization effect
tecture is deemed difficult given the cascading nature of the analysis is conducted to assess the system performance in finite
designs feedback paths and computational complexity. The precision arithmetic. Finally, a hardware implementation of
previous cascade RLMS algorithm is simplified to present a the DRC-pLMS design is done in order to study its resource
parallel input structure as presented in [3]. However, such consumption and behavior in finite precision arithmetic. The
simplification does not reduce the O(N 2 ) complexity and architecture is implemented using Q2.15 format [24], i.e. one
does not provide an easy to pipeline architecture. The relaxed signed bit, two integer bits and fifteen precision bits.
look-ahead pipelined LMS, and the time shared LUT-less Section II presents a mathematical background of the LMS
LMS discussed in [8], [9], present a pipelined architecture and the relaxed look-ahead LMS algorithm. Section III details
for the classical tapped delay line LMS with no noticeable the theoretical derivation of RC-pLMS. Section IV discusses
improvements in the convergence speed nor in the error floor. the convergence analysis. Section V handles the quantization
Furthermore, the pipelined division free variable regularized effect analysis. Section VI presents the hardware architecture
LMS architecture presented in [10] still presents considerable for the DRC-pLMS. Section VII shows the simulation results
complexity and requires a dynamic step size and normalization and discussion. Finally, Section VIII presents the conclusion
stages compared to the classical LMS. and future work.
In this context, the main motivation of our research is to
II. M ATHEMATICAL BACKGROUND
present a low complexity, parallel, pipelined hardware archi-
tecture with accelerated convergence and minimal residual This section presents a brief background review of the LMS
error on Field Programmable Gate Arrays (FPGAs). Thus, algorithm for complex narrow-band signals incoming from the
inspired by the delay feedback technique [3], [5], [7], we far field [25]. The antenna architecture [26] is a uniform linear
propose a two stages, parallel, least mean square structure array (ULA) of N equally spaced antenna elements [5] as
(pLMS) for adaptive beamforming. The pLMS is formed of shown in Fig. 2.
two parallel LMS stages, where the overall error signal is With the first antenna element acting as a reference, θ is
derived as a combination of individual stage errors. The error the angle of arrival and B is the distance between consecutive
signal of the second LMS stage (L M S2 ) is subject to a one antenna elements.
sample Let the input vector, x(k) = [x1 (k), x 2 (k), . . . . . . , x N (k)]T ,
√ delay and is multiplied by the imaginary number at the discrete time instant, k, to the narrow-band beamformer
j = −1 to combine with the error of the first LMS stage
(L M S1 ). We should mention that recurring samples are caused be defined by
by the presence of repeated data in the original message signal, 
N−1
analog to digital converters (ADC) impurities [18], [19], x(k) = ad sd (k) + il (k) + n(k) (1)
quantization errors and low resolution [19]–[22]. Additionally, l=0

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
AKKAD et al.: PIPELINED REDUCED COMPLEXITY TWO-STAGES PARALLEL LMS STRUCTURE FOR ADAPTIVE BEAMFORMING 5081

with [.]T denotes the matrix transpose, sd (k) and il (k) are the
desired and interfering signals with l < N, ad is the N ×
1 complex array steering vector for the desired signal, and
n(k) is the complex additive white Gaussian noise (CAWGN)
vector. A general form of ad is given by

ad = [1, e− j ψ , e− j 2ψ , . . . . . . ..e− j (N−1)ψ ]T (2)



where j = −1 and
Fig. 3. pLMS Flowchart.
B sin(θ )
ψ = 2π (3)
λ
A. Parallel LMS (pLMS)
where λ is the signal wavelength. The output, y(k), of the
beamformer subject to a linear combiner is given by As previously defined, the pLMS is formed of two parallel
LMS and is presented in Fig. 3 where the block j Z −1 repre-
y(k) = wH (k)x(k) (4) sents one sample delay and a multiplication by j . As shown
in Fig. 3, the error signal of L M S2 , e2 (k), is subject to
where [.]H represents the matrix Hermitian transpose and w(k) a sample delay and is multiplied by the imaginary number
is the array weight vector, i.e. the filter coefficients. j to combine with the error of L M S1 . The resulting error,
j e2(k − 1) is then combined with that of L M S1 .
Consequently, this allows a parallel operation of both stages,
hence the notation pLMS. The pLMS algorithm is presented
A. Least Mean Square Algorithm in Algorithm 1, where n t is the total number of input
The least mean square (LMS) algorithm minimizes the mean samples.
square error (MSE), using the steepest descent optimization
method [27]–[29] as follows: Algorithm 1 Parallel LMS (pLMS)
Input: d(k), x(k), n t , μ
e(k) = d(k)−y(k) (5) Output: e p L M S (k), w1 (k)
w(k + 1) = w(k) + μ L M S e∗ (k)x(k) (6) Initialisation: e1 (0) = 0, e2 (−1) = 0, d(−1) = d(0),
x(−1) = x(0), w1 (0) = 0, w2 (0) = 0
where e(k) is the error signal, d(k) is the reference signal,
1: for k = 0 to n t do
∗ is the complex conjugate operator and μ L M S stands for
the gradient descent step or the step size [29]. The optimal 2: y L M S1(k) = w1H (k)x(k)
weight, woplms (k), assuming a wide sense stationary (WSS) 3: e1 (k) = d(k)−y L M S1(k)
process [30] becomes:
4: e p L M S (k) = e1 (k) − j e2 (k − 1)
woplms = R−1 p (7) 5: w1 (k + 1) = w1 (k) + μe∗p L M S (k)x(k)
6: y L M S2(k) = w2H (k)x(k)
Here R = R(0) and R−1 = R−1 (0) are the input signal auto-
correlation matrix and its inverse, respectively, and p = p(0) 7: e2 (k) = d(k)−y L M S2(k)
is the cross correlation vector of the input x(k) and desired 8: w2 (k + 1) = w2 (k) + μe2∗ (k)x(k)
signal d(k). R(0) and p(0) are defined at lag τ = 0 as 9: end for

R(τ ) = E[x(k − τ )xH (k)] (8) 10: return e p L M S (k), w1 (k)


p(τ ) = E[d ∗ (k − τ )x(k)] (9)
The overall error, e p L M S (k), is defined as
where E[.] is the expectation operator and the lag τ = k1 −
e p L M S (k) = e1 (k) − j e2 (k − 1)
k2 . Where, k1 , k2 are different time instances from which an
observation of the random process is taken. While LMS offers = d(k) − j d(k − 1) − w1H (k)x(k)
minimal computational complexity, it suffers from a trade off + j w2H (k − 1)x(k − 1) (10)
between its convergence speed and its error floor [5].
where ei (k) = d(k) − wiH (k)x(k), i represents the stage
identifier, y L M S1(k) and y L M S2(k) are the first and second
III. T HEORETICAL D ERIVATION stage output, and w2 (k − 1) is the delayed weight vector of
L M S2 .
This section details the derivation of the proposed pLMS Proposition-I: The recurring samples in the input and/or
and RC-pLMS adaptive beamformers. desired signals are usually caused by the nature of the

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
5082 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 67, NO. 12, DECEMBER 2020

input signal, i.e. presence of repeated samples in the original substituting (12), (13), (14) and (15), in (11) the MSE ξ p L M S
message signal, ADC impurities [18], quantization errors becomes
[20]–[22] and low resolution resulting in receivers satura-
tion [19]. Additionally, it can be formed as a consequence of ξ p L M S (k) = E[|d(k)|2] − pH w1 (k)−w1H (k)p
symbol detection errors originated from the digital receivers +w1H (k)Rw1 (k) + E[|d(k − 1)|2 ]
in low SNR environments [23]. Thus, in the proposed delayed −E[d1(k)xH
1 (k − 1)]w2 (k − 1)
feedback scheme, the use of the present and past errors,
−w2H (k − 1)E[d1∗(k)x1 (k − 1)]
in the case of recurring samples, can severely degrade the
convergence performance of the pLMS. We can prove that +w2H (k − 1)E[x1(k)x1H (k − 1)]w2 (k − 1)
it can be mitigated by multiplying the delayed samples by + j E[d(k)d ∗(k − 1)] − j E[d ∗ (k)d(k − 1)]
j , which in turn protects against nulls and preserves the − j E[d(k)xH(k − 1)]w2 (k − 1)
accelerated convergence.
− j w1H (k)E[d ∗ (k − 1)x(k)]
Proof: let d(k) = a + j b and d(k −1) = c+ j d. For recur-
ring samples, S(k) = d(k) − d(k − 1) with d(k) ≈ d(k − 1), + j w1H (k)E[x(k)xH (k − 1)]w2 (k − 1)
we get S(k) ≈ 0. However, for S(k) = d(k) − + j E[d(k − 1)xH (k)]w1 (k)
j d(k − 1), i.e. left side term of equation (10), + j w2H (k − 1)E[d ∗ (k)x(k − 1)]
S(k) = a + j b − j c + d = 0.,
− j E[xH (k)w1 (k)w2H (k − 1)x(k − 1)] (16)
where S(k) is the new pLMS reference signal, such as
S(k) = d(k) − j d(k − 1). The MSE ξ p L M S is now defined as with w1 (k) being the tap weights of interest [3], [5], [7], [31].
The optimal weight vector, wop , of w1 (k) can be obtained by
ξ p L M S (k) = E[|e p L M S (k)|2 ] differentiating (16) with respect to wH 1 (k) [5], [7], [32], and
setting the resulting pLMS gradient, ∇ p L M S , to zero
= E[e1 (k)e1∗ (k) + j e1(k)e2∗ (k − 1)
− j e1∗(k)e2 (k − 1) + e2 (k − 1)e2∗ (k − 1)] (11) ∂ξ p L M S (k)
= ∇pL M S
∂wH 1 (k)
= −p + Rw1 (k)
with |.| signifies the complex modulus. Moreover, the first term
of (11) can be expressed as − j E[d ∗(k − 1)x(k)]
+ j E[x(k)xH (k − 1)]w2 (k − 1) (17)
E[|e1 (k)| ] 2
The optimal weight vector, wop , becomes
= E[|d(k)|2 ] − pH w1 (k)−wH
1 (k)p + w1 (k)Rw1 (k) (12)
H
wop = R−1 p + j R−1 E[d ∗ (k − 1)x(k)]
the last term of (11) can be developed as follow − j R−1 E[x(k)xH (k − 1)]w2 (k − 1)| (18)

The resulting gradient can be validated by expanding the


E[|e2 (k − 1)|2 ] = E[|d(k − 1)|2 ] steepest descent update equation using the stochastic gradient
−E[d1 (k)xH as shown in (6)
1 (k − 1)]w2 (k − 1)
−w2 (k − 1)E[d1∗ (k)x(k − 1)]
H
w1 (k + 1) = w1 (k) − μ∇ p L M S
+w2H (k − 1)E[x1 (k)xH (k − 1)] = w1 (k) − μ[−p + Rw1 (k)
×w2 (k − 1) (13) − j E[d ∗(k − 1)x(k)]
+ j E[x(k)xH (k − 1)]w2 (k − 1)]
In addition, the second term of (11) can be detailed as = w1 (k) + μx(k)[d ∗ (k) − xH (k)w1 (k)
+ j d ∗(k − 1) − j xH (k − 1)w2 (k − 1)] (19)
E[e1 (k)e2∗ (k − 1)] = E[d(k)d ∗ (k − 1)
Proposition-II: At steady-state, assuming convergence, we
−d(k)xH (k − 1)w2 (k − 1)
can assume w1 (k) ≈ w2 (k − 1) and the pLMS weight update
−d ∗ (k − 1)w1H (k)x(k) equation defined by (19) can be written as
+w1H (k)x(k)xH (k − 1)w2 (k − 1)] (14)
w(k + 1) = w(k) + μx(k)[d ∗ (k) − xH (k)w(k)
+ j d ∗(k − 1) − j xH (k − 1)w(k)] (20)
and the third term of (11) becomes
Proof: At a steady-state, as k → ∞, assuming that
E[e1∗ (k)e2 (k ∗
− 1)] = E[d (k)d(k − 1) both systems converge and the process is WSS, we get
w1 (k) ≈ w1 (k − 1) ≈ wop , w2 (k) ≈ w2 (k − 1) ≈
−d(k − 1)xH (k)w1 (k)
woplms [5]. Additionally, the filtered outputs y L M S1 and
−d ∗ (k)w2H (k − 1)x(k − 1) y L M S2, tend to approach d(k) with both interference and
+xH (k)w1 (k)w2H (k − 1)x(k − 1)] (15) noise signals being suppressed [5], [7], therefore we assume

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
AKKAD et al.: PIPELINED REDUCED COMPLEXITY TWO-STAGES PARALLEL LMS STRUCTURE FOR ADAPTIVE BEAMFORMING 5083

Algorithm 2 Reduced Complexity pLMS (RC-pLMS)


Input: d(k), d(k − 1), x(k), x(k − 1), n t , μ
Output: e RC p L M S (k), w(k)
Initialisation: e RC p L M S (k) = 0, d(−1) = d(0),
x(−1) = x(0), w(0) = 0
1: for k = 0 to n t do
2: u(k) = x(k) − j x(k − 1)
3: e RC p L M S (k) = S(k) − wH (k)u(k)
4: w(k + 1) = w(k) + μx(k)e ∗RC p L M S (k)
5: end for
Fig. 4. Reduced Complexity pLMS (RC-pLMS).
6: return e RC p L M S (k), w(k)
d(k − 1) ≈ y L M S1(k − 1) ≈ y L M S2(k − 1) [5], [7]. Thus
using (4), (5) and (7) in (18) we get
TABLE I
wop = woplms + j R−1 E[d ∗ (k − 1)x(k)] T HEORETICAL C OMPLEXITY AND R ESOURCE U SAGE
− j R−1 E[x(k)xH (k − 1)]w2 (k − 1)
≈ woplms + j R−1 E[y L∗ M S2(k − 1)x(k)]
− j R−1 E[x(k)xH (k − 1)]w2 (k − 1)
≈ woplms + j R−1 E[x(k)xH (k − 1)]w2 (k − 1)
− j R−1 E[x(k)xH (k − 1)]w2 (k − 1)
≈ woplms (21)
As such, by simplifying (21), we get wop ≈ woplms and hand, compared to the LLMS and pLMS the RC-pLMS
therefore we assume w1 (k) ≈ w2 (k − 1), hence the use of provides similar LMS resource requirements and complex-
a single set of weights, w(k). We can now simplify (19), ity while preserving a linear computational complexity of
removing any requirement for computing an independent set of order O(N).
filter coefficients, i.e. additional filter. As such, (19) becomes The MSE cost function is now re-defined as ξ RC p L M S (k) =
E[|e RC p L M S (k)|2 ] and evaluated in similar way as (16). The
w(k + 1) = w(k) + μx(k)[d ∗ (k) − xH (k)w(k)
gradient of RC-pLMS becomes
+ j d ∗ (k − 1) − j xH (k − 1)w(k)] (22)
∂ξ RC p L M S (k)
∇ RC p L M S =
∂wH 1 (k)
The reduced complexity pLMS (RC-pLMS) is now defined by = −p + Rw1 (k) − E[ j d ∗(k − 1)x(k)]
its final update equation as +E[ j x(k)xH (k − 1)]w1 (k)
w(k + 1) = w(k) + μx(k)e∗RC p L M S (k) (23) +E[ j d ∗(k)x(k − 1)] − E[ j x(k − 1)
×xH (k)]w1 (k) − E[d ∗ (k − 1)x(k − 1)]
where e∗RC p L M S (k) is the RC-pLMS error signal defined by
+E[x(k − 1)xH (k − 1)]w1 (k) (25)
e∗RC p L M S (k) = d ∗ (k) − xH (k)w(k) + j d ∗ (k − 1)
Assuming WSS then (25) is rewritten as
− j xH (k − 1)w(k) (24)
∇ RC p L M S = −p + Rw1 (k) − z∗ (−1)
Moreover, by eliminating the need to compute an independent +Q(1)w1 (k) + z(1) − Q∗ (−1)w1 (k)
set of coefficients, the system complexity is reduced by 2N +1
−p + Rw1 (k) (26)
complex multiplications and N + 2 complex additions. The
final system architecture is shown in Fig. 4 and The RC-pLMS The auto-correlation matrix Q(1) = jR(1) and the cross-
is summarized and presented in Algorithm 2 correlation vector z(−1) = jp(−1) represent a observation
where u(k) and S(k) are the new system input and reference with lag τ = 1. By definition, for a WSS process a correlation
signal, respectively. As seen in Fig. 4 the L M S2 is now matrix Q(τ ) = Q∗ (−τ ) and z(τ ) = z∗ (−τ ). Therefore,
replaced by a one sample delayed version of the inputs equation (26) can now be simplified to
multiplied by j .
∇ RC p L M S = −2p + 2Rw1 (k) − 2z(−1) + 2Qw1 (k) (27)
Table I, presents a comparison of resource complexity
where cMultiply, cAdd, cDivide and RLMSp denotes com- By setting ∇ RC p L M S , to 0 the RC-pLMS optimal weight
plex multiplication, complex addition, complex division and vector, wop , can be obtained as
parallel RLMS, respectively. From Table I, it is clear that the
2(R + jR(1))wop = 2(p + j p(−1))
RLMS, RLMSp and RLS require an undesirable complexity
of order O(N 2 ). Additionally, the RLMS and RLMSp require Awop = (p + j p(1))
N + 1 and 1 complex division, respectively. On the other wop = A−1 (p + j p(−1)) (28)

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
5084 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 67, NO. 12, DECEMBER 2020

where A is the final correlation matrix formed as a linear the eigenvectors of A. We can now write A = O−1 ΛO [30],
combination of the auto-correlation matrices at lags 0 and 1, we can rewrite (35) as
respectively. A is assumed invertible as a result of the random
added noise [30] and is given by v(k + 1) = (I − μO−1 ΛO)v(k) (36)

A = R + j R(1) (29) Multiplying both sides of (36) by O, we get

Thus, the RC-pLMS presents a low complexity approximation Ov(k + 1) = (I − μΛ)Ov(k) (37)
for the pLMS. The behavior of the RC-pLMS is exam-
ined and verified through various simulations presented in Let u(k) = Ov(k), where u(k) is v(k) in a rotated coordinate,
sub-section VII-A. defined by the eigenvectors in O, thus a convergence in u(k)
means a convergence in v(k) [27], [30], then
IV. S TABILITY A NALYSIS u(k + 1) = (I − μΛ)u(k) (38)
In order to determine the conditions for RC-pLMS to
Since all elements in (I − μΛ) are diagonal, the stability and
become stable and converge to the optimal weight, a first order
convergence is achieved with respect to the convergence of the
convergence and stability analysis is performed. Moreover,
first order difference equation formed by all N eigenvalues λi ,
the following analysis determines the optimal learning rate, i.e.
∀i, i
{1, 2, .., N} [30]. Thus, we define a set of N difference
μ. The analysis is performed by studying the mean coefficient
equations as follows
error vector, v(k) [27], [33] which is given by
v(k) = w(k) − wop (30) ui (k + 1) = (1 − μλi )ui (k) (39)

where w(k) is the mean weight vector. Thus, from (22), we can The convergence of the set of N difference equations is
proceed as follows achieved if |1−μλi | < 1 [30], [33]. Thus, for the convergence
in the mean sense, we require
w(k + 1) = w1 (k) + μp − μRw(k)
+ j μp(−1) − j μR(1)w(k) (31) μ< 1
||λ A,max || (40)

At steady-state and convergence, i.e. as k → ∞, we can where the norm, ||λ A,max ||, is the maximum
 eigenvalue in
assume w(k + 1) ≈ w(k) and w(k) ≈ w(k − 1) for w(k) = A and ||.|| is the Euclidean norm i.e. Re{λ}2 + I m{λ}2 .
w1 (k) and w(k) = w2 (k − 1). Thus, (31) becomes Thus, to ensure convergence and stability the step size μ must
satisfy (40).
w(k + 1) − wop = w(k) − wop + μp + j μR(1)wop
−μRw(k) − μRwop + μRwop V. Q UANTIZATION E FFECT
+ j μp(−1) − j μR(1)wop + μR(1)w(k)
While the previous study targeted an infinite precision
(32) model of the RC-pLMS algorithm, an analysis of the effects
where wop is the optimal weight vector. Using the mean of quantization is of crucial importance when targeting digital
coefficient error vector notation and equations (28), (32), we systems with finite precision [30], [34], [35]. Such errors, if not
can write accounted for, causes the system to diverge as it deviates from
its theoretical continuous value [30], [34]. To simplify the
v(k + 1) = (I − μR − j μR(1))v(k) − μRA−1 p analysis of the finite precision case, we assume all the input
− j μRA−1 p(−1) + μp + j μp(−1) and reference signals to be Gaussian, zero-mean, uncorrelated
and generated from an independent identically distributed
−μR(1)A−1 p + μR(1)A−1 p(−1) (33)
(i.i.d) sequences. Additionally, the parameters uq (k), wq (k),
The previous equation can be simplified to Sq (k), eq (k) and yq (k), where the subscript q denotes the
quantization process, are defined as follows
v(k + 1) = (I − μR − j μR(1))v(k) + μp
−μ(R + j R(1))A−1p + j μp(−1) uq (k) = u(k) + ηu (k) (41)
−μ( j R − R(1))A−1 p(−1) (34) wq (k) = w(k) + ηw (k) (42)
Sq (k) = S(k) + η S (k) (43)
Using the equalities ( j R − R(1)) = j (R + j R(1)) and
yq (k) = y(k) + η y (k) (44)
AA−1 = I, and simplifying opposite terms, we get
eq (k) = Sq (k) − yq (k) (45)
v(k + 1) = (I − μR − j μR(1))v(k)
= (I − μA)v(k) (35) where ηu (k), ηw (k), η S (k) and η y (k) are the input signal,
weight vector, reference signal and output quantization error
Using the eigenvalue decomposition (EVD), where Λ is a respectively. Additionally, ηu (k) and ηw (k) are assumed to
diagonal matrix with diagonal entries (λi ) equal to the eigen- be mutually independent and independent of u(k) and w(k),
values of A, and O is a unitary matrix whose rows represent respectively. They are also assumed to be zero mean white

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
AKKAD et al.: PIPELINED REDUCED COMPLEXITY TWO-STAGES PARALLEL LMS STRUCTURE FOR ADAPTIVE BEAMFORMING 5085

sequences with variance θd2 [30], [34]. Thus, from (45) we For the RC-pLMS design, a finite precision
proceed as follows implementation was performed on the Intel Stratix V
5SGXMABN3F45I4 FPGA model [36] using complex
eq (k) = Sq (k) − yq (k) arithmetic for a 18bits signed fixed point Q2, 15 format,
= S(k) + η S (k) − wH q (k)uq (k) − η y (k) i.e. 1 signed bit, 2 integer bits and 15 precision bits [36].
= S(k) + η S (k) − wH (k)u(k) − wH (k)ηu (k) The Q2, 15 format was chosen with respect to the Stratix V
−ηw (k)u(k)−ηw ∗
(k)ηu (k) − η y (k) (46) digital signal processing (DSP) embedded blocks standard
18bits mode configuration. While the selected FPGA model
By ignoring all quantization error terms higher than the first supports 18bits mode and 27bits mode, in which the latter
∗ (k)η (k) = 0, we get
order, i.e. ηw is suggested for high precision wireless communication
u
applications [36], the 18bits mode is selected. The use of the
eq (k) = e RC p L M S (k) + η Sy (k) − wH (k)ηu (k) − ηw (k)u(k) 18bits mode provides minimal resource usage and highlights
(47) the accuracy and efficiency of the proposed algorithm against
quantization errors. Synthesis and finite precision hardware
where η Sy (k) = η S (k) − η y (k). The new MSE is now defined
simulation results are presented in sections VI-C and VII-A,
as ξ RC p L M Sq (k) = [|eq (k)|2 ] and evaluated as presented
respectively.
in (16) with all quantization error terms higher than the first
order are equated to 0. Consequently, we obtain VI. H ARDWARE A RCHITECTURE
2
ξ RC p L M Sq (k) = E[|eq (k)| ] This section, presents a pipelined RC-pLMS structure
= ξ RC p L M S (k) + E[e RC p L M S (k)η∗Sy (k) based on the delay and sum relaxed look-ahead technique
(DRC-pLMS). It also compares this new algorithm to the delay
+e∗RC p L M S (k)η Sy (k) − e RC p L M S (k)
and sum relaxed look-ahead LMS (DLMS) [8].
×w(k)ηu∗ (k) − e RC p L M S (k)uH (k)ηw

(k)
−e∗RC p L M S (k)wH (k)ηu (k) A. DRC-pLMS
−e∗RC p L M S (k)u(k)ηu (k)] (48) From Algorithm 2, we start with a delay relaxation of D1
samples in the error path and D2 step look-ahead in the weight
For simplification, we assume μ to be small. Therefore, update path to obtain
the quantization error terms ηw (k)u(k) and w(k)ηu (k) become
uncorrelated with each other and with the error e RC p L M S (k) w(k + 1) = w(k − D2 )
[30]. From [30], (48) can be simplified to 2 −1
D
+μ e∗RC p L M S (k − D1 − i )x(k − D1 − i )
ξ RC p L M Sq (k) = E[|eq (k)|2 ] i=0
= Jmin (1 + ρ) + ξ1 (θw2 , μ) + ξ2 (θd2 ) (49) (51)

where Jmin (1 + ρ) is the MSE of the infinite precision algo- The previous relaxation is possible assuming the signal is
rithm, ρ is the misadjustment, ξ1 (θw2 , μ) is the quantization WSS and the gradient estimate does not change much over D1
error resulting from ηw (k) and ξ2 (θd2 ) is the quantization error samples [8]. While the previous assumption allows presenting
resulting from ηu (k) and η y (k) [30]. Following the derivation a pipelined architecture, its hardware overhead is N(D2 − 1)
in [35], at steady state (49) can now be written as follows and becomes unacceptable for larger values of N and D2 [8].
Thus, to reduce the resulting overhead, a sum relaxation of D3
ξ RC p L M Sq (k) = E[|eq (k)|2 ] terms is applied where 1 ≤ D3 ≤ D2 . Equation (51) becomes
1 Nθa2 as follows
= ξmin + μξmin tr (A) +
2 2ηu (k)μ w(k + 1) = w(k − D2 )
1 3 −1
D
+ 2 (|wop |2 + a)θd2 (50)
ηu (k) +μ e∗RC p L M S (k − D1 − i )x(k − D1 − i )
where ξmin is the minimum MSE on the infinite precision i=0

LMS, tr (A) is the trace of the correlation matrix A, a is a (52)


random variable dependent on the inner product of wH q (k) and Finally, from the error update equation in Algorithm 2 and (52)
uq (k) and θa2 is the variance of a. From (50), the second term and assuming μ is small enough we obtain
describes the misadjustment parameter ρ where a decrease in
e RC p L M S (k) = S(k) − wH (k − D2 )u(k) (53)
μ leads to a decrease in ρ and an improved performance [30].
However, by analyzing the third term in (50), it is clear that Thus the relaxed look-ahead RC-pLMS can be summarized by
decreasing μ causes the system to deviate from the the infinite Algorithm 3 .
precision performance by increasing the quantization error Unlike the classical look-ahead technique, the delay and
effect. On the other hand, the final term in (50) is only affected sum relaxed look-ahead presented does not result in a unique
by ηu (k) and η y (k) [30], [35]. Thus, it can be concluded that architecture. However, it may be considered as a transforma-
μ may be decreased to a certain level, where the degradation tion in the stochastic sense since the average output profile is
effects of the quantization error become significant [30]. conserved [8].

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
5086 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 67, NO. 12, DECEMBER 2020

Fig. 5. 8-Input DRC-pLMS Beamformer Architecture.

Algorithm 3 Delay and Sum Relaxed Look-Ahead RC-pLMS


(DRC-pLMS)

Input: d(k), d(k − 1), D1 , D2 , D3 , x(k), x(k − 1), n t , μ


Output: e RC p L M S (k), w(k)
Conditions: 1 ≤ D3 ≤ D2 , D1 + D2 < N
Initialisation: e RC p L M S (k) = 0, d(−1) = d(0),
x(−1) = x(0), w(0) = 0
1: for k = 0 to n t do
2: S(k) = d(k) − j d(k − 1)
3: u(k) = x(k) − j x(k − 1)
4: e RC p L M S (k) = S(k) − wH (k − D2 )u(k)
5: w(k + 1) = w(k − D2 )
 D3 −1 ∗
+μ i=0 e RC p L M S (k − D1 − i )x(k − D1 − i )
6: end for
7: return e RC p L M S (k), w(k) Fig. 6. DRC-pLMS 4-Input Linear Combiner Architecture.

defined as follows
B. DRC-pLMS Architecture
y1 (k) = w1..4
H (k)u
1..4 (k) (54)
The relaxed look-ahead hardware architecture is imple-
y2 (k) = H (k)u
w5..8 5..8 (k) (55)
mented using complex arithmetic for a 18bits signed fixed
point Q2, 15 format i.e. 1 signed bit, 2 integer bits and 15 From Fig. 5, a 8 input adaptive beamformer is formed by
precision bits with D1 = 4, D2 = 2 and D3 = 1, i.e. six two 4 input linear combiner blocks and two 4 input weight
pipeline stages. The resulting, top level, architecture is shown update blocks. The systems default external inputs x(k) and
in Fig. 5 where u1..4 (k), u5..8 (k), w1..4 (k) and w5..8 (k) are the d(k) remain unchanged and they are updated internally to
DRC-pLMS input and weight vectors formed of first and last form the modified inputs u(k) and S(k). The resulting stage
4 elements, respectively. Z −1 , Z −D1 and Z −D1 −1 represent outputs, y1 and y2 , of each linear combiner are then combined
the digital delay, i.e. registers of 1, D1 , and D1 − 1 samples, to form the final output y and the total error e RC p L M S . The
respectively. The Conj block denotes complex conjugation. In resulting error and the external input signals x1..4 (k) and
addition, y1 and y2 form the intermediate output. They are x5..8 (k) are then used to update the previous filter coefficients.

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
AKKAD et al.: PIPELINED REDUCED COMPLEXITY TWO-STAGES PARALLEL LMS STRUCTURE FOR ADAPTIVE BEAMFORMING 5087

Fig. 7. DRC-pLMS 4-Input Weight Update Architecture.

TABLE II Fig. 8. RC-pLMS MSE Convergence Comparison.


8-I NPUT RC- P LMS B EAMFORMER S YNTHESIS R ESULTS

From Fig. 5, it becomes clear that by eliminating the input


modification blocks forming u(k) and S(k) the classical LMS,
subject to the delay and sum relaxation, is obtained. The
4-input linear combiner and 4-input weight update blocks are
detailed in Fig. 6 and Fig. 7, respectively. From Fig. 6 we can
notice that the multiplication and addition stages requires one
clock cycle, each, for all parallel inputs with a computational
complexity of O(1), respectively. Each complex multiplier is Fig. 9. RC-pLMS MSE Convergence for Recurring Samples.
formed of four real multipliers and one complex adder, i.e.
it is equivalent two real adders. Additionally, from Fig. 7,
the update term is obtained by a right shift of 6 bits i.e. DSP blocks. Thus, compared to the LMS [8], a major increase
μ = 2−6 = 0.0156 omitting the need for an additional in convergence speed is achieved for the DRC-pLMS at the
multiplier. Both pipeline stages perform parallel operations and cost of a negligible increase in the resource usage, i.e. LUTs
have a computational complexity of order O(1). and registers. Using the proposed hardware architecture, sim-
ulation results for the DRC-pLMS and DLMS output beam-
C. Synthesis Results pattern are shown in section VII-A.
Synthesis results of the DRC-pLMS and DLMS are obtained
for the Intel Stratix V 5SGXMABN3F45I4 model and shown VII. R ESULTS AND D ISCUSSION
in Table II. As the VR-SNC-TDNLMS [10] and the LUT- We conducted several Monte Carlo simulations for
Less Pipelined LMS [9] are implemented for different config- N = 8 antenna elements with 500 realizations of 500 samples.
urations, i.e. antenna elements and word length, they are only Simulations are conducted for a message signal and two
compared with respect to their operating frequency. The look- interferes arriving at an angle (AOA) of 30o , 45o and 80o ,
up tables (LUTs) represents logic cells and a digital signal respectively. The generated inputs are independent random
processing (DSP) block and it is formed of transistor level complex Gaussian sequences [37] of the form v = vp + jvq ,
multipliers, adders and registers. As presented in Table II, where v p and v q are taken from a normal (Gaussian) dis-
in contrast to the VR-SNC-TDNLMS [10] and the LUT- tribution, N , with mean 0 and variances θ 2 of the form
Less Pipelined LMS [9], the DRC-pLMS presents a pipelined N (0, θ p2 ) and N (0, θq2 ), respectively. The subscripts p and
structure operating at a maximum frequency of 208.33MHz. q denotes in-phase and quadrature, out of phase, respectively.
Additionally, the DRC-pLMS and DLMS both required 32 Thus, we define v m , v i1 and v i2 as the independent message

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
5088 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 67, NO. 12, DECEMBER 2020

Fig. 10. RC-pLMS and DRC-pLMS MSE Convergence Behavior for Different SNR.

Fig. 11. RC-pLMS and DRC-pLMS Beam Radiation Pattern for Different SNR.

sd (k), and interfering sequences forming i(k), respectively. was able to achieve a convergence after just 5 iterations.
The desired signal d(k) represented by v d , is considered Moreover, compared to the LMS with a step size of 0.5, i.e.
as an identical copy of the message signal corrupted by maximum suggested step size [30], the RC-pLMS achieved an
CAWGN noise with a signal to noise ratio S N R = 10d B. The accelerated convergence with a minimal step size of 0.0156
parameters and initial conditions at k = 0 are given as, μ1 = in high SNR environments. Moreover, the selected μ RC p L M S
μ2 = μ p L M S = μ RC p L M S = μ D RC p L M S = 2−6 = 0.0156, ensures an accelerated convergence and a minimal error floor
μ L M S = 0.5, d(−1) = d(0), x(−1) = x(0), θ p2 = 0.01 and in low SNR environments, where the use of a smaller step size
θq2 = 0.04. As for the RLS the initial parameters are the is a mandatory for convergence [30]. Additionally, compared
forgetting factor α = 0.98 and initial matrix P(0) = 0.5I , to the RLS, which is of order O(N 2 ) complexity, our proposed
where I is a N × N identity matrix. system presented near similar behavior in convergence and
error floor but with a reduced complexity of order O(N).
A. Simulation Results To numerically assess the validity of proposition − I, the
The simulation results are displayed as a function of the MSE convergence behavior is simulated, using recurring
MSE convergence, and the beam radiation pattern. As shown (repeated) samples, for the RC-pLMS as shown in Fig. 9.
in Fig. 8, for a S N R = 10d B and in contrast to the LMS, From Fig. 9, it is clear that the RC-pLMS, having its delayed
the pLMS and RC-pLMS presented similar LLMS accelerated inputs multiplied by j , preserved its accelerated convergence
convergence behavior with minimal residual error. The system and low steady-state error. Additionally, the MSE behavior

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
AKKAD et al.: PIPELINED REDUCED COMPLEXITY TWO-STAGES PARALLEL LMS STRUCTURE FOR ADAPTIVE BEAMFORMING 5089

the need for computing an independent set of weights. Thus,


the reduced complexity pLMS design (RC-pLMS) is obtained.
In order to design a pipelined hardware architecture of our
proposed RC-pLMS algorithm, we applied the delay and sum
relaxation technique (DRC-pLMS). Stability and quantization
effect analysis were performed to determine the upper bound
of the step size. Experimental results demonstrated the out-
standing performance of the proposed RC-pLMS compared
to different adaptive algorithms. Mean square error behavior
shows that the proposed RC-pLMS achieves convergence after
just few iterations. In contrast to the classical RLS with a
computational complexity of order O(N 2 ), the RC-pLMS
achieved major performance improvement while preserving
a LMS like complexity of order O(N). The beam radiation
pattern obtained through hardware simulation results show that
the finite precision DRC-pLMS presented similar behavior
of the infinite precision theoretical results. Finally, synthe-
Fig. 12. Infinite and Finite Precision Beam Radiation Pattern for an Angle sis results show that the DRC-pLMS achieved a significant
of Arrival 30o .
increase in the maximum operating frequency compared to
other LMS variants. Moreover, the presented implementation
describes the system similarity in transient (pLMS) and steady- is obtained at the cost of a marginal increase in resource usage
state (RC-pLMS) behaviors thus validating the approximation compared to the delayed LMS. Thus, the proposed RC-pLMS
in proposition − II. overcomes the preset constraints of modern beamformers by
To further assess the performance and the stability of the presenting a high performance low computational complexity
system for different SNRs, the MSE convergence behav- structure. Future work includes the analysis of the performance
ior and beam pattern are simulated for the RC-pLMS and of the system for non-stationary processes and wide-band
DRC-pLMS with a SNR ranging from −5d B to 7d B with signals.
a step of 6d B. From Fig. 10a, it is clear that the RC-pLMS
R EFERENCES
achieved accelerated convergence for a S N R = −5d B at
the cost of a larger residual error. On the other hand, from [1] A. Mansour, R. Mesleh, and M. Abaza, “New challenges in wireless
and free space optical communications,” Opt. Lasers Eng., vol. 89,
Fig. 10b, the DRC-pLMS achieves a satisfactory behavior for pp. 95–108, Feb. 2017.
SNR environments up to only 1d B. Such degradation is a [2] D. Burgos, J. Kunzler, R. Lemos, and H. Silva, “Adaptive beamforming
consequence of the sum relaxation adopted in Algorithm 3 for moving targets using genetic algorithms,” in Proc. Workshop Eng.
Appl. Int. Congr. Eng. (WEA), Bogota, Colombia, Oct. 2015, pp. 1–5.
and can be omitted for large values of D3 , i.e. a high order [3] G. Akkad, A. Mansour, B. A. ElHassan, J. Srar, M. Najem, and
moving average filter. However, the increase in D3 presents F. L. Roy, “Low complexity robust adaptive beamformer based on
larger hardware overhead. Additionally, From Fig. 11a, the parallel RLMS and Kalman RLMS,” in Proc. 27th Eur. Signal Process.
Conf. (EUSIPCO), Coruna, Spain, Sep. 2019, pp. 1–5.
RC-pLMS is able to achieve an acceptable beam pointing [4] A. Mansour, C. Jutten, and P. Loubaton, “Adaptive subspace algorithm
and interference nulling for −5d B. However, the DRC-pLMS for blind separation of independent sources in convolutive mixture,”
described in Fig. 11b, shows acceptable results for up to only IEEE Trans. Signal Process., vol. 48, no. 2, pp. 583–586, Feb. 2000.
[5] J. A. Srar, K.-S. Chung, and A. Mansour, “Adaptive array beamforming
1d B thus validating the MSE results. using a combined LMS-LMS algorithm,” IEEE Trans. Antennas Propag.,
Furthermore, the resulting finite precision beam pattern is vol. 58, no. 11, pp. 3545–3557, Nov. 2010.
obtained for the RC-pLMS, DRC-pLMS and DLMS weights, [6] F. Shen, F. Chen, and J. Song, “Robust adaptive beamforming based on
steering vector estimation and covariance matrix reconstruction,” IEEE
at 10d B, following a hardware simulation for Q2.15 signed Commun. Lett., vol. 19, no. 9, pp. 1636–1639, Sep. 2015.
format. As shown in Fig. 12, the resulting spatial filter weights [7] J. A. Srar, K.-S. Chung, and A. Mansour, “Adaptive array beamforming
steered the main lobe towards the direction of arrival 30o of the using a combined LMS-LMS algorithm,” in Proc. IEEE Aerosp. Conf.,
desired signal while nulling interferes coming from the preset Tokyo, Japan, Mar. 2010, pp. 1–5.
[8] N. R. Shanbhag and K. K. Parhi, “Relaxed look-ahead pipelined LMS
angles of 45o and 80o respectively. Additionally, we observe adaptive filters and their application to ADPCM coder,” IEEE Trans. Cir-
that the resulting main lobs achieved for the 18-bits fixed point cuits Syst. II. Analog Digit. Signal Process, vol. 40, no. 12, pp. 753–766,
format shows near identical results to the theoretical models Dec. 1993.
[9] R. K. Sarma, M. T. Khan, R. A. Shaik, and J. Hazarika, “A novel time-
denoted by RC-pLMS and DRC-pLMS. shared and LUT-less pipelined architecture for LMS adaptive filter,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 1,
pp. 188–197, Jan. 2020.
VIII. C ONCLUSION AND F UTURE W ORK [10] W. Zhao, J. Q. Lin, S. C. Chan, and H. K.-H. So, “A division-free
and variable-regularized LMS-based generalized sidelobe canceller for
In this manuscript, we presented, pLMS, a two-stages par- adaptive beamforming and its efficient hardware realization,” IEEE
allel LMS adaptive beamformer with accelerated convergence, Access, vol. 6, pp. 64470–64485, 2018.
minimal error floor and pipelined architecture most suitable for [11] J. Ma, K. K. Parhi, and E. F. Deprettere, “Annihilation-reordering
look-ahead pipelined CORDIC-based RLS adaptive filters and their
a hardware implementation. The pLMS structure was further application to adaptive beamforming,” IEEE Trans. Signal Process.,
simplified by adding a delayed version of the input, eliminating vol. 48, no. 8, pp. 2414–2431, Aug. 2000.

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
5090 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 67, NO. 12, DECEMBER 2020

[12] L.-D. Van and W.-S. Feng, “An efficient systolic architecture for the [37] P. Clarkson and P. White, “Simplified analysis of the LMS adaptive filter
DLMS adaptive filter and its applications,” IEEE Trans. Circuits Syst. using a transfer function approximation,” IEEE Trans. Acoust., Speech,
II. Analog Digit. Signal Process, vol. 48, no. 4, pp. 359–366, Apr. 2001. Signal Process., vol. 35, no. 7, pp. 987–993, Jul. 1987.
[13] P. K. Meher and S. Y. Park, “Critical-path analysis and low-complexity
implementation of the LMS adaptive algorithm,” IEEE Trans. Circuits
Syst. I, Reg. Papers, vol. 61, no. 3, pp. 778–788, Mar. 2014.
[14] B. D. Van Veen and K. M. Buckley, “Beamforming: A versatile approach Ghattas Akkad received the B.Sc. and M.Sc.
to spatial filtering,” IEEE ASSP Mag., vol. 5, no. 2, pp. 4–24, Apr. 1988. degrees in computer engineering from the University
[15] V. H. Nascimento, “The normalized LMS algorithm with dependent of Balamand in 2012 and 2014, respectively. He is
noise,” in Proc. Anais Simpòsio Brasileiro, 2001, pp. 1–10. currently pursuing the Ph.D. degree in adaptive
[16] F. Albu, J. Kadlec, N. Coleman, and A. Fagan, “The Gauss-Seidel fast beamforming with the Ecole Nationale Superieure
affine projection algorithm,” in Proc. IEEE Workshop Signal Process. de Techniques Avancees Bretagne (ENSTA Bre-
Syst., San Diego, CA, USA, Oct. 2002, pp. 109–114. tagne), Brest, France, under the supervision of Pro-
[17] Albu, Kadlec, Coleman, and Fagan, “Pipelined implementations of the fessor Mansour He is currently a Certified Labview
a priori error-feedback LSL algorithm using logarithmic arithmetic,” in Developer with the Ecole Nationale Superieure de
Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Orlando, FL, Techniques Avancees Bretagne (ENSTA Bretagne)
USA, May 2002, pp. 2681–2684. and expected to graduate in October 2020. He has
[18] A. Zoubir, M. Viberg, R. Chellappa, and S. Theodoridis, Academic Press published under IEEE, Springer, and EUSIPCO conferences. His research
Library in Signal Processing, Array and Statistical Signal Processing, interests are signal processing, adaptive beamforming, signal estimation and
vol. 3. Amsterdam, The Netherlands: Elsevier, 2013. detection, neural networks, computer architecture and hardware design, and
[19] A. Farina and L. Ortenzi, “Effect of ADC and receiver saturation on SoC hardware acceleration.
adaptive spatial filtering of directional interference,” Signal Process.,
vol. 83, no. 5, pp. 1065–1078, May 2003.
[20] Q. Jing, Y. Li, and J. Tong, “Performance analysis of multi-rate signal Ali Mansour (Senior Member, IEEE) received
processing digital filters on FPGA,” EURASIP J. Wireless Commun. the M.S. degree in electronic electric engineer-
Netw., vol. 2019, no. 1, p. 31, Dec. 2019. ing from Lebanese University in September 1992,
[21] W. Shang, Z. Dou, W. Xue, and Y. Li, “Digital beamforming based on the M.Sc. and Ph.D. degrees in signal, image and
FPGA for phased array radar,” in Proc. Prog. Electromagn. Res. Symp. speech processing from INPG, Grenoble-France,
Spring (PIERS), St. Petersburg, Russia, May 2017, pp. 437–440. in July 1993 and January 1997, respectively,
[22] V. Seneviratne, A. Madanayake, and L. Bruton, “Multidimensional-DSP and the HDR degree (Habilitation a Diriger des
beamformers using the ROACH-2 FPGA platform,” Electronics, vol. 6, Recherches, this is the highest of the higher degrees
no. 3, p. 49, 2017. in the French system) from UBO, Brest, France,
[23] S. R. Theodore et al., Wireless Communications: Principles and Prac- in November 2006. He held many positions such
tice. Upper Saddle River, NJ, USA: Prentice-Hall, 2002, p. 69. as postdoctoral at LTIRF-INPG, Grenoble-France; a
[24] U. Meyer-Baese, Digital Signal Processing with Field Program- Researcher with BMC–RIKEN, Nagoya, Japan; a teacher–researcher position
mable Gate Arrays (Signals and Communication Technology). Berlin, with ENSIETA, Brest, France; a Senior Lecturer with the Department of ECE,
Germany: Springer-Verlag, 2004. Curtin University, Perth, Australia; an Invited Professor with ULCO, Calais,
[25] P. S. Yedavalli, T. Riihonen, X. Wang, and J. M. Rabaey, “Far-field RF France; and a Professor with Tabuk University, Tabuk, Saudi Arabia. He is
wireless power transfer with blind adaptive beamforming for Internet of currently a Professor with ENSTA-Bretagne. He published numerous refereed
Things devices,” IEEE Access, vol. 5, pp. 1743–1752, 2017. publications. He has authored or coauthored several books or book chapters.
[26] L. Xing, J. Zhu, Q. Xu, D. Yan, and Y. Zhao, “A circular beam-steering During his career, he had successfully supervised several research associates
antenna with parasitic water reflectors,” IEEE Antennas Wireless Propag. and Ph.D. and M.Sc. students. He is interested in blind source separation, high
Lett., vol. 18, no. 10, pp. 2140–2144, Oct. 2019. order statistics, signal processing, robotics, telecommunication, biomedical
[27] B. Widrow, J. McCool, and M. Ball, “The complex LMS algorithm,” engineering, electronic warfare, and cognitive radio. He had also been the
Proc. IEEE, vol. 63, no. 4, pp. 719–720, Apr. 1975. Lead Guest Editor of the EURASIP Journal on Advances in Signal Processing.
[28] J. Kim and A. D. Poularikas, “Performance analysis of the adjusted step He was the Vice President of the IEEE Signal Processing Society in Western
size NLMS algorithm,” in Proc. 36th Southeastern Symp. Syst. Theory, Australia for two years.
Atlanta, GA, USA, 2004, pp. 467–471.
[29] D. P. Mandic, S. Kanna, and A. G. Constantinides, “On the intrinsic
relationship between the least mean square and Kalman filters [Lecture
Notes],” IEEE Signal Process. Mag., vol. 32, no. 6, pp. 117–122, Bachar A. ElHassan received the Diploma degree
Nov. 2015. in engineering from the Faculty of Engineering,
Lebanese University, Tripoli, Lebanon, in 1991,
[30] S. Haykin, Adaptive Filter Theory. London, U.K.: Pearson, 2013.
the M.S. degree in signal and image processing from
[31] G. Akkad, A. Mansour, B. Elhassan, J. Srar, M. Najem, and F. Leroy, the National Polytechnic Institute (INPG), Grenoble,
“An efficient non-blind steering vector estimation technique for robust France, in 1992, and the Ph.D. degree in elec-
adaptive beamforming with multistage error feedback,” in Intelligent tronics from INPG, France, in 1995, in collabo-
Decision Technologies, vol. 2, pp. 13–23, Jun. 2019. ration with France Telecom. Since 1996, he has
[32] P. Bouboulis and S. Theodoridis, “Extension of Wirtinger’s calcu- been with the Faculty of Engineering, Lebanese
lus to reproducing kernel Hilbert spaces and the complex kernel University, Tripoli, Lebanon, where he is currently
LMS,” IEEE Trans. Signal Process., vol. 59, no. 3, pp. 964–978, a Full Professor. He served as a Chairman for the
Mar. 2011. Electrical Engineering Department for more than six years. He is also the
[33] O. Macchi and E. Eweda, “Second-order convergence analysis of Founder and the Director of the laboratory of Telecommunication, Networking
stochastic adaptive linear filtering,” IEEE Trans. Autom. Control, vol. 28, and Microwaves (LTRM), Doctoral School of Sciences and Technologies
no. 1, pp. 76–85, Jan. 1983. (EDST), Lebanese University. His research interests are as follows: image
[34] J. A. Srar, K.-S. Chung, and A. Mansour, “LLMS adaptive beam- processing, digital communications, intelligence in wireless sensor networks,
forming algorithm implemented with finite precision,” in Proc. and Internet of Things. He was the Chair of the IEEE ComSoc Lebanon
20th Telecommun. Forum (TELFOR), Belgrade, Serbia, Nov. 2012, chapter from January 2013 to December 2016 and the Vice Chair of the
pp. 303–306. IEEE Lebanon Section form January 2017 to December 2018. He has
[35] C. Caraiscos and B. Liu, “A roundoff error analysis of the LMS adaptive been the Chair of the IEEE Lebanon Section Elected since January 2019.
algorithm,” IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 1, He founded and was the Director of the Development and Training Center,
pp. 34–41, Feb. 1984. the Order of Engineers and Architects Tripoli, Lebanon, from July 2009 to
[36] Altera. (Oct. 2019). Stratix V Device Handbook Volume 1: Device December 2013. He was the Head of the Employed Engineers Branch,
Interfaces and Integration. Accessed: Apr. 19, 2020. [Online]. Available: Order of Engineers and Architects, Tripoli, Lebanon, from April 2016 to
https://www.intel.com/content/dam/www/programmable/us/en/pdfs/ April 2019. He has also been the member of the ICT Arab Committee
literature/hb/stratix-v/stx5_core.pdf since March 2017.

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.
AKKAD et al.: PIPELINED REDUCED COMPLEXITY TWO-STAGES PARALLEL LMS STRUCTURE FOR ADAPTIVE BEAMFORMING 5091

Elie Inaty (Member, IEEE) was born in El-Koura, Jalal A. Srar was born in Libya, 1970. He received
Lebanon, in June 1975. He received the B.S. and the B.Sc. degree in electronics engineering from
M.S. degrees in electrical engineering from the Uni- Garunis University, Libya, in 1993, the M.Sc. degree
versity of Balamand, El-Koura, in 1996 and 1998, in industrial engineering and the M.Sc. degree in
respectively, and the Ph.D. degree from Universite communication engineering from the College of
Laval, Quebec, QC, Canada, in 2001. He was an Industrial Technology, Libya, in 2001 and 2006,
Adjunct Professor with Universite Laval. He is cur- respectively, and the Ph.D. degree in adaptive
rently an Associate Professor with the University antenna arrays from Curtin University, Australia,
of Balamand. His research interests include code in 2011. From 2001 to 2006, he worked with
division multiple access and wavelength-division- the Adaptive Antenna Research Group, Hill, Libya.
multiplexing fiber optic communications, network He has been a Lecturer with the Electrical Engineer-
control and resource management issues in optical communication networks, ing Department, Misurata University, since 2003. His research interests are
and radio multiple-access techniques. beamforming algorithms, adaptive antenna, signal processing for communi-
cations, signal processing for early detection of cancer diseases, and finally
Rafic Ayoubi (Member, IEEE) received the B.S. wideband signal generation for Civil and Military applications.
degree in electrical engineering and the M.S.
and Ph.D. degrees in computer engineering from
the University of Louisiana, Lafayette, Louisiana,
in 1988, 1990, and 1995, respectively. He joined the
University of Balamand, Koura, Lebanon, in 1996,
where he is currently an Associate Professor. His
current research interests are in the areas of paral-
lel architectures, parallel algorithms, fault tolerance,
artificial neural networks, and FPGA technology.
In these areas, he published several research articles
in several journals and conferences. He has received the first prize in the
Second Annual exhibition for Industrial Research Achievements in Lebanon.

Authorized licensed use limited to: Kongu Engineering College. Downloaded on December 20,2021 at 09:57:27 UTC from IEEE Xplore. Restrictions apply.

You might also like