You are on page 1of 14

Received: 17 February 2023

DOI: 10.1049/rsn2.12405

ORIGINAL RESEARCH
- -Revised: 30 March 2023 Accepted: 2 April 2023

- IET Radar, Sonar & Navigation

Boosting multi‐target recognition performance with multi‐input


multi‐output radar‐based angular subspace projection and
multi‐view deep neural network

Emre Kurtoğlu1 | Sabyasachi Biswas2 | Ali C. Gurbuz2 | Sevgi Zubeyde Gurbuz1

1
Electrical and Computer Engineering, University of Abstract
Alabama Tuscaloosa, Tuscaloosa, AL, USA
Current radio frequency (RF) classification techniques assume only one target in the field
2
Electrical and Computer Engineering, Mississippi of view. Multi‐target recognition is challenging because conventional radar signal pro-
State University, Starkville, MS, USA
cessing results in the superposition of target micro‐Doppler signatures, making it difficult
Correspondence
to recognise multi‐target activity. This study proposes an angular subspace projection
Sevgi Zubeyde Gurbuz. technique that generates multiple radar data cubes (RDC) conditioned on angle (RDC‐ω).
Email: szgurbuz@ua.edu This approach enables signal separation in the raw RDC, making possible the utilisation
of deep neural networks taking the raw RF data as input or any other data representation
Funding information in multi‐target scenarios. When targets are in closer proximity and cannot be separated by
National Science Foundation, Grant/Award classical techniques, the proposed approach boosts the relative signal‐to‐noise ratio be-
Numbers: 1931861, 1932547, 2047771
tween targets, resulting in multi‐view spectrograms that boosts the classification accuracy
when input to the proposed multi‐view DNN. Our results qualitatively and quantitatively
characterise the similarity of multi‐view signatures to those acquired in a single‐target
configuration. For a nine‐class activity recognition problem, 97.8% accuracy in a 3‐
person scenario is achieved, while utilising DNN trained on single‐target data. We also
present the results for two cases of close proximity (sign language recognition and side‐
by‐side activities), where the proposed approach has boosted the performance.

KEYWORDS
Deep Learning, Human Activity Recognition, MIMO Radar

1 | INTRODUCTION stepped frequency CW (SFCW), frequency modulated CW


(FMCW), and impulse‐radio ultra‐wide band (IR‐UWB)
Radio frequency (RF) sensors have been gaining an ever‐ modulations. While architecturally CW radars have been low‐
expanding range of applications, especially due to the emer- complexity, they are limited to only velocity measurements of
gence of low‐cost solid‐state transceivers and computationally the target. Impulse‐radio ultra‐wide band transmissions have
efficient modern graphics processing units. The advent of deep high range resolution, making them well‐suited to vital sign
learning has made possible recognition performances that were measurements, while FMCW radars can achieve high resolu-
not possible a decade ago [1]. These applications include tion range and velocity measurements. More recently, advances
perimeter defence and security [2], drone recognition [3], in automotive radar technologies have made lower‐cost multi‐
automotive perception [4], both in and out of the cabin, human input multi‐output (MIMO) FMCW radar systems [9] more
activity recognition (HAR) [5], gesture recognition [6], sign ubiquitous, now enabling azimuth and elevation angle mea-
language recognition [7], and remote vital sign and health surements, in addition to range and velocity, by utilising the
monitoring [8], among others. phase difference between antenna array elements.
In these applications, RF sensors transmitting a variety of Different types of targets and their motions can be rec-
waveforms can be utilised, including continuous wave (CW), ognised by proving the radar micro‐Doppler (μ‐D) signature

-
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the
original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
© 2023 The Authors. IET Radar, Sonar & Navigation published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

IET Radar Sonar Navig. 2023;17:1115–1128. wileyonlinelibrary.com/journal/rsn2 1115


17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1116
- KURTOĞLU ET AL.

[10] of the target as input to a deep neural network (DNN). based results are presented only for sinusoidal μ‐D curves—
While the gross translational movement of target results in a in real measured signatures where the backscatter from each
central Doppler shift, micro‐Doppler is a term that refers to target is not comprised of a distinct curve, but it is not clear how
smaller frequency modulations centred around the main effective this approach would be. Moreover, the μ‐D signatures
Doppler shift, which are caused by rotations or vibrations of of more complicated targets, such as humans, are comprised of
different parts of the target—known as micro‐motions. multiple trajectories corresponding to the movements of each
Different target types (e.g. car, bicycle, drone) and different body part. This approach does not address the subsequent as-
human activities (e.g. walking, running, gesturing) resulting in sociation problem that would ensue for target signatures
unique patterns in the μ‐D signature that can be used to comprised of multiple trajectories. A similar challenge is faced
classify the target type and its motion. The radar μ‐D signature by the multiple target tracking approach proposed in Ref. [21],
is computed from the raw RF I/Q data via the computation of where the coning targets considered similarly consist of distinct
time‐frequency transformations, such as the short‐time Fourier sinusoidal trajectories.
transform (STFT). Recent works have been proposed using To overcome these limitations, Pegoraro et. al [22] pro-
higher dimensional RF data representations, such as range‐ posed a density clustering and tracking‐based technique to
Doppler (RD) maps versus time to jointly provide time‐ separate the μ‐D for up to four people walking back and forth
varying range and velocity information from FMCW MIMO along a corridor. A trajectory association algorithm is utilised
radars, for example, [11–13] to the DNN for classification. to match clusters with tracked trajectories, and a convolutional
Such works have shown that the utilisation of multiple domain neural network (CNN) that incorporates a reconstruction loss
representations have boosted recognition accuracy in com- term in the cost function is utilised to correctly identify the
parison with the use of just a single representation. person walking in case tracking fails and clusters of some
Unfortunately, much of the current literature is limited to subjects cannot be separated. Although this method in prin-
consideration of just single target scenarios, even though the ciple can generalise to realistic target signatures, results are only
presence of multiple targets is typical in real‐world environ- presented for identification of walking people—HAR for
ments. In multi‐target scenarios, the RF μ‐D signature multiple people in a scene is not considered. Moreover, it can
computed from the raw I/Q data will result in the signature for potentially suffer from trajectory instability due to missed de-
each target super‐imposed upon each other. Consequently, tections, a probable event in scenes where there is significant
DNNs trained with data with just a single target present will not clutte and ghost targets resulting from multi‐path reflections.
be able to correctly recognise the targets' activity. Many of the Significant effort is involved in trajectory management, a task
works involving multiple targets focus on counting the number that is increasingly complicated as the number of targets pre-
of people present [14–17], while just a few works actually aim at sent increases.
separating the μ‐D signatures. Vishwakarma et. al [18] propose a A common limitation of all of the aforementioned works,
sparse coding dictionary learning‐based algorithm to separate however, is that they can perform separation only after tracking
the μ‐D returns from multiple targets. However, the approach results are obtained or after radar images are computed from
relies on parametric models for simulating the human and fan the raw data cube. The computation of tracking trajectories
returns considered as part of a binary classification problem. introduces additional delays in the classification process, which
The Boulic model used to simulate human returns only ap- is detrimental to real‐time applications. For cases like multiple
proximates walking, thus precluding the approach from being close targets or gesture recognition for right and left hands,
effective when generalised human activities, which are not easily separation of targets in the range‐Angle (RA) domain is
represented by a parametric model, are observed. Even in the generally not possible, detrimentally effecting the performance
limited case presented, the separated μ‐D signatures suffer from of existing approaches. Additionally, decomposition or target
losses in comparison to their measured counterparts. separation made at the image level is not scalable to other
Alternatively, Huang et. al [19] applies a multi‐stage sepa- data domains as the raw I/Q signals of the targets are still
ration scheme in which range gating is first applied for pre- superimposed. This precludes the utilisation of joint domain
liminary separation, followed by the design of a multi‐task classification techniques, which require lower‐level signal sep-
learning network for fine signature separation and recognition. aration in the raw radar data and hence radar data cubes (RDC)
However, because multiple targets can be present at the same itself.
range, this approach is essentially relying on the DNN to figure The proposed angular subspace projection‐based separa-
out the signal separation solely based on a learnt, data‐driven tion (ASPS) technique projects the raw radar data onto an
model. This inherently limits the applicability of the pro- angle subspace and generates multiple low‐level RDC‐ω rep-
posed approach to only those target signatures for which the resentations for the targets at different aspect angles. Figure 1
model has been trained and will not generalise easily to pre- shows the fundamental differences between radar signal pro-
viously unseen target μ‐D signatures. cessing stages with and without the proposed projection
Another common approach is to explicitly track the com- method along with its advantages and disadvantages. In
ponents of the overall μ‐D signature. Wang et. al [20] formu- particular, we show that the proposed projection method im-
lates the problem as one of path planning and proposes ant proves the similarity between a target's original signal and the
colony optimisation to identify the points corresponding to decomposed multi‐target signal after projection. This enables
different paths within the overall μ‐D signature. Simulation‐ the utilisation of a DNN trained for the single‐target case to be
17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KURTOĞLU ET AL.
- 1117

FIGURE 1 Conventional versus angular projection‐based radar signal processing (RSP) chain for multi‐target scenarios.

utilisable even in multi‐target scenarios. For a 9‐class μ‐D 2 | MULTIPLE TARGET SIGNAL
signature recognition problem, a four‐layer CNN achieves MODEL
97.8% when three targets are present within the radar field of
view. In cases where multiple targets are positioned with close In an FMCW radar, the frequency of the transmit signal
proximity to each other, the RDC‐ω representation offers changes over time, typically as a linear sweep across the
multi‐view inputs, which highlights the differences at each bandwidth so that both range and velocity measurements can
angle that can be exploited via a multi‐view DNN to achieve be acquired. [24]. In MIMO FMCW radar systems, the pres-
improved classification accuracy. This work expands on pre- ence of multiple channels also enables the estimation of the
liminary work presented as a conference paper [23] by 1) angle of arrival of the radar backscatter from a target. The
adding a weighting stage to the projection pipeline, 2) char- transmitted FMCW signal, ST(t), can be modelled as
acterising target separability as a function of the number of �
antenna elements and aspect angle for an expanded number of ST ðtÞ ¼ exp j2πf c t þ jπαt2 ð1Þ
targets, which includes human activities and several different
gaits of a robotic dog, and 3) demonstrating improved classi- where fc is the carrier frequency, α is the chirp rate defined as
fication results for a 9‐class HAR scenario using a 4‐layer CNN the ratio of bandwidth, B, to the sweep duration, T, as α ¼ TB .
and 10‐class gesture/sign language recognition scenario using a Now suppose that once transmitted, the signal reflects
multi‐view DNN on an expanded dataset that has a greater back from a set of K targets with corresponding ranges Ri and
number of samples per class. radial velocities vi i = 1, 2, …, K. The received signal in one
The specific contributions of this paper are as follows: channel of the radar system can be expressed as

1. We propose the ASPS method to separate and boost the K


X �
relative signal‐to‐noise ratio (SNR) of targets at different SR ðtÞ ¼ Ai exp j2πf c ðt − ti Þ þ jπαðt − ti Þ2 ð2Þ
aspects angles, generating raw multi‐view RDC‐ω data i¼1

representations.
2. The effect of the number of antenna channels and target where Ai is a complex constant related to target radar cross‐
aspect angle on ASPS performance is evaluated. section, and ti is the round trip time delay for the ith target.
3. The effectiveness of ASPS is demonstrated for a HAR Each RF sensor channel's raw data is collected as a time stream
application in an end‐to‐end framework. of in‐phase (I) and quadrature (Q) samples. In FMCW trans-
4. We propose a novel multi‐view DNN that utilises multiple ceivers, the received signal is mixed with a copy of the trans-
RDC‐ω‐derived micro‐Doppler signatures as its input to mitted signal and then low‐pass filtered to remove unwanted
boost classification performance when targets are in close high‐frequency mixing byproducts. The output of the filter as
proximity, such as when separately considering left and right an intermediate frequency (IF) signal SIF(t) is given as follows:
hand movements for sign language recognition. K
X
SIF ðtÞ ¼ Ai expð j2παti t þ ϕi Þ; ð3Þ
The rest of the paper is organised as follows: The signal i¼1
model used in MIMO FMCW radars and detection of multiple
targets in range, Doppler and angle domains are explained in where ϕi is a constant phase term over time and is the function
Section 2. The proposed ASPS method is presented in Sec- of time delays, chirp rate, and carrier frequency. Sampling the
tion 3. The experimental setup for the radar sensor and dataset IF return signal results in N fast‐time samples for each pulse,
collection is provided in Section 4, and Section 5 presents the while computation of its Fourier transform reveals the beat
performance results of the ASPS method for different exper- frequencies Fb = αti, which are directly related to the round‐
imental and hardware configurations. Section 6 presents the trip travel time and distance between the radar and target.
classification results. Finally, Section 7 concludes the paper and The velocity of a target can be obtained through transmission
discusses the future directions. and coherent processing of P pulses. Because the pulse
17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1118
- KURTOĞLU ET AL.

repetition interval (PRI) is typically much longer than the is applied to the RD and RA maps, the presence of several
analog‐to‐digital converter (ADC) sampling interval, the pulse distinct targets can be seen. In the RD map, however, despite
number is typically referred to as slow‐time, while the ADC three targets being present, only two can be detected as two of
samples are referred to as fast‐time. For a MIMO radar with a the people were located at the same distance from the radar.
uniform linear array (ULA) consisting of M virtual channels, a Consequently, in the RD map, their RF returns fall into the
RDC with dimensions N � P � M can be formed. Fourier same RD bin and cannot be distinguished. On the other hand,
processing can be utilised to compute from the RDC various in the CA‐CFAR results for the RA map, three separate targets
RF data representations, such as RD or RA versus time. can now be observed. Hence, multiple targets live in the RD or
Micro‐Doppler can be found by computing the spectro- angle domains and might not be always be separated. Our goal
gram (square modulus of the STFT) across slow‐time. How- is to generate multi‐view raw radar data representations that
ever, when multiple targets are present in the radar field‐of‐view will enhance target classifications even for cases that might not
(FoV), the received signal is comprised of the backscatter from be separated in RA domains.
all the targets in the scene. Thus, the spectrogram consists of the Towards this goal, we propose an angular sub‐space pro-
superposition of the micro‐Doppler signatures for all targets. jection technique for signal separation that decomposes the raw
Figure 2 illustrates the resulting μ‐D spectrogram obtained complex RDC of multiple targets from each other by generating
when two people present in the FoV—one person is walking individual RDCs for each angle subspace. This process
towards the radar and the other is walking away from the radar. strengthens the signals received from the projection angle
Comparing the multi‐target spectrogram with the one obtained subspace but weakens the signals received from any other angle.
when the same activities are recorded individually in the radar Because the signal separation is accomplished at the RDC‐level,
FoV, it can be observed that the μ‐D for each person exhibits subsequently any desired radar data representation for the
the same patterns; but that in the multi‐target spectrogram, separated multi‐view RDCs, including micro‐Doppler signature,
these signatures are overlayed or superimposed so that they RD and RA maps, can be computed. This yields advantages over
interfere with each other. As DNNs for HAR are typically trajectory tracking‐based approaches in the literature because
trained on μ‐D signatures recorded when just a single person is the joint use of multiple input representations has given
present, such activities of the subjects in multi‐target spectro- improved classification performance over just using micro‐
grams will not be accurately classified. Thus, the signatures for Doppler signatures. In essence, a more fundamental form of
each person need to be separated prior to input to a DNN for signal separation is being accomplished at the raw radar signal
classification. level.
The proposed framework, shown in Figure 4, consists of
two main stages. The first stage involves target detection and
3 | ANGULAR SUBSPACE PROJECTION‐ angular projection. Range‐Doppler maps are obtained through
BASED SEPARATION 2D Fast Fourier Transform (FFT) on the fast and slow‐time
dimensions of each channel of the acquired raw RDC. Then,
3.1 | Rationale of proposed approach CA‐CFAR adaptive thresholding is applied on the RD maps to
detect the targets. Angle‐of‐arrival (AoA) of the detected targets
Rather than to immediately compute the μ‐D, the potential
separability of targets can be observed through the use of other
radar data representations, especially the RD and RA maps.
Consider the case of three people walking towards a radar. The
RD and RA maps for this scenario is shown in Figure 3. When
a cell‐averaging constant false alarm rate (CA‐CFAR) detector

FIGURE 2 μ‐D spectrograms belonging to single‐target and multi‐ F I G U R E 3 Separation of three people (walking towards radar) in the
target cases. range, Doppler and angle domain.
17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KURTOĞLU ET AL.
- 1119

FIGURE 4 Proposed end‐to‐end framework of the angular projection method for a classification application.

are estimated using the MUltiple SIgnal Classification (MUSIC) RD‐angle domain. The number of targets in the radar FoV
[25] super‐resolution algorithm. Next, the proposed projection along with their angles is utilised to apply the projection al-
approach, detailed in Section 3.3, is applied with the detected gorithm. If the user has the priori knowledge of this infor-
angles to form new, projected RDCs for each target. The second mation, angles of the targets can directly be fed into the angular
stage involves the implementation of task‐specific processing projection algorithm without the initial detection stage.
on each individual target RDC. For the task of classification, Otherwise, a target detection and an angle estimation method
individual μ‐D signatures [26] can be generated by applying any is applied.
desired time‐frequency transformation on the projected RDCs. A 2D FFT can be applied on the fast and slow‐time
The μ‐D signatures can then be given as input to a DNN to dimension of the multi‐target RDC to obtain the RD maps
separately classify each target. Although μ‐D is used as the input spanning each coherent processing interval, a.k.a. frame. In
representation for classification here, different representations order to detect the targets in the RD maps, a widely used
from multi‐view RDC can also be computed. Even though adaptive thresholding method, CA‐CFAR detection, is applied.
angular subspace projection is applied to detected targets here, Cell‐averaging constant false alarm rate extracts noise samples
we also show that the proposed projection idea boosts the from both leading and lagging RD bins (i.e., training cells)
classification performance even for targets that were not around the cell under test (CUT). The noise power estimate,
separated at the initial stage such as right or left‐hand gestures. Pn, can be computed as [27]follows:
The details of the computations comprising each stage are
presented next. N
1 X
Pn ¼ xm ð4Þ
N m¼1
3.2 | Target detection and AoA estimation
where N is the number of training cells, and xm is the sample in
The first stage to decompose multiple target RDC into indi- each training cell. Guard cells are chosen adjacent to the CUT,
vidual RDCs of each target is the detection of targets in both leading and lagging it. They serve for avoiding signal
17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1120
- KURTOĞLU ET AL.

components from leaking into the training cells, which could angles at one hand can be. As the output of this subspace
adversely affect the noise estimate. The threshold factor, a, can projection, multi‐view RDCs are obtained and varying feature
be written as follows: representations can be computed for each RDC.
� �
−1=N
a ¼ N Pf a − 1 ð5Þ
3.4 | Micro‐Doppler spectrogram generation
where Pfa is the desired false alarm rate.
After obtaining the new projected RDCs, one can utilise them
Once CA‐CFAR is applied on the RD maps, the detected
in any kind of application from tracking to classification since
target RD bins are passed through the MUSIC algorithm [25]
the projection method is applied at the raw I/Q signal level.
for angle estimation. This procedure is applied to all frames. To
This work focusses on the visualisation, classification and
reduce grid effects, clustering is applied in angle space, that is,
performance comparison of the μ‐D spectrograms generated
detections with close angles are clustered into one group. If the
from the projected RDCs with different similarity metrics. μ‐D
number of detections for a particular angle cluster exceeds the
spectrograms are generated from one channel of the RDC by
pre‐defined threshold, the centre angle of the cluster is added
taking the square modulus of the STFT across a slow‐time
to the list of projection angles, which is then given as input to
dimension with the Hanning window.
the projection algorithm.

3.3 | Projection of radar data cubes 4 | EXPERIMENTAL SETUP AND


DATASET
The previous detection stage generates angle estimates of each
In this work, Texas Instrument's AWR2243 Cascade MIMO
target. In this part, we project the raw multi‐target radar data to
radar is employed as the RF sensor which uses a sawtooth
the angular subspace of each target. To do this let us first
FMCW model. Frequency of the transmitted signal linearly
define the angular space. If there is a target at a specific angle
increases as a function of time during sweep repetition period
with respect to the array, the steering vector defines the signal
or sweep time, T. The start frequency in the radar system is
model received at the array. For the case of a ULA the steering
77 GHz, and a bandwidth of 4 GHz is used. So, the signal is
vector, a, can be formulated as follows:
linearly increased up to 81 GHz. The radar system can be
h iT configured as a long‐range radar in the beamforming mode for
aðθÞ ¼ 1 e−jð2πdsinðθÞ=λÞ … e−jð2πðM−1ÞdsinðθÞ=λÞ ð6Þ higher SNR or as a short‐range radar using the MIMO mode
for enhanced angular resolution. In this work, MIMO
where θ is the aspect angle, λ is the signal wavelength, d is the configuration is utilised.
spacing between array elements, and M is the total number of
channels in the ULA. Repeating Equation (6) for each dis-
cretised angle θ yields the steering matrix. The projection 4.1 | Virtual array generation with multi‐
method takes the lower and upper bounds of the desired input multi‐output processing
projection angular interval (i.e., θl and θu, respectively) as in-
puts. For a given angle space [θl, θu], we can discretise this The experimental MIMO radar contains four radar sensor chips
space and create a steering matrix, as B = [a(θ1) a(θ2), …, a(θi), cascaded together, where each contains three transmitters (Tx)
…, a(θS)] where each column is a steering vector for the and four receiver (Rx) antennas, resulting in a total of 12 Tx and
corresponding aspect angle θi ∈ [θl, θu]. Here the column 16 Rx channels. Nine of the 12 Tx antennas are in the same
space of B spans the angular subspace we want to project, and vertical position and the remaining three of them are at different
the slow time index k can be projected onto the column space heights. As for the Rx antennas, all of them are located at the
of B as follows: same height with a sparse distribution in the horizontal axis as
depicted in Figure 5a. Utilising relative positions of different Tx‐
� �−1 � Rx pairs in MIMO radars, one can form a virtual array with a
x nk ¼ B BT B BT xnk
b ð7Þ larger aperture size than what is provided on the hardware. It
can be achieved by employing different modulation techniques
where b x nk is the projected data. Repeating the projection for such as time‐division multiplexing (TDM), binary phase mod-
relevant fast‐time and slow‐time indexes will construct the ulation (BPM), and code‐division multiplexing.
projected RDC for the given angle subspace. The angle sub- Binary phase modulation works by introducing binary phase
space can be selected as the angular extent of the detected difference (i.e., 0° and 180°) between consecutive pulses which
targets. In some cases the angular subspace can also be is not effectively applicable when there exists more than two Tx
designed depending on the application. For example, in radar‐ channels. In this work, TDM is used as the virtual array gen-
based sign language recognition, to better represent right and eration scheme which schedules each Tx‐Rx pair to transmit
left‐hand activities, angular projections for each hand can be pulses. Using 12 Tx and 16 Rx, a virtual array with 192 channels
done considering an angular subspace that can include all (including the overlapping Tx‐Rx pairs) can be formed as
17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KURTOĞLU ET AL.
- 1121

FIGURE 5 Virtual array generation from multi‐input multi‐output (MIMO) array using time‐division multiplexing (TDM) (a‐b) and the experimental setup
(c).

depicted in Figure 5b, where 86 of these 192 channels can be TABLE 1 The acquired dataset for different number of targets.
used to form a ULA in the azimuth direction, providing angular Number of targets Angles Number of samples
resolution of as small as 1.4°.
1 0°, �45° 476
�15°, �30° 16
4.2 | Data collection 2 0°, �45° 15

3 0°, �45° 6
In order to assess the effectiveness of the proposed method, a
HAR dataset with nine activities are acquired where five of
them are human activities and four of them belong to a robotic
dog (i.e., Boston Dynamics's Spot). The acquired classes for
human participants can be listed as follows: walking towards
the radar, walking away from the radar, picking up an object
from the ground, sitting down to a chair, standing up from a
chair. The classes for Spot are: walking towards radar, walking
away from radar, crawling towards radar, crawling away from
radar. The experiment is conducted in a 4.5 m � 6.4 m indoor
area. Figure 5c shows the experimental setup and the layout of
the room where the data are acquired. Each of the activities are
performed at the angles of 0° and �45°. Few more samples are
also collected at �15° and �30° in order assess the limitations
of the proposed method which is discussed in Section 5.2.
Each recording for an activity is lasted for 7 s, and walking/
crawling activities are started at 4.5 m away from the radar.
Single and multi‐target cases are considered, and Table 1
summarises the acquired dataset for varying number of targets.
In this work, μ‐D spectrogram is used as the RF data
representation type for similarity analysis and classification. It
is a time versus velocity (or Doppler shift frequency) heatmap,
and sample μ‐D spectrograms belonging to different classes FIGURE 6 μ‐D spectrogram samples of different classes.
are provided in Figure 6. Three different datasets are generated
from the acquired samples:
allows us to create a scene with multiple targets in a syn-
� The first dataset consists of single activity samples where thetic way. This dataset is referred as merged multi‐target
only one target present in the radar FoV. This dataset is dataset throughout the paper.
referred as single activity dataset. � Finally, the data of multiple subjects located at different
� Secondly, merging of raw I/Q signals of the targets located aspect angles are also recorded which are named as real
at different aspect angles by summing the raw signals up multi‐target dataset.
17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1122
- KURTOĞLU ET AL.

Next, we present the performance analysis of the proposed � Step 2: Multiple single target RDCs are merged by adding
approach and the effect of several radar system parameters. In them up to create a combined RDC which contains the raw
Section 6, the classification performance for two additional return signals of multiple targets, and the μ‐D spectrogram
applications and their corresponding datasets are presented. of the merged RDC is generated.
� Step 3: The combined RDC is projected onto the target
angular subspace. Resulting projected RDCs are used to
5 | PERFORMANCE ANALYSIS OF THE generate new, projected μ‐D spectrograms for each
ANGULAR SUBSPACE PROJECTION‐ projection.
BASED SEPARATION METHOD � Step 4: Compare the similarity of the projected spectrograms
to the original, single target spectrograms with varying
5.1 | Similarity comparison metrics.

Projection of the raw signal onto the angular subspace spanned Figure 7 illustrates the aforementioned steps with the μ‐D
by B keeps the returned signal strength from the targets spectrograms for two and three target cases. In the first row of
located at the angular interval spanned by B, while it fades out the Figure 7a, single target spectrograms for walking away (at
the return signals of the targets located at other angles. In order 0°) and standing up (at +45°) activities and their merging re-
to quantitatively evaluate the quality of the projected RDCs, sults are presented. After the projection of the combined RDC
their similarity with the original, single target spectrograms are onto 0° and +45°, it can be seen that individual targets' signals
compared by applying the following steps: are recovered almost perfectly. Second row presents the results
for standing up and walking towards activities where both of
� Step 1: Raw data with single targets at different aspect an- them have mostly positive Doppler frequencies. It can be seen
gles are recorded, their RDCs are formed and correspond- that the projection method is still able to separate two targets
ing μ‐D spectrograms are generated. quite well, indicating that the performance of ASPS method is

FIGURE 7 Projection results for two and three target cases.


17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KURTOĞLU ET AL.
- 1123

agnostic to the signature of the Doppler frequency caused by more data samples are collected from −30°, −15°, 0°, +15°,
the direction of the target's radial motion. Figure 7b presents and +30° for the activity of walking away from radar. These
the projection results for a three‐target case performed at samples are then merged with a picking up an object activity
−45°, 0° and +45° angles. In the first row, while Spot's sample recorded at −45° one‐by‐one, resulting in varying ΔΘ
walking towards and walking away activities seem to be well between two targets where ΔΘ ∈ {15°, 30°, 45°, 60°, 75° and
separated than others, the projection result for picking up an 90°}. Table 3 presents the similarity results for μ‐D spectro-
object has some leftover signals of Spot's activities before and grams when RDCs of the targets at varying angles merged with
after the picking up an object's signature. This is due to the fact the RDC of the target located at −45°, and projected onto the
that the projection angle subspaces are not fully orthogonal target's original angle. It can be observed that as ΔΘ increases,
and still some weak projections from other angles are SSI and PSNR increase as well and MSE decreases, meaning
observed. that the projected spectrogram resembles more to the original
In order to quantitatively assess the similarity of the pro- single target spectrogram. Figure 8 shows the geometry of the
jected spectrograms to the original (i.e., single target) ones, multi‐target scenario and resulting projected spectrograms
three different similarity metrics are considered, namely, belonging to Table 3. It can be seen that the projection method
structural similarity index (SSI), pixelwise mean‐squared error has hard time to separate targets when ΔΘ = 15° where both
(MSE) and peak signal‐to‐noise ratio (PSNR). Table 2, presents activities are well visible in the resulting spectrogram. When
the averaged similarity results across all the samples of merged ΔΘ = 30°, although a complete isolation of two targets is not
multi‐target dataset for two and three target cases where X, Y, achieved yet, picking up an object activity is mostly suppressed
Z ∈ {�45°, 0°}. It can be observed that when the target angle
and the projection angle matches, resulting spectrograms have T A B L E 3 Similarity results when two activities (i.e., picking up an
higher SSI and PSNR and lower MSE than non‐matching case object and walking away) with angular difference of ΔΘ are merged, and
as expected. Although the presented results are the average of projected onto the targets' original angle.
all samples, having higher SSI and PSNR with lower MSE ΔΘ 15° 30° 45° 60° 75° 90°
when the target and the projection angle matches is consistent
SSI 0.67 0.72 0.77 0.83 0.85 0.85
for all individual samples.
MSE 1.75e3 0.63e3 0.55e3 0.45e3 0.39e3 0.3e3

PSNR 15.7 20.15 20.73 21.63 22.25 23.36


5.2 | Effect of angular difference of the
Abbreviation: PSNR, peak signal‐to‐noise ratio.
targets
So far, only angles of −45°, 0° and +45° are considered for the
similarity measures. In order to understand the effect of
angular difference, ΔΘ, of the targets on the projection results,
T A B L E 2 Mean similarity results for the μ‐D spectrograms where X,
Y, Z ∈ {� 45°, 0°}.

Num.
of Target Projection
targets Angle Angle SSI MSE PSNR
2 X X 0.59 1.82e3 16.93
Y 0.46 3.6e3 13.16
Y X 0.47 3.56e3 13.2
Y 0.55 2.17e3 16.46

3 X X 0.56 2.22e3 16.43


Y 0.45 3.68e3 13.14
Z 0.42 4e3 12.8
Y X 0.46 3.73e3 13.07
Y 0.57 2.06e3 16.52
Z 0.44 3.9e3 12.9
Z X 0.44 3.95e3 12.88
Y 0.44 3.79e3 12.97
Z 0.53 2.48e3 16.05
F I G U R E 8 Generated μ‐D spectrograms after projecting multi‐target
Abbreviation: PSNR, peak signal‐to‐noise ratio. (picking up an object and walking away) RDCs onto original target angles.
17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1124
- KURTOĞLU ET AL.

and walking away seems to be the dominant activity. When more to the original target signatures. Qualitative results for
ΔΘ = 45°, μ‐D signature of the picking up an object activity is this observation are presented in Figure 11. Figure 11a shows
barely noticeable and there exists only a small portion of the the μ‐D spectrograms belonging to 3 different activities at
leftover weak signatures. When ΔΘ = 60° or 75° a complete different angles and their merging result. Figure 11b shows the
separation can be observed with μ‐D spectrograms containing resulting μ‐D spectrograms for projections for the target an-
only one target's signatures. From these results, it can be gles with different number of antenna elements. While the
inferred that as ΔΘ between targets increases, a better sepa- separation is barely noticeable and poor for lower number of
ration is achieved and all similarity metrics perform better. antenna elements, it gets better as the number of MIMO
channels increases. Isolation of individual targets start to
become quite clear after 32 channels, and 72 and 86 channels
5.3 | Effect of number of antennas yield very close results. These qualitative results are found to be
in line with the quantitative results obtained in Figure 10.
As mentioned in Section 4.1, the created virtual array has 86
channels which forms a ULA in the azimuth direction. In any
kind of angle estimation method, the angular resolution is 6 | CLASSIFICATION WITH RDC‐ω
proportional to the number of channels in the antenna array. REPRESENTATION
However, it worsens as the aspect angle deviates from the
direct line‐of‐sight of the radar. A similar phenomenon can be In this section, utilisation of ASPS derived RDC‐ω represen-
observed in the ASPS method as well. In some cases, it can be tation for multi‐target activity classification is presented. First,
seen that the projected μ‐D spectrograms still contain signa- multi‐person HAR where there is sufficient angular separation
tures of some portion of the activities from other angles. This is considered. The RDC‐ω representation for each target is
due to the fact that the steering vector, a(θ), for an angle, θ, is input to a DNN trained with data from single‐target. Next, two
not fully orthogonal to the other angles. Figure 9 shows the cases where targets are in close proximity is considered: 1) the
correlation between the steering vector of angle 0° and the left and right hands during articulation of two‐handed ges-
steering vectors of other angles for different number of an- tures/sign language and 2) two‐people doing different activities
tenna elements. It can be stated that although there are some next to each other. In these cases, the proposed multi‐view
ups and downs, especially noticeable for 4‐elements case, the DNN, taking multiple spectrograms generated from the
correlation between steering vectors gets smaller as the number RDC‐ωs as inputs, is shown to boost the multi‐target classi-
of antenna elements increases which is in favour of the pro- fication performance.
jection algorithm as lower correlation between two angles will
lead to a better separation of the targets located at those angles.
Figure 10 presents the normalised SSI, MSE and PSNR 6.1 | Multi‐person activity recognition
values of the projected and the original spectrogram pairs for
different number of enabled antenna elements. It can be Human activity recognition and indoor monitoring are drawing
observed that SSI and PSNR increase as with the number of more attention as with the development of state‐of‐the‐art
antenna elements while MSE decreases, indicating that the end‐to‐end solutions utilising RF sensing. Human activity
projections made with a larger antenna aperture size resembles recognition dataset described in Section 4 is used to assess the
effectiveness of the method. First, a very basic case,

F I G U R E 9 Correlation between steering vector of angle 0° and other F I G U R E 1 0 Similarity between single target and the projected
angles for different number of array elements. spectrograms for different number of array elements.
17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KURTOĞLU ET AL.
- 1125

classification of the single activity dataset is considered. μ‐D should be close to 0, but softmax cannot provide such output
spectrograms of single targets are fed into a 4‐layer CNN while sigmoid can. In this application, Tmax is set to 3. The
followed by two fully connected layers and one more fully trained model achieved the testing accuracy of 98.7% for the
connected layer having number of nodes equal to the number single activity dataset. However, when the trained model is
of classes for classification. There exists a Reshape layer before tested on merged multi‐target and real multi‐target datasets
the final Dense layer, so that the output of the model will have where multiple targets present in the scene, accuracies drop
the shape of Tmax � Nclass where Tmax is the predefined down to 5.8% and 8.3%, respectively. Such performance drop
maximum number of detectable targets and Nclass is the is expected as the latent space of the multi‐target spectrograms
number of classes. This modification allows model to give may not resemble to any of the individual classes.
prediction for each target. Softmax activation function is often Application of the projection idea can be quite useful in this
employed in the last layer of a classification network for multi‐ scenario as it has capability to isolate μ‐D signatures of the in-
class classification problems. It enables normalisation of the dividual targets. As mentioned earlier, estimation of the target
output of a network to a probability distribution over predicted angles is a necessary step in this framework, and the ground
output classes, based on Luce's choice axiom. In this work, in truth label of the projected spectrogram will depend on the
the final classification layer, sigmoid activation function is ground truth class of the original target. If the original target
preferred over softmax which is more common for multi‐class angle and the estimated projection angle matches, they will share
classification problems. Such unconventionality is needed the same class label. In order to decide whether the estimated
because output values of the softmax should add up to 1, angle and the original angle matches, the angular grid needs to
however, when there are less number of targets than Tmax in be divided into angular bins. If the estimated angle and the
the scene, all the output values for non‐existent target nodes original are in the same angular bin, they are said to be matching
and will share the same class label, otherwise, the target will be
regarded as a false detection, and will not be classified since its
ground truth becomes vague. Figure 12 shows the angle esti-
mation accuracy for varying angular bin widths for 2‐ and 3‐
target cases of the real multi‐target dataset. Accuracy here is
defined as the ratio of the number of correct angle estimations
divided by the total number of estimated targets in the dataset. It
can be seen that the angle estimation accuracies for the 3‐target
case is higher than the 2‐target case, especially for lower angular
bin widths. One reason of this could be that the ground truth of
the 3‐target case spans a larger angular interval than the 2‐target
case, hence it is more likely for an estimated angle to fall into
detection interval of a ground truth angle.
After obtaining the ground truth labels for the projected
spectrograms, they are given as input to the model trained with
single activity dataset. Table 4 presents the prediction accu-
racies of the projected spectrograms for the real multi‐target
dataset for varying angular bin widths. It can be seen that the
accuracy reaches its maximum with 97.8% at 10° which is only
1% lower than the single target prediction task, although none of

FIGURE 11 Projection results for varying number of antenna FIGURE 12 Angle estimation accuracy for different angular tolerance
elements. values.
17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1126
- KURTOĞLU ET AL.

the real multi‐target dataset samples are used in the training samples for the words HELLO and MOUNTAIN. In the first row, it
stage. This shows the benefit of utilising the ASPS method in a can be seen that while projecting onto (0°, 30°) interval
multi‐target scenario by decomposing the multi‐target spectro- strengthens the power of first negative and second positive
grams into individual spectrograms and enabling flexibility to peaks, projecting onto (−30°, 0°) strengthens the power of first
treat them as single target spectrograms, and classify them in positive and the last negative peaks. A similar observation can
that way. Finally, Table 5 presents the testing accuracy results for be made for the word MOUNTAIN in the second row. Although
different number of virtual channels in the MIMO array. It can ASPS method, in this case, cannot completely separate μ‐D
be observed that the accuracy improves with increasing number signatures of left and right hands, the resulting spectrograms
of MIMO channels since more number of channels yields better can be used to enrich the feature space in the learning‐based
separation of μ‐D signatures and the projected spectrograms classification algorithms.
start to resemble more to the single target samples as discussed For this 10‐class ASL recognition problem, a multi‐branch
in Section 5.3. DNN model which takes left and right projected spectrograms
as inputs is proposed. The network has two input layers and 4
CNN blocks. Weights of two branches are shared across cor-
6.2 | Multi‐view deep neural network for responding layers to reduce the number of trainable parame-
multiple targets in close proximity ters, hence better regularising the model. Two branches are
then merged in a concatenation layer and softmax is employed
6.2.1 | American sign language recognition for classification. The proposed network model is presented in
Figure 14. The performance of this model is then compared
American Sign Language (ASL) recognition using radars has with the baseline model which is essentially the same network,
become an emerging research field, especially with the devel- but with a single branch that takes the original spectrograms as
opment of small package, commercially available RF sensors. inputs. Table 6 presents the classification results when the
American Sign Language signs are composed of a mixture of projection is employed and not employed. It can be seen that
various hand movement types (e.g. circular, straight, and back‐ while baseline model's mean accuracy is limited to 93.3%, the
and‐forth). While some signs are articulated with one hand, classification accuracy of the projected spectrograms with the
some are articulated using both hands. Separation of return proposed network can go up to 96.9%. The reason projection
signals from left and right hands can be quite useful in order to
retrieve the individual characteristics of each hand's motion.
However, rapid change in the spatial position of the hands and
two hands being very close to each other introduce challenging
scenarios and classical representations such as RA domain
cannot separate the right and left hand as two separate targets.
In order to demonstrate the performance of the proposed
ASPS approach, an ASL dataset with 10 different signs (YOU,
HELLO, WALK, DRINK, FRIEND, KNIFE, WELL, CAR, ENGINEER,
MOUNTAIN) are collected from six participants. Moreover,
considering not all the commercially available MIMO radars
have antenna apertures as large as 86 channels, TI's
AWR1642BOOST single‐chip radar with two Tx and four Rx
channels is employed as the RF sensor. A virtual array of eight
channels is formed using BPM scheme for MIMO processing.
American Sign Language samples are then projected onto left
and right angular intervals of (0° to �30°), (�30° to �60°) and F I G U R E 1 3 μ‐D spectrograms of the projected American Sign
(�60° to �90°). Figure 13 shows the original and projected Language (ASL) signs.

T A B L E 4 Classification accuracy of the projected spectrograms for


the real multi‐target dataset.

Angular Bin Width 5° 10° 15° 20° 30° 35° 45°

Accuracy (%) 96.9 97.8 93.8 92.2 92.3 88.9 79

T A B L E 5 Classification accuracy of the projected spectrograms for


varying number of multi‐input multi‐output (MIMO) channels.

Number of MIMO Channels 4 16 32 72 86

Accuracy (%) 46.8 82 93.1 94.6 94.6 F I G U R E 1 4 Proposed multi‐view convolutional neural network
(CNN) model where Wi denotes the shared weights at the ith layer.
17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KURTOĞLU ET AL.
- 1127

T A B L E 6 Classification accuracy (%) comparison results for the T A B L E 7 Classification accuracy (%) comparison results for closely
American Sign Language (ASL) recognition task for varying projection located targets for varying projection angle intervals.
angle intervals.
No projection Projection angle interval
No projection Projection angle interval
0 to �5 0 to �30 �30 to �60 �60 to �90
0 to �30 �30 to �60 �60 to �90
91.1 92.9 94.6 92.9 87.5
93.3 96.9 93.9 93.3

result since all the activities were performed in the direct line of
performs better for angular subspaces of [0, �30] might be sight of the radar, hence not much information is present in
because the articulating ASL signs mostly live in this angular the higher angular intervals. On the other hand, projection on
subspace. These results show that although the ASPS method (�0° to �30°) yields the highest performance with 94.6%
has some limitations for challenging scenarios such as sepa- which is more than 3% performance gain when compared to
ration of left and right hands signals, it can still be employed to the baseline method.
enrich the feature space in the model which yields a better
recognition performance.
7 | DISCUSSION AND CONCLUSIONS
6.2.2 | Two people performing activities side‐by‐ This paper proposes two techniques to address the challenge
side of multi‐target recognition: 1) an ASPS method to emphasise
at the raw signal‐level the data of targets located at different
Similar to the ASL case, when the targets are very close to each aspect angles with respect to the radar; and 2) a multi‐view
other or side by side, it is a challenging task to individually DNN, which takes as input spectrograms generated from the
isolate and extract each target's μ‐D signal as depicted in multiple RDC‐ω representations generated via ASPS. The
Figure 8. When the generated spectrograms from the ASPS approach was demonstrated for several use cases: multi‐per-
method have major μ‐D components from multiple targets, it son/robot motion recognition, HAR for closely spaced targets
is not plausible to feed the resulting projected spectrograms and sign language recognition, where ASPS is used to generate
into a DNN model trained with only single target samples as RDC‐ω representations for the left and right hands that are
their feature spaces are different from the single target samples. then used in a multi‐view network for classification. The sig-
Therefore, a similar approach can be followed as in the ASL nificance of the new RDC‐ω representation is that the angle‐
case, and the ASPS method can still be benefited in the cases depended boosting of target signatures is accomplished at
when there is a prior knowledge about the number of targets the level of the raw RDC, not after post‐processing in images
present in the scene. as is done with current approaches. The RDC‐ω representation
In order to demonstrate an alternative use of the ASPS can thus be also used to develop DNNs that directly operate
method for closely located targets, a separate HAR dataset is on the raw RDC‐ω data or other 2D/3D radar data repre-
acquired where two targets were performing the same or sentations than micro‐Doppler, such as RD, RA maps.
different activities in the direct line of sight of the radar, very The case studies presented in this study show that the
closely on the lateral axis and side‐by‐side for the stationary proposed method is able to generate individual RDCs for each
activities like sitting down, standing up and picking up an object. target so that the newly generated RF data representations
In total, 276 samples for 5 different activities are acquired from the projected RDCs can be treated as samples belonging
including walking towards and away from the radar. The ac- to a single target in a classification task. For a nine‐class activity
quired samples are projected onto different pairs of left and right recognition scenario, the projected multi‐target RF data sam-
angular intervals: (0° to �5°), (0° to �30°), (�30° to �60°) and ples were classified with 97.8% accuracy by a CNN model
(�60° to �90°). The projected left and right samples are then which is trained solely on single target samples, despite mul-
used to train the multi‐input CNN model presented in Figure 14 tiple targets being within the field of view. We also characterise
by only modifying the number of nodes in the softmax layer the effect of the number of MIMO channels on the perfor-
according to the number of classes. Performance of the multi‐ mance of the projection method in terms of different similarity
input CNN model with projected samples is compared with metrics and classification accuracy. In the case of close prox-
the single branch model with no projection. Single branch model imity between targets, we show that the multiple RDC‐ω
has the same hyperparameters and the identical architecture to representations can be used in a multi‐input DNN framework
the one branch of the multi‐branch CNN model. to boost classification performance.
Table 7 presents the testing results for the single branch In future work, we plan to further investigate the design of
baseline model (i.e., no projection) and the proposed multi‐ DNNs operating on the raw RDC‐ω representations to enable
branch CNN with the projected spectrograms. It can be seen real‐time recognition applications, such as RF‐enabled cyber‐
that all the projected angular intervals except (�60° to �90°) physical human systems for explicit and implicit control of
outperform the baseline method with no projection. Obtaining personal assistants and autonomous vehicle in‐cabin driver/
lower performance for the (�60° to �90°) case is an expected passenger monitoring sub‐systems. Future work will also
17518792, 2023, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12405 by EVIDENCE AID - BELGIUM, Wiley Online Library on [29/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1128
- KURTOĞLU ET AL.

include multi‐target activity classification under more chal- 10. Chen, V.: The Micro‐doppler Effect in Radar, 2nd ed. Artech House
lenging scenarios such as moving clutter. (2019)
11. Erol, B., Amin, M.G.: Radar data cube processing for human activity
recognition using multisubspace learning. IEEE Trans. Aero. Electron.
AUT HO R CO N TR I BU T I ON Syst. 55(6), 3617–3628 (2019). https://doi.org/10.1109/taes.2019.2910980
Emre Kurtoğlu: Methodology, Software, Formal analysis, 12. Ding, W., Guo, X., Wang, G.: Radar‐based human activity recognition
Investigation, Data curation, Writing – original draft, Writing – using hybrid neural network model with multidomain fusion. IEEE
review and editing, Visualisation. Sabyasachi Biswas: Software, Trans. Aero. Electron. Syst. 57(5), 2889–2898 (2021). https://doi.org/10.
Formal analysis, Investigation, Data curation, Writing – original 1109/taes.2021.3068436
13. Kurtoğlu, E., et al.: Asl trigger recognition in mixed activity/signing se-
draft, Writing – review and editing, Visualisation. Ali C. Gurbuz: quences for rf sensor‐based user interfaces. IEEE Transactions on
Conceptualisation, Methodology, Resources, Writing – review Human‐Machine Systems 52(4), 699–712 (2022). https://doi.org/10.
and editing, Supervision, Funding acquisition. Sevgi Zubeyde 1109/thms.2021.3131675
Gurbuz: Conceptualisation, Methodology, Resources, Writing – 14. Choi, J.W., Yim, D.H., Cho, S.H.: People counting based on an ir‐uwb
radar sensor. IEEE Sensor. J. 17(17), 5717–5727 (2017). https://doi.
review and editing, Supervision, Funding acquisition.
org/10.1109/jsen.2017.2723766
15. Choi, J.W., Quan, X., Cho, S.H.: Bi‐directional passing people counting
ACK NOW L ED GE ME N T S system based on ir‐uwb radar sensors. IEEE Internet Things J. 5(2),
This work was funded in part by the National Science Foun- 512–522 (2018). https://doi.org/10.1109/jiot.2017.2714181
dation Awards #1932547, #1931861 and #2047771. Human 16. Yang, X., et al.: Dense people counting using ir‐uwb radar with a hybrid
studies research was conducted under UA Institutional Review feature extraction method. Geosci. Rem. Sens. Lett. IEEE 16(1), 30–34
(2019). https://doi.org/10.1109/lgrs.2018.2869287
Board Protocol #18‐06‐1271. 17. Bao, R., Yang, Z.: Cnn‐based regional people counting algorithm
exploiting multi‐scale range‐time maps with an ir‐uwb radar. IEEE Sensor.
CO NF LI CT O F I NT E RE ST STA T EM E N T J. 21(12), 13704–13713 (2021). https://doi.org/10.1109/jsen.2021.30
We declare that we have no conflict of interest. 71941
18. Vishwakarma, S., Ram, S.S.: Classification of multiple targets based on
disaggregation of micro‐Doppler signatures. In: 2016 Asia‐Pacific Mi-
DATA AVA IL AB I LI T Y STA T E ME N T crowave Conference (APMC), pp. 1–4 (2016)
The data that support the findings of this study are available 19. Huang, X., et al.: Multi‐person recognition using separated micro‐Doppler
from the corresponding author upon reasonable request. signatures. IEEE Sensor. J. 20(12), 6605–6611 (2020). https://doi.org/10.
1109/jsen.2020.2977170
OR CID 20. Wang, Y., et al.: Micro‐Doppler separation of multi‐target based on aco in
midcourse. J. Eng. 2019(19), 5967–5970 (2019). https://ietresearch.
Emre Kurtoğlu https://orcid.org/0000-0002-3053-3196 onlinelibrary.wiley.com/doi/abs/10.1049/joe.2019.0412
21. Changyu, S., et al.: Multiple target tracking based separation of micro‐
REFE R ENC ES Doppler signals from coning target. In: 2014 IEEE Radar Conference,
pp. 0130–0133 (2014)
1. Gurbuz, S. (ed.) Deep Neural Network Design for Radar Applications.
22. Pegoraro, J., Meneghello, F., Rossi, M.: Multiperson continuous tracking
IET, London (2020)
and identification from mm‐wave micro‐Doppler signatures. IEEE
2. Fioranelli, F., Ritchie, M., Griffiths, H.: Classification of unarmed/armed
Trans. Geosci. Rem. Sens. 59(4), 2994–3009 (2021). https://doi.org/10.
personnel using the netrad multistatic radar for micro‐Doppler and
1109/tgrs.2020.3019915
singular value decomposition features. Geosci. Rem. Sens. Lett. IEEE
23. Kurtoğlu, E., et al.: Rf micro‐Doppler classification with multiple spec-
12(9), 1933–1937 (2015). https://doi.org/10.1109/lgrs.2015.2439393
trograms from angular subspace projections. In: 2022 IEEE Radar Con-
3. Huizing, A., et al.: Deep learning for classification of mini‐uavs using
ference (RadarConf22), pp. 1–6 (2022)
micro‐Doppler spectrograms in cognitive radar. IEEE Aero. Electron.
24. Lin, J.J., et al.: Design of an fmcw radar baseband signal processing
Syst. Mag. 34(11), 46–56 (2019). https://doi.org/10.1109/maes.2019.
system for automotive application. SpringerPlus 5, 1–16 (2016). https://
2933972
springerplus.springeropen.com/articles/10.1186/s40064‐015‐1583‐5
4. Hakobyan, G., Yang, B.: High‐performance automotive radar: a review of
25. Hayes, M.H.: Statistical Digital Signal Processing and Modeling. Wiley
signal processing algorithms and modulation schemes. IEEE Signal
India Pvt. Limited (2009). https://books.google.com.tr/books?id=z0G
Process. Mag. 36(5), 32–44 (2019). https://doi.org/10.1109/msp.2019.
qhOe9GNQC
2911722
26. Chen, V.C., et al.: Micro‐Doppler effect in radar: phenomenon, model,
5. Gurbuz, S.Z., Amin, M.G.: Radar‐based human‐motion recognition with
and simulation study. IEEE Trans. Aero. Electron. Syst. 42(1), 2–21
deep learning: promising applications for indoor monitoring. IEEE
(2006). https://doi.org/10.1109/taes.2006.1603402
Signal Process. Mag. 36(4), 16–28 (2019). https://doi.org/10.1109/msp.
27. Richards, M.A.: Fundamentals of Radar Signal Processing. McGraw‐Hill
2018.2890128
Education (India) Pvt Limited (2005). https://books.google.com/books?
6. Lien, J., et al.: Soli: ubiquitous gesture sensing with millimeter wave radar.
id=qizdSv8MEngC
ACM Trans. Graph. 35(4), 1–19 (2016). https://doi.org/10.1145/
2897824.2925953
7. Gurbuz, S.Z., et al.: Radar‐based methods and apparatus for communi-
cation and interpretation of sign languages. In: U.S. Patent No. How to cite this article: Kurtoğlu, E., et al.: Boosting
11,301,672 B2 (Invention Disclosure Feb. 16, 2018) (2022) multi‐target recognition performance with multi‐input
8. Islam, S.M.M., et al.: Contactless radar‐based sensors: recent advances in multi‐output radar‐based angular subspace projection
vital‐signs monitoring of multiple subjects. IEEE Microw. Mag. 23(7), and multi‐view deep neural network. IET Radar Sonar
47–60 (2022). https://doi.org/10.1109/mmm.2022.3140849
Navig. 17(7), 1115–1128 (2023). https://doi.org/10.
9. Cong, J., et al.: Fast target localization method for fmcw mimo radar via
vdsr neural network. Rem. Sens. 13(10), 1956 (2021). https://doi.org/10. 1049/rsn2.12405
3390/rs13101956

You might also like