You are on page 1of 19

Information Fusion 91 (2023) 694–712

Contents lists available at ScienceDirect

Information Fusion
journal homepage: www.elsevier.com/locate/inffus

A survey of identity recognition via data fusion and feature learning


Zhen Qin a,b ,∗, Pengbiao Zhao a,b , Tianming Zhuang a,b , Fuhu Deng a,b ,∗∗, Yi Ding a,b,c ,∗∗,
Dajiang Chen a,b ,∗∗
a Network and Data Security Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, China
b School of Information and Software Engineering, University of Electronic Science and Technology of China, China
c
Ningbo WebKing Technology Joint Stock Co.,Ltd, China

ARTICLE INFO ABSTRACT

Keywords: With the rapid development of the Mobile Internet and the Industrial Internet of Things, a variety of
Identity recognition applications put forward an urgent demand for user and device identity recognition. Digital identity with
Data fusion hidden characteristics is essential for both individual users and physical devices. With the assistance of
Multimodal
multimodalities as well as fusion strategies, identity recognition can be more reliable and robust. In this survey,
Mobile internet
we turn to investigate the concepts and limitations of unimodal identity recognition, the motivation, and
Industrial internet of things
Feature learning
advantages of multimodal identity recognition, and summarize the recognition technologies and applications
via feature level, match score level, decision level, and rank level data fusion strategies. Additionally, we also
discuss the security concerns and future research orientations of learning-based identity recognition, which
enables researchers to achieve a better understanding of the current status of this field and select future
research directions. This survey summarizes and expands the fusion processing technologies and methods for
multi-source and multimodality data, and provides theoretical support for their applications in complicated
scenarios. In addition, it enables researchers to achieve a better understanding of the current research status
of this field and select proper future research directions.

1. Introduction will record the keyboard input. Therefore, Rational use of combined
features for identity recognition to overcome the shortcomings of single
As the Mobile Internet and Industrial Internet of Things developed feature, so as to effectively improve the security and performance of the
rapidly, the scale of users in the social network becomes larger. At the system.
same time, the era of the interconnection of all things introduces a great The concept of modality has been gradually introduced to this re-
number of intelligent terminal devices. With the help of these devices, search field. A modality refers to the way in which something happens
people can freely access the network to complete relevant network or is experienced. To be more specific, such as hearing sounds, feeling
services, leading to mutual interaction and cooperation. Under this textures, viewing objects, and smelling scents, all these sensory actions
circumstance, the significance of user and device identity recognition can be regarded as channels for humans to feel and communicate
tends to be increasingly prominent, which has a wide range of practical with the outside world. The related research is often described as
needs and research values. multimodality if including such modes. In this paper, we summarize
However, merely relying on a single feature for the user or device modal into three aspects: human private characteristics [1–3] (e.g.,
identity recognition faces severe challenges, specifically, it is easy to fingerprint, iris), individual features collected by physical devices (e.g.,
be counterfeited and stolen. For example, identity recognition based personal signature, walking action), and physical fingerprints (e.g.,
on passwords or keys has been a mature and effective method. The
device hardware that commonly used in the Internet of Things) (see
security of passwords or keys completely depends on their confiden-
Fig. 1).
tiality. However, the security threats in the open environment are
However, in the practical application of single feature recognition,
becoming increasingly prominent. Users are vulnerable to surveillance
there exist the following four shortcomings. The first is the similarity
when entering passwords, or when using the keyboard to enter the
between classes. For the recognition system with a large amount of
password in case of computer poisoning, the trojan horse program

∗ Corresponding author at: Network and Data Security Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, China.
∗∗ Corresponding authors.
E-mail addresses: qinzhen@uestc.edu.cn (Z. Qin), pengbiaozhao@std.uestc.edu.cn (P. Zhao), tianming.zhuang@std.uestc.edu.cn (T. Zhuang),
fuhu.deng@uestc.edu.cn (F. Deng), yi.ding@uestc.edu.cn (Y. Ding), djchen@uestc.edu.cn (D. Chen).

https://doi.org/10.1016/j.inffus.2022.10.032
Received 12 December 2021; Received in revised form 30 October 2022; Accepted 31 October 2022
Available online 7 November 2022
1566-2535/© 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Z. Qin et al. Information Fusion 91 (2023) 694–712

processing and fusion method, further analyzes the recognition security


concerns, as well as explores research challenges and future directions.
This paper presents an overview of the current status of research on
multimodal identity recognition. The main work of this paper can be
summarized as:

• The modalities of physiological biometrics, behavioral biometrics,


and physical fingerprint for identity recognition are investigated,
while the advantages and disadvantages of unimodal identity
recognition are illustrated.
• The fusion strategies of combining and leveraging feature level,
match score level, decision level, and rank level semantic infor-
mation for multimodal identity recognition are summarized and
their application scenarios are also presented.
• The security concerns introduced by adversarial attacks towards
learning-based identity recognition are studied, and future re-
search orientations of multimodal identity recognition are dis-
cussed.

Sections 2 and 3 introduce unimodal identity recognition and its


limitations. Sections 4 to 6 detail the fusion strategies, multimodal
combination, and security analysis of multimodal identity recognition,
respectively. The challenges and future directions are discussed in
Section 7. Section 8 concludes this work.

2. An introduction of unimodal identity recognition


Fig. 1. Presentation of multiple modalities.
Unimodality means that the person or system relies on one or only
one source of identity feature for identification. There are various
features that can be selected as unimodal features. In this section, we
data, the features between different users may overlap, which will
will provide a detailed description of the commonly used unimodality,
seriously affect the recognition result. The second is the variability of
following the three classifications mentioned above.
characteristics. Due to the reasons of equipment or users, the same
characteristics of the same user might be mutable with times, which
2.1. Physiological biometrics
lacks robustness. The third comes to be sensor noise. Due to the defects
of sensors and other hardware equipment or environmental factors, the
Traditional identity recognition systems based on individual char-
collected data is likely to be disturbed by a large amount of noise,
acteristics, also known as biometric systems, often only use a single
which affects the recognition accuracy to a large extent. The last is the
biological feature for recognition. It can be said that biometric identi-
non-universality of features. Some users are unable to provide specific
fication is the oldest and most widely used identification method. The
features due to private reasons. For instance, people engaged in manual biometric system usually includes two processes: the registration phase
labor may not be able to provide clear fingerprint images, and the and the recognition phase.
deaf cannot provide audio information for voice recognition. In general, In the registration phase, the system accepts the user’s biometric
the improvement of the accuracy and reliability requirements has been features as input, then preprocesses them and finally extracts features
raised in identity recognition nowadays, unimodal identity recognition from them. The extracted features will be stored in the system database
system cannot meet present demands. The recognition stage accepts the biometrics of users and then
In order to move beyond such limitations, researchers proposed mainly does the feature extraction like the register stage. After finishing
an identity recognition scheme using multi modalities, also known as the feature extraction, the system will put this extracted feature to the
multimodal identity recognition. For example, during the fingerprint database to calculate the matching score and identification.
recognition procedure, it is difficult to extract effective information There is various type of biometrics feature that can be extracted
from unclear fingerprints. The multimodal identity recognition system and are proven useful and reliable. Brief comments about them can be
is capable of obtaining other types of biometric features to make up found in Table 1. In this work, we separate them as face-based modal
for the deficiency. Then, by fusing different kinds of features [4–8], recognition and fingerprint-based modal recognition.
the influence of external noise can also be effectively reduced to an
acceptable extent, while the recognition accuracy can be improved 2.1.1. Face-based unimodal recognition
concurrently. Furthermore, the multi-feature system can also improve Face biometrics has always been one of the crucial biometric fea-
the anti-counterfeiting ability of the system, so as to improve the tures, and its application has extensive areas such as security, educa-
security of the system. the use of multiple biometrics can increase the tion, entertainment, health and law enforcement. Face recognition can
difficulty for intruders to construct different features at the same time. be divided into two types: image-based and video-based.
Generally speaking, the identity recognition system based on multi- Image-based face recognition implements the identification from
modality shows the advantages of high reliability, strong robustness, a single image or frame. Appearance-based face recognition was first
and wide practicability. introduced by Murase and Nayar [13], which detects the face re-
The existing surveys related to multimodality fusion [9–12] mainly gion as a whole and managed to present it to a lower dimensional
focus on the commonly-used biometrical characteristics. As a supple- subspace. Model-based face recognition includes 2D methods and 3D
ment, this review focuses on research based on the multimodality data methods. The most famous 2D model is the Active Appearance Model
fusion to achieve identity recognition, discusses the feature represen- (AAM) [14–16], which is a statistical appearance method by con-
tation of physiological biometrics, behavioral biometrics, and physical trolling a set of shape and gray level parameters. AAM employs an
fingerprint in the field, and reviewed the complex scenario modal iterative matching algorithm by utilizing the relationship between the

695
Z. Qin et al. Information Fusion 91 (2023) 694–712

Table 1
Evaluation of physiological characteristics for unimodal identity recognition.
Type Advantage Disadvantage Application field
Face Convenient and efficient can recognize multiple Get influenced by many factors such as lighting Public management, security and
faces at the same time; no need to direct contact condition camera angle and low resolutions. anti-theft, e-commerce
with the device. High recognition accuracy. transactions, asset protection
Iris Non-contact, clean and avoids the infection of Difficult to miniature hardware devices; expensive Coal industry, access control
diseases; impossible to be copied and modified. and hard to promote on large scale; slow system, judicial security
High reliability recognition process
Ear Stable structure and will not be changed by the Easily affected by posture, occlusion, lighting and Forensic area
age, emotion and facial expressions; small size and other issues; low privacy and medium accuracy
constant color cost less storage capacity
Fingerprint Fast recognition speed, most convenient biometrics; Sensitive to the humidity and cleanliness of the Smartphone unlock, criminal
easy to promote and widely used; low rate of fingers; difficult to image for some people with investigation, household
misjudgement and rejection; easy to operate few or no fingerprints. registration management
Palm print Easily be placed in the correct position of the Uniqueness is not very strong, similarity of human Access control system, clerk
scanner; measurement is weakly influenced by the palms is relatively high attendance management
dirt and scars.

Table 2
Evaluation of behavioral characteristics for unimodal identity recognition.
Behavioral Advantage Disadvantage Application filed
biometrics
Voice Convenient and can be realized by only a simple Voice changes are easily affected by sickness; National social security system
microphone; relatively low cost, easy to easily imitated and stolen by impostors; noise
commercialize. sound around the recognition environment
Handwriting Non-invasive, easy to collect signature data, Easy to be imitated, and the handwriting is Business signature, financial area
relatively zero cost affected by the ink material and the user’s mood
at that time
Gait Low cost of acquisition device, farther collected Relatively low reliability, slow collection speed Video surveillance system
distance, gait is not easy to disguise, convenient to
use

image error and model parameter perturbations. 3D face recognition global features methods such as Binarized Statistical Image Features
uses Three-dimension facial characteristic and features such as surface (BSIF) [50] and Histogram [51]. Deep learning-based approaches use
normal [17–19]. Texture based face recognition uses local feature CNN based model citegalbally2013image,czajka2015pupil such as VGG
descriptors such as Histogram of Oriented Gradients (HOG) [20–22], model [52], ResNet model [53] and InceptionV3 model [54,55].
Local Binary Pattern (LBP) [23] and Scale-invariant Feature Transform The iris feature classification stage is the final stage of the iris
(SIFT) [24,25]. recognition system, classification aims to match the level of the given
Video-based face recognition refers to the use of the dynamic and sample and sample already stored in the system database. Tradition
movement of facial features to do identification and recognition. Set- approaches include k-NN method [56] and SVM method [57]. Deep
based methods [26] treated video or frames as a set of image samples learning-based approaches uses CNN based model [58,59], FCN based
such as frame aggregation [27–29], frame selection [30]. Sequenced- model [60,61], DBN model [62,63] and ResNet based model [64].
based methods [31] use the temporal information from a video. Spatio- Unlike other common biometrics like fingerprint and face, the ear
temporal methods [32,33] utilize both motion information and texture can also be used to do identification. The shape of people’s ears presents
information. its uniqueness just like human facial expressions. The structure of the
In addition to the comprehensive use of faces, there is also some ear includes Scapha, Helix, concha, Lobule, and so on. These plentiful
features can be used for recognition. The ear identification system
biometrics for single facial organs, such as the eyes and ears.
needs two main stages. Detection stage and Recognition stage.
The structure of the eye includes the sclera, pupil, cornea, and iris.
The ear identification system initially needs to locate and detect
Among them, the iris has extensive details that can be used for identifi-
the specific ear images from the original face images. 2D ear detection
cation. Iris recognition gained much attention in different areas such as
has various methods such as active contour model [65], AdaBoost
airplane security, industrial areas, and medical institutes. A traditional
algorithm [66–68], structural ear image [69–71] and Canny edge de-
iris recognition system includes those procedures: pre-processing, iris
tector [72]. Besides, 3D ear detection approach includes Histograms
segmentation, iris normalization, feature extraction, feature selection,
of Categorized Shapes (HCS) [73,74], Entropic Binary Particle Swarm
classification, and decision.
Optimization (EBPSO) [75] and other feature extractions [76,77].
The iris segmentation stage aims to reduce the residual parts and Ear recognition stages denote that the system utilizes the richness
maintain the iris part. Traditional approaches include Geodesic Active of features of the human ear scheme to do identification. Popular
Contour (GAC) methods [34–36], Super-pixel Segmentation (SPS) [37], 2D recognition includes sparse representation [78–81], Discrete Cosine
and Morphological [38]. Also, there are several deep learning-based Transforms (DCT) [82], Gabor features [83–86] and neural network
approaches such as U-Net [39,40], VGG [41], R-CNN [42–44] and other methods [87–89]. Also, there are several recognition approaches for 3D
CNN based model [45,46]. ear images such as Structure from Motion (SFM), Shape from Shading
The Iris feature extraction stage is used to obtain the specific (SFS) [90], Shape-based Interest Point Descriptor (SIP) [91] and Local
features from obtaining the iris region. Traditional approaches include Salient Shape Feature [92].
local feature extraction and global feature extraction. Local feature
extraction uses local features methods such as texture features, statis- 2.1.2. Fingerprint-based modal recognition
tical features, Scale-invariant Feature Transform (SIFT) descriptor [47] The fingerprint is undoubtedly one of the most famous biological
and local binary pattern(LBP) [48,49]. Global feature extraction uses characteristics, owing to its uniqueness and consistency. A fingerprint is

696
Z. Qin et al. Information Fusion 91 (2023) 694–712

Table 3
Summary of device characteristics used for unimodal identity recognition.
Radio frequency Methods Advantage Disadvantage
fingerprint
Transient-based RF Bayesian Step Change Detection High detection rate Complex computation
fingerprint
Bayesian Ramp Change Detection
Mean Change Point Detection
Phase Detection
Steady-based RF Passive Radiometric Device Identification Simplify the modulation scheme, resilient to Possible to be effected by the
fingerprint System mobility traveling nodes
Other approaches Power Spectrum Density Uniqueness to wifi devices, effective to only work well on IEEE 802.11a
amplitude transient detection signals

an impression left by the friction ridges of a human finger. Fingerprint statistical measurements for classification. It contains global features
representation schemes can be classified as below. and local features. Global features [128–132] describe the global trait
Global feature based representation represents the global pattern of the handwriting images. Local features [133] describe the key points
of the ridges in a fingerprint and includes global ridge-line frequency and details of the handwriting images.
orientation images and core points, etc. The structure features presents local structure and topology of char-
Local feature based representation [93–97] pays much attention to acteristics on writing such as loops, dots and edges. There are sev-
the local minutiae part of the fingerprint such as local ridge orientations eral structural features [133,134] such as fragments, graphemes and
and local ridge frequency. These distinctive representations outperform strokes.
the global counterparts. Model-based features extract by deep learning-based model from the
Transform-based representation [98–100] uses transform methods image data. The common model for feature extraction is CNN based
such as Digital Wavelet Transform features or Digital Cosine Transform model [135–138] and RNN based model [139,140].
to represent the fingerprint.
Palm print is the most frequently used body part in people’s daily 2.2.3. Gait-based identity recognition
lives, like the fingers, and has a wide variety of biometric features Gait-based recognition is an active behavioral biometric recognition
for identity recognition. The principal lines of palm prints include the that aims to identify people by his or her walking posture. Due to its
thenar region, interdigital region and hypothenar region. There are sev- ease of disguise and noncontact, gait recognition has been used in the
eral feature extractions such as line based approaches, which develop field of surveillance. There are two types of gait recognition. One is
edge detection methods to extract palm lines [101–107], subspace- vision-based and the other is sensor-based gait recognition.
based approaches, which involve principal component analysis (PCA), Vision-based gait recognition refers to the use of cameras to acquire
independent component analysis (ICA), linear discriminant analysis images of human walking actions. Vision-based gait feature extraction
(LDA) [108–113] respectively and statistical approaches including can be classified into Model-based methods and model-free methods.
global statistical [114] and local statistical approaches [115–117]. Model-based methods use body composition or motion model, such as
structural model [141] and motion model [142], to describe walking
2.2. Behavioral biometrics
pattern. Model-free methods extract the representation of the mo-
tion gait images without any model. The common representation are
In addition to the extensive use of human physiological charac-
appearance-based representation [143–145], distribution-based repre-
teristics, some human behavioral characteristics can also be used for
sentation [146] and transformation-based representation [147].
identification. In this subsection, we present three of the most studied
Sensor-based gait recognition refers to the use of sensor equip-
behavioral biometrics as shown in Table 2: voice, handwriting and gait
ment such as inertial sensors and wearable sensors. After collecting
identity recognition.
the sensor data of gait signals, normally it needs data extraction,
cycle segmentation and then conducts gait identification. Recently deep
2.2.1. Voice-based identity recognition
learning has gain extensive success in gait recognition such as CNN
Human’s voice is unique and can provide biometrics characteris-
model [148–150], LSTM model [151,152] and GAN model [153].
tics for systems to determine identity. The voice identification system
includes voice collection, voice preprocessing, feature extraction and
feature classification stages. 2.3. Physical fingerprint
With the development of machine learning and deep learning, more
and more ML/DL methods get involved in voice identification. Machine With the emerging rise of IoT, device identification has become
learning approaches such as Gaussian mixture model [118–120], deci- increasingly important and necessary. In this section, we briefly in-
sion Tree [121,122], support vector machine [123], k-nearest neigh- troduce the radio frequency (RF) fingerprint and some other device
bor [123] and Bayes method [124]. Deep learning approaches such as authentication solutions as shown in Table 3.
CNN [125], RNN [126] and autoencoder [127].
2.3.1. Radio frequency fingerprint
2.2.2. Handwriting-based identity recognition Physical devices can be identified by unique radio circuitry. Radio
Handwriting is a signature for human identification. Even the same fingerprint is used to identify physical wireless devices by extracting
person cannot write two identical handwritings and the style and shape specific structures in the electromagnetic waves emitted from the trans-
of people’s handwriting are influenced by some subjective factors like mitters. RF fingerprinting includes three main categories: transient-
people’s moods and object factors like the material of the signing pen. based RF fingerprinting, steady-based RF fingerprinting and other ap-
Therefore, it is necessary to identify and recognize handwriting. Hand- proaches.
writing identification has been adopted by commercial transactions and Transient-based radio frequency fingerprint identification technol-
some governments. ogy uses the switch from off to on status of the transmitter that occurs
There are three types of features: statistical features, structural fea- before the actual data transmission of the signal. These methods include
tures and model-based features. Statistical features are geometric and Bayesian Step Change Detection (BSCD) [154], Bayesian Ramp Change

697
Z. Qin et al. Information Fusion 91 (2023) 694–712

Fig. 3. Diagram of the multimodal identity recognition process.

may collect incomplete fingerprint data, if collector’s fingerprint is not


Fig. 2. Literature statistics of unimodal identity recognition.
cleaned before scanning, or if the finger is molting.
Not convenient: Unimodal identification requires users to fully
Detection (BRCD) [155], Mean Change Point Detection (MCPD) [155], display this modality in front of the device or sensor. However, this
Phase Detection (PD) [156] and Transient Envelope [157]. ideal situation is difficult to achieve in some realistic scenarios. For
The steady-state-based method focuses on the unique features ex- example, the outbreak of the covid-19 has forced people to wear masks
tracted from the modulation part of the signal. Some approaches were while traveling, but when performing face recognition, they must take
proposed such as Passive Radiometric Device Identification System off the masks. One potential alternative solution is iris recognition,
(PARADIS) [158], which used five specific signal features: frequency nevertheless, the accuracy of recognition will also have been affected
error, synchronized correlation, phase errors, I/Q origin offset, and negatively if the user happens to wear contact lenses.
magnitude. Not universal: Some users may be not able to provide specific
Some of the other methods usually utilize other attributes of the biometric features due to personal factors. For example, some disabled
signal and logical layer, such as measuring the power spectrum density people who have lost their tongue cannot provide the voice information
(PSD) [159] to identify wireless devices. for voice recognition. Besides, gait recognition and iris recognition do
not work for those people who have some physiological disease with
2.3.2. Other device recognition solutions legs and eyes.
There are some other solutions to authenticate the device by tak- Easy to forge: Due to the singularity of the modalities used for
ing full advantage of the physical trait. Chen et al. [160] proposed identification, it is likely that an imposter is able to spoof identification
a lightweight device authentication protocol speaker-to-microphone systems by imitating or stealing certain biometric characteristics of
(S2M) by using the frequency response of a speaker and a microphone others. For example, a face recognition system will be fooled by some
from two wireless IoT devices as the acoustic hardware fingerprint. Lin specific impostors who wear 3D mask.
et al. [161] introduced a device-free authentication system to validate
human identity using WIFI signals via only utilizing a laptop and a 3.2. Motivation for multimodality
Commodity Off-The-Shelf (COTS) router. Aneja et al. [162] found that
device fingerprints can be identified via Inter Arrival Time (IAT) for In addition to the modalities introduced in Section 2, some unique
packets and then propose four models to capture packet information signals possessed by the human body can also be collected by various
on the router based on the layers and methodology used. sensors equipped on smartphones and wearable devices. For exam-
According to statistics in Web of Science database, the number of ple, the information data collected by gravity sensors, gyroscopes,
papers published on different unimodal identification methods in the touchscreens and other devices in smartphones can reflect the unique
past five years is shown in Fig. 2. habits of cellphone users. Wearable devices can detect the user’s heart-
beat, gait and other behavioral characteristics. Next-generation sensors
3. Motivation: multimodal identity recognition design can even collect electroencephalography (EEG) and electromyogram
(EMG) for identification. In our survey, the identification accuracy
This section begins with a discussion of the limitations of unimodal of using these modalities alone is not high, but when they are used
recognition techniques, followed by a brief introduction to common for multimodal identity recognition tasks, they can effectively en-
frameworks and evaluation metrics for multimodal identity recogni- hance recognition and greatly reduce the constraints of the recognition
tion. environment.
Based on the above, the use of multimodality for identity recog-
3.1. Limitation of unimodal recognition nition is very necessary and urgent. It can solve the problems of
unimodal recognition mentioned above. First, when multimodal fea-
tures are used for recognition, the combination of them can weaken the
Although the current unimodal recognition technology has a certain
effect of feature overlap and improve the recognition accuracy. Second,
degree of accuracy and has been fully utilized in some fields, there are
some of modal feature tend to change dramatically in terms of the
still some drawbacks and limitations.
spatio-temporal variability of features. When a modality changes, the
Difficult to obtain intact data: Typically, unimodal data is ac- importance of that feature in the recognition system decreases, which
quired through the sensor, but when the sensor is not regularly main- can effectively counteract the degradation of performance. The third
tained or modality is not sufficiently intact, the data may be disturbed point is that when multiple modalities are introduced when observing
by noise during acquisition. For example, a fingerprint scanning device and describing the same object, some complementary information can

698
Z. Qin et al. Information Fusion 91 (2023) 694–712

be obtained to obtain more robust results in the prediction process. 3.4. Description of evaluation metrics
And in a scenario where a modality is not easily accessible or falls into
failure, the rest of the modal information can continue to provide the A brief overview of the commonly used evaluation metrics for
basic information for prediction and avoid system failure. Finally, the identity recognition is presented below.
multimodal identity recognition system has better defense performance
for various attacks. It is difficult for a single tampered modality to play True positive (TP): The system correctly predicted positive samples
a significant role in many modalities to influence the final decision, as positive.
and even during the identification process, the system can also locate True negative (TN): The system correctly predicted negative sam-
the modal position of the attack by analyzing the abnormal modal ples as negative.
characteristics.
False positive (FP): The system wrongly predicted negative sam-
3.3. Framework and technology ples as positive.
False negative (FN): The system wrongly predicted positive sam-
In recent years, a large number of multimodal models have
ples as negative.
emerged, which have been widely applied due to its robustness and
reliability when encountering modality changes. A typical diagram Recall: The recall for one class c is the proportion of predicted
of the multimodal identity recognition system is given in Fig. 3. positive examples in all the positive samples (True positive and false
Firstly, the information is collected from the modalities by various negative).
devices to get the initial data. After that, pre-processing operations
𝑇 𝑃𝑐
are performed on the data, which generally include abnormal data 𝑅𝑒𝑐𝑎𝑙𝑙𝑐 = (1)
cleaning, noise cleaning and other operations. Then the corresponding 𝑇 𝑃𝑐 + 𝐹 𝑁𝑐
feature extractors of different modalities are used, either by using
Accuracy (ACC) : The rate of the correctly accepted and correctly
manually designed algorithms or by using the most popular deep
rejected matches in the whole system.
neural networks to extract features. Finally, multiple modal features are
fused to form a multi-dimensional information feature database with 𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (2)
multiple users to provide subsequent classification support. When a 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
user needs to be identified and authenticated, the data is also collected
Precision (PRE): The percentage of true positive examples in the
from the specified modalities, and after the pre-processing operation,
sample predicted by the system to be positive.
the unique modal features obtained are compared with the feature
data in the database by the matching algorithm to finally obtain the 𝑇𝑃
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (3)
identification results. It is worth noting that, the repeated steps here 𝑇𝑃 + 𝐹𝑃
are used to display different modes, such as mode A to mode Z. These
Unweighted average recall (UAR) : In order to reduce the
three processing steps are usually required to obtain data features
weighted deviation, the accuracy of the balance is called the un-
for training and recognition. However, although there are the same
data processing steps, each modality will generally have a variety of weighted average recall rate, which is the average of the positive recall
different algorithms for each step. rate and the negative recall rate.
The repeated steps here are used to display different modes, such as ∑|𝐶|
𝑟𝑒𝑐𝑎𝑙𝑙𝑐
mode A to mode Z. These three processing steps are usually required 𝑈 𝐴𝑅 = 𝑐=1 (4)
|𝐶|
to obtain data features for training and recognition. However, although
there are the same data processing steps, each modality will generally False acceptance rate (FAR) : The rate of the system wrongly
have a variety of different algorithms for each step. accept the false subjects as correct subjects, which measures the per-
The process from the acquisition of information to the final recogni-
centage of wrongly accepted matches.
tion result is essentially the processing of feature information, in which
the process generally contains these important steps: (1) representation 𝑁𝐹 𝐴
𝐹 𝐴𝑅 = × 100% (5)
of information: representing and summarizing multimodal data in a 𝑁𝐼𝑅𝐴
way that exploits the complementarity and redundancy of multiple where NFA refers to the number of times the wrong object is judged as
modalities; (2) fusion: combining information from multiple modalities the correct object, and NIRA is the number of times the wrong object
for classification or prediction. is judged.
Multimodality is a representation of data using information from
False reject rate (FRR) : The ratio of the system wrongly reject
multiple such entities, often with multiple representations, which can
the correct subjects as false subjects, which measures the percentage of
be an image, a piece of audio, a piece of text or other forms. There
false rejected matches.
are many difficulties: how to combine heterogeneous data; how to deal
with different types of noise; how to deal with missing or anoma- 𝑁𝐹 𝑅
𝐹 𝑅𝑅 = × 100% (6)
lous data, and which are more efficient ways to represent modal 𝑁𝐺𝑅𝐴
features. Currently, most images are represented using features learned where NFR refers to the number of times the correct object is judged as
from trained convolutional neural networks or other network architec- the wrong object, and NGRA is the number of times the wrong object
tures [163]. In the audio domain, many artificially designed acoustic is judged.
features have also been replaced by deep neural networks driven
by speech data [164] and recurrent neural networks for language Equal Error Rate (EER) : Equal Error Rate (EER) is the point on the
analysis [165]. ROC curve that corresponds to the same probability of misclassifying
Multimodal fusion is the combination of information from multiple a positive sample or a negative sample. This point is obtained by
modalities that is used to obtain a classified category or predicted intersecting the ROC curve with the diagonal of the unit square.
outcome, with techniques dating back to work done 32 years ago [166].
Currently, the use of neural networks is a very popular approach to 4. Fusion methods for multimodal identity recognition
deal with multimodal fusion. However, in recent deep neural network
models, the boundary between the representation of modal features and Multimodal fusion is one of the key research points. It can process
fusion has gradually blurred. Therefore, in the subsequent discussion and fuse various types of data information obtained from different
section of this paper, we will focus on the introduction and discussion fusion methods and integrate it into a stable multimodal representa-
of multimodal fusion techniques. tion. In this process, the information from two or more modalities is

699
Z. Qin et al. Information Fusion 91 (2023) 694–712

Fig. 4. Illustration of four fusion levels for multimodal identity recognition.

Table 4
Related work of data fusion in feature level for identity recognition.
Reference Year Modalities Methods Dataset Metrics
A feature selection algorithm to
[167] 2012 Audio reduce acoustic information and FAU Aibo Corpus UAR:45.08%
one classifier used to analyze
Feature extraction with Linear
Face and
[168] 2013 Discriminant Analysis and Feature – ACC:97.50%
signature
selection using GA
FVC_2000 DB1A EER:6.49%
[169] 2018 Fingerprint Multi-level feature fusion
SDUMLA_FP EER:7.29%
PEC_FP EER:4.23%
The cepstral coefficients and
[170] 2019 Voice and face statistical coefficients for voice and – EER:2.81%
Eigenface and SVM for face
PRE:99.96%
[171] 2019 Echo and face Deep Neural Networks –
Recall:88.84%
The nearest neighbor algorithm CASIA V1.0 (iris) ACC:98.33%
[172] 2021 Iris and fingerprint and speedup robust feature for and CASIA V5.0 FAR:0.74%
feature extraction (fingerprint) FRR 0.93%

integrated and it can be used to complete the classification or regression iris and fingerprint are fused. In the process of feature extraction, the
task. nearest neighbor algorithm and speedup robust feature (SURF) is used
For different fusion levels of multimodal identity recognition, the on both fingerprint and Iris data. Then, in the feature selection process,
current fusion methods can be divided into four parts: feature level the extracted features are both optimized by the GA algorithm. Then
fusion, match score level fusion, decision level fusion, and rank level the iris and fingerprint data are trained by the ANN algorithm.
fusion, which will be discussed in detail in this section. Jaswal et al. [173] first combined backtracking search algorithm
and 2D2LDA to fuse fingerprint and palmprint information at feature
4.1. Feature level fusion level. In finger dorsal patterns identification, Attia et al. [174] combine
the information extracted from the finger dorsal surface image with the
The process of combining features vectors obtained through differ- major and minor knuckle pattern regions in feature level.
ent sensors, different biometrics to obtain a new feature vector and fed In Bokade’s research work [175], three modalities of face, ear, and
it into a machine learning classifier for further classification, is called palmprint were fused to form a feature vector with reduced dimension.
feature level fusion. The process of feature level fusion is shown in The dimension reduction and feature extraction were achieved by prin-
feature level fusion in Fig. 4. cipal component analysis and extracted feature vectors were further
Feature level fusion contains more biological information in theory, concatenated to form a combined feature. The combined feature was
but because the dimension and size of feature vectors of different modes further used for the training and inference process. The GAR has been
may not be compatible, low compatibility may result in poor inference improved compared with unimodal approaches.
performance, so normalization is needed before the fusion process. More applications of feature level fusion is shown in Table 4,
Poonguzhali et al. [169] proposed a two-level feature extraction and where the vacant part in the column of the dataset indicates that the
fusion method to extract and fuse fingerprint features at two levels corresponding paper uses self-collected non-public datasets.
of features. The level 1 feature, level 2 feature, and fused two-level In summary, the advantages of feature-level fusion are that it
features are evaluated at three different experiment stages. Then the achieves considerable information compression, facilitates real-time
three kinds of features were further fused to form a fourth kind of fused processing. Because the extracted features are directly related to the
feature. The result shows the combined feature vectors are more ade- decision analysis, the fusion results maximize the feature informa-
quate in the inference process in terms of EER and DI metrics. Kumar tion required for decision analysis. Feature-level fusion generally uses
et al. [172] proposed a feature level method, in which biometrics of distributed or centralized fusion systems.

700
Z. Qin et al. Information Fusion 91 (2023) 694–712

Table 5
Related work of data fusion in match score level for identity recognition.
Reference Year Modalities Methods Dataset Metrics
Image and Weighted-summation operation for
[176] 2008 – EER:2.13%
voice matching scores
Different SIFT and ORB CASIA Multispectral
[177] 2014 features of features extraction Palmprint Image EER:0.36%
palm veins based Database V1.0
NIST BSSR1 EER:0.75%
Face+fingerprint, Particle Swarm Optimization and
[178] 2015
face+speech Genetic Algorithm XM2VTS EER:1.85%
BANCA EER:10.4%
FAR:1.5%
K-means clustering, CASIA-IrisV4
FRR:3.89%
[179] 2020 Iris and fingerprint Decision Tree
and Fuzzy Logic FAR:2.5%
CASIA-FingerprintV5
FRR:5%

Table 6
Related work of data fusion in decision level for identity recognition.
Reference Year Modalities Methods Dataset Metrics
Acoustic and
Naive-Bayes, Support Vector
[167] 2012 linguistic FAU Aibo Corpus UAR: 41.74%
Machine and Logistic Model Tree
features
Fingerprints
Multi-finger feature encrypted by
[180] 2015 of different – FAR:0.59%
hash function
fingers
Face and Joint Encryption and Compression
[181] 2016 FEI and NIST ACC:92%
fingerprint technique for biometric data
Face and Wavelet sub-bands, Nearest palmprint database
[182] 2020 ACC: 98.45%
Palmprint Neighbor Classifier ORL+Yale, IIT-Delhi

4.2. Match score level fusion To sum up, the fusion of match scores can reduce the influence of
various biometric data, complex recognition processes, and underlying
Match score level fusion is also called confidence level fusion. It is diversity. At the same time, it can preserve the measurement of indi-
different from feature layer fusion and is carried out after matching. vidual feature similarity, which can be used in the recognition system
The match module in the identity recognition process will deliver a of legal and illegal users.
match score after the match process, as shown in match score level
fusion in Fig. 4. This score can reflect the similarity between the
inputted feature vector and the existing feature vector in the database. 4.3. Decision level fusion
By fusing the match scores of different modalities, the input object can
be recognized. The fusion process can be completed by a variety of
In the decision level fusion, different biometrics are classified in-
statistical learning techniques.
dependently to obtain the classification decision (accept/reject). These
Normalization operation is often needed before matching score level
decisions from different modalities are fused to make the final decision,
fusion to avoid incompatibility caused by different measurements of
as shown in decision level fusion in Fig. 4. The fusing process can
match scores of different modalities. Through a series of Normalization
methods, match scores from different modalities can be converted to be implemented by a variety of methods, such as AND rule, OR rule,
the same arithmetic range. Majority voting, decision table, Bayesian decision, etc [189].
Dalila et al. [178] proposed to use a hybrid GA-PSO approach in Three kinds of decision level fusion methods were proposed [182],
the process of match score level fusion. By the weighted sum rule, naming Local Decision Fusion method (LDF), Global Decision Fusion
the hybrid GA-PSO optimized the weights which are associated with method (GDF), and Local-Global Decision Fusion method (LGDF),
the modalities in the fusion process to obtain the ideal ERR values. which is performed by utilizing both local and global information.
The result demonstrated that the hybrid approach outperforms the The information was extracted by utilizing wavelet sub-bands of high
traditional approaches and both GA and PSO approaches in terms of and low frequency. Then the nearest neighbor classifier was used to
ERR in three different datasets. classify the different frequencies of sub-bands. The resulting classes are
Aizi et al. [179] proposed a match score level fusion strategy to combined by means of weighted majority voting. The LDF and GDF
fuse the iris and the fingerprint. These two modalities were processed methods realized 9.4%, 11%, 10.6%, and 11.5% of average recognition
separately to form score vectors. Then the fusion is applied at the score rates increase than unimodal. The LGDF method outperforms the
level. For each modality, the score ranges are split into three zones of feature-score hybrid fusion, and a higher best average recognition rate
interest corresponding to the proposed identification method. Then two
of 6.75% is achieved.
kinds of methods were used in the region of interest. The first is based
The applications of decision level fusion are shown in Table 6.
on the decision tree and the weighted sum (BCC), and the second is
based on the fuzzy logic method(BFL). Because of the high flexibility of decision level fusion in information
Sandhya et al. [183] proposed a match score level fusion method processing, the system has low requirements on data transmission band-
for protecting fingerprint and palmprint templates. A binary vector is width, which can effectively reflect the different types of information
generated for each biometric and the match score of each biometric is from each side of the environment and target and process asynchronous
calculated. Then score-level fusion is performed with T-operators. information. However, the decision level fusion ignores the low-level
The applications of match score level fusion are shown in Table 5. interactions between multiple modalities.

701
Z. Qin et al. Information Fusion 91 (2023) 694–712

Table 7
Related work of data fusion in rank level for identity recognition.
Reference Year Modalities Methods Dataset Metrics
Borda Count, Weighted Borda
NIST BSSR1, IITD
Multiple Count, Highest and product of
[184] 2010 touchless palmprint RR:99.41%
palmprints ranks, Bucklin Majority Voting, and
database
a new nonlinear approach
FAR:0.05%
[185] 2011 Palmprints Logistic Regression PolyU database
EER:0.01%
Borda Count and Logistic Kinect gait and face
[186] 2017 Gait, face ACC:96.67%
Regression datasets
Highest Rank, Borda Count,
Different Sfax-Miracl hand
[187] 2017 Logistic Regression/Weighted RR: 99.04%
fingers database
Borda Count
Social Weighted Borda Count, A social interaction
[188] 2020 behavioral Highest database of 241 ACC:99.45%
traits Rank method Twitter users

4.4. Rank level fusion due to the high requirements for data preprocessing and feature ex-
traction, the cost of decision-level integration is relatively high, which
Rank level fusion is another specific type of decision level fu- is also an existing shortcoming, which needs to be further studied and
sion [12]. Rank level fusion is the process of reordering the rank list solved.
from different unimodal match modules, thus making a final decision, Hybrid fusion tries to adopt the advantages of the two kinds of
as shown in rank level fusion in Fig. 4. The reorder process can be fusion above in a common framework. And it has already been used in
achieved by several methods, such as Borda Count, Weighted Borda the fields of multimodal speaker identification and multimedia event
Count, Bucklin Majority Voting, etc. Since the rank list obtained from detection.
unimodal match modules is the rank of possible objects registered in
the database, the compatibility problem of different modalities can be 5. Application: Multimodal combination
avoided.
Rahman et al. [186] proposed a face extraction method based on As mentioned above, multimodal identification technologies have
the HOG feature. Then K-nearest neighbors algorithm is used to classify a very important role in several domain scenarios. According to the
the samples by both face and gait features. Then the rank level fusion modal features used in the real environment, identification technolo-
is performed by Borda count and logistic regression approaches. The gies can be classified into the following two basic types: biometric
two kinds of approaches achieved 93.99% and 96.67% of accuracy. The identification based on biometric features and device identification
applications of rank level fusion are shown in Table 7. based on hardware physical features. In recent years, based on the
Sing et al. [190] proposed a rank level fusion strategy for creating research and application of biometric features, especially multimodal
fuzzy ranks with Gaussian function based on the confidence of clas- biometric features, a variety of multimodal biometric systems with
sifiers. This method can reflect associations between the outputs of different features and different fusion mechanisms have been proposed
different classifiers. The fuzzy ranks are then fused by the confidence and developed. At present, the research on biometric recognition is
factors of the classifier to produce the final result in the recognition mainly focused on the recognition of people. Biometric features can
process. be divided into two categories: one is physiological features, such as
In summary, as Renu et al. suggested [191], rank level fusion is an facial features, fingerprints, and palmprint features, and the other is
adequate option in a multimodal identification system, as problems that behavioral features, such as voice and gait features.
come from normalization and incompatibility are not as prominent as In principle, it is natural that the more features used in recognition,
that in score level fusion, and can fuse more information than decision the deeper the portrayal of the entity and the higher the recognition
level fusion. accuracy. However, from the perspective of practical application, due
As mentioned in the article [192,193], the fusion performed before to various environmental factors, technical factors, economic factors,
matching, such as sensor layer fusion and feature layer fusion, can be etc., the latest research has mainly focused on the fusion of the main
referred to as prior to matching fusion. The fusion after matching, such features rather than adding all features to the recognition system. In
as matching score level fusion, decision level fusion, and rank level this section, we will sort out the above situations separately, describe
fusion, can be called after matching fusion. the latest research progress in recognition scenarios, and summarize the
It is worth mentioning that the fusion methods are also divided into modal combinations commonly used in multimodal identity recognition
early fusion (feature-based), late fusion (decision-based), and hybrid research.
fusion [194].
In early fusion, the low-level information is considered. Features 5.1. Identity recognition based on physiological characteristics
of different biometrics can be effectively extracted and integrated to
boost the performance and achieve a better effect. Although different In many studies, the most commonly used features can be divided
modalities are usually highly correlated, this correlation is difficult to into facial features and hand features.
extract at the feature level. Hinton et al. [195] believe that the corre- Yuan et al. [197] applied full space linear discriminant analysis
lation between the information contained by different data streams is (FSLDA) to the recognition of ear images, face images, and combined
certain at higher levels. Martínez et al. [196] proposed that early fusion ear and face images. A multi-domain expert information fusion scheme
of multimodal data does not adequately demonstrate complementarity based on the rank fusion integration method was proposed [198],
between modalities, but may lead to input from redundant vectors. which mainly integrates the results of different biometric matches by
In late fusion, fusions at a high level are performed. Usually, values using a new rank fusion method based on the characteristics of the face,
obtained by different unimodal are fused to get the final result under ear, and signature. S. Ribaric et al. [199] presented a bimodal biometric
a certain fusion method. As a high-level fusion, the results are of high verification system for physical access control based on the features of
flexibility, strong anti-interference, and good fault tolerance. However, the palmprint and the face.

702
Z. Qin et al. Information Fusion 91 (2023) 694–712

Table 8
Demonstration of sensor-based multimodal fusion methods for identity recognition.
Reference Modalities used Remarks (Pros/Cons)
Vocal cord vibration Limited perceived
[213]
and lip movement distance.
High security, not easy
[214] EEG and keystroke dynamics to be deceived, but small
sample data set.
Password,
The accuracy needs to
[215] keystroke behavior
be improved.
and gesture sliding
Moving steps and Rich data set and accurate
[216]
heart rate recognition rate.
Heart rate, gait and Implicit verification
[217]
breathing without interaction.
Identification of the same
Fig. 5. Statistical display of different multimodal combinations based on human
[212] Lip language person using multiple
features.
languages.

In addition, the most unique feature of facial features is iris informa-


tion. In terms of physiology, this feature is unique, so there are many problem of dimension increase after the fusion of these features by
recognition methods based on this feature. Haripraath et al. [200] in- introducing modularity in attributes. A new multimodal short-term
troduced a multimodal biometric system of iris and palmprint based on memory (LSTM) architecture was described, which seamlessly unified
wavelet packet analysis. The visible texture of a person’s iris and palm- the visual and auditory modes from the beginning of each sequence
print is encoded into a compact two-dimensional wavelet packet coeffi- input [206]. The key idea is to extend the traditional LSTM by sharing
cient sequence to generate a ’feature vector code’ for feature matching. weights not only in time steps, but also in modes. Xin et al. [207]
Chiara Galdi used simultaneous interpreting of image differences from proposed an effective attention guided depth audio face fusion method
different sensors to solve the problem of sensor interoperability [201]. to detect active speakers. Abozaid et al. [170] proposed a multimodal
For hand characteristics, Li et al. [202] used the feature level fusion biometric recognition method based on face and speech recognition
method to fuse the hand biometrics including palmprint, hand shape fusion. Through feature fusion and score fusion, the speech and face
and finger joint print, and named them hand features . An efficient biometric systems are combined into a single multimodal biometric
matching algorithm based on phase correlation function (PCF) was system. A special body conduction sensor throat microphone (TM)
proposed [203]. The algorithm used palmprint and finger knuckle print was studied for joint speech vitality detection (VLD) and automatic
(FKP) to match. Attia et al. [204] use score level fusion method to speaker verification (ASV) [208]. The work in [209] combined the
extract distinctive features from fingerprint and palmprint information advantages of lip movement and speech, captures these two biometric
using PCANet depth learning method. Then multiclass SVM is utilized characteristics simultaneously through the built-in audio device on
to calculate the corresponding matching score. Finally, combine the the smartphone, and integrates them at the data level, solving the
shortcomings of the original system.
matching scores by different rules.
Robert et al. [210] combined facial features with dynamic features
The modality combinations used in the above article are shown
for identity recognition, which can effectively resist the environmental
in Fig. 5. The face is the most used modality (the video is basically
impact when collecting features. A fine tuning of multimodal correla-
taken for human faces), and it can be used for identity recognition
tion in speaker verification system was proposed, which uses sound and
together with almost all other modalities. It can be said that face is
face, but no face is needed after training, and only voice and audio are
one of the most important modalities in the task of identity recognition.
used as speaker verification system [211]. Nawaz et al. [212] studied
Secondly, palm prints, fingerprints, and signatures are also modalities
the relationship between faces and sounds in multiple languages spoken
that have been studied a lot, while ears and iris are often used to assist
by the same group. A security method was presented for verification of
in verification with other modalities. It can also be seen that the most
smart home speakers in the Internet of things using millimeter wave
researched combination is face and voice, including the combination of
(MMW) radar [213] (see Table 8).
audio and video, which is also one of the current research hotspots.
In addition, in practical application scenarios, there was often a lot
In addition to this, the face can be considered a complement to
of identity recognition research based on biometric characteristics ob-
the ear feature when it is combined with the ear together. Also, face
tained through mobile devices. For example, Dee et al. [218] used the
modality is often combined with hand features, signature, and voice sensor’s touch pressure, positioning, and timing functions to develop
features, and this combination of two features is a more common modal biometrics. Gesture keys based on gravity sensors and passwords were
combination at present, which is also related to the current practical used for verification [219]. A fuzzy authentication framework for gait
application scenario. The combination of two modalities can improve information based on a neural network was proposed [220], using a
the accuracy of recognition and increase security, while also taking into 3-D wearable acceleration sensor with orthogonal sensing direction to
account the convenience of use. For the combination of three modalities measure a specific body position. A characterized biometric systems
or even more modalities, it is more often applied in more stringent based on motion and position sensor data is proposed [221]. And
recognition scenarios, while having higher requirements for fusion and a verification scheme combining password, keystroke behavior, and
recognition algorithms. gesture sliding was also proposed [215].
Meanwhile, identity recognition is also an important part of the
5.2. Identity recognition based on sensor data research on wearable devices. Vhaduri et al. [216] proposed a hidden
identity verification mechanism that uses three coarse-grained bio-
In addition to physiological features, dynamic features can also be metric combinations: number of steps moved, heart rate and calorie
used for identification. burn, or metabolic amount. Similarly, an verification mechanism was
One common feature is video or throat movement data of the proposed in [217] based on audio signals of heart rate, gait and
speaker collected by special sensors. Rahul et al. [205] solved the breathing.

703
Z. Qin et al. Information Fusion 91 (2023) 694–712

Finally, the rapid development of sensor technology enables con- network cannot classify the input data correctly and get the wrong out-
sumer electronics (CE) research groups including manufacturers to put result, Lucy. This process is widely used in numerous multimodal
embed various practical sensors into handheld devices. In the next identity recognition frameworks. Here we divide the security of identity
generation CE equipment, in addition to traditional sensors such as recognition into two parts: adversarial attack and adversarial defense.
inertial measurement unit (IMU), camera, fingerprint, and proximity, By introducing some common attack methods and defense measures to
future sensors such as EEG and EMG are also included. A novel multi- analyze the reasons for the attacks and how to remedy them .
modal biometric system was proposed by combining EEG and keystroke
dynamics [214]. This method is not easy to be affected by false signals 6.1. Adversarial attack
and has high security. Behera et al. [222] proposed a new method for
air-based identity recognition by analyzing finger movement and brain In the introduction of identity recognition methods above, deep
activity using sensors in a new generation CE device. The proposed learning is widely used and the identity recognition realized by neural
system in [223] extended finger input verification from touch screens to networks has reached a very high accuracy, which is more robust
any physical surface of IoT devices (such as intelligent access systems to random noise than other machine learning methods [231], but
and IoT devices) based on a touch-sensing technology with vibration in terms of adversarial samples, the neural network cannot achieve
signals. It combines cryptography, behavioral and physiological char- satisfactory results or even achieve correct recognition. The adversarial
acteristics, and surface dependencies to provide a low-cost, tangible, sample [232] is obtained by adding specific noise on the basis of the
and enhanced security solution. original sample. The difference between the two samples is so small
Compared with the static features of the human body, the modalities that humans cannot distinguish the difference with the naked eye. But
that can reflect the human body features collected by sensors are often the adversarial sample will deceive the classifier of the neural network,
rich in diversity and versatility and have better fusion capability at the and use it as a sample input will get the wrong output result. The
fusion level, which can be combined with other multiple modalities. adversarial attack is to use adversarial samples to attack the network,
However, there are still problems such as insufficient differentiation and this kind of attack is difficult to prevent.
of modal data, difficulty in feature extraction, expensive hardware In the past, research on adversarial samples mainly focused on
acquisition equipment, and low universality. interference classification. The methods are mainly embodied as per-
turbing samples in advance, e.g. Nguyen et al. [230] used evolutionary
5.3. Device identification algorithms and gradient ascent to generate deceptive images. The
generated adversarial samples may not be recognized by humans at
The rapid development of the Internet of things involves all as- all, but Deep Neural Networks (DNN) can identify and classify them
pects and effectively promotes the intelligent development of various with high confidence. In this regard, Sabour et al. [233] proposed to
infrastructure fields. However, the frequent use of Internet of things slightly perturb the image in the middle layer of the DNN to generate
devices exposes more vulnerabilities. In the research of the security a new adversarial sample. There is no significant difference between
field of the Internet of things, it is necessary to study the reliable trusted the adversarial sample and the original sample, but the internal repre-
verification of devices, especially in today’s technical environment, it sentation of the adversarial sample is very similar to the interference
is necessary and important to carry out multi-level authentication for image that realizes the perturbation. Goodfellow et al. [234] introduced
highly vulnerable devices. the Fast Gradient Sign Method (FGSM) algorithm which uses network
Sun et al. [224] proposed a unified privacy protection device discov- gradients to generate adversarial samples. The required gradient infor-
ery and verification mechanism for heterogeneous D2D terminals based mation can be calculated by backpropagation of the model. By adding
on identity prefix encryption and ECDH technology. Sharma et al. [225] gradients to the input samples, the loss value is increased during model
proposed a multi-level security mechanism with password based, One training so as to realize the adversarial attack.
Time Password (OTP) based, and certificate-based. Gope et al. [226] Poudel et al. [235] introduced a black box adversarial attack frame-
proposed a lightweight, privacy-protected two-factor authentication work in which the attack model is regarded as a black box. The attacker
scheme for IoT devices based on passwords and Physically Uncloneable can input the public dataset into the model and obtain the output
Functions (PUFs). Zhang et al. [227] can be based on the channel status result, build an alternative model based on the input and output and
information of Wearable Proxy Devices (WPD) and IMD in WBANs, use the adversarial samples generated by the alternative model for
using the special characteristics of received signal strength (RSS) ratio, attacks. Through the above introduction, we know that adversarial
Distinguish legitimate users from attackers. A lightweight device veri- attacks pose a great threat to the application of deep learning, but in
fication protocol named Speaker–Microphone (S2M) is proposed [160] turn Madryin et al. [236] pointed out that deep learning is not mature
that utilizes the frequency response of the speaker and microphone of enough from the perspective of the robustness of neural networks and
two wireless IoT devices as an acoustic hardware fingerprint. a Projected Gradient Descent (PGD) attack method that uses local first-
order information is proposed to prove his view. Croce et al. [237]
6. Security analysis for learning-based identity recognition proposed to test the existing adversarial defense performance through
adversarial attacks. The article introduced two variants based on the
Most of the identification methods mentioned above use relevant above PGD attack, and then combined the proposed new attack with
knowledge in the field of machine learning. However, the above meth- the existing two attacks to form an attack set for testing the robustness
ods rarely take into account security issues while providing identifica- of the adversarial defense.
tion services. Research shows that they are vulnerable to adversarial Based on the fact that there was no effective method to calculate the
attacks [228–230], which will cause errors in the identity recognition slight perturbations in the adversarial sample at that time, Moosavi-
results. Dezfooli et al. [238] proposed the DeepFool algorithm as a tool to
As is shown in Fig. 6, it describes the process of how adversarial accurately calculate the adversarial perturbations, and thus evaluate
attacks work. Researchers obtain some relevant features of Mary, such the robustness of the DNN classifier to the perturbations. From the
as her facial information, audio signals, and text data. Those features article, we can see that appropriate fine-tuning based on DeepFool eval-
are used to be the input samples. However, the attacker adds some uation results can improve network performance. Since the DeepFool
random noise to the samples, which results in some new adversarial algorithm can only calculate against perturbations for a single image,
samples. Then the attacker uses these adversarial samples as input data on this basis Moosavi-Dezfooli et al. [239] proposed an algorithm
for the neural network. Due to the presence of noise, the identification for calculating general perturbations, through which perturbations can

704
Z. Qin et al. Information Fusion 91 (2023) 694–712

Fig. 6. Process of adversarial attack towards learning-based identity recognition. The network above can be regarded as any identification neural network. The yellow and blue
layers of the network are convolution layers used to extract features from input samples, the gray spots are full connection layers used to realize dimension reduction of intermediate
data and the green spots are classifiers which decide the result of output. (For interpretation of the references to color in this figure legend, the reader is referred to the web
version of this article.)

be generated for any image. The experiment proves that The gen- defense. Compared to the attack method, the realization of defense
eral perturbations can be well extended to other network models. Su technology appears to be more open. Defense technology based on
et al. [240] studied an adversarial attack method that can achieve various methods has been proposed to resist adversarial attacks. These
network interference by modifying only one pixel in extreme cases. methods have significantly enhanced the robustness of the network and
Therefore, a one-pixel adversarial perturbations generation method achieved good performance.
based on a differential evolution algorithm is proposed. The attack Papernot et al. [246] proposed a defense mechanism called defen-
only needs the classification probability label output by the model and sive distillation based on knowledge distillation, which can significantly
does not need internal information such as the network structure of the reduce the success rate of adversarial samples against neural network
model, so this method is also regarded as a semi-black box attack. attacks and increase the difficulty of generating adversarial samples.
As one of the adversarial attacks, a Decision-based attack only needs After the introduction of defensive distillation, the generation efficiency
top-1 labels returned by the target network to generate adversarial of adversarial samples under DNN conditions was reduced from 95% to
samples. Based on this, Li et al. [241] first came up with f-mixup 0.5%. In the previous studies on resisting adversarial samples, most of
method to produce adversarial samples in the frequency domain and the methods were used to improve the DNN model. In the previous
then proposed f-attack as a new decision-based attack which achieves section, Madry et al. [236] have introduced the PGD attack method,
SOAT(state of the art) performance. Based on the fact that there are a but unlike the predecessors, they use adversarial attacks for adversarial
few researches on robustness analyses of clustering algorithms against training from the perspective of robust optimization. Training neu-
adversarial noise, Emanuele et al. [242] designed a new black-box ral networks through PGD attacks can greatly improve the network’s
gradient-free attack to fool clustering algorithms. The noise generated performance and improve the network’s resistance to various types of
by a novel optimization algorithm is proved to be effective for super- Adversarial attacks.
vised algorithms as well. Chen et al. [243] were the first to propose Hendrycks et al. [247] adopted self-supervised learning to improve
AoA (Attack on Attention) which is a semantic feature shared by DNNs the uncertainty and robustness of the model. Through research, it is
(Deep Neural Networks). By converting the cross entropy loss into found that the self-supervised method has achieved significant improve-
the attention loss, AoA greatly improved its transferability and could ments in robustness, label damage, and common input damage. Juuti
be easily combined with other transferability-enhancement techniques. et al. [248] proposed the first general technology PRADA to detect
The dataset with 50 000 adversarial samples generated by AOA is model extraction attacks. By analyzing the distribution of users’ con-
named as DAmageNet, which can be considered as a benchmark for tinuous queries, it can identify deviations from the normal distribution
robustness testing and adversarial training. to detect extraction attacks. Experiments show that it can detect all the
In next subsection, we will introduce an adversarial defense method past model extracts attacks.
called defensive distillation. This method has excellent performance With open-set recognition becoming an important aspect of deep
against the current adversarial attacks, but Carlini et al. [244] be- learning, it has been widely used in the real world and proved to be
lieved that this does not mean that defensive distillation significantly vulnerable to adversarial samples. Therefore, it is essential to protect
improves the robustness of neural network, and proposed three new it from adversarial attacks. Shao et al. [249] proposed an OSDN-CAML
adversarial attack algorithms based on different distance metrics (zero (Open-Set Defense Network with Clean-Adversarial Mutual Learning)
norm, two norm, and infinite norm). Neither the original network to defend against adversarial attacks. Extensive experiments have been
nor the defensively distilled network can withstand the attacks of the done to prove the superiority of this method in solving the problem
three algorithms. Sharmin et al. [245] used single-step and multi- above.
step FGSM to attack Artificial Neural Networks (ANN) and Spiking Through the previous work, people have realized that adversarial
Neural Networks (SNN), conducted comprehensive analyses of the two training is an effective means to resist adversarial attacks, but almost
networks based on their performance and proposed a framework for all adversarial training methods in the past have focused on global
generating adversarial attacks based on SNN. Experimental results show defense. For this, Chen et al. [250] proposed the use of smooth ad-
that attacks generated from SNN using this framework are stronger than versarial training (SAT) to solve the problem of specific targets being
those generated from ANN. attacked in actual scenarios. It includes two strategies: smooth distilla-
tion and smooth cross-entropy loss function, which helps SAT achieve
6.2. Adversarial defense the most advanced defense capabilities in different confrontation at-
tacks. Yuan et al. [251] proposed a new method of network defense
In view of the fact that adversarial samples will greatly affect the using adversarial dual network learning of random non-linear image
recognition and classification results of deep learning models, adver- transformation. Random non-linear image transformation is used to in-
sarial attacks have become a major security risk for neural networks. terfere and destroy network noise, and the cleaning network is designed
This has also led more researchers to focus on the field of adversarial to restore the image content destroyed by the graphics transformation.

705
Z. Qin et al. Information Fusion 91 (2023) 694–712

Sheikholeslami et al. [252] introduced a randomization-based sampling complicated circumstances. Most of the lightweight model structures
mechanism for adversarial defense. This method samples the network are hand-crafted, while the performance of the model is completely
middle layer data in the test phase and uses variance minimization dependent on personal ability. We need an automatic method based on
to process sampling results to obtain the sampling probability. Test neural network architecture search to realize the design of a lightweight
samples with high classification uncertainty are regarded as adversarial model.
samples.
In addition, there is another research direction that faces similar 7.2. Future directions
security issues: digital watermarking is the embedding of a set of
information into another data entity, which sounds similar to the Multimodal fusion identity recognition ensures that user informa-
way of fighting attacks. In this regard, Quiring et al. [253] linked tion is safe through multimodal multiple guarantees. At the same time,
the offensive and defensive technology of digital watermarking with it can adapt to application changes in complicated scenes, as well as
the research on the adversarial offensive and adversarial defensive. letting the identification results more accurate. In the future, several
The article shows that the defensive strategy of digital watermark- directions can be further explored.
ing can effectively resist model extraction attacks. Similarly, related
technologies that strengthen machine learning in adversarial defense The convenience and stability of modal acquisition. Some biometrics
have achieved good results in defending against watermarking oracle such as voice, heart sound and ECG, are easily disturbed by environ-
attacks. Akhtar et al. [229] proposed a special framework for the ment, noise and other factors. External interference will cause the loss
general adversarial perturbations in [239], which is used to detect of information, further affecting the accuracy of the recognition result.
general perturbations in images and remove image perturbations to Based on this, flexible wearable devices can be designed to collect
achieve accurate network recognition. The framework uses a Support and improve the robustness of the feature acquisition system. Besides,
Vector Machine (SVM) to detect perturbations and restores the image physical devices are the main source of feature acquisition, so it is
through a Perturbation Rectifying Network (PRN). Among the various significant to choose. Selecting equipment with high stability, high
defense methods proposed above, adversarial training has proved to cost performance, low energy consumption and low noise interference
be the most effective method so far, but Wang et al. [254] found will help improve the accuracy of recognition and certification at
that some correct samples that were misclassified during the training the hardware and peripheral levels. Especially for some elders, these
process had a significant impact on the defense effect. Based on this, equipment must be convenient to use, quick to transport and provide
a method called Misclassification Aware adveRsarial Training (MART) an excellent user experience.
is proposed, which distinguishes misclassified and correctly classified
Different processing approaches for different modes. Different modali-
samples during the training process.
ties are presented in various ways, containing different semantic infor-
mation. In this paper, we present three various models. The approach to
7. Discussion
extracting semantic information more effectively from modals remains
a problem. Finding a general model for different modes or raising
7.1. Challenges
corresponding processing methods in terms of their characteristics.
These research directions need to be further explored.
While dealing with data from multiple information sources in var-
ious complicated scenes, it is likely to face the challenge of low- The selection of fusion strategies. In this paper, we describe four
quality data, including data incompleteness, data inconsistency, limited kinds of fusion methods, feature level fusion, match score level fusion,
computing resources, and so on. decision level fusion, and rank level fusion. However, there has no
Data Incompleteness. In some cases, the difficulty to collect samples unified standard for the selection of a fusion strategy. Whether there
of certain categories results in a small-scale dataset. The approaches to exist differences in fusion methods cor responding to biological charac-
resolving the problem include migration learning, data enhancement, a teristics and physical equipment characteristics is worth discussing. In
priori knowledge, and so on. For example, the idea of transfer learning addition to the existing several fusion strategies, other levels of fusion
can be adapted to train a neural network with larger datasets of methods that can better combine the semantic information and improve
similar fields, and then fine-tune the pre-training network with smaller the recognition results, or deal with low spatial resolution, reasonable
collected data sets. Data enhancement can also be applied to rotate, design of evoked scenarios, and small sample size are also required
translate or generate a generic adversarial network (GAN) to expand to be further explored. Adaptive fusion, soft biometrics and quality
the scale of the training datasets. information fusion, user-specific fusion, or online learning-based fusion
might be promising directions.
Data Inconsistency. While in the actual training process, the data
under some specific situations are difficult to collect, so the alternative Multimodal security and privacy. Multi-biometrics has contributed
is used to replace the ones. Thus, this method might lead to a mismatch significantly to enhance the identity recognition performance. At the
of the data during the training and testing phase. At the same time, same time, it is likely to invade and disclose privacy. The high and
due to limitations of objective factors, data inconsistency is also likely rising amount of multi-biometrics used in identity recognition tasks will
to occur in the collection procedure. To solve the problem, more data increase the risk and concerns. A related challenge is to balance secu-
close to the needs can be synthesized through data expansion. Or we rity and privacy in a reasonable way. Specifically, utilizing a privacy
can also unify all data to another distribution according to the actual security scheme may lead to a decline in identity recognition accuracy.
demands. Thus, targeted encryption algorithms and anti-attack methods can be
further investigated in this research field.
Limited Computational resources. Under complicated situations (such
as edge computing scenarios in industrial Internet and mobile Internet), Data feature space compression. Multi-dimensional features can be
computing resources may be limited. Therefore, it is quite necessary to extracted from multi-modal data. In the process of data fusion and
develop lightweight neural networks to meet the requirements. There matching, it is necessary to calculate the most representative features
exists four main approaches, to be specific, the artificial design of for classification and recognition, so as to obtain a group of precise
lightweight neural network model, automatic design of neural network feature combinations. This is aimed to achieve the effect that different
based on neural network architecture search, CNN model compression pattern points are far away and similar pattern points are close. There-
and automatic model compression based on AutoML. The lightweight fore, a cost-effective compression method in feature space during the
model is required to maintain normal inference performance under process of feature matching and fusion is worth to be explored.

706
Z. Qin et al. Information Fusion 91 (2023) 694–712

8. Conclusion [8] D. Ramachandram, G.W. Taylor, Deep multimodal learning: A survey on recent
advances and trends, IEEE Signal Process. Mag. 34 (6) (2017) 96–108.
As a key technique for identification in Mobile Internet and Indus- [9] M. Ghayoumi, A review of multimodal biometric systems: Fusion methods
and their applications, in: 2015 IEEE/ACIS 14th International Conference on
trial Internet of Things, learning-based multimodal identity recognition
Computer and Information Science, ICIS, IEEE, 2015, pp. 131–136.
can combine and leverage abundant modality features to achieve im- [10] I. Olade, H.-n. Liang, C. Fleming, A review of multimodal facial biomet-
proved identification accuracy, compared with unimodal manner. This ric authentication methods in mobile devices and their application in head
review has provided an introduction to the modalities used for identity mounted displays, in: 2018 IEEE SmartWorld, Ubiquitous Intelligence &
recognition as well as its limitations. Moreover, multimodal identity Computing, Advanced & Trusted Computing, Scalable Computing & Commu-
nications, Cloud & Big Data Computing, Internet of People and Smart City
recognition has been discussed, including a presentation and discussion
Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), IEEE, 2018,
of four fusion strategies for multimodal features and a summary of pp. 1997–2004.
multimodal combinations applied to different scenarios. Furthermore, [11] R. Ryu, S. Yeom, S.-H. Kim, D. Herbert, Continuous multimodal biometric
the security concerns of multimodal identity recognition have been authentication schemes: a systematic review, IEEE Access (2021).
analyzed, and so does the current challenges and potential solutions [12] S.K. Choudhary, A.K. Naik, Multimodal biometric authentication with secured
for multimodal identification. templates—A review, in: 2019 3rd International Conference on Trends in
Electronics and Informatics, ICOEI, IEEE, 2019, pp. 1062–1069.
Moreover, by exploiting complementary information between dif-
[13] H. Murase, S.K. Nayar, Visual learning and recognition of 3-D objects from
ferent modalities, it is capable of satisfying the requirements of identity appearance, Int. J. Comput. Vis. 14 (1) (1995) 5–24.
recognition applications in complex scenarios. [14] T.F. Cootes, G.J. Edwards, C.J. Taylor, Active appearance models, IEEE Trans.
Pattern Anal. Mach. Intell. 23 (6) (2001) 681–685.
CRediT authorship contribution statement [15] Y. Weiwei, Face recognition using constrained active appearance model, in:
2009 Third International Symposium on Intelligent Information Technology
Application Workshops, IEEE, 2009, pp. 348–351.
Zhen Qin: Conceptualization, Writing – review & editing, Super-
[16] M.A. Khan, C. Xydeas, H. Ahmed, Multi-model AAM framework for face image
vision, Funding acquisition. Pengbiao Zhao: Investigation, Writing – modeling, in: 2013 18th International Conference on Digital Signal Processing,
original draft, Writing – review & editing. Tianming Zhuang: Investi- DSP, IEEE, 2013, pp. 1–5.
gation, Writing – original draft, Visualization. Fuhu Deng: Methodol- [17] B. Gökberk, M.O. İrfanoğlu, L. Akarun, 3D shape-based face representation
ogy, Writing – review & editing. Yi Ding: Resources, Writing – review and feature extraction for face recognition, Image Vis. Comput. 24 (8) (2006)
& editing, Project administration. Dajiang Chen: Resources, Writing – 857–869.
[18] H. Song, U. Yang, S. Lee, K. Sohn, 3D face recognition based on facial shape
review & editing, Project administration.
indexes with dynamic programming, in: International Conference on Biometrics,
Springer, 2006, pp. 99–105.
Declaration of competing interest [19] B. Gokberk, H. Dutagaci, A. Ulas, L. Akarun, B. Sankur, Representation plurality
and fusion for 3-D face recognition, IEEE Trans. Syst. Man Cybern. B 38 (1)
The authors declare that they have no known competing finan- (2008) 155–173.
cial interests or personal relationships that could have appeared to [20] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection,
in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern
influence the work reported in this paper.
Recognition (CVPR’05), 1, Ieee, 2005, pp. 886–893.
[21] O. Déniz, G. Bueno, J. Salido, F. De la Torre, Face recognition using histograms
Data availability of oriented gradients, Pattern Recognit. Lett. 32 (12) (2011) 1598–1603.
[22] T.-T. Do, E. Kijak, Face recognition using co-occurrence histograms of oriented
No data was used for the research described in the article. gradients, in: 2012 IEEE International Conference on Acoustics, Speech and
Signal Processing, ICASSP, IEEE, 2012, pp. 1301–1304.
[23] T. Ojala, M. Pietikäinen, D. Harwood, A comparative study of texture measures
Acknowledgments
with classification based on featured distributions, Pattern Recognit. 29 (1)
(1996) 51–59.
This work was supported in part by the National Natural Science [24] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J.
Foundation of China (No. 62072074, No. 62076054, No. 62027827, Comput. Vis. 60 (2) (2004) 91–110.
No. 61902054, No. 62002047), the Frontier Science and Technology [25] D. Huang, C. Shan, M. Ardabilian, Y. Wang, L. Chen, Local binary patterns
and its application to facial image analysis: a survey, IEEE Trans. Syst. Man.
Innovation Projects of National Key R&D Program (No. 2019QY1405),
Cybern. C 41 (6) (2011) 765–781.
the Sichuan Science and Technology Innovation Platform and Talent
[26] J.R. Barr, K.W. Bowyer, P.J. Flynn, S. Biswas, Face recognition from video: A
Plan (No. 2020JDJQ0020, No. 2022JDJQ0039), the Sichuan Science review, Int. J. Pattern Recognit. Artif. Intell. 26 (05) (2012) 1266002.
and Technology Support Plan (No. 2020YFSY0010, No. 2022YFQ0045, [27] C. Ding, D. Tao, Trunk-branch ensemble convolutional neural networks for
No. 2022YFS0220, No. 2019YJ0636, No. 2021YFG0131), and the video-based face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 40 (4)
Medico-Engineering Cooperation Funds from University of Electronic (2017) 1002–1014.
[28] F. Mokhayeri, E. Granger, G.-A. Bilodeau, Domain-specific face synthesis for
Science and Technology of China (No. ZYGX2021YGLH212, No.
video face recognition from a single sample per person, IEEE Trans. Inf.
ZYGX2022YGRH012). Forensics Secur. 14 (3) (2018) 757–772.
[29] F. Mokhayeri, E. Granger, A paired sparse representation model for robust face
References recognition from a single sample, Pattern Recognit. 100 (2020) 107129.
[30] W. Wang, R. Wang, Z. Huang, S. Shan, X. Chen, Discriminant analysis on
[1] A.K. Jain, S.Z. Li, Handbook of Face Recognition, vol. 1, Springer, 2011. Riemannian manifold of Gaussian distributions for face recognition with image
[2] K.W. Bowyer, M.J. Burge, Handbook of Iris Recognition, Springer, 2016. sets, in: Proceedings of the IEEE Conference on Computer Vision and Pattern
[3] D. Maltoni, D. Maio, A.K. Jain, S. Prabhakar, Handbook of Fingerprint Recognition, 2015, pp. 2048–2057.
Recognition, Springer Science & Business Media, 2009. [31] R.E. Haamer, K. Kulkarni, N. Imanpour, M.A. Haque, E. Avots, M. Breisch, K.
[4] M. Singh, R. Singh, A. Ross, A comprehensive overview of biometric fusion, Nasrollahi, S. Escalera, C. Ozcinar, X. Baro, et al., Changes in facial expression
Inf. Fusion 52 (2019) 187–205. as biometric: a database and benchmarks of identification, in: 2018 13th IEEE
[5] G. Muhammad, F. Alshehri, F. Karray, A. El Saddik, M. Alsulaiman, T.H. International Conference on Automatic Face & Gesture Recognition (FG 2018),
Falk, A comprehensive survey on multimodal medical signals fusion for smart IEEE, 2018, pp. 621–628.
healthcare systems, Inf. Fusion 76 (2021) 355–375. [32] S.T. Kim, D.H. Kim, Y.M. Ro, Spatio-temporal representation for face authen-
[6] G. Slanzi, G. Pizarro, J.D. Velásquez, Biometric information fusion for web user tication by using multi-task learning with human attributes, in: 2016 IEEE
navigation and preferences analysis: An overview, Inf. Fusion 38 (2017) 12–21. International Conference on Image Processing, ICIP, IEEE, 2016, pp. 2996–3000.
[7] J. Li, Q. Wang, Multi-modal bioelectrical signal fusion analysis based on [33] F. Hajati, M. Tavakolian, S. Gheisari, Y. Gao, A.S. Mian, Dynamic texture
different acquisition devices and scene settings: Overview, challenges, and novel comparison using derivative sparse representation: Application to video-based
orientation, Inf. Fusion 79 (2022) 229–247. face recognition, IEEE Trans. Hum.-Mach. Syst. 47 (6) (2017) 970–982.

707
Z. Qin et al. Information Fusion 91 (2023) 694–712

[34] V. Caselles, R. Kimmel, G. Sapiro, Geodesic active contours, in: Proceedings of [61] F. Gaxiola, P. Melin, F. Valdez, J.R. Castro, Person recognition with modular
IEEE International Conference on Computer Vision, IEEE, 1995, pp. 694–699. deep neural network using the iris biometric measure, in: Fuzzy Logic Aug-
[35] V. Caselles, R. Kimmel, G. Sapiro, Geodesic active contours, Int. J. Comput. mentation of Neural and Optimization Algorithms: Theoretical Aspects and Real
Vis. 22 (1) (1997) 61–79. Applications, Springer, 2018, pp. 69–80.
[36] M. Mustafa, A.J. Mohamad Faizal, M.R. Mohd Shafry, A. Illiasaak, Implemen- [62] M. Baqar, A. Ghani, A. Aftab, S. Arbab, S. Yasin, Deep belief networks for iris
tation of geodesic active contour approach for pigment spots segmentation on recognition based on contour detection, in: 2016 International Conference on
the iris surface, J. Comput. Sci. 12 (11) (2016) 564–571, http://dx.doi.org/10. Open Source Systems & Technologies, ICOSST, IEEE, 2016, pp. 72–77.
3844/jcssp.2016.564.571. [63] F. He, Y. Han, H. Wang, J. Ji, Y. Liu, Z. Ma, Deep learning architecture for iris
[37] N. Susitha, R. Subban, Reliable pupil detection and iris segmentation algorithm recognition based on optimal gabor filters and deep belief network, J. Electron.
based on SPS, Cogn. Syst. Res. 57 (2019) 78–84. Imaging 26 (2) (2017) 023005.
[38] T.M. Khan, D.G. Bailey, M.A. Khan, Y. Kong, Real-time iris segmentation [64] Y. Lu, H. Pan, Application of iris images in racial classifications based on dilate
and its implementation on FPGA, J. Real-Time Image Process. 17 (5) (2020) convolution and residual network, IEEE Access 7 (2019) 182395–182405.
1089–1102. [65] R. Deepak, A.V. Nayak, K. Manikantan, Ear detection using active contour
[39] W. Zhang, X. Lu, Y. Gu, Y. Liu, X. Meng, J. Li, A robust iris segmentation model, in: 2016 International Conference on Emerging Trends in Engineering,
scheme based on improved U-net, IEEE Access 7 (2019) 85082–85089. Technology and Science, ICETETS, IEEE, 2016, pp. 1–7.
[40] X. Wu, L. Zhao, Study on iris segmentation algorithm based on dense U-net, [66] S.M. Islam, M. Bennamoun, R. Davies, Fast and fully automatic ear detection
IEEE Access 7 (2019) 123959–123968. using cascaded adaboost, in: 2008 IEEE Workshop on Applications of Computer
[41] S. Lian, Z. Luo, Z. Zhong, X. Lin, S. Su, S. Li, Attention guided U-net for accurate Vision, IEEE, 2008, pp. 1–6.
iris segmentation, J. Vis. Commun. Image Represent. 56 (2018) 296–304. [67] L. Yuan, F. Zhang, Ear detection based on improved adaboost algorithm, in:
[42] S.M. Patil, R.R. Jha, A. Nigam, IpSegNet: deep convolutional neural network 2009 International Conference on Machine Learning and Cybernetics, 4, IEEE,
based segmentation framework for iris and pupil, in: 2017 13th International 2009, pp. 2414–2417.
Conference on Signal-Image Technology & Internet-Based Systems, SITIS, IEEE, [68] A. Abaza, C. Hebert, M.A.F. Harrison, Fast learning ear detection for real-
2017, pp. 184–191. time surveillance, in: 2010 Fourth IEEE International Conference on Biometrics:
[43] M. Korobkin, G. Odinokikh, Y. Efimov, I. Solomatin, I. Matveev, Iris segmen- Theory, Applications and Systems, BTAS, IEEE, 2010, pp. 1–6.
tation in challenging conditions, Pattern Recognit. Image Anal. 28 (4) (2018) [69] S. Prakash, U. Jayaraman, P. Gupta, Connected component based technique for
652–657. automatic ear detection, in: 2009 16th IEEE International Conference on Image
[44] P. Rot, Ž. Emeršič, V. Struc, P. Peer, Deep multi-class eye segmentation for Processing, ICIP, IEEE, 2009, pp. 2741–2744.
ocular biometrics, in: 2018 IEEE International Work Conference on Bioinspired [70] S. Prakash, U. Jayaraman, P. Gupta, Ear localization using hierarchical clus-
Intelligence, IWOBI, IEEE, 2018, pp. 1–8. tering, in: Optics and Photonics in Global Homeland Security V and Biometric
[45] N. Liu, H. Li, M. Zhang, J. Liu, Z. Sun, T. Tan, Accurate iris segmentation Technology for Human Identification VI, 7306, SPIE, 2009, pp. 361–369.
in non-cooperative environments using fully convolutional networks, in: 2016 [71] S. Prakash, P. Gupta, An efficient ear localization technique, Image Vis. Comput.
International Conference on Biometrics, ICB, IEEE, 2016, pp. 1–8. 30 (1) (2012) 38–50.
[46] Y. He, S. Wang, K. Pei, M. Liu, J. Lai, Visible spectral iris segmentation via [72] S. Ansari, P. Gupta, Localization of ear using outer helix curve of the ear,
deep convolutional network, in: Chinese Conference on Biometric Recognition, in: 2007 International Conference on Computing: Theory and Applications
Springer, 2017, pp. 428–435. (ICCTA’07), IEEE, 2007, pp. 688–692.
[47] J. Geng, Y. Li, T. Chian, SIFT based iris feature extraction and matching, [73] J. Zhou, S. Cadavid, M. Abdel-Mottaleb, Histograms of categorized shapes for
in: Geoinformatics 2007: Geospatial Information Science, 6753, International 3D ear detection, in: 2010 Fourth IEEE International Conference on Biometrics:
Society for Optics and Photonics, 2007, p. 67532F. Theory, Applications and Systems, BTAS, IEEE, 2010, pp. 1–6.
[48] Z. He, Z. Sun, T. Tan, Z. Wei, Efficient iris spoof detection via boosted local [74] A. Pflug, P.M. Back, C. Busch, Towards making HCS ear detection robust
binary patterns, in: International Conference on Biometrics, Springer, 2009, pp. against rotation, in: 2012 IEEE International Carnahan Conference on Security
1080–1090. Technology, ICCST, IEEE, 2012, pp. 90–96.
[49] H. Zhang, Z. Sun, T. Tan, Contact lens detection based on weighted LBP, in: [75] M.R. Ganesh, R. Krishna, K. Manikantan, S. Ramachandran, Entropy based
2010 20th International Conference on Pattern Recognition, IEEE, 2010, pp. binary particle swarm optimization and classification for ear detection, Eng.
4279–4282. Appl. Artif. Intell. 27 (2014) 115–128.
[50] K.B. Raja, R. Raghavendra, C. Busch, Binarized statistical features for improved [76] P. Chidananda, P. Srinivas, K. Manikantan, S. Ramachandran, Entropy-
iris and periocular recognition in visible spectrum, in: 2nd International cum-hough-transform-based ear detection using ellipsoid particle swarm
Workshop on Biometrics and Forensics, IEEE, 2014, pp. 1–6. optimization, Mach. Vis. Appl. 26 (2) (2015) 185–203.
[51] H. Demirel, G. Anbarjafari, Iris recognition system using combined histogram [77] J. Lei, X. You, M. Abdel-Mottaleb, Automatic ear landmark localization,
statistics, in: 2008 23rd International Symposium on Computer and Information segmentation, and pose classification in range images, IEEE Trans. Syst. Man
Sciences, IEEE, 2008, pp. 1–4. Cybern. 46 (2) (2015) 165–176.
[52] K. Nguyen, C. Fookes, A. Ross, S. Sridharan, Iris recognition with off-the-shelf [78] L. Yuan, W. Liu, Y. Li, Non-negative dictionary based sparse representation
CNN features: A deep learning perspective, IEEE Access 6 (2017) 18848–18855. classification for ear recognition with occlusion, Neurocomputing 171 (2016)
[53] D. Menotti, G. Chiachia, A. Pinto, W.R. Schwartz, H. Pedrini, A.X. Falcao, A. 540–550.
Rocha, Deep representations for iris, face, and fingerprint spoofing detection, [79] R. Khorsandi, S. Cadavid, M. Abdel-Mottaleb, Ear recognition via sparse
IEEE Trans. Inf. Forensics Secur. 10 (4) (2015) 864–879. representation and gabor filters, in: 2012 IEEE Fifth International Conference on
[54] C.-W. Tan, A. Kumar, Integrating ocular and iris descriptors for fake iris image Biometrics: Theory, Applications and Systems, BTAS, IEEE, 2012, pp. 278–282.
detection, in: 2nd International Workshop on Biometrics and Forensics, IEEE, [80] R. Khorsandi, M. Abdel-Mottaleb, Gender classification using 2-D ear images
2014, pp. 1–4. and sparse representation, in: 2013 IEEE Workshop on Applications of Computer
[55] P. Silva, E. Luz, R. Baeta, H. Pedrini, A.X. Falcao, D. Menotti, An approach Vision, WACV, IEEE, 2013, pp. 461–466.
to iris contact lens detection based on deep image representations, in: 2015 [81] R. Khorsandi, A. Taalimi, M. Abdel-Mottaleb, Robust biometrics recognition
28th SIBGRAPI Conference on Graphics, Patterns and Images, IEEE, 2015, pp. using joint weighted dictionary learning and smoothed L0 norm, in: 2015 IEEE
157–164. 7th International Conference on Biometrics Theory, Applications and Systems,
[56] X. Liu, Y. Bai, Y. Luo, Z. Yang, Y. Liu, Iris recognition in visible spectrum based BTAS, IEEE, 2015, pp. 1–6.
on multi-layer analogous convolution and collaborative representation, Pattern [82] T. Ying, Z. Debin, Z. Baihuan, Ear recognition based on weighted wavelet
Recognit. Lett. 117 (2019) 66–73. transform and DCT, in: The 26th Chinese Control and Decision Conference
[57] H. Rai, A. Yadav, Iris recognition using combined support vector machine and (2014 CCDC), IEEE, 2014, pp. 4410–4414.
hamming distance approach, Expert Syst. Appl. 41 (2) (2014) 588–593. [83] K. Soni, S.K. Gupta, U. Kumar, S.L. Agrwal, A new gabor wavelet transform
[58] Y. Du, T. Bourlai, J. Dawson, Automated classification of mislabeled near- feature extraction technique for ear biometric recognition, in: 2014 6th IEEE
infrared left and right iris images using convolutional neural networks, in: Power India International Conference, PIICON, IEEE, 2014, pp. 1–3.
2016 IEEE 8th International Conference on Biometrics Theory, Applications and [84] A. Tahmasebi, H. Pourghassem, H. Mahdavi-Nasab, An ear identification system
Systems, BTAS, IEEE, 2016, pp. 1–6. using local-gabor features and knn classifier, in: 2011 7th Iranian Conference
[59] F. Marra, G. Poggi, C. Sansone, L. Verdoliva, A deep learning approach for iris on Machine Vision and Image Processing, IEEE, 2011, pp. 1–4.
sensor model identification, Pattern Recognit. Lett. 113 (2018) 46–53. [85] A. Kumar, C. Wu, Automated human identification using ear imaging, Pattern
[60] Z. Zhao, A. Kumar, Towards more accurate iris recognition using deeply learned Recognit. 45 (3) (2012) 956–968.
spatially corresponding features, in: Proceedings of the IEEE International [86] B. Arbab-Zavar, M.S. Nixon, Robust log-gabor filter for ear biometrics, in: 2008
Conference on Computer Vision, 2017, pp. 3809–3818. 19th International Conference on Pattern Recognition, IEEE, 2008, pp. 1–4.

708
Z. Qin et al. Information Fusion 91 (2023) 694–712

[87] D. Sánchez, P. Melin, Optimization of modular granular neural networks using [115] J. You, W.-K. Kong, D. Zhang, K.H. Cheung, On hierarchical palmprint coding
hierarchical genetic algorithms for human recognition using the ear biometric with multiple features for personal identification in large databases, IEEE Trans.
measure, Eng. Appl. Artif. Intell. 27 (2014) 41–56. Circuits Syst. Video Technol. 14 (2) (2004) 234–243.
[88] F.N. Sibai, A. Nuaimi, A. Maamari, R. Kuwair, Ear recognition with feed-forward [116] X.-Q. Wu, K.-Q. Wang, D. Zhang, Wavelet based palm print recognition, in:
artificial neural networks, Neural Comput. Appl. 23 (5) (2013) 1265–1273. Proceedings. International Conference on Machine Learning and Cybernetics, 3,
[89] S. Banerjee, A. Chatterjee, Image set based ear recognition using novel dic- IEEE, 2002, pp. 1253–1257.
tionary learning and classification scheme, Eng. Appl. Artif. Intell. 55 (2016) [117] A. Kumar, H.C. Shen, Palmprint identification using palmcodes, in: Third
37–46. International Conference on Image and Graphics (ICIG’04), IEEE, 2004, pp.
[90] S. Cadavid, M. Abdel-Mottaleb, Human identification based on 3D ear mod- 258–261.
els, in: 2007 First IEEE International Conference on Biometrics: Theory, [118] D.A. Reynolds, R.C. Rose, Robust text-independent speaker identification using
Applications, and Systems, IEEE, 2007, pp. 1–6. Gaussian mixture speaker models, IEEE Trans. Speech Audio Process. 3 (1)
[91] J. Lei, J. Zhou, M. Abdel-Mottaleb, A novel shape-based interest point descriptor (1995) 72–83.
(SIP) for 3D ear recognition, in: 2013 IEEE International Conference on Image [119] S. Chakroborty, G. Saha, Improved text-independent speaker identification using
Processing, IEEE, 2013, pp. 4176–4180. fused MFCC & IMFCC feature sets based on Gaussian filter, Int. J. Signal Process.
[92] X. Sun, G. Wang, 3D ear matching using local salient shape feature, in: 2013 5 (1) (2009) 11–19.
International Conference on Computer-Aided Design and Computer Graphics,
[120] X. Zhao, Y. Wang, D. Wang, Robust speaker identification in noisy and
IEEE, 2013, pp. 377–378.
reverberant conditions, IEEE/ACM Trans. Audio. Speech. Lang. Process. 22 (4)
[93] A. Gyaourova, A. Ross, A novel coding scheme for indexing fingerprint patterns,
(2014) 836–845.
in: Joint IAPR International Workshops on Statistical Techniques in Pattern
[121] K. Manikandan, E. Chandra, Speaker identification using a novel prosody with
Recognition (SPR) and Structural and Syntactic Pattern Recognition, SSPR,
fuzzy based hierarchical decision tree approach, Indian J. Sci. Technol. 9 (44)
Springer, 2008, pp. 755–764.
(2016).
[94] P. Tuyls, A.H. Akkermans, T.A. Kevenaar, G.-J. Schrijen, A.M. Bazen,
[122] P. Dhakal, P. Damacharla, A.Y. Javaid, V. Devabhaktuni, A near real-time
R.N. Veldhuis, Practical biometric authentication with template protection,
automatic speaker recognition architecture for voice-based user interface, Mach.
in: International Conference on Audio-and Video-Based Biometric Person
Learn. Knowl. Extr. 1 (1) (2019) 504–520.
Authentication, Springer, 2005, pp. 436–446.
[95] J. Bringer, V. Despiegel, Binary feature vector fingerprint representation [123] Z.K. Abdul, Kurdish speaker identification based on one dimensional convo-
from minutiae vicinities, in: 2010 Fourth IEEE International Conference on lutional neural network, Comput. Methods Differ. Equ. 7 (4 (Special Issue))
Biometrics: Theory, Applications and Systems, BTAS, IEEE, 2010, pp. 1–6. (2019) 566–572.
[96] L. Sha, F. Zhao, X. Tang, Improved fingercode for filterbank-based fingerprint [124] A. Indumathi, E. Chandra, Speaker identification using bagging techniques, in:
matching, in: Proceedings 2003 International Conference on Image Processing 2015 International Conference on Computers, Communications, and Systems,
(Cat. No. 03CH37429), 2, IEEE, 2003, pp. II–895. ICCCS, IEEE, 2015, pp. 223–229.
[97] H.İ. Öztürk, B. Selbes, Y. Artan, MinNet: Minutia patch embedding network [125] Z. Zhang, L. Wang, A. Kai, T. Yamada, W. Li, M. Iwahashi, Deep neural network-
for automated latent fingerprint recognition, in: Proceedings of the IEEE/CVF based bottleneck feature and denoising autoencoder-based dereverberation for
Conference on Computer Vision and Pattern Recognition, 2022, pp. 1627–1635. distant-talking speaker identification, EURASIP J. Audio Speech Music Process.
[98] M. Tico, P. Kuosmanen, J. Saarinen, Wavelet domain features for fingerprint 2015 (1) (2015) 1–13.
recognition, Electron. Lett. 37 (1) (2001) 21–22. [126] L. Dovydaitis, V.E. Rudžionis, Building LSTM neural network based speaker
[99] T. Amornraksa, S. Tachaphetpiboon, Fingerprint recognition using DCT features, identification system, Comput. Sci. Techn. 6 (1) (2018) 574–580.
Electron. Lett. 42 (9) (2006) 522–523. [127] C. Zhang, K. Koishida, J.H. Hansen, Text-independent speaker verification based
[100] H. Xu, R.N. Veldhuis, A.M. Bazen, T.A. Kevenaar, T.A. Akkermans, B. Gokberk, on triplet convolutional neural network embeddings, IEEE/ACM Trans. Audio
Fingerprint verification using spectral minutiae representations, IEEE Trans. Inf. Speech Language Process. 26 (9) (2018) 1633–1644.
Forensics Secur. 4 (3) (2009) 397–409. [128] L. Schomaker, M. Bulacu, Automatic writer identification using connected-
[101] X. Wu, D. Zhang, K. Wang, B. Huang, Palmprint classification using principal component contours and edge-based features of uppercase western script, IEEE
lines, Pattern Recognit. 37 (10) (2004) 1987–1998. Trans. Pattern Anal. Mach. Intell. 26 (6) (2004) 787–798.
[102] X. Wu, K. Wang, D. Zhang, Palmprint recognition using directional line [129] D. Bertolini, L.S. Oliveira, R. Sabourin, Multi-script writer identification using
energy feature, in: Proceedings of the 17th International Conference on Pattern dissimilarity, in: 2016 23rd International Conference on Pattern Recognition,
Recognition, 2004. ICPR 2004, 4, IEEE, 2004, pp. 475–478. ICPR, IEEE, 2016, pp. 3025–3030.
[103] F. Li, M.K. Leung, X. Yu, Palmprint identification using hausdorff distance, in: [130] A. Seropian, M. Grimaldi, N. Vincent, Writer identification based on the fractal
IEEE International Workshop on Biomedical Circuits and Systems, 2004, IEEE, construction of a reference base, in: ICDAR, 2003, pp. 1163–1167.
2004, pp. S3–3. [131] A. Brink, M. Bulacu, L. Schomaker, How much handwritten text is needed
[104] D.-S. Huang, W. Jia, D. Zhang, Palmprint verification based on principal lines, for text-independent writer verification and identification, in: 2008 19th
Pattern Recognit. 41 (4) (2008) 1316–1328. International Conference on Pattern Recognition, IEEE, 2008, pp. 1–4.
[105] J. Canny, A computational approach to edge detection, IEEE Trans. Pattern
[132] M. Bulacu, L. Schomaker, Text-independent writer identification and verifica-
Anal. Mach. Intell. PAMI-8 (6) (1986) 679–698, http://dx.doi.org/10.1109/
tion using textural and allographic features, IEEE Trans. Pattern Anal. Mach.
TPAMI.1986.4767851.
Intell. 29 (4) (2007) 701–717.
[106] X. Wu, D. Zhang, K. Wang, Fusion of phase and orientation information for
[133] G. Leedham, S. Chachra, Writer identification using innovative binarised
palmprint authentication, Pattern Anal. Appl. 9 (2) (2006) 103–111.
features of handwritten numerals, in: Seventh International Conference on
[107] Z. Sun, T. Tan, Y. Wang, S.Z. Li, Ordinal palmprint represention for personal
Document Analysis and Recognition, 2003. Proceedings, IEEE, 2003, pp.
identification [represention read representation], in: 2005 IEEE Computer
413–416.
Society Conference on Computer Vision and Pattern Recognition (CVPR’05),
[134] V. Pervouchine, G. Leedham, Extraction and analysis of forensic document
1, IEEE, 2005, pp. 279–284.
examiner features used for writer identification, Pattern Recognit. 40 (3) (2007)
[108] T. Connie, A.T.B. Jin, M.G.K. Ong, D.N.C. Ling, An automated palmprint
1004–1013.
recognition system, Image Vis. Comput. 23 (5) (2005) 501–515.
[109] X. Wu, D. Zhang, K. Wang, Fisherpalms based palmprint recognition, Pattern [135] S. Fiel, R. Sablatnig, Writer identification and retrieval using a convolutional
Recognit. Lett. 24 (15) (2003) 2829–2838. neural network, in: International Conference on Computer Analysis of Images
[110] G. Lu, D. Zhang, K. Wang, Palmprint recognition using eigenpalms features, and Patterns, Springer, 2015, pp. 26–37.
Pattern Recognit. Lett. 24 (9–10) (2003) 1463–1467. [136] L. Xing, Y. Qiao, Deepwriter: A multi-stream deep CNN for text-independent
[111] L. Zhang, D. Zhang, Characterization of palmprints by wavelet signatures via writer identification, in: 2016 15th International Conference on Frontiers in
directional context modeling, IEEE Trans. Syst. Man Cybern. B 34 (3) (2004) Handwriting Recognition, ICFHR, IEEE, 2016, pp. 584–589.
1335–1347. [137] R. Nasuno, S. Arai, Writer identification for offline japanese handwritten
[112] S. Ribaric, I. Fratric, A biometric identification system based on eigenpalm and character using convolutional neural network, in: Proceedings of the 5th IIAE
eigenfinger features, IEEE Trans. Pattern Anal. Mach. Intell. 27 (11) (2005) (Institute of Industrial Applications Engineers) International Conference on
1698–1709. Intelligent Systems and Image Processing, 2017, pp. 94–97.
[113] A. Kumar, D. Zhang, Combining fingerprint, palmprint and hand-shape for [138] V. Christlein, D. Bernecker, F. Hönig, A. Maier, E. Angelopoulou, Writer
user authentication, in: 18th International Conference on Pattern Recognition identification using GMM supervectors and exemplar-SVMs, Pattern Recognit.
(ICPR’06), 4, IEEE, 2006, pp. 549–552. 63 (2017) 258–267.
[114] J.S. Noh, K.H. Rhee, Palmprint identification algorithm using hu invariant mo- [139] X.-Y. Zhang, G.-S. Xie, C.-L. Liu, Y. Bengio, End-to-end online writer identi-
ments and otsu binarization, in: Fourth Annual ACIS International Conference fication with recurrent neural network, IEEE Trans. Hum.-Mach. Syst. 47 (2)
on Computer and Information Science (ICIS’05), IEEE, 2005, pp. 94–99. (2016) 285–292.

709
Z. Qin et al. Information Fusion 91 (2023) 694–712

[140] Y. Zhu, Y. Wang, An offline text-independent writer identification system [166] B.P. Yuhas, M.H. Goldstein, T.J. Sejnowski, Integration of acoustic and visual
with sae feature extraction, in: 2016 International Conference on Progress in speech signals using neural networks, IEEE Commun. Mag. 27 (11) (1989)
Informatics and Computing, PIC, IEEE, 2016, pp. 432–436. 65–71.
[141] J.-H. Yoo, M.S. Nixon, Automated markerless analysis of human gait motion [167] S. Planet, I. Iriondo, Comparison between decision-level and feature-level fusion
for recognition and classification, Etri J. 33 (2) (2011) 259–266. of acoustic and linguistic features for spontaneous emotion recognition, in: 7th
[142] W. Lu, W. Zong, W. Xing, E. Bao, Gait recognition based on joint distribution Iberian Conference on Information Systems and Technologies (CISTI 2012),
of motion angles, J. Vis. Lang. Comput. 25 (6) (2014) 754–763. IEEE, 2012, pp. 1–6.
[143] A. Roy, S. Sural, J. Mukherjee, Gait recognition using pose kinematics and pose [168] S. Awang, R. Yusof, M.F. Zamzuri, R. Arfa, Feature level fusion of face and
energy image, Signal Process. 92 (3) (2012) 780–792. signature using a modified feature selection technique, in: 2013 International
[144] D.-Y. Huang, T.-W. Lin, W.-C. Hu, C.-H. Cheng, Gait recognition based on gabor Conference on Signal-Image Technology & Internet-Based Systems, IEEE, 2013,
wavelets and modified gait energy image for human identification, J. Electron. pp. 706–713.
Imaging 22 (4) (2013) 043039. [169] N. Poonguzhali, M. Ezhilarasan, A framework for level-1 and level-2 feature
[145] E.-H. Zhang, H.-B. Ma, J.-W. Lu, Y.-J. Chen, Gait recognition using dynamic gait level fusion, in: 2018 Ieee International Conference on System, Computation,
energy and PCA+ LPP method, in: 2009 International Conference on Machine Automation and Networking (Icscan), IEEE, 2018, pp. 1–6.
Learning and Cybernetics, 1, IEEE, 2009, pp. 50–53. [170] A. Abozaid, A. Haggag, H. Kasban, M. Eltokhy, Multimodal biometric scheme
[146] C.P. Lee, A.W. Tan, S.C. Tan, Gait recognition with transient binary patterns, for human authentication technique based on voice and face recognition fusion,
J. Vis. Commun. Image Represent. 33 (2015) 69–77. Multimedia Tools Appl. 78 (12) (2019) 16345–16361.
[147] S.D. Choudhury, T. Tjahjadi, Silhouette-based gait recognition using procrustes
[171] B. Zhou, Z. Xie, F. Ye, Multi-modal face authentication using deep visual
shape analysis and elliptic Fourier descriptors, Pattern Recognit. 45 (9) (2012)
and acoustic features, in: ICC 2019-2019 IEEE International Conference on
3414–3426.
Communications, ICC, IEEE, 2019, pp. 1–6.
[148] M. Zeng, L.T. Nguyen, B. Yu, O.J. Mengshoel, J. Zhu, P. Wu, J. Zhang,
[172] T. Kumar, S. Bhushan, S. Jangra, Ann trained and WOA optimized feature-level
Convolutional neural networks for human activity recognition using mobile
fusion of iris and fingerprint, Mater. Today: Proceedings 51 (2022) 1–11.
sensors, in: 6th International Conference on Mobile Computing, Applications
[173] G. Jaswal, R.C. Poonia, Selection of optimized features for fusion of palm print
and Services, IEEE, 2014, pp. 197–205.
and finger knuckle-based person authentication, Expert Syst. 38 (2021).
[149] Z. Wu, Y. Huang, L. Wang, X. Wang, T. Tan, A comprehensive study on cross-
view gait based human identification with deep cnns, IEEE Trans. Pattern Anal. [174] A. Attia, Z. Akhtar, Y. Chahir, Feature-level fusion of major and minor dorsal
Mach. Intell. 39 (2) (2016) 209–226. finger knuckle patterns for person authentication, Signal, Image Video Process.
[150] N. Takemura, Y. Makihara, D. Muramatsu, T. Echigo, Y. Yagi, On in- 15 (4) (2021) 851–859.
put/output architectures for convolutional neural network-based cross-view gait [175] G.U. Bokade, R.D. Kanphade, Secure multimodal biometric authentication
recognition, IEEE Trans. Circuits Syst. Video Technol. 29 (9) (2017) 2708–2719. using face, palmprint and ear: a feature level fusion approach, in: 2019
[151] F.J. Ordóñez, D. Roggen, Deep convolutional and lstm recurrent neural net- 10th International Conference on Computing, Communication and Networking
works for multimodal wearable activity recognition, Sensors 16 (1) (2016) Technologies, ICCCNT, IEEE, 2019, pp. 1–5.
115. [176] D.-S. Kim, K.-S. Hong, Multimodal biometric authentication using teeth image
[152] L. Wu, C. Shen, A. van den Hengel, Convolutional lstm networks for video-based and voice in mobile environment, IEEE Trans. Consum. Electron. 54 (4) (2008)
person re-identification, 1, (11) 2016, arXiv preprint arXiv:1606.01609. 1790–1797.
[153] S. Yu, H. Chen, E.B. Garcia Reyes, N. Poh, Gaitgan: Invariant gait feature [177] X. Yan, F. Deng, W. Kang, Palm vein recognition based on multi-algorithm and
extraction using generative adversarial networks, in: Proceedings of the IEEE score-level fusion, in: 2014 Seventh International Symposium on Computational
Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. Intelligence and Design, 1, IEEE, 2014, pp. 441–444.
30–37. [178] C. Dalila, H. Imane, N.-A. Amine, Multimodal score-level fusion using hybrid
[154] O. Tekbas, N. Serinken, O. Ureten, An experimental performance evaluation ga-pso for multibiometric system, Informatica 39 (2) (2015).
of a novel radio-transmitter identification system under diverse environmental [179] K. Aizi, M. Ouslim, Score level fusion in multi-biometric identification based
conditions, Can. J. Electr. Comput. Eng. 29 (3) (2004) 203–209. on zones of interest, J. King Saud University-Comput. Inf. Sci. 34 (1) (2022)
[155] O. Ureten, N. Serinken, Bayesian detection of wi-fi transmitter RF fingerprints, 1498–1509.
Electron. Lett. 41 (6) (2005) 373–374. [180] C. Li, J. Hu, J. Pieprzyk, W. Susilo, A new biocryptosystem-oriented security
[156] K.B. Rasmussen, S. Capkun, Implications of radio fingerprinting on the security analysis framework and implementation of multibiometric cryptosystems based
of sensor networks, in: 2007 Third International Conference on Security and on decision level fusion, IEEE Trans. Inf. Forensics Secur. 10 (6) (2015)
Privacy in Communications Networks and the Workshops-SecureComm 2007, 1193–1206.
IEEE, 2007, pp. 331–340. [181] A.K. Naik, R.S. Holambe, Joint encryption and compression scheme for a
[157] S. Xu, L. Xu, Z. Xu, B. Huang, Individual radio transmitter identification multimodal telebiometric system, Neurocomputing 191 (2016) 69–81.
based on spurious modulation characteristics of signal envelop, in: MILCOM [182] D.R. Devi, K.N. Rao, Decision level fusion schemes for a multimodal biometric
2008-2008 IEEE Military Communications Conference, IEEE, 2008, pp. 1–5. system using local and global wavelet features, in: 2020 IEEE Interna-
[158] V. Brik, S. Banerjee, M. Gruteser, S. Oh, Wireless device identification with ra- tional Conference on Electronics, Computing and Communication Technologies,
diometric signatures, in: Proceedings of the 14th ACM International Conference CONECCT, IEEE, 2020, pp. 1–6.
on Mobile Computing and Networking, 2008, pp. 116–127.
[183] M. Sandhya, Y. Sreenivasa Rao, S. Biswajeet, V. Dilip Kumar, M. Anup Kumar,
[159] W.C. Suski II, M.A. Temple, M.J. Mendenhall, R.F. Mills, Using spectral finger-
A score-level fusion method for protecting fingerprint and palmprint templates,
prints to improve wireless network security, in: IEEE GLOBECOM 2008-2008
in: Security and Privacy, Springer, 2021, pp. 1–11.
IEEE Global Telecommunications Conference, IEEE, 2008, pp. 1–5.
[184] A. Kumar, S. Shekhar, Personal identification using multibiometrics rank-level
[160] D. Chen, N. Zhang, Z. Qin, X. Mao, Z. Qin, X. Shen, X.-Y. Li, S2M: A
fusion, IEEE Trans. Syst. Man Cybern. C 41 (5) (2010) 743–752.
lightweight acoustic fingerprints-based wireless device authentication protocol,
[185] A. Tahmasebi, H. Pourghasem, H. Mahdavi-Nasab, A novel rank-level fusion for
IEEE Internet Things J. 4 (1) (2016) 88–100.
multispectral palmprint identification system, in: 2011 International Conference
[161] C. Lin, J. Hu, Y. Sun, F. Ma, L. Wang, G. Wu, Wiau: An accurate device-free
on Intelligent Computation and Bio-Medical Instrumentation, IEEE, 2011, pp.
authentication system with ResNet, in: 2018 15th Annual IEEE International
208–211.
Conference on Sensing, Communication, and Networking, SECON, IEEE, 2018,
[186] M.W. Rahman, F.T. Zohra, M.L. Gavrilova, Rank level fusion for kinect gait and
pp. 1–9.
face biometrie identification, in: 2017 IEEE Symposium Series on Computational
[162] S. Aneja, N. Aneja, M.S. Islam, IoT device fingerprint using deep learning,
Intelligence, SSCI, IEEE, 2017, pp. 1–7.
in: 2018 IEEE International Conference on Internet of Things and Intelligence
System, IOTAIS, IEEE, 2018, pp. 174–179. [187] S. Ben Jemaa, M. Hammami, H. Ben-Abdallah, Finger surfaces recognition using
[163] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep rank level fusion, Comput. J. 60 (7) (2017) 969–985.
convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (2012). [188] S.N. Tumpa, M.L. Gavrilova, Score and rank level fusion algorithms for social
[164] G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. behavioral biometrics, IEEE Access 8 (2020) 157663–157675.
Vanhoucke, P. Nguyen, T.N. Sainath, et al., Deep neural networks for acoustic [189] M.V. Kumar, R. Srikantaswamy, Comparative analysis of distinct fusion levels
modeling in speech recognition: The shared views of four research groups, IEEE in multimodal biometrics, Int. J. Comput. Appl. 4 (2015) 1–4.
Signal Process. Mag. 29 (6) (2012) 82–97. [190] J.K. Sing, A. Dey, M. Ghosh, Confidence factor weighted Gaussian function
[165] G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M.A. Nicolaou, B. Schuller, induced parallel fuzzy rank-level fusion for inference and its application to
S. Zafeiriou, Adieu features? end-to-end speech emotion recognition using a face recognition, Inf. Fusion 47 (2019) 60–71.
deep convolutional recurrent network, in: 2016 IEEE International Confer- [191] R. Sharma, S. Das, P. Joshi, Rank level fusion in multibiometric systems, in:
ence on Acoustics, Speech and Signal Processing, ICASSP, IEEE, 2016, pp. 2015 Fifth National Conference on Computer Vision, Pattern Recognition, Image
5200–5204. Processing and Graphics, NCVPRIPG, IEEE, 2015, pp. 1–4.

710
Z. Qin et al. Information Fusion 91 (2023) 694–712

[192] D.R. Devi, K.N. Rao, A multimodal biometric system using partition based dwt [217] W. Cheung, S. Vhaduri, Context-dependent implicit authentication for wearable
and rank level fusion, in: 2016 IEEE International Conference on Computational device users, in: 2020 IEEE 31st Annual International Symposium on Personal,
Intelligence and Computing Research, ICCIC, IEEE, 2016, pp. 1–5. Indoor and Mobile Radio Communications, IEEE, 2020, pp. 1–7.
[193] S. Chaudhary, R. Nath, A multimodal biometric recognition system based on [218] T. Dee, I. Richardson, A. Tyagi, Continuous transparent mobile device touch-
fusion of palmprint, fingerprint and face, in: 2009 International Conference screen soft keyboard biometric authentication, in: 2019 32nd International
on Advances in Recent Technologies in Communication and Computing, IEEE, Conference on VLSI Design and 2019 18th International Conference on
2009, pp. 596–600. Embedded Systems, VLSID, IEEE, 2019, pp. 539–540.
[194] T. Baltrušaitis, C. Ahuja, L.-P. Morency, Multimodal machine learning: A survey [219] L. Xie, H. Xian, X. Tang, W. Guo, F. Hang, N. Fang, G-key: An authentica-
and taxonomy, IEEE Trans. Pattern Anal. Mach. Intell. 41 (2) (2018) 423–443. tion technique for mobile devices based on gravity sensors, in: 2019 IEEE
[195] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with International Conference on Power Data Science, ICPDS, IEEE, 2019, pp.
neural networks, Science 313 (5786) (2006) 504–507. 126–129.
[196] H.P. Martínez, G.N. Yannakakis, Deep multimodal fusion: Combining discrete [220] Z. Qin, G. Huang, H. Xiong, Z. Qin, K.-K.R. Choo, A fuzzy authentication system
events and continuous signals, in: Proceedings of the 16th International based on neural network learning and extreme value statistics, IEEE Trans.
Conference on Multimodal Interaction, 2014, pp. 34–41. Fuzzy Syst. 29 (3) (2019) 549–559.
[197] L. Yuan, Z.-C. Mu, X.-N. Xu, Multimodal recognition based on face and ear, in: [221] D. Frolova, A. Epishkina, K. Kogos, Mobile user authentication using keystroke
2007 International Conference on Wavelet Analysis and Pattern Recognition, 3, dynamics, in: 2019 European Intelligence and Security Informatics Conference,
IEEE, 2007, pp. 1203–1207. EISIC, IEEE, 2019, p. 140.
[198] M.M. Monwar, M.L. Gavrilova, Multimodal biometric system using rank-level [222] S.K. Behera, P. Kumar, D.P. Dogra, P.P. Roy, A robust biometric authentication
fusion approach, IEEE Trans. Syst. Man Cybern. B 39 (4) (2009) 867–878. system for handheld electronic devices by intelligently combining 3D finger
[199] S. Ribaric, I. Fratric, K. Kis, A biometric verification system based on the motions and cerebral responses, IEEE Trans. Consum. Electron. 67 (1) (2021)
fusion of palmprint and face features, in: ISPA 2005. Proceedings of the 4th 58–67.
International Symposium on Image and Signal Processing and Analysis, 2005, [223] X. Yang, S. Yang, J. Liu, C. Wang, Y. Chen, N. Saxena, Enabling finger-touch-
IEEE, 2005, pp. 12–17. based mobile user authentication via physical vibrations on IoT devices, IEEE
[200] S. Hariprasath, T. Prabakar, Multimodal biometric recognition using iris fea- Trans. Mob. Comput. (2021).
ture extraction and palmprint features, in: IEEE-International Conference on [224] Y. Sun, J. Cao, M. Ma, H. Li, B. Niu, F. Li, Privacy-preserving device discovery
Advances in Engineering, Science and Management (ICAESM-2012), IEEE, 2012, and authentication scheme for D2D communication in 3GPP 5G HetNet, in:
pp. 174–179. 2019 International Conference on Computing, Networking and Communications,
[201] C. Galdi, M. Nappi, J.-L. Dugelay, Multimodal authentication on smartphones: ICNC, IEEE, 2019, pp. 425–431.
Combining iris and sensor recognition for a double check of user identity, [225] D.K. Sharma, N. Baghel, S. Agarwal, Multiple degree authentication in sensible
Pattern Recognit. Lett. 82 (2016) 144–153. homes basedon iot device vulnerability, in: 2020 International Conference on
[202] Q. Li, Z. Qiu, D. Sun, Feature-level fusion of hand biometrics for personal Power Electronics & IoT Applications in Renewable Energy and Its Control,
verification based on kernel PCA, in: International Conference on Biometrics, PARC, IEEE, 2020, pp. 539–543.
Springer, 2006, pp. 744–750. [226] P. Gope, B. Sikdar, Lightweight and privacy-preserving two-factor authen-
[203] A. Meraoumia, S. Chitroub, A. Bouridane, Fusion of finger-knuckle-print and tication scheme for IoT devices, IEEE Internet Things J. 6 (1) (2018)
palmprint for an efficient multi-biometric system of person recognition, in: 2011 580–589.
IEEE International Conference on Communications, ICC, IEEE, 2011, pp. 1–5. [227] Z. Zhang, X. Xu, S. Han, Y. Liang, C. Liu, Wearable proxy device-assisted
[204] A. Attia, S. Mazaa, Z. Akhtar, Y. Chahir, Deep learning-driven palmprint and authentication request filtering for implantable medical devices, in: 2020 IEEE
finger knuckle pattern-based multimodal person recognition system, Multimedia Wireless Communications and Networking Conference, WCNC, IEEE, 2020, pp.
Tools Appl. 81 (8) (2022) 10961–10980. 1–6.
[205] R. Kala, H. Vazirani, A. Shukla, R. Tiwari, Fusion of speech and face by [228] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R.
enhanced modular neural network, in: International Conference on Information Fergus, Intriguing properties of neural networks, 2013, arXiv preprint arXiv:
Systems, Technology and Management, Springer, 2010, pp. 363–372. 1312.6199.
[206] J. Ren, Y. Hu, Y.-W. Tai, C. Wang, L. Xu, W. Sun, Q. Yan, Look, listen and [229] N. Akhtar, J. Liu, A. Mian, Defense against universal adversarial perturbations,
learn—A multimodal LSTM for speaker identification, in: Proceedings of the in: Proceedings of the IEEE Conference on Computer Vision and Pattern
AAAI Conference on Artificial Intelligence, AAAI ’16, AAAI Press, 2016, pp. Recognition, 2018, pp. 3389–3398.
3581–3587. [230] A. Nguyen, J. Yosinski, J. Clune, Deep neural networks are easily fooled: High
[207] X. Liu, J. Geng, H. Ling, Y.-m. Cheung, Attention guided deep audio-face fusion confidence predictions for unrecognizable images, in: Proceedings of the IEEE
for efficient speaker naming, Pattern Recognit. 88 (2019) 557–568. Conference on Computer Vision and Pattern Recognition, 2015, pp. 427–436.
[208] M. Sahidullah, D.A.L. Thomsen, R.G. Hautamäki, T. Kinnunen, Z.-H. Tan, R. [231] R. Feinman, R.R. Curtin, S. Shintre, A.B. Gardner, Detecting adversarial samples
Parts, M. Pitkänen, Robust voice liveness detection and speaker verification from artifacts, 2017, arXiv preprint arXiv:1703.00410.
using throat microphones, IEEE/ACM Trans. Audio Speech Lang. Process. 26 [232] N. Akhtar, A. Mian, Threat of adversarial attacks on deep learning in computer
(1) (2017) 44–56. vision: A survey, Ieee Access 6 (2018) 14410–14430.
[209] L. Wu, J. Yang, M. Zhou, Y. Chen, Q. Wang, LVID: A multimodal biometrics [233] S. Sabour, Y. Cao, F. Faghri, D.J. Fleet, Adversarial manipulation of deep
authentication system on smartphones, IEEE Trans. Inf. Forensics Secur. 15 representations, 2015, arXiv preprint arXiv:1511.05122.
(2019) 1572–1585. [234] I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial
[210] R.W. Frischholz, U. Dieckmann, BiolD: a multimodal biometric identification examples, 2014, arXiv preprint arXiv:1412.6572.
system, Computer 33 (2) (2000) 64–68. [235] B. Poudel, W. Li, Black-box adversarial attacks on network-wide multi-
[211] S. Shon, J.R. Glass, Multimodal association for speaker verification, in: step traffic state prediction models, in: 2021 IEEE International Intelligent
INTERSPEECH, 2020, pp. 2247–2251. Transportation Systems Conference, ITSC, IEEE, 2021, pp. 3652–3658.
[212] S. Nawaz, M.S. Saeed, P. Morerio, A. Mahmood, I. Gallo, M.H. Yousaf, A. [236] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards deep learning
Del Bue, Cross-modal speaker verification and recognition: A multilingual models resistant to adversarial attacks, 2017, arXiv preprint arXiv:1706.06083.
perspective, in: Proceedings of the IEEE/CVF Conference on Computer Vision [237] F. Croce, M. Hein, Reliable evaluation of adversarial robustness with an
and Pattern Recognition, 2021, pp. 1682–1691. ensemble of diverse parameter-free attacks, in: International Conference on
[213] Y. Dong, Y.-D. Yao, Secure mmwave-radar-based speaker verification for IoT Machine Learning, PMLR, 2020, pp. 2206–2216.
smart home, IEEE Internet Things J. 8 (5) (2020) 3500–3511. [238] S.-M. Moosavi-Dezfooli, A. Fawzi, P. Frossard, Deepfool: a simple and accurate
[214] A. Rahman, M.E. Chowdhury, A. Khandakar, S. Kiranyaz, K.S. Zaman, M.B.I. method to fool deep neural networks, in: Proceedings of the IEEE Conference
Reaz, M.T. Islam, M. Ezeddin, M.A. Kadir, Multimodal EEG and keystroke on Computer Vision and Pattern Recognition, 2016, pp. 2574–2582.
dynamics based biometric system using machine learning algorithms, IEEE [239] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, Universal adversarial
Access 9 (2021) 94625–94643. perturbations, in: Proceedings of the IEEE Conference on Computer Vision and
[215] K.-W. Tse, K. Hung, Behavioral biometrics scheme with keystroke and swipe Pattern Recognition, 2017, pp. 1765–1773.
dynamics for user authentication on mobile platform, in: 2019 IEEE 9th [240] J. Su, D.V. Vargas, K. Sakurai, One pixel attack for fooling deep neural
Symposium on Computer Applications & Industrial Electronics, ISCAIE, IEEE, networks, IEEE Trans. Evol. Comput. 23 (5) (2019) 828–841.
2019, pp. 125–130. [241] X.-C. Li, X.-Y. Zhang, F. Yin, C.-L. Liu, Decision-based adversarial attack with
[216] S. Vhaduri, C. Poellabauer, Multi-modal biometric-based implicit authentication frequency mixup, IEEE Trans. Inf. Forensics Secur. 17 (2022) 1038–1052.
of wearable device users, IEEE Trans. Inf. Forensics Secur. 14 (12) (2019) [242] A.E. Cinà, A. Torcinovich, M. Pelillo, A black-box adversarial attack for
3116–3125. poisoning clustering, Pattern Recognit. 122 (2022) 108306.

711
Z. Qin et al. Information Fusion 91 (2023) 694–712

[243] S. Chen, Z. He, C. Sun, J. Yang, X. Huang, Universal adversarial attack on [249] R. Shao, P. Perera, P.C. Yuen, V.M. Patel, Open-set adversarial defense
attention and the resulting dataset damagenet, IEEE Trans. Pattern Anal. Mach. with clean-adversarial mutual learning, Int. J. Comput. Vis. 130 (4) (2022)
Intell. (2020). 1070–1087.
[244] N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks, [250] J. Chen, X. Lin, H. Xiong, Y. Wu, H. Zheng, Q. Xuan, Smoothing adversarial
in: 2017 Ieee Symposium on Security and Privacy (Sp), IEEE, 2017, pp. 39–57. training for gnn, IEEE Trans. Comput. Soc. Syst. 8 (3) (2020) 618–629.
[245] S. Sharmin, P. Panda, S.S. Sarwar, C. Lee, W. Ponghiran, K. Roy, A compre- [251] J. Yuan, Z. He, Adversarial dual network learning with randomized image
hensive analysis on adversarial robustness of spiking neural networks, in: 2019 transform for restoring attacked images, IEEE Access 8 (2020) 22617–22624.
International Joint Conference on Neural Networks, IJCNN, IEEE, 2019, pp. [252] F. Sheikholeslami, S. Jain, G.B. Giannakis, Efficient randomized defense against
1–8. adversarial attacks in deep convolutional neural networks, in: ICASSP 2019-
[246] N. Papernot, P. McDaniel, X. Wu, S. Jha, A. Swami, Distillation as a defense 2019 IEEE International Conference on Acoustics, Speech and Signal Processing,
to adversarial perturbations against deep neural networks, in: 2016 IEEE ICASSP, IEEE, 2019, pp. 3277–3281.
Symposium on Security and Privacy, SP, IEEE, 2016, pp. 582–597. [253] E. Quiring, D. Arp, K. Rieck, Forgotten siblings: Unifying attacks on machine
[247] D. Hendrycks, M. Mazeika, S. Kadavath, D. Song, Using self-supervised learning learning and digital watermarking, in: 2018 IEEE European Symposium on
can improve model robustness and uncertainty, Adv. Neural Inf. Process. Syst. Security and Privacy (EuroS&P), IEEE, 2018, pp. 488–502.
32 (2019). [254] Y. Wang, D. Zou, J. Yi, J. Bailey, X. Ma, Q. Gu, Improving adversarial robustness
[248] M. Juuti, S. Szyller, S. Marchal, N. Asokan, PRADA: protecting against DNN requires revisiting misclassified examples, in: International Conference on
model stealing attacks, in: 2019 IEEE European Symposium on Security and Learning Representations, 2019.
Privacy (EuroS&P), IEEE, 2019, pp. 512–527.

712

You might also like