You are on page 1of 12

Interpretability of Machine

Learning: Recent Advances


and Future Prospects
Gao, Lei
arXiv:2305.00537v1 [cs.MM] 30 Apr 2023

Toronto Metropolitan University


Guan, Ling
Toronto Metropolitan University

Abstract—The proliferation of machine learning (ML) has drawn unprecedented interest in the study of
various multimedia contents such as text, image, audio and video, among others. Consequently,
understanding and learning ML-based representations have taken center stage in knowledge discovery in
intelligent multimedia research and applications. Nevertheless, the black-box nature of contemporary ML,
especially in deep neural networks (DNNs), has posed a primary challenge for ML-based representation
learning. To address this black-box problem, the studies on interpretability of ML have attracted tremendous
interests in recent years. This paper presents a survey on recent advances and future prospects on
interpretability of ML, with several application examples pertinent to multimedia computing, including
text-image cross-modal representation learning, face recognition, and the recognition of objects. It is
evidently shown that the study of interpretability of ML promises an important research direction, one which
is worth further investment in.

I. INTRODUCTION performances and predictions in real applications


[3]. Therefore, there has been urgent demand
In recent years, machine learning (ML), espe- to better understand and more effectively learn
cially deep neural networks (DNNs) and artificial ML-based representations. To address this black-
intelligence (AI) in general, have been utilized box problem, interpretable ML (I-ML) methods
broadly and successfully in different multime- have recently drawn considerable attention and
dia computing tasks, such as audio processing, interests in ML and the intelligent multimedia
image classification, computer vision, image re- communities [4]-[5].
trieval, and healthcare, amongst others [1]-[2]. It
is widely acknowledged that such tasks involve
the processing of various information streams to
gain valuable insights from the input data sources,
intermediate decisions, or higher level activities,
leading to superior, sometimes unprecedented
performance. Despite the extraordinary success of
ML and AI in multimedia and other fields which Figure 1 The interpretable ML (I-ML).
require intelligent processing, the interpretabil-
ity of ML/AI remains a persistent challenge. An illustrative diagram of I-ML is presented
Specifically, the black-box nature of contempo- in Figure 1 [4]. As consensus suggests [80]-[81],
rary ML architectures has posed a longstanding the classical neural network (NN)-based mod-
problem, causing concerns about questionable els (e.g., neural network, convolutional neural

IEEE Multimedia Published by the IEEE Computer Society © 2022 IEEE


1
network (CNN) and DNNs in general) exhibit sign [5], thus attempting to solve the black-box
less interpretable characteristics, thus attracting problem by post hoc actions [95]. On the other
more attention from both academic and industrial hand, interpretability of ML usually refers to the
sectors, first attempting to explain the black-box ability that users can not only see but also study
and, more recently, designing new models that are and understand how inputs are mathematically
inherently interpretable. and/or logically mapped to the outputs [5]. The
Though all NN-based models stem from Kurt- ultimate goal of studying interpretability is to
Vladimir (K-V) Universal approximation (UA) build the model architecture, which is inherently
theory [6]-[7], deep learning (DL) has dominated interpretable to avoid the black-box problem [94].
the research landscape for the past 10 years in Either way, the objective of I-ML is to identify the
visual computing, natural language processing, best possible strategies to improve interpretability
video processing, and more [8]-[11]. There are in intelligent multimedia, and ML in general. For
several reasons for the booming popularity of reference, the number of articles on interpretabil-
DL-based models: the drastically increased chip ity/explainability in the period of 2000-2019 is
processing abilities (e.g. GPU units), the signif- plotted in Figure 2 [16].
icantly lowered cost of computing hardware, the
considerable advances in ML [12] and the knowl-
edge in neurobiological science discovered and
accumulated over several decades [72]. In gen-
eral, DL-based architectures consist of multiple
processing layers to learn representations of input
data with different levels of abstraction. Com-
bined with certain optimization algorithms (e.g.,
backpropagation, Adam, etc.), such architectures
help reveal the intricate structure in large data sets
to develop intelligent systems/machines which Figure 2 The number of articles on
attempt to mimic the natural human computing interpretability and explainability of NNs/DL
system for information processing. Nevertheless, [16].
the end-to-end architecture usually makes the DL-
based representations a black-box [13][14], im- As commonly agreed upon, there are two
plying that it is difficult to tell what the prediction steps in making NN/DL models interpretable:
relies on, and what features or representations extraction and exhibition [15]. In general, extrac-
play more important roles in a given task. For a tion discovers relevant knowledge for an inter-
similar reason, DNNs have been shown to suffer mediate representation, and exhibition organizes
from lack of robustness [15]. For example, small such representation in a way that is easy for
changes in an input, sometimes imperceptible to humans to understand, e.g., via visualization of
humans, could induce instability in the DNNs, the representation [15]. The core of this survey
causing undesirable performance [15]. In the last focuses on finding relationships either contained
few years, the ML community recognized that in the data or learned by the ML model, thus
either the black-box problem has to be understood more inline with the extraction step of I-ML.
and solved or truly interpretable models have to Several survey papers have been made available
be conceived for better design and development [15]-[19][35][36][73]-[75] to the ML and intel-
of the ML models. Consequently, the study of ligent multimedia communities, mostly focusing
explainable and interpretable ML models came on explaining the internal structure of a black-
into play. Though the objectives of explainability box. While this article will further dig into recent
and interpretability are similar, they take very advances along this line of research, emphasis
different approaches. Explainability of ML refers will be given to another class of models which
to the capability of understanding the work logic are designed to be inherently interpretable from
of a ML model after the completion of its de- analytically inspired perspectives.
To complement the review, a down-to-earth

2 IEEE Multimedia
approach is utilized in this work, applying the values that change for different inputs are helpful
I-ML methods to three multimedia related areas for understanding how a given model works,
(text-image representation, face recognition and thereby generating an interpretable model.
object recognition), in anticipation that such an
approach would better inspire readers in pursuit DL Based Methods
of ML interpretability from multiple angles in Due to the recent advancement of DNNs, a
their R & D endeavors. In the rest of this paper, plethora of works based on pure DL methods have
Sections II and III present reviews on the classical been proposed, forming the mainstream tactics in
NN-based and inherently interpretable models, identifying explainability of DNNs, especially for
respectively. In Section IV, the evaluation and CNNs. For example, Zhang et al. [20] proposed
comparisons of representative models pertinent interpretable CNNs to clarify knowledge repre-
to multi-modal image and multimedia analysis sentations in high convolution layers. Then, the
and recognition are conducted. Section V presents generated knowledge representation aids in the
a summary discussion and outlines some future understanding of logic inside a CNN architecture.
prospects. Section VI draws conclusions. Zee et al. [23] introduced an interpretable neural
network (e.g., Siamese CNN) with application
II. Classical NN Based Methods to face recognition. This network provides an
In the past several years, classical NN-based explainable model to better distinguish between
models have achieved great success and outper- the faces of two similar actresses. In [26], a
form humans in numerous difficult tasks, such as technique called CNN-INTE is introduced and
image and visual classification, natural language applied to explain deep CNNs. The CNN-INTE
processing and recognition, and board games. method provides global interpretation for any
However, the black-box nature of the classical test instances on the hidden layers in the whole
models presents a real challenge to explain the feature space, attempting to explain the inner
underlying mechanisms and behaviors of the net- mechanisms of DL-based models. Chen et al.
works. The popular solution to address this prob- [29] proposed a prototype layer that was then
lem is to explore interpretability via explaining added to a regular CNN architecture. According
the inside of the black-box. Hence, for this class to the prototype layer, the network can provide
of models, the word ‘interpretability’ refers to the several prototypes for different parts of the input
ability to clarify and extract knowledge represen- image, resulting in proper interpretation of the
tations in different layers of NN-based models as model’s functionality. In [30], a decision tree
defined in [20]. In this section, the most studied is presented to encode decision modes in fully-
methods in this class, FFNN based and DL based, connected layers. Instead of classification, the
are surveyed. decision tree is designed to quantitatively explain
the logic for each CNN prediction. Wang et al.
FFNN Based Methods [31] associated every output channel in or from
Back in the 1980’s, the FFNN was already each layer with a gate (non-negative weight) for
employed to interpret and design NNs and other CNNs, which indicates how critical that channel
highly sophisticated networks [15]. For instance, is in the overall network architecture. By doing
a FFNN was utilized to construct a global min- so, the network gains the ability to explore and as-
imum loss function, resulting in a certain level sign meanings to critical nodes so that the CNNs
of understanding of the model [21]. In [22], become explainable. Ribeiro et al. [33] introduces
Kuo et al. presented an interpretable feedforward Local Interpretable Model-Agnostic Explanations
(FF) design using a data-centric approach, by (LIME), an algorithm that provides explanations
which the network parameters of the current layer of decisions for any ML model. The LIME algo-
are derived according to data statistics from the rithm calculates the importance of each feature by
output of the previous layer in a one-pass manner. generating perturbed samples of the input point
Yosinski et al. [34] inspected the activation values and using these samples (labeled by the original
of neurons in each layer with respect to different model) to learn a local approximation to the CNN
images or videos. They found that live activation model.

2022
3
In addition, a grouping-based interpretable
neural network (GroupINN) [24] was proposed.
By utilizing three different types of layers: node
grouping layer, graph convolutional layer and
fully connected layer, GroupINN can learn the
node grouping and extract graph features jointly,
leading to improved performance on brain data
classification. Montavon et al. [25] proposed Figure 3 The interpretability by FFNN/DL.
a layer-wise relevance propagation (LRP) tech-
nique, specifically designed for explanation of
III. Inherently I-ML Methods
CNNs. In essence, the LRP method is rooted
The second class of I-ML models, coined as
in a conservation principle, where each neuron
inherently interpretable models, obeys structural
receives a share of the network output and re-
knowledge of the domain, such as monotonicity,
distributes it to its predecessors in equal amount
causality, structural (generative) constraints, ad-
until the input variables are reached, an operation
ditivity, or physical constraints that come from
procedure similar to the autoencoder algorithm.
domain knowledge and can, at least, be partially
Hooker et al. [27] presented a RemOve And
justified by theoretical analysis such as physics
Retrain (ROAR) method to evaluate DL-based
laws and/or mathematical formulas [73]. The key
interpretability, which is accomplished by ver-
members of this I-ML family include physics-
ifying how the accuracy of a retrained model
informed, model based, algorithm unrolling so-
degrades as features estimated to be important
lutions and mathematics inspired methods, which
are removed. In [28], the Locality Guided Neu-
will be presented in the following subsections.
ral Network (LGNN) method is proposed with
application to explainable artificial intelligence
(XAI). Since LGNN is able to preserve locality Physics-informed NN
between neighbouring neurons within each layer In general, methods in physics-informed NN
of a deep network, it can alleviate the black-box are trained to handle supervised learning tasks
nature of current AI methods and make them un- while respecting any given laws of physics de-
derstandable by humans to some extent. In [32], scribed by general nonlinear partial differential
an interpretable partial substitution (a rule set) is equations [69]. The schematic of a physics-
investigated on the network to cover a certain informed NN is given in Figure 4, where a
subset of the input space. The proposed model fully-connected neural network is adopted to
integrates an interpretable partial substitute with approximate the multi-physics solutions u. The
any black-box model to introduce transparency derivatives of u are computed with automatic
into the predictive process at little to no cost. differentiation (AD) and then utilized to formulate
The schematic graph of the interpretability by the residuals of the governing equations in the
FFNN/DL is depicted in Figure 3. In essence, loss function. Finally, parameters of the neural
the power of the DL-based methods reviewed network θ and the unknown partial differential
in this section is confined by certain limitations equation (PDE) parameters λ are studied by min-
when applied to explore NN interpretability such imizing the loss function [82]–[83].
as the vanishing/exploding gradient problem, and
tuning of parameters manually. These limitations
prompt several groups of researchers to consider
investigating model interpretability from alterna-
tive angles, which is the focus of the next section.

Figure 4 A schematic of a physics-informed NN


model from [82].

4 IEEE Multimedia
There exist several representative physics- [89] utilized the integration of model based NN
informed models. Raissi et al. [84] utilized the with data-driven pipelines to introduce a general
physics-informed NN to solve two problems in framework for deep learning. The framework can
ML: data-driven solution and data-driven dis- be applied to a broad range of application ar-
covery of partial differential equations, reporting eas, including ultrasound imaging, optics, digital
promising results for a diverse collection of prob- communications, and tracking of dynamic sys-
lems in computational science. In [85], a frac- tems. In [90], a model based NN was proposed
tional physics-informed NN model was proposed for optimized sampling and reconstruction. This
to address multidimensional forward and inverse algorithm facilitates the joint and continuous op-
problems with forcing terms whose values are timization of the sampling pattern and the CNN
only known at randomly scattered spatio-temporal parameters to improve image quality, exhibiting
coordinates (black-box forcing terms). Cuomo et certain levels of interpretability.
al. [86] published a survey paper which summa-
rized that most research in physics-informed NN Algorithm Unrolling
has focused on customizing this class of models Different from [69][70], algorithm unrolling
through different activation functions, gradient solves model interpretability by providing a con-
optimization techniques, neural network struc- crete and systematic connection between iterative
tures, and loss function structures in ML. algorithms that are widely used in signal process-
ing and DNNs [71]. A high-level overview of
Model based NN algorithm unrolling is drawn in Figure 6. From
Studies on interpretability of model-based NN Figure 6, given an iterative algorithm (left), a
mainly focus on the construction of models that corresponding deep network (right) is generated
readily provide insight into the relationships they by cascading its iterations h. Then, iteration step
have learned [70]. A model based NN diagram h (left) is executed a number of times, result-
is shown in Figure 5. In the diagram, there ing in different parameters h1 , h2 ,...(right). Each
exist three streams according to different domain iteration h depends on algorithm parameters θ,
knowledge, the purely data-driven model (the left which are transferred into network parameters θ1 ,
stream), model-based ML (the middle stream), θ2 ,...(right). Instead of determining parameters
and model-based algorithm without data-driven through cross-validation or analytical derivations,
characteristics (the right stream) [87]. the parameters θ1 , θ2 ,... are learned from training
datasets through end-to-end training. In this way,
the network layers naturally inherit interpretabil-
ity from the iteration procedure.

Figure 5 A diagram of model based NN from


[87].
Figure 6 A high-level overview of algorithm
Model based architecture has been adopted for unrolling from [71].
DNN research and applications. In [88], a model
based NN framework was presented for image In [91], a Bayesian based unrolling algo-
reconstruction. This framework provides a sys- rithm was presented with application to single-
tematic approach for deriving deep architectures photon Lidar systems. The resulting algorithm
for inverse problems with an arbitrary structure, benefits from the advantages of both statistical
generating an interpretable DNN model for a and learning based frameworks, providing im-
variety of image applications. Shlezinger et al. proved network interpretability. Chen et al. [92]

2022
5
proposed a graph unrolling network algorithm proposed by integrating discrete cosine transform
for graph signal denoising. The proposed method (DCT) with a NN architecture, resulting in an
expanded the original unrolling algorithm to the analytically interpretable model. In [42], a canon-
graph domain and provided an interpretation of ical correlation analysis network (CCANet) is
the architecture design from a signal processing presented. The CCANet constructs two-view mul-
perspective. In [93], an interpretable unsupervised tistage filter banks by employing the canonical
unrolling algorithm was introduced to hyperspec- correlation analysis (CCA) algorithm and designs
tral pansharpening. In this work, a pansharpening a NN architecture with interpretable properties.
model was first constructed. Then, iterative steps In order to explore more discriminant informa-
are unfolded into a deep interpretable iterative tion from given data sets, the within-class and
generative dual adversarial network. between-class correlation matrices are employed
and optimized jointly to construct the convolution
Mathematics Inspired Methods layer, laying the foundation for the introduction
By integrating Statistics Guided Optimization of a distinct discriminant canonical correlation
(SGO) with NN architecture, this class of mod- analysis network (DDCCANet) [43][67]. In [44],
els, coined as SGO-NN, exhibits model agnostic a learning-based multi-modal hashing analysis
properties and is ideal for global model inter- (ILMMHA) model is proposed. ILMMHA is able
pretability. In fact, the idea also stems from the to generate an analytically interpretable feature
K-V UA theory [6]-[7], but using an alternate representation, yielding substantially improved
realization strategy. Instead of going deep, the performance in cross-model (text-image) recog-
network goes wider, an approach first acknowl- nition application.
edged and practiced by the NN community in the It is worth noting that the key members of
1980’s. Like the DNNs, computational power lim- this model class, such as CCANet, DDCCANet
ited the progress of this class of networks as well and ILMMHA, are particularly prevalent to infor-
until the first decade of the 21st century. However, mation processing by mimicking certain facts in
DNNs quickly dominated the ML landscape, and neurobiological signal analysis, handling multiple
thus overran further exploration of this alternative information streams coherently and simultane-
approach until recently. ously (with audio-visual processing as a popular
This class of architecture features three char- example) [45]. Apparently such an architecture
acters: a) Kolmogorov-Arnold (K-A) theorem fits well with multimedia information processing,
[79] and K-V UA theory [6]-[7] as the foundation; in which two or more different data streams are
b) certain biological justifications and scientific processed jointly. It is worth further noting that
rules in architecture design; c) powerful opti- even when working with one mode of sensory
mization methods for a quality training process. data, e.g., audio or visual, such a principle still
Analytically, the recent progress in approxima- applies. For instance, a) human speech is a natural
tion theory solidly verified the K-A theorem/K- blend of phonetic information and vocal informa-
V UA theory that three hidden layers are suf- tion; b) to form a 3D color image in the human
ficient for a NN to approximate any nonlinear visual system, color and depth information are
functions under mild conditions [38]. At the same jointly processed and presented.
time, a number of practical models with three While K-A theorem/K-V UA theory serve
or fewer layers have emerged and demonstrated as the theoretical foundation and facts of neu-
their flexibility [39], effectiveness and compu- robiology helps form the architecture, learning
tational affordability. Examples include PCANet other than steepest descent motivated backpropa-
[40], DCTNet [41], CCANet [42], DDCCANet gation (Back-Prop) algorithms has also been ap-
[43][67], ILMMHA [44]. Ma et al. [40] offered a proached. Though it is a known fact that steepest
PCANet for image classification. In the PCANet, descent is slow to converge (and easy to diverge)
a classical SGO method, principal component and prone to get trapped in local minima, it forms
analysis (PCA), was utilized to construct multi- the backbone of Back-Prop algorithms due to
stage filter banks, leading to easy understanding the very limited computational power available
of the proposed model. In [41], DCTNet was

6 IEEE Multimedia
in the 1980’s when the algorithm emerged. The (text-image)-based and multi-view visual-based
extremely long training time for contemporary (face recognition and recognition of objects)
DL algorithms, e.g., in any Resnet architecture, is, examples. The compared algorithms/models are
at least, partly due to the above mentioned issue. grouped into three categories, a) without I-ML
Many ideas have been put forward to alleviate (WO-I-ML), b) with classical NN (C-NN), and c)
this problem. For example, Widrow et al. [37] SGO-NN. Note that, all SGO-NN models used in
proposed a Non-Prop algorithm and demonstrated the experiments consist of three or fewer hidden
that it is much simpler and easier to implement, layers.
and converges much faster than Back-Prop algo-
rithms while producing similar quality of results Cross-modal (Text-image) Recognition
for shallow NNs. Different from Widrow’s idea, The Wiki Database.
the multi-stream nature of neurobiological sig- The Wiki database is captured from various
nal processing has recently motivated a different featured articles in Wikipedia. There are 2,866
Non-Prop algorithm in which the parameters of documents stored in text-image pairs and as-
the network are determined by analytically solv- sociated with supervised semantic labels of 10
ing a SGO problem in each convolutional layer classes. For fair comparison, all documents are
independently, leading to model interpretability further divided into a training subset with 2173
which is mathematically plausible. Therefore, the documents and a testing subset with 693 docu-
mathematical vigor and meaningful interpretation ments as practiced in past studies [46]-[50]. The
of the functionality of the SGO solutions offer SGO-NN model, ILMMHA, uses two classical
an effective vehicle to complement NNs’ ability features as the input (bag-of-visual SIFT (BOV-
to handle big data and have inspired researchers SIFT) [48] and the Latent Dirichlet Allocation
in ML and multimedia communities to make (LDA) [49]). For comparison, the recognition
collective use of their strengths to explore the results by different types of SOTA algorithms are
model interpretability. Note that, since this class tabulated in the Table 1.
of NN models executes the optimization pro-
cess by utilizing SGO-based algorithms instead Recognition accuracies on the Wiki
Table 1.
of Back-Prop, it shortens the calculation pro- database
cess and reduces running time, making SGO- Methods Training # Accuracy Type
NN algorithms more applicable in real tasks. The L2,1 CCA [46] 2173 65.99% WO-I-ML
schematic graph of such a network architecture is MH-DCCM [47] 2173 67.10% WO-I-ML
RE-DNN [50] 2173 63.95% C-NN
illustrated in Figure 7. ILMMHA [44] 2173 74.28% SGO-NN

Since the incumbent SGO principle (multi-


modal hashing (MMH)) is capable of measuring
semantic similarity across multiple variables ef-
fectively, the ILMMHA model yields a new fea-
ture representation of high quality by integrating
the SGO principle with NN architecture. Table 1
Figure 7 The interpretability by SGO and NN
shows that ILMMHA outperforms the others by
architecture.
a large margin, demonstrating the superiority of
IV. Evaluation on Exemplar Applications this SGO-NN model.
ML techniques have been playing a significant
role in information processing with impressive, Visual Examples
sometimes unprecedented success. In the follow- Face Recognition-The ORL Database.
ing subsections, the performances of different In face recognition, experiments are conduced
state-of-the-art (SOTA) I-ML methods pertinent on the Olivetti Research Lab (ORL) database. In
to multi-modal image and multimedia analysis the ORL database, there are 40 people with 10
and recognition are evaluated on cross-modal different images for each subject. The images are

2022
7
captured under different conditions, such as illu- Object Recognition-The Caltech 256 Database.
mination, posing, etc. In this experiment, all 400 Caltech 256 database contains a varying set
samples in the ORL database are utilized with 280 of illumination, movements, backgrounds, etc.
images randomly selected as the training samples The classes are hand-picked to represent a wide
while the remaining images used in testing. The variety of natural and artificial objects in vari-
SGO-NN models are worked on two-view data ous settings. In the experiments, for fair com-
sets (the original image and local binary patterns parison, the same settings used in other studies
(LBP)-based image). The comparison amongst are adopted. Specifically, 60 images are chosen
SOTA algorithms is given in Table 2. from each class as training samples. A relatively
Table 2. Recognition accuracy on the ORL simple DNN architecture, VGG-19, is employed
database to extract DL-based features, which serve as the
Methods Training # Accuracy Type input to the SGO-NN models. The performance
ANFIS-ABC [51] 280 96.00% WO-I-ML of SOTA methods is reported in Table 4.
ESP [52] 280 96.00% WO-I-ML
DL-SE [76] 280 96.08% WO-I-ML Recognition accuracy with other meth-
Table 4.
HMMFA [77] 280 94.17% WO-I-ML
CNN [53] 280 95.92% C-NN ods on the Caltech 256
IKLDA+PNN [54] 280 96.35% C-NN Methods Training # Accuracy Type
LiSSA [55] 280 97.51% C-NN CMFA-SR [62] 15420 76.31% WO-I-ML
PCANet [40] 280 96.28% SGO-NN LLKc [63] 15420 75.36% WO-I-ML
CCANet [42] 280 97.92% SGO-NN Joint fine-tuning [65] 15420 83.80% C-NN
DDCCANet [43] 280 98.50% SGO-NN SMNN+Xception [64] 15420 84.70% C-NN
TransTailor [66] 15420 87.30% C-NN
CCANet [42] 15420 87.82% SGO-NN
DDCCANet [43] 15420 88.34% SGO-NN
Object Recognition-The ETH-80 Database.
As a popular data set for multi-view fea-
The three visual examples clearly show that
ture representation studies, the ETH-80 database
the mathematically plausible DDCCANet brings
consists of 3280 color RGB object samples. All
out the discriminant information from the given
samples are divided into eight classes, including
data sets, resulting in superior performance com-
apples, cars, cows, cups, dogs, horses, pears and
pared with SOTA algorithms as shown in Tables
tomatoes. In our experiments, all samples are
2–4.
normalized at a size of 64 × 64 pixels. Moreover,
In summary, the experimental results illus-
1640 images are randomly chosen to construct
trated in Tables 1–4 clearly show that both I-ML
the training subset while the remaining images
branches operate well in the experiments, with
are used in testing. The raw data (R and G sub-
SGO-NN having a slight edge in the three visual
channel images) is adopted as the two inputs for
examples and being substantially better in cross-
SGO-NN models. For comparison, recognition
model recognition. The results evidently justify
rates by SOTA methods in all three categories
the necessity of incorporating interpretability in
are listed in Table 3.
ML research.
Recognition accuracy on the ETH–80
Table 3.
database V. DISCUSSION AND FUTURE
Methods Training # Accuracy Type PROSPECTS
SSL-TR [56] 1640 93.40% WO-I-ML Discussions
SRC+DPC [57] 1640 94.00% WO-I-ML
SML [58] 1640 94.02% WO-I-ML
In this subsection, we first discuss and em-
RMML [78] 1640 94.25% WO-I-ML phasize on the key points raised in this paper:
TLRDA+PCA [59] 1640 92.00% C-NN
CMCM [60] 1640 92.50% C-NN 1) I-ML gives ML models the ability to ex-
Fine-tuned AlexNet [61] 1640 94.20% C-NN plain or to present their behaviors in un-
CCANet [42] 1640 93.98% SGO-NN derstandable terms to humans, which better
DDCCANet[43] 1640 94.40% SGO-NN
serves human beings and brings benefits to
our society. As a result, there is a growing

8 IEEE Multimedia
interest in both the academic and industrial Future Prospects
sectors in I-ML and insight is being gained This paper prompts us to contemplate further
into working mechanisms of this class of on future prospects of I-ML, especially in ML
ML models, both classical NN based I-ML and the intelligent multimedia communities with
and inherently I-ML. the following proposed considerations:
2) The investigation of the inherently I-ML
models opens up a new front to address 1) Challenges in the Study of I-ML. There
various challenges in model interpretability are still numerous critical challenges to
of ML with the objective of minimizing the understanding the interpretability of ML,
black-box problem from the network design such as:
stage. This class of models obeys structural (a) how to discover more effective solutions
knowledge of the domain and can, at least, that can learn interpretability from hetero-
be partially justified by theoretical analysis geneous ML-based models. Although the
such as physics laws and/or mathematical research on model interpretability is a hot
formulas. The current success and ongoing topic, it is still far from being thoroughly
trend show that these models are expected studied, especially when dealing with lim-
to have a bright future in ML research and ited training data. It is our humble opinion
applications. that extracting complementary features via
3) Based on the K-A theorem/K-V UA theory, heterogeneous models is one of the po-
neurobiological signal processing facts and tential solutions to this issue. It is known
SGO principles, the SGO-NN models have that information fusion is capable of ex-
demonstrated great promise in addressing ploiting complementary and/or consistency
the interpretability problem associated with properties amongst individual features for
contemporary ML, especially that pertinent effective knowledge discovery in multime-
to multi-modal image and multimedia anal- dia computing & synthesis, presenting a so-
ysis and recognition. In these models, not lution to the limited training data problem.
only are more abstract and robust semantics Since heterogeneous models usually pos-
being explored by NN structure, but also sess distinct architectures/algorithms, more
mathematically meaningful interpretations effort should be devoted to study this class
of the functionality of the SGO solutions of ML models.
are put into perspective, vigorously justify- (b) how to reveal the model interpretabil-
ing its superior performance over the other ity when the input data contains temporal
categories of algorithms (WO-I-ML and C- information (such as in video-based ap-
NN). plications), an area which has not been
4) Although SGO-NN is closely aligned with extensively explored using I-ML models.
information fusion, the ETH-80 example For example, temporal features such as
with image pixels as network input demon- skeletons are extracted and combined with
strated its power to handle the complete static features of color and depth to improve
processing pipeline with raw data as input. performance in action recognition. Hence,
Thus it can serve both as an information the demand of simultaneously and collab-
integrator of high level features and an end- oratively processing static and dynamic in-
to-end processor like the DNNs. formation streams presents new challenges
5) Profiting from the parallel computing power to the exploration and design of more pow-
of GPU, the most recent study [68] shows erful I-ML models.
that the calculation time of SGO-NN mod-
els has been greatly shortened, making it 2) Model interpretability based on method-
practically plausible in both academic re- ology fusion. From both the survey and the
search and real-world tasks. exemplar applications presented in this pa-
per, fusion methods integrating SGO prin-
ciples and NN architecture are a promising

2022
9
branch in the study of I-ML, opening a 9. W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F.E.
Alsaadi. “A survey of deep neural network architectures
door for the development of inherently in- and their applications.” Neurocomputing, vol. 234, pp.
terpretable learning models. However, most 11–26, 2017.
10. Y. LeCun, Y. Bengio, and G. Hinton. “Deep learning.”
of the current SGO-NN based methods are Nature, vol. 521, no. 7553, pp. 436–444, 2015.
only able to handle one or two data streams. 11. S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M.P.
To work with data from three or more Reyes, M. Shyu, S. Chen, and S.S. Iyengar. “A survey on
deep learning: Algorithms, techniques, and applications.”
sources, a situation frequently encountered ACM Computing Surveys (CSUR), vol. 51, no. 5, pp. 1–
in the real world, there is a natural demand 36, 2018.
12. Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M.S.
to design new models/algorithms. We antic- Lew. “Deep learning for visual understanding: A review.”
ipate that the study of methodology fusion Neurocomputing, vol. 187, pp. 27–48, 2016.
will continue maximizing the pros and min- 13. V. Buhrmester, D. Munch, and M. Arens. “Analysis of ex-
plainers of black box deep neural networks for computer
imizing the cons of SGO-NN models. vision: A survey.” arXiv preprint arXiv:1911.12116, 2019.
14. T. Langner, R. Strand, H. Ahlstrom, and J. Kullberg.
“Large-scale biometry with interpretable neural network
VI. Conclusions regression on uk biobank body MRI.” Scientific reports,
This paper reviews recent advances and vol. 10, no. 1, pp. 1–9, 2020.
15. X. Huang, D. Kroening, W. Ruan, J. Sharp, Y. Sun,
contemplates the future prospects of ML in- E. Thamo, M. Wu, and X. Yi. “A survey of safety and
terpretability. Performance and comparison on trustworthiness of deep neural networks: Verification,
testing, adversarial attack and defence, and interpretabil-
the collected exemplar applications using basic ity.” Computer Science Review, vol. 37, pp. 1–35, 2020.
NNs/DNNs and the two categories of I-ML meth- 16. F.L. Fang, J. Xiong, M. Li, and G. Wang. “On inter-
ods indicate that I-ML methods had evidently led pretability of artificial neural networks: A survey.” IEEE
Transactions on Radiation and Plasma Medical Sci-
to performance gains. In addition, methodology ences, 2021 (Preprint).
fusion of SGO principles and NN architecture 17. Q. Zhang, and S.C. Zhu. “Visual interpretability for deep
learning: a survey.” arXiv preprint arXiv:1802.00614,
brings about the benefit of reduced computing 2018.
cost. From the survey paper, we reckon that more 18. Y. Zhang, P. Tino, A. Leonardis, and K. Tang. “A sur-
vey on neural network interpretability.” arXiv preprint
efforts should be put into the investigation of the arXiv:2012.14261, 2020.
study of I-ML. 19. L.H. Gilpin, D. Bau, B.Z. Yuan, A. Bajwa, M. Specter,
and L. Kagal. “Explaining explanations: An overview of
interpretability of machine learning.” 2018 IEEE 5th In-
REFERENCES ternational Conference on data science and advanced
1. M.I. Jordan, and T.M. Mitchell. “Machine learning: analytics (DSAA), pp. 80–89, 2018.
Trends, perspectives, and prospects.” Science, vol. 349, 20. Q. Zhang, Y.N. Wu, and S.C. Zhu. “Interpretable convo-
no. 6245, pp. 255–260, 2015. lutional neural networks.” 2018 IEEE CVPR, pp. 8827–
2. S. Das, N. Agarwal, D. Venugopal, F.T. Sheldon, and S. 8836, 2018.
Shiva. “Taxonomy and Survey of Interpretable Machine 21. Q. Nguyen, and M. Hein. “The loss surface of deep and
Learning Method.” 2020 IEEE Symposium Series on wide neural networks.” 2017 ICML, pp. 2603–2612, 2017.
Computational Intelligence (SSCI), pp. 670–677, 2020. 22. C-C.J. Kuo, M. Zhang, S. Li, J. Duan, and Y. Chen. “In-
3. W. Zhu, X. Wang, and W. Gao. “Multimedia intelli- terpretable convolutional neural networks via feedforward
gence: When multimedia meets artificial intelligence.” design.” Journal of Visual Communication and Image
IEEE Transactions on Multimedia, vol. 22, no. 7, pp. Representation, vol. 60, pp. 346–359, 2019.
1823–1835, 2020. 23. T. Zee, G. Gali, and I. Nwogu. “Enhancing human face
4. C. Molnar, G. Casalicchio, and B. Bischl. “Interpretable recognition with an interpretable neural network.” 2019
machine learning: a brief history, state-of-the-art and ICCV Workshops, pp. 1–9, 2019.
challenges.” European Conference on Machine Learning 24. Y. Yan, J. Zhu, M. Duda, E. Solarz, C. Sripada, and D.
and Knowledge Discovery in Databases, pp. 417–431, Koutra. “Groupinn: Grouping-based interpretable neural
2020. network for classification of limited, noisy brain data.”
5. A. Adadi, and M. Berrada. “Peeking inside the black-box: Proceedings of the 25th ACM SIGKDD International
a survey on explainable artificial intelligence (XAI).” IEEE Conference on Knowledge Discovery & Data Mining, pp.
Access, vol. 6, pp. 52138–52160, 2018. 772–782, 2019.
6. K. Hornik, M. Stinchcombe, and H. White. “Multilayer 25. G. Montavon, W. Samek, and K. Muller. “Methods for
feedforward networks are universal approximators.” Neu- interpreting and understanding deep neural networks.”
ral networks, vol. 2, no. 5, pp. 359–366, 1989. Digital signal processing, vol. 73, pp. 1–15, 2018.
7. A. Kolmogorov. “On the representation of continuous 26. X. Liu, X. Wang, and S. Matwin. “Interpretable deep
functions of several variables by superpositions of con- convolutional neural networks via meta-learning.” 2018
tinuous functions of a smaller number of variables.” Pro- IJCNN, pp. 1–9, 2018.
ceedings of the USSR Academy of Sciences, vol. 108, 27. S. Hooker, D. Erhan, P. Kindermans, and B. Kim. “A
pp. 179–182, 1956. benchmark for interpretability methods in deep neural
8. A. Khan, A. Sohail, U. Zahoora, and A.S. Qureshi. “A networks.” 2019 NIPS, pp.1–12, 2019.
survey of the recent architectures of deep convolutional 28. R. Tan, N. Khan, and L. Guan. “Locality guided neu-
neural networks.” Artificial Intelligence Review, vol. 53, ral networks for explainable artificial intelligence.” 2020
no. 8, pp. 5455–5516, 2020. IJCNN, pp. 1–8, 2020.

10 IEEE Multimedia
29. C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. K. allocation.” Journal of machine Learning research, vol. 3,
Su. “This looks like that: deep learning for interpretable no. 1, pp. 993-1022, 2003.
image recognition.” 2019 NIPS, pp. 1–12, 2019. 50. C. Wang, H. Yang, and C. Meinel. “A deep semantic
30. Q. Zhang, Y. Yang, H. Ma, and Y.N. Wu. “Interpreting framework for multimodal representation learning.” Multi-
cnns via decision trees.” 2019 CVPR, pp. 6261–6270. media Tools and Applications, vol. 75, no. 15, pp. 9255-
2019. 9276, 2016.
31. Y. Wang, H. Su, B. Zhang, and X. Hu. “Interpret neural 51. M.R. Rejeesh. “Interest point based face recognition
networks by identifying critical data routing paths.” 2018 using adaptive neuro fuzzy inference system.” Multimedia
CVPR, pp. 8906–8914, 2018. Tools and Applications, vol. 78, no. 16, pp. 22691-22710,
32. T. Wang. “Gaining free or low-cost interpretability with 2019.
interpretable partial substitute.” 2019 ICML, pp. 6505– 52. W. Wei, H. Dai, and W. Liang. “Exponential sparsity pre-
6514, 2019. serving projection with applications to image recognition.”
33. M.T. Ribeiro, S. Singh, and C. Guestrin. “Why should Pattern Recognition, vol. 104, pp. 1-11, 2020.
i trust you? Explaining the predictions of any classifier.” 53. A. Krizhevsky, I. Sutskever, and G. E. Hinton. “ImageNet
Proceedings of the 22nd ACM SIGKDD international classification with deep convolutional neural network.”
conference on knowledge discovery and data mining, pp. 2012 NIPS, pp. 1097-1105, 2012.
1135–1144, 2016. 54. A. Ouyang, Y. Liu, S. Pei, X. Peng, M. He, and Q. Wang.
34. J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. “A hybrid improved kernel LDA and PNN algorithm for
Lipson. “Understanding neural networks through deep efficient face recognition.” Neurocomputing, vol. 393, pp.
visualization.” arXiv preprint arXiv:1506.06579, 2015. 214-222, 2019.
35. B. Aslam, A. Zafar, and U. Khalil. “Development of 55. T. Wang, W.W. Ng, M. Pelillo and S. Kwong. “LiSSA:
integrated deep learning and machine learning algorithm localized stochastic sensitive autoencoders.” IEEE Trans-
for the assessment of landslide hazard potential.” Soft actions on Cybernetics, vol. 51, no. 5, pp. 2748–2760,
Computing, vol. 25, no. 21, pp. 13493–13512, 2021. 2019.
36. Y. Roh, G. Heo, and S.E. Whang. “A survey on data 56. W. Yan, Q. Sun, H. Sun, and Y. Li. “Semi-Supervised
collection for machine learning: a big data-ai integration Learning Framework Based on Statistical Analysis for
perspective.” IEEE Transactions on Knowledge and Data Image Set Classification.” Pattern Recognition, vol. 107,
Engineering, vol. 33, no. 4, pp. 1328–1347, 2019. pp. 1–13, 2020.
37. B. Widrow, A. Greenblatt, Y. Kim, and D. Park. “The no- 57. K. Sharma, and R. Rameshan. “Image Set Classifica-
prop algorithm: A new learning algorithm for multilayer tion Using a Distance-Based Kernel Over Affine Grass-
neural networks.” Neural Networks, vol. 37, pp. 182–188, mann Manifold.” IEEE Transactions on Neural Networks
2013. and Learning Systems, vol. 32, no. 3, pp. 1082–1095,
38. Z. Shen, H. Yang, and S. Zhang. “Neural network ap- 2020.
proximation: Three hidden layers are enough.” Neural 58. E. Vural, and C. Guillemot. “A study of the classification
Networks, vol. 141, pp. 160–173, 2021. of low-dimensional data with supervised manifold learn-
39. Z. Qin, F. Yu, C. Liu, and X. Chen. “How convolutional ing.” Journal of Machine Learning Research, vol. 18, no.
neural networks see the world—A survey of convolu- 1, pp. 5741–5795, 2017.
tional neural network visualization methods.” Mathemati- 59. J. Zhang, Z. Li, P. Jing, Y. Liu, and Y. Su. “Tensor-driven
cal Foundations of Computing, vol. 1, no. 2, pp. 149–180, low-rank discriminant analysis for image set classifica-
2018. tion.” Multimedia Tools and Applications, vol. 78, no. 4,
40. T.H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng and Y. Ma. pp. 4001–4020, 2019.
“PCANet: A simple deep learning baseline for image 60. N. Sogi, T. Nakayama, and K. Fukui. “A method based
classification?” IEEE Transactions on Image Processing, on convex cone model for image-set classification with
vol. 24, no. 12, pp. 5017–5032, 2015. cnn features.” 2018 IJCNN, pp. 1–8, 2018.
41. C.J. Ng, and A.B.J. Teoh. “DCTNet: A simple learning- 61. X. Wu, Y. Wang, H. Tang, and R. Yan. “A structure–time
free approach for face recognition.” 2015 APSIPA, pp. parallel implementation of spike-based deep learning.”
761–768, 2015. Neural Networks, vol. 113, pp. 72–78, 2019.
42. X. Yang, W. Liu, D. Tao, and J. Cheng. “Canonical corre- 62. A. Puthenputhussery, Q. F. Liu, and C. J. Liu. “A sparse
lation analysis networks for two-view image recognition.” representation model using the complete marginal fisher
Information Sciences, vol. 385, pp. 338–352, 2017. analysis framework and its applications to visual recog-
43. L. Gao, Z. Guo, L. Guan. “A Distinct Discriminant nition.” IEEE Transactions on Multimedia, vol. 19, no. 8,
Canonical Correlation Analysis Network based Deep In- pp. 1757–1770, 2017.
formation Quality Representation for Image Classifica- 63. Q. F. Liu, and C. Liu. “A novel locally linear KNN method
tion.” 2020 ICPR, pp. 7595–7600, 2020. with applications to visual recognition.” IEEE transactions
44. L. Gao and L. Guan. “Interpretable learning-based on neural networks and learning systems, vol. 28, no. 9,
multi-modal hashing analysis for Multi-view Feature Rep- pp. 2010–2021, 2017.
resentation Learning.” IEEE MIPR, pp. 1–6, 2022. 64. D. Wang, and K. Mao. “Learning Semantic Text Fea-
45. B. Widrow, and J.C. Aragon. “Cognitive memory.” Neural tures for Web Text-Aided Image Classification.” IEEE
Networks, vol. 41, pp. 3–14, 2013. Transactions on Multimedia, vol. 21, no. 12, pp. 2985–
46. M. Xu, Z. Zhu, X. Zhang, Y. Zhao, and X. Li. “Canonical 2996, 2019.
Correlation Analysis With L2,1-Norm for Multiview Data 65. W. Ge, and Y. Yu. “Borrowing treasures from the
Representation.” IEEE transactions on cybernetics, vol. wealthy: Deep transfer learning through selective joint
50, no. 11, pp. 4772–4782, 2020. fine-tuning.” IEEE CVPR, pp. 1086–1095, 2017.
47. L. Gao, and L. Guan. “A Discriminative Vectorial Frame- 66. B. Liu, Y. Cai, Y. Guo, and X. Chen. “TransTailor: Prun-
work for Multi-modal Feature Representation.” IEEE ing the pre-trained model for improved transfer learning.”
transactions on Multimedia, vol. 24, pp. 1503–1514, 2021 AAAI, vol. 35, no. 10, pp. 8627–8634, 2021.
2022. 67. L. Gao, Z. Guo, and L. Guan. “ODMTCNet: An In-
48. K. Nguyen, D. Le, and D.A. Duong. “Efficient traffic sign terpretable Multi-view Deep Neural Network Architec-
detection using bag of visual words and multi-scales sift.” ture for Image Feature Representation.” arXiv preprint
2013 NIPS, pp. 433–441, 2013. arXiv:2110.14830, 2021.
49. D.M. Blei, A.Y. Ng, and M.I. Jordan. “Latent dirichlet 68. K. Liu, L. Gao, and L. Guan. “A GPU-accelerated Al-

2022
11
gorithm for Distinct Discriminant Canonical Correlation through physics informed neural networks: where we are
Network.” arXiv preprint arXiv:2209.13027, 2022. and what is next.” Journal of Scientific Computing, vol.
69. M. Raissi, P. Perdikaris, and G.E. Karniadakis. “Physics- 92, no. 3, pp. 1–62, 2022.
informed neural networks: A deep learning framework for 87. N. Shlezinger, J. Whang, Y.C. Eldar, and A.G. Di-
solving forward and inverse problems involving nonlinear makis. “Model-based deep learning.” arXiv preprint
partial differential equations.” Journal of Computational arXiv:2012.08405, 2020.
physics, vol. 378, pp. 686–707, 2019. 88. H.K. Aggarwal, M.P. Mani, and M. Jacob. “MoDL:
70. W.J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, Model-based deep learning architecture for inverse prob-
and B. Yu. “Definitions, methods, and applications in lems.” IEEE transactions on medical imaging, vol. 38, no.
interpretable machine learning.” Proceedings of the Na- 2, pp. 394–405, 2018.
tional Academy of Sciences, vol. 116, no. 44, pp. 22071– 89. N. Shlezinger, Y.C. Eldar, and S.P. Boyd. “Model-based
22080, 2019. deep learning: On the intersection of deep learning and
71. V. Monga, Y. Li, and Y.C. Eldar. “Algorithm unrolling: optimization.” IEEE Access, vol. 10, pp. 115384–115398,
Interpretable, efficient deep learning for signal and image 2022.
processing.” IEEE Signal Processing Magazine, vol. 38, 90. H.K. Aggarwal, and M. Jacob. “J-MoDL: Joint model-
no. 2, pp. 18–44, 2021. based deep learning for optimized sampling and re-
72. M.T. Vu, T. Adali, D. Ba, G. Buzsáki, D. Carlson, K. construction.” IEEE journal of selected topics in signal
Heller, C. Liston. “A shared vision for machine learning processing, vol. 14, no. 6, pp. 1151–1162, 2020.
in neuroscience.” Journal of Neuroscience, vol. 38, no. 7, 91. J. Koo, A. Halimi, and S. McLaughlin. “A Bayesian
pp. 1601–1607, 2018. based deep unrolling algorithm for single-photon Lidar
73. C. Rudin. ”Stop explaining black box machine learning systems.” IEEE Journal of Selected Topics in Signal
models for high stakes decisions and use interpretable Processing, vol. 16, no. 4, pp. 762–774, 2022.
models instead.” Nature Machine Intelligence, vol. 1, no. 92. S. Chen, Y.C. Eldar and L. Zhao. “Graph unrolling
5, pp. 206–215, 2019. networks: Interpretable neural networks for graph signal
74. F.K. Doilovic, M. Brcic, and N. Hlupic. “Explainable arti- denoising.” IEEE Transactions on Signal Processing, vol.
ficial intelligence: A survey.” 2018 41st International con- 69, pp. 3699–3713, 2021.
vention on information and communication technology, 93. J. Qu, W. Dong, Y. Li, S. Hou, and Q. Du. “An Inter-
electronics and microelectronics (MIPRO), pp. 0210– pretable Unsupervised Unrolling Network for Hyperspec-
0215, 2018. tral Pansharpening.” IEEE Transactions on Cybernetics
75. P. Angelov, and E. Soares. “Towards explainable deep (Early Access), 2023.
neural networks (xDNN).” Neural Networks, vol. 130, pp. 94. G. Xu, T.D. Duong, Q. Li, S. Liu, and X. Wang. “Causal-
185–194, 2020. ity learning: A new perspective for interpretable machine
76. Y. Zhang, W. Liu, H. Fan, Y. Zou, Z. Cui, and Q. learning.” arXiv preprint arXiv:2006.16789, 2020.
Wang. “Dictionary learning and face recognition based 95. E.M. Kenny, C. Ford, M. Quinn, and M.T. Keane.
on sample expansion.” Applied Intelligence, vol. 52, no. “Explaining black-box classifiers using post-hoc
4, pp. 3766–3780, 2022. explanations-by-example: The effect of explanations
77. S. Zhao, W. Liu, S. Liu, J. Ge, and X. Liang. “A and error-rates in XAI user studies.” Artificial Intelligence,
hybrid-supervision learning algorithm for real-time un- vol. 294, pp. 103459.1–103459.25, 2021.
completed face recognition.” Computers and Electrical
Engineering, vol. 101, pp. 1–15, 2022.
78. N. Sogi, L.S. Souza, B.B. Gatto, and K. Fukui. “Metric Lei Gao is currently with the Department of Electri-
learning with a-based scalar product for image-set recog- cal, Computer, and Biomedical Engineering, Toronto
nition.” IEEE/CVF CVPR Workshops, pp. 850–851, 2020.
Metropolitan University. His research interests include
79. V. Kurkova. “Kolmogorov’s theorem is relevant.” Neural
computation, vol. 3, no. 4, pp. 617–622, 1991. multimedia computing, multimodal information fusion,
80. T. Adali, R.C. Guido, T.K. Ho, K. Müller, and S. Strother. machine learning, and patter recognition. Lei Gao
“Interpretability, Reproducibility, and Replicability.” IEEE
received his Ph.D. degree in the Ryerson University.
Signal Processing Magazine, vol. 39, no. 4, pp. 5–7,
2022. He is a member of IEEE and a member of ACM.
81. Q. Teng, Z. Liu, Y. Song, K. Han, and Y. Lu. “A survey on Contact him at iegaolei@gmail.com.
the interpretability of deep learning in medical diagnosis.”
Multimedia Systems, vol. 28, pp. 2335-2355, 2022.
82. S. Cai, Z. Mao, Z. Wang, M. Yin, and G.E. Karniadakis. Ling Guan is currently a Professor with the Depart-
“Physics-informed neural networks (PINNs) for fluid me- ment of Electrical, Computer, and Biomedical Engi-
chanics: A review.” Acta Mechanica Sinica, vol. 37, no.
neering, Toronto Metropolitan University. He has pub-
12, pp. 1727–1738, 2021.
83. E. Kharazmi, Z. Zhang, and G.E. Karniadakis. “hp- lished extensively in multimedia processing and com-
VPINNs: Variational physics-informed neural networks munications, human centered computing, machine
with domain decomposition.” Computer Methods in Ap-
learning, image and signal processing, and multime-
plied Mechanics and Engineering, vol. 425, pp. 1–24,
2021. dia computing in the immersive environment. Ling
84. M. Raissi, P. Perdikaris, and G.E. Karniadakis.“Physics- Guan received the Ph.D. degree from the University of
informed neural networks: A deep learning framework for British Columbia. He is a Fellow of IEEE, an Elected
solving forward and inverse problems involving nonlinear
partial differential equations.” Journal of Computational Member of the Canadian Academy of Engineering.
physics, vol. 378, pp. 686–707, 2019. Contact him at lguan@ee.torontomu.ca.
85. G. Pang, L. Lu, and G.E. Karniadakis. “fPINNs: Frac-
tional physics-informed neural networks.” SIAM Journal
on Scientific Computing, vol. 41, no. 4, pp. A2603–
A2626, 2019.
86. S. Cuomo, V.S.D. Cola, F. Giampaolo, G. Rozza,
M. Raissi, and F. Piccialli.“Scientific machine learning

12 IEEE Multimedia

You might also like