You are on page 1of 13

6772 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL.

30, 2021

Joint Learning of Latent Similarity and Local


Embedding for Multi-View Clustering
Aiping Huang , Member, IEEE, Weiling Chen , Member, IEEE, Tiesong Zhao , Senior Member, IEEE,
and Chang Wen Chen , Fellow, IEEE

Abstract— Spectral clustering has been an attractive topic


in the field of computer vision due to the extensive growth
of applications, such as image segmentation, clustering and
representation. In this problem, the construction of the simi-
larity matrix is a vital element affecting clustering performance.
In this paper, we propose a multi-view joint learning (MVJL)
framework to achieve both a reliable similarity matrix and
a latent low-dimensional embedding. Specifically, the similarity
matrix to be learned is represented as a convex hull of similarity
matrices from different views, where the nuclear norm is imposed
to capture the principal information of multiple views and
improve robustness against noise/outliers. Moreover, an effective
low-dimensional representation is obtained by applying local
embedding on the similarity matrix, which preserves the local
intrinsic structure of data through dimensionality reduction.
With these techniques, we formulate the MVJL as a joint
optimization problem and derive its mathematical solution with
the alternating direction method of multipliers strategy and
the proximal gradient descent method. The solution, which Fig. 1. Application examples of multi-view clustering in computer vision.
consists of a similarity matrix and a low-dimensional rep-
resentation, is ultimately integrated with spectral clustering
or K-means for multi-view clustering. Extensive experimental
results on real-world datasets demonstrate that MVJL achieves perspectives [1]–[3], and these data are called multi-view
superior clustering performance over other state-of-the-art data. Relative to traditional single-view clustering methods,
methods. multi-view ones access more features and structural informa-
Index Terms— Multi-view clustering, joint learning, feature tion hidden in multi-view data, and have been widely used in
representation, local embedding, nuclear norm. computer vision [4], [5]. As illustrated in Fig. 1, a multi-view
clustering method was utilized to estimate pseudo labels of
unlabeled training data for person re-identification [6], [7];
I. I NTRODUCTION
An efficient and robust image co-segmentation algorithm

R ECENTLY, most visual data have multiple modalities or


are represented by heterogeneous features from different
was designed by coupling with multi-view clustering [8];
More accurate image annotation and retrieval have benefited
from multi-view clustering methods [9], [10]. In addition,
Manuscript received October 8, 2020; revised April 14, 2021; accepted
June 27, 2021. Date of publication July 26, 2021; date of current version a multi-view clustering method based on bipartite match-
July 30, 2021. This work was supported in part by the National Natural ing has been used for video summarization [11]. The past
Science Foundation of China under Grant 61901119 and Grant 62001116 and decade has witnessed a boom in multi-view clustering meth-
in part by the Natural Science Foundation of Fujian Province under
Grant 2019J01222 and Grant 2020J01466. The associate editor coordinating ods based on subspaces [12], [13], matrix factorization [14],
the review of this manuscript and approving it for publication was Prof. multi-kernel [15] and spectra [16].
Kui Jia. (Corresponding author: Tiesong Zhao.) Among them, multi-view spectral clustering demonstrates
Aiping Huang and Weiling Chen are with the Fujian Key Laboratory
for Intelligent Processing and Wireless Transmission of Media Information, promising performance. It contains two main steps: similarity
College of Physics and Information Engineering, Fuzhou University, Fuzhou matrix construction and clustering implementation. The first
350108, China (e-mail: sxxhap@163.com; weiling.chen@fzu.edu.cn). step is of utmost importance because the performance of the
Tiesong Zhao is with the Fujian Key Laboratory for Intelligent Processing
and Wireless Transmission of Media Information, College of Physics and spectral clustering algorithm is highly dependent on the quality
Information Engineering, Fuzhou University, Fuzhou 350108, China, and of the similarity matrix. Although existing similarity matrix
also with the Peng Cheng Laboratory, Shenzhen 518055, China (e-mail: construction methods have improved multi-view spectral clus-
t.zhao@fzu.edu.cn).
Chang Wen Chen was with the Department of Computer Science and tering performance, most of them suffer from the following
Engineering, University at Buffalo, The State University of New York, two problems. First, the reliability of their similarity matrices
Buffalo, NY 14228 USA. He is now with the Department of Computing, may be impaired by noise and outliers embedded in raw
The Hong Kong Polytechnic University, Hong Kong (e-mail: chencw@
buffalo.edu). data; thus, the ultimate clustering performance can be affected.
Digital Object Identifier 10.1109/TIP.2021.3096086 Second, they may fail to exploit the local intrinsic structure
1941-0042 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: JOINT LEARNING OF LATENT SIMILARITY 6773

Fig. 2. Framework for proposed method. We first extract feature matrices of datasets from m views and use them to construct similarity matrices. With the
local embedding technique and nuclear norm constraint, the fused similarity matrix W and latent low-dimensional representation Y are generated by MVJL,
and corresponding two strategies MVJL-S and MVJL-K are designed for the final clustering results.

embedded in similarity matrices, which makes them difficult II. R ELATED W ORK
to learn an effective feature representation for multi-view
Thanks to ubiquitous multi-view data, multi-view cluster-
clustering.
ing [17]–[20] has been extensively investigated during the past
In this paper, we propose an MVJL method that integrates
decade. It improves clustering performance by capturing the
latent similarity learning and local embedding technique into
rich information of multiple cues. Many existing multi-view
a unified framework. As demonstrated in Fig. 2, we extract
clustering methods are derived from spectral clustering due
multi-view features, construct a similarity matrix for each
to their potential performance and well-defined mathematical
view, and optimize an objective function to learn a reliable
framework. According to the construction of the similarity
similarity matrix and an effective low-dimensional represen-
matrix, the existing spectral-based multi-view clustering meth-
tation. Both the similarity matrix and the low-dimensional
ods can be mainly divided into two categories.
representation can characterize the underlying clustering struc-
The first type employs tools such as Gaussian kernels to
ture of multi-view data. By integrating them with spectral
construct similarity matrices of different views, and then fuses
clustering or K-means, two clustering strategies, MVJL-S
them to learn a shared similarity matrix of all views. For
and MVJL-K, are designed for the final results. The major
example, early work [21] proposed a co-training multi-view
contributions of this paper are summarized as follows:
spectral clustering method in which the eigenvectors from one
• The fused similarity matrix based on nuclear norm. The view are adopted as predictor functions to update the graph of
similarity matrix is iteratively updated until convergence. the other view. In [22], the similarities of eigenvectors learned
With the nuclear norm constraint, the MVJL model can from different views were further enhanced by enforcing these
capture the principle information of multiple views to eigenvectors towards a specific common consensus. Recent
enhance its robustness to noise. work [23] minimized disagreement between different views
• Local embedding for low-dimensional representation. By and constrained the rank of the Laplacian matrix to learn a
searching an effective low-dimensional representation, consensus similarity graph. [24] integrated graph completion
the MVJL model leverages the local similarity infor- and consensus representation learning into a joint framework
mation of samples to reveal the underlying clustering for incomplete multi-view clustering.
structure. The second type is self-representation-based subspace learn-
• The MVJL-S and MVJL-K strategies for multi-view clus- ing methods, which utilize source data as a dictionary to learn
tering. The MVJL model learns a latent similarity matrix a self-representation matrix for the subsequent construction
and a low-dimensional embedding to further design two of similarity matrix. For example, [25] proposed a latent
multi-view clustering strategies: MVJL-S and MVJL-K. multi-view subspace clustering model that jointly learns an
Experimental results validate their superior performance underlying representation and performs data reconstruction
over other popular methods. within a unified framework. This method was further general-
The rest of this paper is arranged as follows. Section II ized in [26] by combining it with a deep neural network. [27]
reviews related works to this paper. The proposed frame- adopted the Hilbert Schmidt Independence Criterion (HSIC)
work and corresponding iterative optimization algorithm are and intact space learning to construct an intactness-aware
developed in Section III. Section IV presents the experimental similarity matrix. [15] automatically assigned ideal weight for
results and analysis. Conclusions are drawn in Section V. each view to learn similarity information without additional

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
6774 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

parameters. [28] proposed a unified model to jointly learn the TABLE I


kernel representation tensor and affinity matrix. S UMMARY OF THE K EY N OTATIONS U SED IN THE PAPER
Although the above similarity matrix construction methods
have improved clustering performance, most of them suffer
from the impact of noise and outliers embedded in raw
data, or ignore the local manifold structure of the similarity
matrix, which affects further improvement of the ultimate
performance. To this end, an increasing number of multi-view
spectral clustering approaches coupled with manifold learning
have been proposed. For example, [29] presented a non-
negative matrix factorization framework, where consensus
manifold regularization was incorporated to maintain the local
geometrical structure of all views. [30] utilized adaptive neigh-
borhoods to perform multi-view clustering/classification and
local manifold structure learning simultaneously. Inspired by
this, [31] employed Laplacian embedding on a pre-learned
data representation to preserve the local manifold structure for
a representation of data, and constructing a reliable similarity
obtaining a shared similarity matrix for spectral clustering.
matrix is beneficial for multi-view clustering. On the other
Recently, low-rank matrix learning has been widely stud-
hand, the local structure information in sample space is more
ied [32]–[34] and benefits many multi-view learning methods.
capable of reflecting the relationships among samples. Towards
For example, [35] utilized a low-rank decomposition to learn a
this end, we employed low-rank constraint and manifold learn-
shared consensus similarity matrix. [36] performed symmetric
ing to formulate an MVJL framework for a reliable similarity
matrix factorization with a low-rank constraint to characterize
matrix and a latent low-dimensional embedding. The details of
the data underlying clustering structure. In [37], the low-rank
the framework are elaborated in this section. Table I presents
property was encoded by the Tucker decomposition to capture
key notations used in the paper.
high-order correlations among views. In addition, low-rank
matrix learning can also be employed for noise/outlier process-
ing. For example, [38] developed a cross-view low-rank A. Problem Formulation
analysis model for outlier detection. In [39], a low-rank For a given multi-view dataset {X(i) }m i=1 with m views and
constraint was incorporated into marginalized denoising to X(i) ∈ Rdi ×n . MVJL aims to capture a shared low-dimensional
boost the model robustness of its proposed model. [13] utilized embedding Y = [y1 , · · · , yn ] ∈ Rk×n with k  di (i =
a low-rank constraint to guarantee robustness to outliers and 1, 2, · · · , m) by preserving sufficient local structure informa-
sample-specific corruptions. tion of samples, and simultaneously learn a fused similarity
In this work, we integrate low-rank constraint and manifold matrix W = [Wi j ]n×n for characterizing the similarities
learning into a joint learning model for a reliable similarity among samples, which can be abstracted in the following
matrix and a latent low-dimensional representation. Different canonical form:
from the above low-rank-constraint-based methods, we employ  2
1  
m
 
the nuclear norm as a low-rank constraint to learn a reliable min W − αi W(i)  + γ W∗
similarity matrix, so as to capture the principal information α,W,Y 2  
i=1 F
from multi-view data and enhance its robustness against
β 
n n
noise/outliers. The proposed model also differs from the exist- + yi − y j 22 Wi j
ing spectral-based multi-view clustering methods in the way 4
j =1 i=1
to addresses the local structure embedded in multi-view data. T
s.t. α ≥ 0, α 1 = 1, (Y) = 0 (1)
We employ local embedding on the shared similarity matrix
to learn a low-dimensional representation, thereby excavating where W(i) ∈ Rn×n is the similarity matrix obtained by X(i)
the local manifold structure embedded in multi-view data by the Gaussian kernel, and α = [α1 , · · · , αm ]T is a weighted
for performance improvement. Both the similarity matrix and vector to balance the significance of different views’ similarity
the low-dimensional representation characterize the cluster- matrices. (Y) = 0 is a certain constraint imposed on Y, and
ing structure of multi-view data. Therefore, two strategies, β, γ are two nonnegative hyperparameters.
MVJL-S and MVJL-K, are designed for the final cluster- In Equation (1), the first term ensures learning a shared
ing results by integrating them with spectral clustering or similarity matrix for different views with a minimal fitting
K-means. To the best of our knowledge, MVJL is an early error. By incorporating the idea of principal component analy-
attempt for multi-view learning that can produce two different sis, the second term adopts the nuclear norm to enforce the
clustering strategies by joint learning. shared similarity matrix to be of low-rank. This not only
facilitates learning a compact form to compress the important
III. M ULTI -V IEW J OINT L EARNING FOR C LUSTERING information into a small proportion of components, but also
Real-world datasets contain various representations in mul- benefits to improve robustness to noise and outliers. The last
tiple views. On one hand, similarity matrix can be regarded as term aims to preserve sufficient local structure information of

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: JOINT LEARNING OF LATENT SIMILARITY 6775

samples to learn a latent low-dimensional representation. The An intermediate variable θ is introduced to separate Z and α,
motivation behind this term is to preserve the inter-sample and Equation (5) can be represented as the form of
similarity of original data after projection to a low-dimensional 1
subspace. Therefore, the similarity matrix W is embedded min vec(W) − Zα2F s.t. α = θ, α ≥ 0, α T 1 = 1. (6)
α,θ 2
as a penalty factor for the sample pairs violating this rule.
The weighted vector α can be regarded as a regularization This is equivalent to minimizing the following ADMM
parameter for the multi-objective optimization problem, where problem:
α ≥ 0 and α T 1 = 1. 1
As we know Lρ (α, θ, y) = vec(W) − Zθ 2F + y T (α − θ )
2

n 
n ρ
+ α − θ 2F s.t. α ≥ 0, α T 1 = 1, (7)
yi − y j 22 Wi j 2
i=1 j =1 which can be solved by the iterative updating rules:
 n  n
= (yiT yi − 2yiT y j + yTj y j )Wi j α (t +1) = arg min Lρ (α, θ (t ) , y(t ) ), (8)
α
i=1 j =1
θ (t +1) = arg min Lρ (α (t +1) , θ, y(t ) ), (9)
= 2T r (YDYT ) − 2T r (YWYT ) = 2T r (YLW YT ) (2) θ
y(t +1) = y(t ) + ρ(α (t +1) − θ (t +1)), (10)
where LW = D − W is the Laplacian matrix and D

n where t denotes the number of iterations.
is a diagonal matrix with Dii = Wi j . To ensure the
j =1
Eliminating constant terms of Lρ (α, θ, y), the optimization
existence of the regularized solution for minimizing Equa- problem with respect to α is written as
tion (2), the constraint YDYT = I is imposed on Y to ρ
remove an arbitrary scaling factor in the embedding. Denote min yT (α − θ ) + α − θ 2F s.t. α ≥ 0, α T 1 = 1. (11)
α 2

n
di ×n
By specific algebra formulation, it is further simplified as
X= [X(1); X(2); · · · ∈R ; X(m)]
. The low-dimensional
i=1
 2
representation Y can be represented by a linear projection ρ y

n
min α−θ +  s.t. α ≥ 0, α T 1 = 1, (12)
di ×k α 2 ρ F
P ∈ Ri=1 with Y = P T X. Consequently, the MVJL
problem is further represented as which can be solved with a closed-form solution by the method
 2 proposed in [40].
1  
m
 (i)  β The optimization problem of updating θ is formulated as
min W − αi W  + T r (P T XLW XT P)
α,W,P 2   2 1 ρ
i=1 F min vec(W) − Zθ 2F + y T (α − θ ) + α − θ 2F . (13)
θ 2 2
+ γ W∗ s.t. α ≥ 0, α T 1 = 1, P T XDXT P = I. (3)
This problem is unconstrained and the optimal solution is
∂ L (α,θ,y)
attained at ρ ∂θ = 0. Taking the derivative of Lρ (α, θ, y)
B. Alternating Optimization Algorithm
with respect to θ , we have
According to the objective function in Equation (3),
∂Lρ (α, θ, y)
we simultaneously seek a latent low-dimensional represen- = −ZT vec(W)+ZT Zθ −y + ρ(θ − α). (14)
tation and a fused similarity matrix for multi-view data. ∂θ
Since this function is not jointly convex for all the variables, ∂ Lρ (α,θ,y)
Set ∂θ = 0. Then, the optimal solution is
we develop an efficient alternating optimization algorithm that
separates the multi-variable optimization problem into several θ = (ZT Z + ρI)−1 (ZT vec(W) + y + ρα). (15)
solvable subproblems.
2) Updating P With Fixed α and W: The subproblem of
1) Updating α With Fixed W and P: The subproblem of
updating P aims to solve the minimization problem
updating α is equivalent to
 2 arg min T r (P T XLW XT P) s.t. P T XDXT P = I. (16)
1  
m P
 (i) 
min W − αi W  s.t. α ≥ 0, α T 1 = 1. (4)
α 2  Using the Lagrangian multiplier method, λ ∈ Rk×k is
i=1 F
introduced to construct a Lagrangian function as
Denote vec(W) ∈ Rnn×1 as the vectorization of W. Then
the subproblem above is equivalently transformed as L(P, λ) = T r (P T XLW XT P) − T r (λT (P T XDXT P − I)).
(17)
1
min vec(W) − Zα2F s.t. α ≥ 0, α T 1 = 1, (5) As we know, its optimal value is attained at ∂ L∂P
(P,λ)
= 0
α 2
∂ L(P,λ)
and ∂λ = 0. Thus, we obtain
where Z = [vec(W(1) ), . . . , vec(W(m) )] ∈ Rnn×m .
We adopt the alternating direction method of multipli- ∂L(P, λ)
= 2XLW XT P − XDXT P(λ + λT ) = 0 (18)
ers (ADMM) strategy to settle the above optimization problem. ∂P

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
6776 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

and Algorithm 1 Algorithm for MVJL


∂L(P, λ)
= P T XDXT P − I = 0. (19)
∂λ
Based on Karush-Kuhn-Tucher (KKT) conditions, we know
that λ = [λi j ]k×k is a nonnegative diagonal matrix. Therefore,
XLW XT P = XDXT Pλ (20)
implies that for any column pi (1 ≤ i ≤ k) of P,
XLW XT pi = λii XDXT pi (21)
holds. This illustrates that pi is a generalized eigenvector of
matrices XLW XT and XDXT , and λii is the corresponding
generalized eigenvalue.
3) Updating W With Fixed α and P: Directly solving W is
a tough task, as it is restricted by a non-differentiable nuclear
norm. To this end, the proximal gradient descent method
is employed to update W. Denote the objective function
J (W) = f (W) + g(W), where a differentiable component
 2
1  
m
  β
f (W) = W − αi W(i)  + T r (P T XLW XT P)
2   2
i=1 F
and a non-differentiable component g(W) = γ ||W|||∗. The
proximal gradient descent method reads the iterative rule as
1
W(t +1) = Proxg (W(t ) − ∇ f (W(t ))),
1/L
(22)
L
where L is a constant. As we know, Proxg (W(t ) −
1/L
(t ) The complexity of computing P is O(d 3 + nd 2 ), where
L ∇ f (W )) is equal to the value of
1
m
 2 d = di . Because the number m of views is much lower
1  (t ) 1 
(t )  γ
arg min W − (W − ∇ f (W )) + ||W||∗ , (23) i=1
than d and n, the overall computational cost of MVJL is
W 2 L F L
O(n 3 + nd 2 + d 3 ).
which has a closed form solution, read as
W(t +1) = U γ /L V
T
, (24) IV. E XPERIMENTS
where γ /L is a soft thresholding, i.e., γ /L = max In this section, comprehensive experiments on real-world
{0, − γL I}. Herein, U VT is a singular value decomposition datasets are conducted to evaluate the performance of the
of W(t ) − L1 ∇ f (W(t ) ) with proposed methods and comparison algorithms.

m
β
[∇ f (W(t ))]i j = [W(t ) − αi W(i) ]i j + ||yi −y j ||22 . (25) A. Test Datasets
4
i=1
Eight publicly available multi-view benchmarks are
Thus, all variables have been updated. employed to provide a fair testing platform for all compared
Summarizing the aforementioned analysis of optimal solu- methods.
tions to all subproblems, the algorithm for MVJL is presented ALOI1 is a collection of color images with small objects
in Algorithm 1. Once the optimal W and Y are available, that were taken at various viewing angles, illumination direc-
the subsequent clustering strategies MVJL-S and MVJL-K, tions, illumination colors, and object orientations. For each
which respectively integrate with spectral clustering or image, RGB color histograms, HSV color histograms, color
K-means, are conducted for the final results. similarity and Haralick texture features are extracted as four
It is difficult to generally prove the convergence for our views.
algorithm. Fortunately, most sub-optimization problems have Caltech1012 is a database containing 101 categories of
closed-form solutions in each iteration, and empirical evidence images. Following [41], twenty classes totaling 2,386 samples
on real-world data suggests that the proposed algorithm has are selected to evaluate the performance of our algorithm. Six
stable convergence behavior, as shown in Section IV-F2. different features are extracted as views: 48-D Gabor features,
The computational complexity of the proposed MVJL 40-D wavelet moments features, 254-D CENTRIST features,
method relies on those of updating optimization vari-
ables α, W and P. To update α, the complexity is O(mn 2 ). 1 http://aloi.science.uva.nl/
The W-subproblem leads to a complexity of O(n 3 ). 2 http://www.vision.caltech.edu/Image Datasets/Caltech101/

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: JOINT LEARNING OF LATENT SIMILARITY 6777

TABLE II
A B RIEF C HARACTERIZATION OF A LL T ESTED D ATA S ETS

Fig. 3. Sample images from ALOI, Caltech101, NUS-WIDE and MSRC-v1 datasets.

1984-D HOG features, 512-D GIST features, and 928-D local CMT with 24 dimensions, HOG with 576 dimensions, GIST
binary pattern (LBP) features. with 512 dimensions, CENTRIST with 254 dimensions, and
HW3 consists of handwritten numerals from 0 to 9 digit LBP with 256 dimensions.
classes with a total of 2,000 patterns. In the dataset, six public ORL7 is a face image dataset containing 40 distinct sub-
features are available. They are 76-D Fourier coefficients of jects. Each subject has 10 different face images taken under
the character shapes (FOU), 216-D profile correlations (FAC), different lighting conditions, facial expressions and facial
64-D Karhunen-love coefficients (KAR), 240-D pixel averages details (such as glasses or no glasses).
in 2 × 3 windows (PIX), 47-D Zernike moment (ZER) and YouTube8 is a video dataset, from which 2,000 samples are
6-D morphological (MOR) features. selected. Each sample is described by six views consisting of
MNIST4 is a handwritten digit collection from which audio features (mfcc, volume stream, and spectrogram stream)
10,000 samples are selected for testing, along with three views and visual features (cuboid histogram, hist motion estimate and
of features produced by IsoProjection with 30 dimensions, lin- hog features).
ear discriminant analysis with 9 dimensions and neighborhood The important statistics of the aforementioned datasets are
preserving embedding with 30 dimensions. summarized in Table II and several sample images from the
NUS-WIDE5 is a web image dataset. We select a subset ALOI, Caltech101, NUS-WIDE and MSRC-v1 datasets are
containing 1,600 images of 8 categories in our experiments. shown in Fig. 3.
Six available features for each image are provided: 64-D
color histogram, 144-D color correlogram, 75-D edge direction
histogram, 128-D wavelet texture, 225-D blockwise color B. Comparison Algorithms and Parameter Settings
moments, and 500-D bag-of-words. To evaluate the effectiveness of the proposed algorithms,
MSRC-v16 dataset contains 210 images in 7 classes com- we compare them with the following seven state-of-the-art
posed of trees, buildings, airplanes, cows, faces, cars and clustering approaches.
bicycles. We extract five visual features from each image: K-Means is a classical clustering algorithm that serves
as a benchmark of various unsupervised learning methods.
3 http://archive.ics.uci.edu/ml/datasets/Multiple+Features
4 http://yann.lecun.com/exdb/mnist/ 7 http://www.uk.research.att.com/facedatabase.html
5 https://lms.comp.nus.edu.sg/research/NUS-WIDE.htm 8 http://archive.ics.uci.edu/ml/datasets/YouTube+Multiview+Video+
6 https://www.microsoft.com/en-us/research/project/image-understanding/ Games+Dataset

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
6778 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

It is inclined to cluster data into spherical distributions and d = 500 and nearest neighbor number k = 3. For MvSCN,
is sensitive to initial values. the balance parameter λ and neighborhood size k are fixed as 1
MVC [42] proposed two kinds of unsupervised learning and 14, respectively. In addition to the specific parameters, all
schemes (multi-view EM algorithm and agglomerative algo- comparison algorithms need to know the number of clusters
rithm) for text data with a co-training idea, and proved the in advance, whereas MVC and K-means do not face the
superiority against their single-view counterparts in the context parameter selection issue. Moreover, since MVC can deal with
of web categorization. only two views, we take the first two views of each test
MVCC [43] attempted to explore a latent representation dataset as input. For K-means, we concatenate feature vectors
matrix of each view and then used them to derive a shared of different views together for the all-view clustering setting.
consensus representation for all views. This goal was reached
through concept decomposition with a local manifold structure
regularization. C. Evaluation Metrics
SwMC [44] is a totally self-weighted multi-view clustering The clustering performance is evaluated by comparing the
method. It leveraged multiple graphs to assign a reasonable obtained cluster labels with the ground-truth. Four well-known
weight for each view according to the importance of the view metrics including clustering accuracy (ACC) [48], normalized
and directly achieved final clustering results without further mutual information (NMI) [49], purity (Purity), and adjusted
postprocessing. rand index (ARI) [50] are employed for quantitative com-
MvDMF [45] is a deep matrix factorization-based parison. These four metrics favor different properties, which
multi-view clustering model. It introduced graph regularizers facilitates the comprehensive evaluation of clustering results.
to guide the consensus representation learning in the final layer For each metric, a higher value is preferable to indicate better
such that most shared geometric structures across multiple performance. For each experiment, the mean value along
graphs could be preserved. with the standard deviation are recorded by repeating each
MLAN [30] is a multi-view learning model with adaptive algorithm 30 times with random initializations.
neighbors. Implementing multi-view learning in tandem with
local structure embedding, the researchers obtained an opti-
mal graph with a certain number of connected components D. Experimental Results
corresponding to exact clustering assignments. In this subsection, we compare the proposed clustering
MVKSC [46] unified two or more kernel spectral cluster- strategies MVJL-K and MVJL-S with seven other algorithms
ing (KSC) models into a joint framework. Meanwhile, a cou- in terms of the aforementioned four widely used metrics.
pling term was added to exploit complementary information The detailed clustering results on eight datasets are reported
of multiple views to improve the clustering performance. in Tables III and IV, where the numbers in the parentheses
MSC-IAS [27] first utilized multi-view information to learn are the standard deviation. It should be noted that running
an intact space, constructed the intactness-aware similarity MvSCN on MSRC-v1 and ORL will encounter matrix inver-
matrix in the space by HSIC, and employed spectral clus- sion errors. Therefore, we report only the performance of ten
tering on the obtained similarity matrix to perform the final other methods on these two datasets. For ease of comparison,
clustering. we highlight the best results in red boldface and the second
MvSCN [47] encapsulated deep learning for multi-view best in blue boldface. From the two tables, the following
spectral clustering, where SiameseNet and QR decomposition observations can be made:
were respectively employed to design embedding networks First, multi-view methods are evidently more promising than
and an orthogonal layer. the single-view baseline, as evidenced by the fact that the
Several algorithmic parameters must be set in advance. maximum performance improvement provided by multi-view
For the proposed MVJL method, the embedded dimension algorithms can be achieved 54.3%. This suggests that utilizing
number is set as the number of clusters, the regularized various multi-view clustering models to explore the hidden
parameter β is taken as 1 and the weight coefficient γ for local complementary information contained in multi-view data can
embedding is fixed as 0.1. Regarding all multi-view clustering effectively improve clustering performance.
algorithms for comparison, we implement them through the Second, the compared algorithms show respective merits in
source codes provided by the authors and adopt their default various scenarios. MVKSC does not perform well on most
settings if feasible. For MVCC, we set the regularization datasets, which may be caused by its requirement of different
parameter α = 100 and two trade-off coefficients β = 100 parameter settings in different datasets. However, for a fair
and γ = 10. For SwMC, we set the penalty coefficient λ for comparison, we set all parameters as a uniform fixed value in
the sum of the c smallest eigenvalues to 1. Regarding MVKSC, this work because of the addition of new test datasets. MVC is
the regularization parameters and kernel parameters are set as designed especially for document clustering and different EM
1 and 0.1, respectively. For MLAN, the number of adaptive applications in the model differ in specific tasks. For image
neighbors and the maximal number of iterations are fixed as clustering, MVC can also achieve acceptable performance.
9 and 30, respectively. Regarding MvDMF, the parameter γ MVCC provides a flexible fusion strategy. Its clustering
is set as 0.5, β is fixed to 0.01 and the number of nearest performance is competitive with MVC. MSC-IAS is a sub-
neighbors k is taken as 5. For MSC-IAS, its parameters space clustering model that can avoid information loss caused
are fixed as follows: λ2 = 0.1, dimension of intact space by insufficient views. By recovering an intact space from

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: JOINT LEARNING OF LATENT SIMILARITY 6779

TABLE III
A C OMPARISON OF ACC AND NMI OF D IFFERENT M ULTI -V IEW C LUSTERING A LGORITHMS . T HE B EST R ESULTS A RE H IGHLIGHTED IN R ED B OLDFACE
AND THE S ECOND B EST A RE M ARKED W ITH B LUE B OLDFACE . (H IGHER M EANS B ETTER )

TABLE IV
A C OMPARISON OF P URITY AND ARI OF D IFFERENT M ULTI -V IEW C LUSTERING A LGORITHMS . T HE B EST R ESULTS A RE H IGHLIGHTED IN R ED
B OLDFACE AND THE S ECOND B EST A RE M ARKED W ITH B LUE B OLDFACE . (H IGHER M EANS B ETTER )

multi-view data, MSC-IAS improves the clustering perfor- the weight for each view without additional parameters.
mance compared with MVC and MVCC. MLAN and SwMC SwMC achieves good performance on Caltech101, HW and
are parameter-free methods. They can automatically allocate MSRC-v1. However, its performance degrades severely on

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
6780 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

Fig. 4. Experimental visualizations of the learned similarity matrix in the four sampled datasets.

ALOI, NUS-WIDE and YouTube. In general, MLAN performs Fig. 5 presents an intuitive scatter diagram to illustrate
promisingly on most datasets but is not as robust as SwMC. the clustering performance of all algorithms on a subset
MvDMF and MvSCN are deep-learning-based methods. Due of MNIST with 2,000 samples. First, we stack the feature
to the lack of large numbers of well-defined training data, vectors of 3 views together, and employ t-distributed stochastic
these methods cannot embody their highlighted advantages in neighbor embedding (t-SNE) [51] to project the original
multi-view clustering on the test datasets. high-dimensional data onto a 2-D space. Then, the mapped
Third, the two proposed clustering strategies MVJL-K and 2-D data are colored with the cluster labels obtained by
MVJL-S have unique characteristics. For example, the ARI different methods and ground truth. Evidently, a better per-
performance improvement of MVJL-S over MVJL-K can formance should be closer to the ground truth. In this sense,
achieve up to 7.4% on HW, while MVJL-K improves ACC the visualization effects coincide with the clustering results
performance by 6.7% compared with MVJL-S on Caltech101. shown in Tables III and IV. The proposed methods can
This demonstrates the respective merits of the low-rank struc- effectively group similar samples into the same clusters,
ture and low-dimensional representation in different scenarios. which further validates the effectiveness and superiority of our
Certainly, there are some cases in which our methods do model.
not perform well. For example, the images in NUS-WIDE
cannot be well clustered by MVJL-S. This may be attributed F. Model Discussions
to the challenge of the dataset or the extracted features 1) Runtime Analysis: In Table V, we present the running
being unfavorable to the estimation of similarity matrices. time of the state-of-the-art methods on all test datasets.
Fortunately, the employment of local embedding can well Due to the matrix inversion exceptions of running MvSCN
preserve the points’ similarity relationship and maximize the on MSRC-v1 and ORL, the running times of MvSCN
smoothness with respect to the intrinsic manifold structure on these two datasets are not recorded. From the table,
of the dataset in low embedding subspace. This can alle- the single-view method (i.e., K-means) often performs more
viate the unreliability of similarity estimation to an extent, efficiently than multi-view methods. The reason is clear:
which promotes MVJL-K to achieve the best performance K-means simply concatenates feature vectors of different
on NUS-WIDE. Overall, the proposed MVJL-S and MVJL-K views together for all-view clustering. By contrast, multi-view
outperform the compared algorithms in terms of performance clustering methods need to address the multi-view data
and robustness, which indicates that MVJL is a promis- fusion problem. Among the multi-view methods, MVC is
ing method for multi-view clustering. The superiority of almost the fastest, while MvDMF consumes the longest
MVJL is that the multi-view clustering performance can be time on most datasets. Our MVJL-K and MVJL-S perform
effectively improved by utilizing the interactions of learning stably and exhibit a good running speed compared with
an optimal low-rank structure, assigning a reasonable view the competitors. Considering its superior clustering perfor-
weight and finding a low-dimensional representation in a joint mance, its time consumption is acceptable for real-world
framework. applications.
2) Convergence Analysis: To validate the convergence prop-
E. Visualization Analysis erty of our method, we compute the objective function values
Fig. 4 presents the visualizations of the fused similarity on four test datasets, respectively. The corresponding curves
matrix on four datasets. Herein, we randomly select a subset are presented in Fig. 6. From the figure, the objective function
containing 2,000 samples from MNIST for better visualization. values decrease with increasing iteration number and then
Evidently, more values in yellow colors in the diagonal blocks achieve convergence. Although our method requires tens of
means a better similarity matrix. In this sense, the similarity iterations to converge, its computational complexity is not
matrices learned on HW, MNIST and MSRC-v1 outperform high due to its simple model instead of a deep network. This
those learned on NUS-WIDE. The possible reasons that the suggests that the proposed algorithm has stable convergence
outline of the similarity matrix is not obvious on NUS-WIDE behavior.
lie in the challenge of the dataset itself or the extracted features Fig. 7 shows the evolution of view weights on four test
that are unfavorable to the estimation of similarity matrices. datasets. At the beginning, we treat each view equally and

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: JOINT LEARNING OF LATENT SIMILARITY 6781

Fig. 5. Experimental visualizations of different multi-view clustering methods on a subset of MNIST with 2,000 samples.

TABLE V
A C OMPARISON OF T IME C ONSUMPTION FOR A LL A LGORITHMS ( IN S ECONDS )

Fig. 6. Convergence curves of MVJL on four test datasets.

initialize αi = m1 , where αi is the i -th view weight and m is 3) Parameter Sensitivity: To investigate the variations of
the number of views. From Fig. 7, the weight curves rapidly MVJL under different settings, parameter sensitivity analysis
converge after a limited number of iterations, which also is conducted in this subsection, as shown in Figs. 8 and 9.
indicates the stable convergence of the proposed algorithm. The accuracy variation curves of MVJL-K and MVJL-S are

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
6782 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

Fig. 7. The evolution of view weights on four test datasets.

Fig. 8. Performance of MVJL with varied regularization coefficient β ranging from {1 × 10−5 , 1 × 10−4 , · · · , 1 × 104 }.

Fig. 9. Performance of MVJL with varied regularization coefficient γ ranging from {1 × 10−5 , 1 × 10−4 , · · · , 1 × 102 }.

marked with the solid and dotted lines, respectively. In MVJL, the optimal parameters on each test dataset, and then select
there are two parameters β and γ to be tuned. In the parameter an appropriate uniform value for each parameter to execute
sensitivity check, we employ a grid searching strategy to find multi-view clustering on all datasets.

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
HUANG et al.: JOINT LEARNING OF LATENT SIMILARITY 6783

Fig. 8 explicitly shows the performance variation curves [9] Y. Gong, Q. Ke, M. Isard, and S. Lazebnik, “A multi-view embedding
on eight datasets, where the regularized parameter β ranges space for modeling internet images, tags, and their semantics,” Int. J.
Comput. Vis., vol. 106, no. 2, pp. 210–233, 2014.
from {1 × 10−5 , 1 × 10−4 , · · · , 1 × 104 }. The best clustering [10] M. Chi, P. Zhang, Y. Zhao, R. Feng, and X. Xue, “Web image retrieval
results of MVJL are attained when β ∈ [10−5 , 1] in most reranking with multi-view clustering,” in Proc. 18th Int. Conf. World
cases. Specifically, MVJL achieves acceptable performance Wide Web (WWW), 2009, pp. 1189–1190.
[11] Y. Fu, Y. Guo, Y. Zhu, F. Liu, C. Song, and Z.-H. Zhou, “Multi-
on almost all datasets when β is 1. By contrast, the perfor- view video summarization,” IEEE Trans. Multimedia, vol. 12, no. 7,
mance of MVJL-K decreases sharply when β is 10, which pp. 717–729, Nov. 2010.
occurs because this thresholding value leads to inaccurate [12] Z. Kang et al., “Partition level multiview subspace clustering,” Neural
Netw., vol. 122, pp. 279–288, Feb. 2020.
generalized eigenvalues and their corresponding eigenvectors. [13] W. Zhu, J. Lu, and J. Zhou, “Structured general and specific multi-view
Furthermore, MVJL-S performs more robustly than MVJL-K. subspace clustering,” Pattern Recognit., vol. 93, pp. 392–403, Sep. 2019.
The potential reason may be that the K-Means method is very [14] Y. Yang, F. Shen, Z. Huang, H. T. Shen, and X. Li, “Discrete nonnegative
spectral clustering,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 9,
sensitive to initialization values. Overall, both MVJL-K and pp. 1834–1845, Sep. 2017.
MVJL-S achieve promising performance over a wide range. [15] S. Huang, Z. Kang, I. W. Tsang, and Z. Xu, “Auto-weighted multi-view
The parametric sensitivity of MVJL with respect to γ is clustering via kernelized graph learning,” Pattern Recognit., vol. 88,
analyzed in Fig. 9, where the weight coefficient γ for local pp. 174–184, Apr. 2019.
[16] Z. Hu, F. Nie, R. Wang, and X. Li, “Multi-view spectral clustering via
embedding ranges from {1×10−5 , 1×10−4 , · · · , 1×102 }. It is integrating nonnegative embedding and spectral embedding,” Inf. Fusion,
worth mentioning that a larger γ may lead to learning a trivial vol. 55, pp. 251–259, Mar. 2020.
low-rank representation solution. On the tested datasets, this [17] J. Wu, Z. Lin, and H. Zha, “Essential tensor learning for multi-view
spectral clustering,” IEEE Trans. Image Process., vol. 28, no. 12,
phenomenon occurs when γ exceeds 1 × 103 . Nevertheless, pp. 5910–5922, Dec. 2019.
both MVJL-K and MVJL-S are stable as the parameter γ [18] M.-M. Cheng, L. Jing, and M. K. Ng, “Tensor-based low-dimensional
differs in the selected wide range of values, which also representation learning for multi-view clustering,” IEEE Trans. Image
Process., vol. 28, no. 5, pp. 2399–2414, May 2019.
suggests the robustness of the proposed method. [19] Z. Yang, Q. Xu, W. Zhang, X. Cao, and Q. Huang, “Split multiplicative
multi-view subspace clustering,” IEEE Trans. Image Process., vol. 28,
no. 10, pp. 5147–5160, Oct. 2019.
V. C ONCLUSION [20] X. Peng, Z. Huang, J. Lv, H. Zhu, and J. T. Zhou, “Multi-view clustering
In this paper, a principled joint learning method named without parameter selection,” in Proc. 36th Int. Conf. Mach. Learn.,
2019, pp. 5092–5101.
MVJL was proposed for multi-view clustering. In MVJL, [21] A. Kumar and H. Daume, “A co-training approach for multi-view
a similarity matrix and a low-dimensional representation spectral clustering,” in Proc. 28th Int. Conf. Mach. Learn., vol. 2011,
shared by all views were jointly learned. This can effectively pp. 393–400.
[22] A. Kumar, P. Rai, and H. Daume, “Co-regularized multi-view spec-
enhance the robustness to noise and outliers as well as facil- tral clustering,” in Proc. Adv. Neural Inf. Process. Syst., 2011,
itate investigations into the underlying clustering structure of pp. 1413–1421.
multi-view data. Moreover, an effective alternating optimiza- [23] K. Zhan, F. Nie, J. Wang, and Y. Yang, “Multiview consensus graph
tion with two clustering strategies was proposed to ensure clustering,” IEEE Trans. Image Process., vol. 28, no. 3, pp. 1261–1270,
Mar. 2019.
a high-quality clustering performance. Extensive experiments [24] J. Wen et al., “Adaptive graph completion based incomplete multi-view
performed on eight datasets verified the clear superiority of clustering,” IEEE Trans. Multimedia, early access, Aug. 3, 2020, doi:
MVJL. Our model is able to flexibly investigate the comple- 10.1109/TMM.2020.3013408.
[25] C. Zhang, Q. Hu, H. Fu, P. Zhu, and X. Cao, “Latent multi-view
mentarity among multiple views for clustering. subspace clustering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jul. 2017, pp. 4279–4287.
[26] C. Zhang et al., “Generalized latent multi-view subspace clustering,”
R EFERENCES IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 1, pp. 86–99,
[1] G. Chao and S. Sun, “Semi-supervised multi-view maximum entropy Jan. 2020.
discrimination with expectation Laplacian regularization,” Inf. Fusion, [27] X. Wang, Z. Lei, X. Guo, C. Zhang, H. Shi, and S. Z. Li, “Multi-view
vol. 45, pp. 296–306, Jan. 2019. subspace clustering with intactness-aware similarity,” Pattern Recognit.,
[2] P. Zhu, Q. Hu, Q. Hu, C. Zhang, and Z. Feng, “Multi- vol. 88, pp. 50–63, Apr. 2019.
view label embedding,” Pattern Recognit., vol. 84, pp. 126–135, [28] Y. Chen, X. Xiao, and Y. Zhou, “Jointly learning kernel representation
Dec. 2018. tensor and affinity matrix for multi-view clustering,” IEEE Trans.
[3] S. Yang, L. Li, S. Wang, W. Zhang, Q. Huang, and Q. Tian, “Skele- Multimedia, vol. 22, no. 8, pp. 1985–1997, Aug. 2020.
tonNet: A hybrid network with a skeleton-embedding process for multi- [29] L. Zong, X. Zhang, L. Zhao, H. Yu, and Q. Zho, “Multi-view cluster-
view image representation learning,” IEEE Trans. Multimedia, vol. 21, ing via multi-manifold regularized non-negative matrix factorization,”
no. 11, pp. 2916–2929, Nov. 2019. Neural Netw., vol. 88, pp. 74–89, Apr. 2017.
[4] F. Fang, L. Li, H. Zhu, and J.-H. Lim, “Combining faster R-CNN and [30] F. Nie, G. Cai, J. Li, and X. Li, “Auto-weighted multi-view learning for
model-driven clustering for elongated object detection,” IEEE Trans. image clustering and semi-supervised classification,” IEEE Trans. Image
Image Process., vol. 29, pp. 2052–2065, 2020. Process., vol. 27, no. 3, pp. 1501–1511, Sep. 2017.
[5] L. Yang, C. Shen, Q. Hu, L. Jing, and Y. Li, “Adaptive sample-level [31] D. Xie, Q. Gao, Q. Wang, X. Zhang, and X. Gao, “Adaptive latent
graph combination for partial multiview clustering,” IEEE Trans. Image similarity learning for multi-view clustering,” Neural Netw., vol. 121,
Process., vol. 29, pp. 2780–2794, 2020. pp. 409–418, Jan. 2020.
[6] X. Xin, J. Wang, R. Xie, S. Zhou, W. Huang, and N. Zheng, “Semi- [32] R. Xia, Y. Pan, L. Du, and J. Yin, “Robust multi-view spectral clustering
supervised person re-identification using multi-view clustering,” Pattern via low-rank and sparse decomposition,” in Proc. 28th AAAI Conf. Artif.
Recognit., vol. 88, pp. 285–297, Apr. 2019. Intell. (AAAI), 2014, pp. 2149–2155.
[7] W. Li, X. Zhu, and S. Gong, “Person re-identification by deep joint [33] H. Xu, X. Zhang, W. Xia, Q. Gao, and X. Gao, “Low-rank tensor
learning of multi-loss classification,” in Proc. 26th Int. Joint Conf. Artif. constrained co-regularized multi-view spectral clustering,” Neural Netw.,
Intell., Aug. 2017, pp. 2194–2200. vol. 132, pp. 245–252, Dec. 2020.
[8] Z. Tao, H. Liu, H. Fu, and Y. Fu, “Multi-view saliency-guided clustering [34] J. Wu, X. Xie, L. Nie, Z. Lin, and H. Zha, “Unified graph and low-rank
for image cosegmentation,” IEEE Trans. Image Process., vol. 28, no. 9, tensor learning for multi-view clustering,” in Proc. 34th AAAI Conf.
pp. 4634–4645, Sep. 2019. Artif. Intell. (AAAI), 2020, pp. 6388–6395.

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.
6784 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

[35] Z. Tao, H. Liu, S. Li, Z. Ding, and Y. Fu, “From ensemble clustering Weiling Chen (Member, IEEE) received the
to multi-view clustering,” in Proc. 26th Int. Joint Conf. Artif. Intell., B.S. and Ph.D. degrees in communication engi-
Aug. 2017, pp. 2843–2849. neering from Xiamen University, Xiamen, China,
[36] Y. Wang, L. Wu, X. Lin, and J. Gao, “Multiview spectral clustering in 2013 and 2018, respectively. She is currently a
via structured low-rank matrix factorization,” IEEE Trans. Neural Netw. Lecturer with the College of Physics and Informa-
Learn. Syst., vol. 29, no. 10, pp. 4833–4843, Oct. 2018. tion Engineering, Fuzhou University, China. From
[37] Y. Chen, X. Xiao, and Y. Zhou, “Multi-view clustering via simulta- September 2016 to December 2016, she held a vis-
neously learning graph regularized low-rank tensor representation and iting position with the School of Computer Science
affinity matrix,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), and Engineering, Nanyang Technological University,
Jul. 2019, pp. 1348–1353. Singapore. Her current research interests include
[38] S. Li, M. Shao, and Y. Fu, “Multi-view low-rank analysis with applica- image quality assessment, image compression, and
tions to outlier detection,” ACM Trans. Knowl. Discovery Data, vol. 12, underwater acoustic communication.
no. 3, pp. 32-1–32-22, 2018.
[39] Z. Tao, H. Liu, S. Li, Z. Ding, and Y. Fu, “Marginalized multiview
ensemble clustering,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31,
no. 2, pp. 600–611, Feb. 2020. Tiesong Zhao (Senior Member, IEEE) received the
B.S. degree in electrical engineering from the Uni-
[40] F. Nie, X. Wang, and H. Huang, “Clustering and projected clustering versity of Science and Technology of China, Hefei,
with adaptive neighbors,” in Proc. 20th ACM SIGKDD Int. Conf. Knowl. China, in 2006, and the Ph.D. degree in computer
Discovery Data Mining, Aug. 2014, pp. 977–986. science from the City University of Hong Kong,
[41] Y. Li, F. Nie, H. Huang, and J. Huang, “Large-scale multi-view spectral Hong Kong, in 2011. He worked as a Research
clustering via bipartite graph,” in Proc. 29th AAAI Conf. Artif. Intell. Associate with the Department of Computer Sci-
(AAAI), 2015, pp. 2750–2756. ence, City University of Hong Kong, from 2011 to
[42] S. Bickel and T. Scheffer, “Multi-view clustering,” in Proc. 4th IEEE 2012, a Postdoctoral Fellow with the Department
Int. Conf. Data Mining (ICDM), Nov. 2004, pp. 19–26. of Electrical and Computer Engineering, University
[43] H. Wang, Y. Yang, and T. Li, “Multi-view clustering via of Waterloo, from 2012 to 2013, and a Research
concept factorization with local manifold regularization,” in Scientist with the Ubiquitous Multimedia Laboratory, University at Buffalo,
Proc. IEEE 16th Int. Conf. Data Mining (ICDM), Dec. 2016, The State University of New York, Buffalo, from 2014 to 2015. He is currently
pp. 1245–1250. a Minjiang Distinguished Professor with the College of Physics and Informa-
[44] F. Nie, J. Li, and X. Li, “Self-weighted multiview clustering with tion Engineering, Fuzhou University, China. His research interests include
multiple graphs,” in Proc. 26th Int. Joint Conf. Artif. Intell., Aug. 2017, multimedia signal processing, coding, quality assessment, and transmission.
pp. 2564–2570. Due to his contributions in video coding and transmission, he received the
[45] H. Zhao, Z. Ding, and Y. Fu, “Multi-view clustering via deep matrix Fujian Science and Technology Award for Young Scholars in 2017. He has
factorization,” in Proc. 31st AAAI Conf. Artif. Intell. (AAAI), 2017, also been serving as an Associate Editor for IET Electronics Letters since
pp. 2921–2927. 2019.
[46] L. Houthuys, R. Langone, and J. A. K. Suykens, “Multi-
view kernel spectral clustering,” Inf. Fusion, vol. 44, pp. 46–56,
Nov. 2018.
[47] Z. Huang, J. T. Zhou, X. Peng, C. Zhang, H. Zhu, and J. Lv, “Multi- Chang Wen Chen (Fellow, IEEE) received the
view spectral clustering network,” in Proc. 28th Int. Joint Conf. Artif. B.S. degree from the University of Science and
Intell., Aug. 2019, pp. 2563–2569. Technology of China in 1983, the M.S.E.E. degree
[48] L. Lovasz and M. D. Plummer, Matching Theory. Providence, RI, USA: from the University of Southern California in 1986,
and the Ph.D. degree from the University of Illinois
American Mathematical Society, 2009.
at Urbana–Champaign in 1992.
[49] A. Strehl and J. Ghosh, “Cluster ensembles—A knowledge reuse frame- He is currently a Chair Professor of visual com-
work for combining partitionings,” Proc. 18th AAAI Conf. Artif. Intell.
puting at The Hong Kong Polytechnic University.
(AAAI), vol. 3, 2002, pp. 583–617. Previously, he has been an Empire Innovation Pro-
[50] L. Hubert and P. Arabie, “Comparing partitions,” J. Classification, vol. 2, fessor of Computer Science and Engineering with
no. 1, pp. 193–218, 1985. the University at Buffalo, The State University of
[51] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” New York, from 2008 to 2021. He also served as the Dean of the School of
J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008. Science and Engineering, The Chinese University of Hong Kong, Shenzhen,
from 2017 to 2020. He was the Allen Henry Endow Chair Professor with the
Florida Institute of Technology from 2003 to 2007. He was with the Faculty
of Electrical and Computer Engineering, University of Missouri, Columbia,
from 1996 to 2003, and the Faculty of Electrical and Computer Engineering,
University of Rochester, from 1992 to 1996. His research has been supported
by NSF, DARPA, Air Force, NASA, Whitaker Foundation, Microsoft, Intel,
Kodak, Huawei, and Technicolor.
Prof. Chen has been a fellow of SPIE since 2007. He received nine
best paper awards or best student paper awards. He has also received
several research and professional achievement awards, including the Sigma
Xi Excellence in Graduate Research Mentoring Award in 2003, the Alexander
von Humboldt Research Award in 2009, the University at Buffalo Exceptional
Scholar-Sustained Achievement Award in 2012, The State University of
New York System Chancellors Award for Excellence in Scholarship and
Aiping Huang (Member, IEEE) received the B.S. Creative Activities in 2016, and the Distinguished ECE Alumni Award from
degree in mathematics and applied mathematics the University of Illinois at Urbana–Champaign in 2019. He has served as
from Putian University, Putian, China, in 2011, and the Conference Chair for several major IEEE, ACM, and SPIE conferences
the M.S. degree in basic mathematics from Minnan related to multimedia video communications and signal processing. From
Normal University, Zhangzhou, China, in 2014. She January 2014 to December 2016, he has been the Editor-in-Chief of IEEE
is currently pursuing the Ph.D. degree with the T RANSACTIONS ON M ULTIMEDIA. From January 2006 to December 2009,
College of Physics and Information Engineering, he has also served as the Editor-in-Chief for IEEE T RANSACTIONS ON
Fuzhou University. She worked as a Lecturer with C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY. He has been an
the School of Information Science and Technology, Editor for several other major IEEE T RANSACTIONS and journals, including
Xiamen University Tan Kah Kee College, from the P ROCEEDINGS OF IEEE, IEEE J OURNAL OF S ELECTED A REAS IN
2014 to 2018. Her research interests include com- C OMMUNICATIONS, and IEEE J OURNAL OF E MERGING AND S ELECTED
puter vision, image processing, machine learning, and data mining. T OPICS IN C IRCUITS AND S YSTEMS .

Authorized licensed use limited to: University of Technology Sydney. Downloaded on November 10,2023 at 12:08:03 UTC from IEEE Xplore. Restrictions apply.

You might also like