Professional Documents
Culture Documents
Textbook Facial Kinship Verification A Machine Learning Approach 1St Edition Haibin Yan Ebook All Chapter PDF
Textbook Facial Kinship Verification A Machine Learning Approach 1St Edition Haibin Yan Ebook All Chapter PDF
https://textbookfull.com/product/thoughtful-machine-learning-
with-python-a-test-driven-approach-1st-edition-matthew-kirk/
https://textbookfull.com/product/matlab-machine-learning-recipes-
a-problem-solution-approach-2nd-edition-michael-paluszek/
https://textbookfull.com/product/thoughtful-machine-learning-
with-python-a-test-driven-approach-first-edition-kirk/
https://textbookfull.com/product/matlab-machine-learning-recipes-
a-problem-solution-approach-3rd-edition-michael-paluszek/
Python Debugging for AI, Machine Learning, and Cloud
Computing: A Pattern-Oriented Approach 1st Edition
Vostokov
https://textbookfull.com/product/python-debugging-for-ai-machine-
learning-and-cloud-computing-a-pattern-oriented-approach-1st-
edition-vostokov/
https://textbookfull.com/product/practical-machine-learning-and-
image-processing-for-facial-recognition-object-detection-and-
pattern-recognition-using-python-himanshu-singh/
https://textbookfull.com/product/practical-guide-for-biomedical-
signals-analysis-using-machine-learning-techniques-a-matlab-
based-approach-1st-edition-abdulhamit-subasi/
https://textbookfull.com/product/learn-data-mining-through-excel-
a-step-by-step-approach-for-understanding-machine-learning-
methods-1st-edition-hong-zhou/
https://textbookfull.com/product/python-real-world-machine-
learning-real-world-machine-learning-take-your-python-machine-
learning-skills-to-the-next-level-1st-edition-joshi/
SPRINGER BRIEFS IN COMPUTER SCIENCE
Haibin Yan
Jiwen Lu
Facial Kinship
Verification
A Machine Learning Approach
123
SpringerBriefs in Computer Science
Series editors
Stan Zdonik, Brown University, Providence, Rhode Island, USA
Shashi Shekhar, University of Minnesota, Minneapolis, Minnesota, USA
Xindong Wu, University of Vermont, Burlington, Vermont, USA
Lakhmi C. Jain, University of South Australia, Adelaide, South Australia, Australia
David Padua, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
Xuemin (Sherman) Shen, University of Waterloo, Waterloo, Ontario, Canada
Borko Furht, Florida Atlantic University, Boca Raton, Florida, USA
V.S. Subrahmanian, University of Maryland, College Park, Maryland, USA
Martial Hebert, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Katsushi Ikeuchi, University of Tokyo, Tokyo, Japan
Bruno Siciliano, Università di Napoli Federico II, Napoli, Italy
Sushil Jajodia, George Mason University, Fairfax, Virginia, USA
Newton Lee, Newton Lee Laboratories, LLC, Tujunga, California, USA
More information about this series at http://www.springer.com/series/10028
Haibin Yan Jiwen Lu
•
123
Haibin Yan Jiwen Lu
Beijing University of Posts Tsinghua University
and Telecommunications Beijing
Beijing China
China
Any parent knows that people are always interested in whether their own child
bears some resemblance to any of them, mother or father. Human interest in family
ties and connections, in common ancestry and lineage, is as natural to life as life
itself.
Biometrics has made enormous advances over the last 30 years. When I wrote
my first paper on automated face recognition (in 1985) I could count the active
researchers in the area on the fingers of my hand. Now my fingers do not suffice
even to count the countries in which automatic face recognition is being developed,
my fingers suffice only to count the continents. Surprisingly, even with all the
enormous volume of research in automated face recognition and in face biometrics,
there has not as yet been much work on kinship analysis in biometrics and computer
vision despite the natural interest. And it does seem a rather obvious tack to take.
The work has actually concerned conference papers and journal papers, and now it
is a book. Those are prudent routes to take in the development of any approach and
technology.
There are many potential applications here. In the context of forensic systems,
the police now use familial DNA to trace offenders, particularly for cold crimes
(those from a long time ago). As we reach systems where automated search will
become routine, one can countenance systems that look for kinship. It could even
putatively help to estimate appearance with age, given a missing subject. There are
many other possible applications too, especially as advertising has noted the potent
abilities of biometric systems.
For now those topics are a long way ahead. First facial kinship analysis needs
exposure and development in terms of science. It needs data and it needs technique.
I am pleased to see that is what is being considered in this new text on Facial
Kinship Verification: A Machine Learning Approach. This will be part of our
future. Enjoy!
v
Preface
vii
viii Preface
The writing of this book is supported in part by the National Natural Science
Foundation of China (Grant No. 61603048, 61672306), the Beijing Natural Science
Foundation (Grant No. 4174101), the Fundamental Research Funds for the Central
Universities at Beijing University of Posts and Telecommunications (Grant No.
2017RC21) and the Thousand Talents Program for Distinguished Young Scholars.
ix
x Contents
3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4.1 Experimental Setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4 Video-Based Facial Kinship Verification . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Chapter 1
Introduction to Facial Kinship Verification
Recent advances in psychology and cognitive sciences [1–3, 11] have revealed that
human face is an important cue for kin similarity measure as children usually look
like their parents more than other adults, because children and their parents are
biologically related and have overlapped genetics. Inspired by this finding, there have
been some seminal attempts on kinship verification via human faces, and computer
vision researchers have developed several advanced computational models to verify
human kinship relations via facial image analysis [7–10, 14–19, 21, 22]. While there
are many potential applications for kinship verification such as missing children
searching and social media mining, it is still challenging to develop a robust kinship
verification system for real applications because there are usually large variations
on pose, illumination, expression, and aging on facial images, especially, when face
images are captured in unconstrained environments.
Kinship verification via facial image analysis is an interesting problem in computer
vision. In recent years, there have been a few seminal studies in the literature [7–10,
15–19, 21, 22]. Existing kinship verification methods can be mainly categorized
into two classes: feature-based [5, 8, 10, 21, 22] and model-based [9, 15, 18, 19].
Generally, feature-based methods aim to extract discriminative feature descriptors
to effectively represent facial images so that stable kin-related characteristics can
be well preserved. Existing feature representation methods include skin color [8],
histogram of gradient [8, 16, 21], Gabor wavelet [6, 16, 18, 22], gradient orientation
pyramid [22], local binary pattern [4, 15], scale-invariant feature transform [15, 16,
19], salient part [10, 17], self-similarity [12], and dynamic features combined with
spatiotemporal appearance descriptor [5]. Model-based methods usually apply some
statistical learning techniques to learn an effective classifier, such as subspace learn-
ing [18], metric learning [15, 19], transfer learning [18], multiple kernel learning [22]
and graph-based fusion [9]. Table 1.1 briefly reviews and compares existing kinship
verification methods for facial kinship modeling over the past 5 years, where the
performance of these methods is evaluated by the mean verification rate. While the
verification rate of different methods in this table cannot be directly compared due
to different datasets and experimental protocols, we still see that a major progress of
kinship verification has been obtained in recent years. More recently, Fang et al. [7]
extended kinship verification to kinship classification. In their work, they proposed
a kinship classification approach by reconstructing the query face from a sparse
set of samples among the candidates for family classification, and 15% rank-one
classification rate was achieved on a dataset consisting of 50 families.
To the best of our knowledge, there are very limited attempts on tackling the prob-
lem of facial kinship verification in the literature. However, there are some potential
applications of facial kinship verification results presented. One representative appli-
cation of kinship verification is social media analysis. For example, there are tens of
billion images in the popular Facebook website, and more than 2.5 billions images
are added to the website each month. How to automatically organize such large-scale
data remains a challenging problem in computer vision and multimedia. There are
two key questions to be answered: (1) who these people are, and (2) what their rela-
tions are. Face recognition is an important approach to address the first question,
and kinship verification is a useful technique to approach the second question. When
kinship relations are known, it is possible to automatically create family trees from
these social network websites. Currently, our method has achieved 70–75% verifi-
cation rate when two face images were captured from different photos, and 75–80%
verification rate from the same photo. Compare with the state-of-the-art face ver-
ification methods which usually achieve above 90% verification rate on the LFW
dataset, the performance of existing facial kinship verification methods is low. How-
ever, existing computational facial kinship verification methods still provide useful
information for us to analyze the relation of two persons because these numbers
are not only much higher than random guess (50%), but also comparable to that of
human observers. Another important application of kinship verification is missing
children search. Currently, DNA testing is the dominant approach to verify the kin
relation of two persons, which is effective to find missing children. However, there
are two limitations for the DNA testing: (1) the privacy of DNA testing is very high,
which may make it restricted in some applications; (2) the cost of DNA testing is
higher. However, kinship verification from facial images can remedy these weak-
nesses because verifying kinship relations from facial images is very convenient and
its cost is very low. For example, if we want to find a missing child from thousands of
children, it is difficult to use the DNA testing to verify their kin relation due to privacy
concerns. However, if our kinship verification method is used, we can quickly first
identify some possible candidates which have high similarity from facial images.
Then, the DNA testing is applied to get the exact search result.
c [2015] IEEE. Reprinted, with permission, from ref. [20]
Table 1.1 Comparisons of existing kinship verification methods presented over the past few years.
Method Feature representation Classifier Dataset Accuracy (%) Year
Fang et al. [8] Local features from different face KNN Cornell KinFace 70.7 2010
parts
Zhou et al. [21] Spatial pyramid local feature SVM 400 pairs (N.A.) 67.8 2011
descriptor
Xia et al.[18] Context feature with transfer learning KNN UB KinFace 68.5 2012
Guo and Wang [10] DAISY descriptors from semantic Bayes 200 pairs (N.A) 75.0 2012
parts
Zhou et al. [22] Gabor gradient orientation pyramid SVM 400 pairs (N.A.) 69.8 2012
Kohli et al. [12] Self-similarity of Weber face SVM IIITD KinFace [64.2, 74.1] 2012
Somanath and Gabor, HOG and SIFT SVM VADANA KinFace [67.0, 80.2] 2012
1.1 Overview of Facial Kinship Verification
Kambhamettu [16]
Dibeklioglu et al. [5] Dynamic features + spatiotemporal SVM UVA-NEMO, 228 spontaneous pairs 72.9 2013
appearance features
UVA-NEMO, 287 posed pairs 70.0 2013
Lu et al. [15] Local feature with metric learning SVM KinFaceW-I 69.9 2014
KinFaceW-II 76.5 2014
Guo et al. [9] LBP feature Logistic regressor + 322 pairs (available upon request) 69.3 2014
fusion
Yan et al. [20] Mid-level features by discriminative SVM KinFaceW-I 70.1 2015
learning
KinFaceW-II 77.0
Cornell KinFace 71.9
UB KinFace 67.3
Kohil et al. [13] Hierarchical representation Filtered contractive WVU 90.8 2017
DBN
3
4 1 Introduction to Facial Kinship Verification
This book aims to give an in-time summarization of the past achievements as well
as to introduce some emerging techniques for facial kinship verification, specially
from the machine-learning prepositive. We would like to give some suggestions to
readers who are also interested in conducting research on this area by presenting
some newly facial kinship datasets and benchmarks.
The remaining of this book is organized as follows:
Chapter 2 presents the state-of-the-art feature learning techniques for facial kin-
ship verification. Specifically, conventional well-known facial feature descriptors are
briefly reviewed. However, these feature representation methods are hand-crafted
and not data-adaptive. In this chapter, we introduce the feature learning approach
and show how to employ it for facial kinship verification.
Chapter 3 presents the state-of-the-art metric learning techniques for facial kinship
verification. Specifically, conventional similarity learning methods, such as principal
component analysis, linear discriminant analysis, and locality preserving projections
are briefly reviewed. However, these similarity learning methods are hand-crafted
and not data-adaptive. In this chapter, we introduce the metric learning approach and
show how to use it for facial kinship verification.
Chapter 4 introduces some recent advances of video-based facial kinship veri-
fication. Compared with a single image, a face video provides more information
to describe the appearance of human face. It can capture the face of the person of
interest from different poses, expressions, and illuminations. Moreover, face videos
can be much easier captured in real applications because there are extensive surveil-
lance cameras installed in public areas. Hence, it is desirable to employ face videos
to determine the kin relations of persons. However, it is also challenging to exploit
discriminative information of face videos because intra-class variations within a face
video are usually larger than those in a single sill image. In this chapter, we show
some recent efforts in video-based facial kinship verification.
Chapter 5 finally concludes this book by giving some suggestions to the potential
researchers in this area. These suggestions include the popular used benchmark
datasets and the standard evaluation protocols. Moreover, we also discuss some
potential directions for future work of facial kinship verification.
References
1. Alvergne, A., Oda, R., Faurie, C., Matsumoto-Oda, A., Durand, V., Raymond, M.: Cross-
cultural perceptions of facial resemblance between kin. J. Vis. 9(6), 1–10 (2009)
2. Dal Martello, M., Maloney, L.: Where are kin recognition signals in the human face? J. Vis.
6(12), 1356–1366 (2006)
3. DeBruine, L., Smith, F., Jones, B., Craig Roberts, S., Petrie, M., Spector, T.: Kin recognition
signals in adult faces. Vis. Res. 49(1), 38–43 (2009)
4. Deng, W., Hu, J., Guo, J.: Extended SRC: undersampled face recognition via intraclass variant
dictionary. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1864–1870 (2012)
References 5
5. Dibeklioglu, H., Salah, A.A., Gevers, T.: Like father, like son: facial expression dynamics for
kinship verification. In: IEEE International Conference on Computer Vision, pp. 1497–1504
(2013)
6. Du, S., Ward, R.K.: Improved face representation by nonuniform multilevel selection of gabor
convolution features. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 39(6), 1408–1419 (2009)
7. Fang, R., Gallagher, A.C., Chen, T., Loui, A.: Kinship classification by modeling facial feature
heredity, pp. 2983–2987 (2013)
8. Fang, R., Tang, K., Snavely, N., Chen, T.: Towards computational models of kinship verification.
In: IEEE International Conference on Image Processing, pp. 1577–1580 (2010)
9. Guo Yuanhao, D.H., Maaten, L.v.d.: Graph-based kinship recognition. In: International Con-
ference on Pattern Recognition, pp. 4287–4292 (2014)
10. Guo, G., Wang, X.: Kinship measurement on salient facial features. IEEE Trans. Instrum. Meas.
61(8), 2322–2325 (2012)
11. Kaminski, G., Dridi, S., Graff, C., Gentaz, E.: Human ability to detect kinship in strangers’
faces: effects of the degree of relatedness. Proc. R. Soc. B: Biol. Sci. 276(1670), 3193–3200
(2009)
12. Kohli, N., Singh, R., Vatsa, M.: Self-similarity representation of weber faces for kinship classi-
fication. In: IEEE International Conference on Biometrics: Theory, Applications, and Systems,
pp. 245–250 (2012)
13. Kohli, N., Vatsa, M., Singh, R., Noore, A., Majumdar, A.: Hierarchical representation learning
for kinship verification. IEEE Trans. Image Process. 26(1), 289–302 (2017)
14. Lu, J., Hu, J., Zhou, X., Zhou, J., Castrillòn-Santana, M., Lorenzo-Navarro, J., Bottino, A.G.,
Vieira, T.: Kinship verification in the wild: the first kinship verification competition. In:
IAPR/IEEE Joint Conference on Biometrics, pp. 1–6 (2014)
15. Lu, J., Zhou, X., Tan, Y.P., Shang, Y., Zhou, J.: Neighborhood repulsed metric learning for
kinship verification. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 331–345 (2014)
16. Somanath, G., Kambhamettu, C.: Can faces verify blood-relations? In: IEEE International
Conference on Biometrics: Theory, Applications and Systems, pp. 105–112 (2012)
17. Xia, S., Shao, M., Fu, Y.: Toward kinship verification using visual attributes. In: IEEE Inter-
national Conference on Pattern Recognition, pp. 549–552 (2012)
18. Xia, S., Shao, M., Luo, J., Fu, Y.: Understanding kin relationships in a photo. IEEE Trans.
Multimed. 14(4), 1046–1056 (2012)
19. Yan, H., Lu, J., Deng, W., Zhou, X.: Discriminative multimetric learning for kinship verification.
IEEE Trans. Inf. Forensics Secur. 9(7), 1169–1178 (2014)
20. Yan, H., Lu, J., Zhou, X.: Prototype-based discriminative feature learning for kinship verifica-
tion. IEEE Trans. Cybern. 45(11), 2535–2545 (2015)
21. Zhou, X., Hu, J., Lu, J., Shang, Y., Guan, Y.: Kinship verification from facial images under
uncontrolled conditions. In: ACM International Conference on Multimedia, pp. 953–956 (2011)
22. Zhou, X., Lu, J., Hu, J., Shang, Y.: Gabor-based gabor-based gradient orientation pyramid for
kinship verification under uncontrolled environments. In: ACM International Conference on
Multimedia, pp. 725–728 (2012)
Chapter 2
Feature Learning for Facial Kinship
Verification
Abstract In this chapter, we discuss feature learning techniques for facial kinship
verification. We first review two well-known hand-crafted facial descriptors including
local binary patterns (LBP) and the Gabor feature. Then, we introduce a compact
binary face descriptor (CBFD) method which learns face descriptors directly from
raw pixels. Unlike LBP which samples small-size neighboring pixels and computes
binary codes with a fixed coding strategy, CBFD samples large-size neighboring
pixels and learn a feature filter to obtain binary codes automatically. Subsequently,
we present a prototype-based discriminative feature learning(PDFL) method to learn
mid-level discriminative features with low-level descriptor for kinship verification.
Unlike most existing prototype-based feature learning methods which learn the model
with a strongly labeled training set, this approach works on a large unsupervised
generic set combined with a small labeled training set. To better use multiple low-
level features for mid-level feature learning, a multiview PDFL (MPDFL) method
is further proposed to learn multiple mid-level features to improve the verification
performance.
Local binary pattern (LBP) is a popular texture descriptor for feature representation.
Inspired by its success in texture classification, Ahonen et al. [1] proposed a novel
LBP feature representation method for face recognition. The basic idea is as follows:
for each pixel, its 8-neighborhood pixels are thresholded into 1 or 0 by comparing
them with the center pixel. Then the binary sequence of the 8-neighborhoods is trans-
ferred into a decimal number (bit pattern states with upper left corner moving clock-
wise around center pixel), and the histogram with 256 bins of the processed image
is used as the texture descriptor. To capture the dominant features, Ahonen et al. [1]
extended LBP to a parametric form (P, R), which indicates that P gray-scale values
are equally distributed in a circle with radius R to form circularly symmetric neighbor
sets. To better characterize the property of texture information, uniform pattern is
Fig. 2.1 The LBP and ILBP operators where a Original image patch b Results of LBP operator
c Results of ILBP operator
defined according to the number of spatial transitions that are bitwise 0/1 changes in
the calculated binary values. If there are at most two bitwise transitions from 0 to 1
or vice versa, the binary value is called a uniform pattern.
Recently, Jin et al. [22] developed an improved LBP (ILBP) method. Different
from LBP, ILBP makes better use of the central pixel in the original LBP and takes the
mean of all gray values of elements as the threshold value. For the newly generated
binary value of the central pixel, it was added to the most left position of the original
binary string. The corresponding decimal value range would be changed from [0,
255] to [0, 510] for a 3×3 operator. Figure 2.1 shows the basic ideas of LBP and ILBP.
Since LBP and ILBP are robust to illumination changes, they have been widely used
in many semantic understanding tasks such as face recognition and facial expression
recognition.
Gabor wavelet is a popular feature extraction method for visual signal representation,
and discriminative information is extracted by convoluting the original image with
a set of Gabor kernels with different scales and orientations. A 2-D Gabor wavelet
kernel is the product of an elliptical Gaussian envelope and a complex plane wave,
defined as [31]:
Fig. 2.2 Real part of Gabor kernels at eight orientations and five scales [31]
where μ and ν define the orientation and scale of the Gabor kernels, z = z(x, y) is
the variable in a complex spatial domain, · denotes the norm operator, and the
wave vector kμ,ν is defined as follows:
where Oμ,ν (z) is the convolution result corresponding to the Gabor kernel at orien-
tation μ and scale ν. Figure 2.2 shows the the real part of the Gabor kernels.
Usually, five spatial frequencies and eight orientations are used for Gabor feature
representation, and there will be a total of 40 Gabor kernel functions employed on
each pixel of an image. Its computational cost is generally expensive. Moreover, only
the magnitudes of Gabor wavelet coefficients are used as features because the phase
information are sensitive to inaccurate alignment.
10 2 Feature Learning for Facial Kinship Verification
There have been a number of feature learning methods proposed in recent years
[3, 16, 18, 20, 26, 38]. Representative feature learning methods include sparse
auto-encoders [3], denoising auto-encoders [38], restricted Boltzmann machine [16],
convolutional neural networks [18], independent subspace analysis [20], and recon-
struction independent component analysis [26]. Recently, there have also been some
work on feature learning-based face representation, and some of them have achieved
reasonably good performance in face recognition. For example, Lei et al. [29] pro-
posed a discriminant face descriptor (DFD) method by learning an image filter using
the LDA criterion to obtain LBP-like features. Cao et al. [7] presented a learning-
based (LE) feature representation method by applying the bag-of-word (BoW) frame-
work. Hussain et al. [19] proposed a local quantized pattern (LQP) method by mod-
ifying the LBP method with a learned coding strategy. Compared with hand-crafted
feature descriptors, learning-based feature representation methods usually show bet-
ter recognition performance because more data-adaptive information can be exploited
in the learned features.
Compared with real-valued feature descriptors, there are three advantages for
binary codes: (1) they save memory, (2) they have faster computational speed, and
(3) they are robust to local variations. Recently, there has been an increasing interest
in binary code learning in computer vision [14, 40, 43, 44]. For example, Weiss et
al. [44] proposed an efficient binary coding learning method by preserving the simi-
larity of original features for image search. Norouzi et al. [36] learned binary codes
by minimizing a triplet ranking loss for similar pairs. Wang et al. [43] presented a
binary code learning method by maximizing the similarity of neighboring pairs and
minimizing the similarity of non-neighboring pairs for image retrieval. Trzcinski et
al. [40] obtained binary descriptors from patches by learning several linear projec-
tions based on pre-defined filters during training. However, most existing binary
code learning methods are developed for similarity search [14, 40] and visual track-
ing [30]. While binary features such as LBP and Haar-like descriptor have been used
in face recognition and achieved encouraging performance, most of them are hand-
crafted. In this chapter, we first review the compact binary face descriptor method
which learns binary features directly from raw pixels for face representation. Then,
we introduce a feature learning approach to learn mid-level discriminative features
with low-level descriptor for kinship verification.
While binary features have been proven to be very successful in face recognition [1],
most existing binary face descriptors are all hand-crafted (e.g., LBP and its exten-
sions) and they suffer from the following limitations:
2.2 Feature Learning 11
(a)
0.06
0.05
0.04
0.03
0.02
0.01
0
0 10 20 30 40 50 60
(b)
0.025
0.02
0.015
0.01
0.005
0
0 10 20 30 40 50 60
Fig. 2.3 The bin distributions of the a LBP and b our CBFD methods. We computed the bin
distributions in the LBP histogram and our method in the FERET training set, which consists of
1002 images from 429 subjects. For a fair comparison, both of them adopted the same number of
bins for feature representation, which was set to 59 in this figure. We clearly see from this figure that
the histogram distribution is uneven for LBP and is more uniform for our CBFD method. c [2015]
IEEE. Reprinted, with permission, from Ref. [35]
Fig. 2.4 One example to show how to extract a pixel difference vectors (PDV) from the original
face image. For any pixel in the image, we first identify its neighbors in a (2R + 1) × (2R + 1)
space, where R is a parameter to define the neighborhood size and it is selected as 1 in this figure for
easy illustration. Then, the difference between the center point and neighboring pixels is computed
as the PDV. c [2015] IEEE. Reprinted, with permission, from Ref. [35]
To address these limitations, the compact binary local feature learning method [35]
was proposed to learn face descriptors directly from raw pixels. Unlike LBP which
samples small-size neighboring pixels and computes binary codes with a fixed coding
strategy, we sample large-size neighboring pixels and learn a feature filter to obtain
binary codes automatically. Let X = [x1 , x2 , . . . , xN ] be the training set containing
N samples, where xn ∈ Rd is the nth pixel difference vector (PDV), and 1 ≤ n ≤ N.
Unlike most previous feature learning methods [18, 26] which use the original raw
pixel patch to learn the feature filters, we use PDVs for feature learning because PDV
measures the difference between the center point and neighboring pixels within a
patch so that it can better describe how pixel values change and implicitly encode
important visual patterns such as edges and lines in face images. Moreover, PDV has
been widely used in many local face feature descriptors, such as hand-crafted LBP [1]
and learning-based DFD [29]. Figure 2.4 illustrates how to extract one PDV from the
original face image. For any pixel in the image, we first identify its neighbors in a
(2R + 1) × (2R + 1) space, where R is a parameter to define the neighborhood size.
Then, the difference between the center point and neighboring pixels is computed
as the PDV. In our experiments, R is set as 3 so that each PDV is a 48-dimensional
feature vector.
Our CBFD method aims to learn K hash functions to map and quantize each xn
into a binary vector bn = [bn1 , . . . , bnK ] ∈ {0, 1}1×K , which encodes more compact
and discriminative information. Let wk ∈ Rd be the projection vector for the kth
function, the kth binary code bnk of xn can be computed as
bnk = 0.5 × sgn(wkT xn ) + 1 (2.4)
1. The learned binary codes are compact. Since large-size neighboring pixels are
sampled, there are some redundancy in the sampled vectors, making them compact
can reduce these redundancy.
2. The learned binary codes well preserve the energy of the original sample vectors,
so that less information is missed in the binary codes learning step.
3. The learned binary codes evenly distribute so that each bin in the histogram
conveys more discriminative information, as shown in Fig. 2.3b.
N
=− bnk − μk 2
n=1
N
+λ1 (bnk − 0.5) − wkT xn 2
n=1
N
+λ2 (bnk − 0.5)2 (2.5)
n=1
where N is the number of PDVs extracted from the whole training set, μk is the mean
of the kth binary code of all the PDVs in the training set, which is recomputed and
updated in each iteration in our method, λ1 and λ2 are two parameters to balance the
effects of different terms to make a good trade-off among these terms in the objective
function.
The physical meanings of different terms in (2.5) are as follows:
1. The first term J1 is to ensure that the variance of the learned binary codes are
maximized so that we only need to select a few bins to represent the original
PDVs in the learned binary codes.
2. The second term J2 is to ensure that the quantization loss between the original fea-
ture and the encoded binary codes is minimized, which minimizes the information
loss in the learning process.
3. The third term J3 is to ensure that feature bins in the learned binary codes evenly
distribute as much as possible, so that they are more compact and informative to
enhance the discriminative power.
min J(W ) = J1 (W ) + λ1 J2 (W ) + λ2 J3 (W )
W
1
=− × tr (B − U )T (B − U )
N
+λ1 (B − 0.5) − W T X2F
+λ2 (B − 0.5) × 1N×1 2F (2.7)
where B = 0.5 × sgn(W T X) + 1 ∈ {0, 1}N×K is the binary code matrix and
U ∈ RN×K is the mean matrix which is repeated column vector of the mean of all
binary bits in the training set, respectively.
To our knowledge, (2.7) is an NP-hard problem due to the non-linear sgn(·)
function. To address this, we relax the sgn(·) function as its signed magnitude [14,
43] and rewrite J1 (W ) as follows:
1
J1 (W ) = − × (tr(W T XX T W ))
N
−2 × tr(W T XM T W )
+tr(W T MM T W ) (2.8)
where M ∈ RN×d is the mean matrix which is repeated column vector of the mean
of all PDVs in the training set.
Similarly, J3 (W ) can be re-written as:
where H = 0.5 × 11×N × 1N×1 × 0.5, which is a constant and is not influenced by W .
Combining (2.7)–(2.9), we have the following objective function for our CBFD
model:
where
2.2 Feature Learning 15
1
Q− × (XX T − 2XM T + MM T )
N
+λ2 X1N×1 11×N X T (2.11)
−2 × tr((B − 0.5) × X T W ))
−λ2 × N × tr(11×K W T X1N×1 )
subject to W T W = I. (2.14)
We use the gradient descent method with the curvilinear search algorithm in [45]
to solve W . Algorithm 2.1 summarizes the detailed procedure of proposed CBFD
method.
16 2 Feature Learning for Facial Kinship Verification
Fig. 2.5 The flow-chart of the CBFD-based face representation and recognition method. For each
training face, we first divide it into several non-overlapped regions and learn the feature filter
and dictionary for each region, individually. Then, we apply the learned filter and dictionary to
extract histogram feature for each block and concatenate them into a longer feature vector for face
representation. Finally, the nearest neighbor classifier is used to measure the sample similarity.
c [2015] IEEE. Reprinted, with permission, from Ref. [35]
Having obtained the learned feature projection matrix W , we first project each
PDV into a low-dimensional feature vector. Unlike many previous feature learning
methods [26, 27] which usually perform feature pooling on the learned features
directly, we apply an unsupervised clustering method to learn a codebook from the
training set so that the learned codes are more data-adaptive. In our implementations,
the conventional K-means method is applied to learn the codebook due to its sim-
plicity. Then, each learned binary code feature is pooled as a bin and all PDVs within
the same face image is represented as a histogram feature for face representation.
Previous studies have shown different face regions have different structural infor-
mation and it is desirable to learn position-specific features for face representation.
Motivated by this finding, we divide each face image into many non-overlapped local
regions and learn a CBFD feature descriptor for each local region. Finally, features
extracted from different regions are combined to form the final representation for the
whole face image. Figure 2.5 illustrates how to use the CBFD for face representation.
Recently, there has been growing interest in unsupervised feature learning and deep
learning in computer vision and machine learning, and a variety of feature learning
methods have been proposed in the literature [10, 16, 18, 21, 23, 25, 28, 33, 38, 41,
2.2 Feature Learning 17
Fig. 2.6 Pipeline of our proposed kinship verification approach. First, we construct a set of face
samples from the labelled face in the wild (LFW) dataset as the prototypes and represent each
face image from the kinship dataset as a combination of these prototypes in the hyperplane space.
Then, we use the labeled kinship information and learn mid-level features in the hyperplane space
to extract more semantic information for feature representation. Finally, the learned hyperplane
parameters are used to represent face images in both the training and testing sets as a discrimina-
tive mid-level feature for kinship verification.
c [2015] IEEE. Reprinted, with permission, from
Ref. [51]
49, 52] to learn feature representations from raw pixels. Generally, feature learning
methods exploit some prior knowledge such as smoothness, sparsity, and temporal
and spatial coherence [4]. Representative feature learning methods include sparse
auto-encoder [38, 41], restricted Boltzmann machines [16], independent subspace
analysis [8, 28], and convolutional neural networks [25]. These methods have been
successfully applied in many computer vision tasks such as image classification [25],
human action recognition [21], face recognition [18], and visual tracking [42].
Unlike previous feature learning methods which learn features directly from raw
pixels, we propose learning mid-level discriminative features with low-level descrip-
tor [51], where each entry in the mid-level feature vector is the corresponding decision
value from one support vector machine (SVM) hyperplane. We formulate an opti-
mization objective function on the learned features so that face samples with a kin
relation are expected to have similar decision values from these hyperplanes. Hence,
our method is complementary to the exiting feature learning methods. Figure 2.6
shows the pipeline of our proposed approach.
Low-level feature descriptors such as LBP [2] and scale-invariant feature trans-
form (SIFT) [32] are usually ambiguous, which are not discriminative enough for
kinship verification, especially when face images were captured in the wild because
there are large variations on face images captured in such scenarios. Unlike most pre-
vious kinship verification work where low-level hand-crafted feature descriptors [12,
13, 15, 24, 34, 39, 47, 48, 50, 53, 54] such as local binary pattern (LBP) [2, 9] and
Gabor features [31, 54] are employed for face representation, we expect to extract
more semantic information from low-level features to better characterize the relation
of face images for kinship verification.
To achieve this, we learn mid-level feature representations by using a large unsu-
pervised dataset and a small supervised dataset because it is difficult to obtain a
large number of labeled face images with kinship labels for discriminative feature
learning. First, we construct a set of face samples with unlabeled kinship relation
18 2 Feature Learning for Facial Kinship Verification
from the labelled face in the wild (LFW) dataset [17] as the reference set. Then, each
sample in the training set with a labeled kin relation is represented as a mid-level
feature vector. Finally, we formulate an objective function by minimizing the intra-
class samples (with a kinship relation) and maximizing the inter-class neighboring
samples with the mid-level features. Unlike most existing prototype-based feature
learning methods which learn the model with a strongly labeled training set, our
method works on a large unsupervised generic set combined with a small labeled
training set because unlabeled data can be easily collected. The following details the
proposed approach.
Let Z = [z1 , z2 , . . . , zN ] ∈ Rd×N be a unlabeled reference image set, where N
and d are the number of samples and feature dimension of each sample, respectively.
Assume S = (x1 , y1 ), . . . , (xi , yi ), . . . , (xM , yM ) be the training set containing M
pairs of face images with kinship relation (positive image pairs), where xi and yi are
face images of the ith pair, and xi ∈ Rd and yi ∈ Rd . Different from most existing
feature learning methods which learn feature representations from raw pixels, we
aim to learn a set of mid-level features which are obtained from a set of prototype
hyperplanes. For each training sample in S, we apply the linear SVM to learn the
weight vector w to represent it as follows:
N
N
w= αn ln zj = βn zj = Zβ (2.15)
n=1 n=1
Assume we have learned K linear SVM hyperplanes, then the mid-level feature
representations of xi and yi can be represented as
1
M k
+ f (xit2 ) − f (yi )22
Mk i=1 t =1
2
1
M
− f (xi ) − f (yi )22
M i=1
s.t. βk 1 ≤ γ , k = 1, 2, . . . , K (2.20)
where yit1 represents the t1 th k-nearest neighbor of yi and xit2 denotes the t2 th k-
nearest neighbor of xi , respectively. The objectives of H1 and H2 in (2.20) is to make
the mid-level feature representations of yit1 and xi , and xit2 and yi as far as possible
if they are originally near to each other in the low-level feature space. The physical
meaning of H3 in (2.20) is to expect that xi and yi are close to each other in the
mid-level feature space. We enforce the sparsity constraint on βk such that only a
sparse set of support vectors from the unlabeled reference dataset are selected to
learn the hyperplane because we assume each sample can be sparsely reconstructed
the reference set, which is inspired by the work in [23]. In our work, we apply the
same parameter γ to reduce the number of parameters in our proposed model so that
the complexity of the proposed approach is reduced.
Combining (2.18)–(2.20), we simplify H1 (B) to the following form
1 T T
M k
H1 (B) = B Z xi − BT Z T yit1 22
Mk i=1 t =1
1
M k
1
= tr BT Z T (xi − yit1 )(xi − yit1 )T ZB
Mk i=1 t =1 1
= tr(BT F1 B) (2.21)
where
1 T
M k
F1 = Z (xi − yit1 )(xi − yit1 )T Z. (2.22)
Mk i=1 t =1
1
where
1 T
M k
F2 = Z (xit2 − yi )(xit2 − yi )T Z (2.24)
Mk i=1 2 =1
1
M
F3 = Z T (xi − yi )(xi − yi )T Z. (2.25)
i=1
K
K
min H(A, B) = Gak − Gβk 2 + λ βkT βk
k=1 k=1
s.t. AT A = IK×K ,
βk 1 ≤ γ , k = 1, 2, . . . , K. (2.27)
K
K
min H(B) = Gak − Gβk 2 + λ βkT βk
k=1 k=1
s.t. βk 1 ≤ γ , k = 1, 2, . . . , K. (2.28)
FOOTNOTES
[1113] Several leading authors assume this to have occurred on the 28th of April,
when Cortés mustered his forces. He says nothing about the formal launch on that
occasion, and it is hardly likely that two such performances could have been
effected in one day.
[1114] ‘Las vanderas Reales, y otras vanderas del nombre que se dezia ser el
vergantin.’ Bernal Diaz, Hist. Verdad., 138. Ixtlilxochitl assumes that the flag-ship
was named Medellin, Hist. Chich., 313-14, but this appears to be based on a
misinterpretation of Herrera, who places Villafuerte ‘of Medellin’ at the head of the
list of captains. Vetancurt believes that the vessels were named after the apostles,
to whom Cortés was so devoted. Teatro Mex., pt. iii. 156. If so, the flag-ship may
have been called San Pedro, after his patron. When all were floated a storm came
which threatened to break them one against the other. Torquemada, i. 532.
[1116] The smallest was soon set aside as useless. There was some trouble in
obtaining rowers, owing in a great measure to the employment in Spain of
criminals in that capacity. Hidalgos shrank from anything that could be regarded as
common labor, and even ordinary sailors refused to handle a branded implement.
In this dilemma a list was made of all natives of seaports, and of those known to
be able fishermen, and finally the selected number were ordered to take the oar,
regardless of caste.
[1117] So says Cortés, Cartas, 208. Bernal Diaz as usual implies a smaller number
by stating that Cortés sent to ask for only 20,000 men from the republics. Hist.
Verdad., 137-8. The Cholultecs, he says, who had maintained a neutral attitude
since the massacre there, sent a small force under their own captain. Gomara
allows 60,000 allies to come; Vetancurt 90,000, of whom 60,000 are Tlascaltecs;
Clavigero limits the arrival to 50,000 Tlascaltecs, the whole number of allies
swelling gradually to over 200,000, while Herrera makes that number arrive within
two days; Ixtlilxochitl names thirteen chiefs, who commanded the 50,000
Tlascaltecs (a misprint gives 5000), and some of those leading the 10,000
Huexotzincas; he also allows 10,000 Cholultecs; of his own Tezcucans he claims
over 200,000 to have come, 50,000 each being furnished by the provinces of
Tezcuco, Otumba, Tziuhcohua, and Chalco, with Tepeaca, Quauhnahuac, etc.;
8000 chiefs or nobles joined besides from Tezcuco, and 50,000 laborers in
addition, it seems. Hist. Chich., 313; Hor. Crueldades, 20.
[1119] Cortés, Cartas, 206. Gomara agrees, Hist. Mex., 191, but Bernal Diaz gives
the number as 84 horsemen, 650 soldiers with swords, shields, and lances, and
194 archers and arquebusiers. loc. cit.
[1120] Some half a dozen towns furnished 8000 each within eight days. The
feathers were fastened by the archers with glue from the cactle root. They kept
two strings and as many catches, and maintained their skill by target practice. Id.
[1122] ‘Iva a tomar por fuerça el Cacicazgo, e vassallos, y tierra del mismo
Chichimecatecle,’ Bernal Diaz, Hist. Verdad., 139, which must be an idle rumor.
[1123] Some time later when Ojeda went to Tlascala for supplies he brought back
the confiscated property, including a quantity of treasure, and 30 women, the
daughters, servants, etc., of Xicotencatl. Torquemada, i. 558. The Tlascaltec laws
were severe, ‘E l’odio particolare, che portavano a quel Principe, il cui orgoglio
non potevano più soffrire.’ Clavigero, Storia Mess., iii. 195. Some Tlascaltecs say
his father had warned Cortés against his son, and urged his death. Bernal Diaz,
loc. cit. Herrera observes that he could hardly have been seized without Tlascaltec
aid. dec. iii. lib. i. cap. xvii.
[1124] Ojeda, in Herrera, loc. cit. Solis, Hist. Mex., ii. 379, thinks that it would have
been hazardous to hang him at Tezcuco, where many Tlascaltecs were gathered;
but he forgets that nearly all this people had already set out for Mexico. His
supposition is based on Bernal Diaz, who intimates that he was not hanged in
Tezcuco. Alvarado had pleaded for his life, and Cortés, while pretending
compliance, secretly ordered the alguacil to despatch him. Hist. Verdad., 139.
[1125] Cortés divided the lordship of the chieftain afterward between his two sons,
and the name of Xicotencatl has been perpetuated by more than one line, as
Camargo shows in his Hist. Tlax. Two officers of that name figured during the
American invasion of 1847 in defence of their country. Brasseur de Bourbourg,
Hist. Nat. Civ., iv. 447.
[1126] Most writers say the 10th, misled by an error in Cortés, Cartas, 208; and
this error causes Prescott, among others, to fall into more than one mistake, which
he upholds with vain arguments. On earlier pages in the Cartas are given dates in
connection with religious festivals which show that Whitsunday fell on the 19th of
May, and the departure took place three days later. Bernal Diaz gives the 13th,
and says that the Xicotencatl affair had detained them a day. He afterward varies
the date.
[1127] They passed round Zumpango Lake, through Quauhtitlan and Tenayocan.
Cortés, Cartas, 210; Bernal Diaz, Hist. Verdad., 139.
[1128] ‘Jamàs quisieron Paz [the Aztecs]; y aunque à la postre la recibieron, el Rei
no la aceptò, porque al principio, contra su Consejo, la rehusaron.’ Torquemada, i.
572. Gomara says the same, but Duran, the historian of his dynasty, declares that
he loved too much to rule and to display his personal valor ever to listen to peace
proposals. Hist. Ind., MS., ii. 490. On the following pages he gives a speech by
this ruler, painting the shame and evil of surrender. Before this, according to the
native records of Sahagun, Cortés had invited Quauhtemotzin, under promise of
security, to a conference, in order to explain his motives for the campaign. Not
wishing to appear afraid, the Aztec monarch came to the rendezvous near
Acachinanco, in a state barge, attended by several nobles. Cortés arrived in a
brigantine. He reviewed the allegiance tendered to the Spanish sovereign, the
revolt, precipitated by Alvarado’s effort to anticipate the murderous plot, and the
subsequent slaughter of Spaniards and robbery of treasures. These unjustifiable
and inhuman outrages he had come to avenge; and he would not stay his hand till
the enemies of his king and God had been driven forth. Quauhtemotzin merely
replied that he accepted war, and thereupon returned to the city. Sahagun, Hist.
Conq. (ed. 1840), 147-50. Torquemada, i, 543, and Brasseur de Bourbourg adopt
this story, the latter stating that the Aztec ruler proposed to consult his council. But
Clavigero rightly assumes that the interview never took place. All other records
say that Quauhtemotzin persistently refused ever to speak with Cortés, even from
behind his walls.
[1129] Many captives had been secured during recent raids on Chalco and
Tezcuco, and other parts, so that there was no lack. The native victims numbered
4000, it is said. Herrera, dec. iii. lib. i. cap. xvii. ‘All boys,’ says Oviedo, iii. 515. ‘Yo
bien creo que fuerõ muchas, mas no tantas.’ Gomara, loc. cit. The limbs of the
Spaniards were sent to different provinces to frighten the inhabitants. Bernal Diaz,
Hist. Verdad., 135.
[1130] Ixtlilxochitl states in one place that his namesake remained at Tezcuco to
raise troops and to arrange for regular trains of supplies for the Spanish camps. In
another relation he allows him to accompany Cortés with 16,000 canoes. Hor.
Crueldades, 21; Relacion, 314. The canoes which now attended the fleet appear
to have served chiefly as transports.
[1131] Owned even under republican rule by the heirs of Cortés, as a tetzontli
quarry.
[1132] Cortés, Cartas, 211. Bernal Diaz raises the number to 4000, Peter Martyr to
5000, while Vetancurt assumes that the 500 were merely the van. Teatro Mex., pt.
iii. 158.
[1133] Oviedo writes that they were sacrificing boys to propitiate the gods. iii. 516.
‘La flota q̄ les parecio no dar batalla con tan pocas y cansadas,’ observes
Gomara, Hist. Mex., 194.
[1134] Alvarado advanced as far as the first wide bridge, but lost three men.
Bernal Diaz, Hist. Verdad., 141.
[1135] Probably behind the great southern levee. See Native Races, ii. 564.
[1136] Gomara calls it wrongly Xaltoca, and Robertson confounds it, singularly
enough, with Tezcuco. Hist. Am., ii. 114.
[1137] Cortés, Cartas, 216-17. The greater number of the allies came daily from
their camp at Coyuhuacan to join Cortés as warriors and sappers. Digging and
similar work was done chiefly by Tezcucans. Herrera states that the vessels of
Flores and Ruiz de la Mota were placed at a broken causeway between the
camps of Alvarado and Sandoval. dec. iii. lib. i. cap. xvii.
[1138] ‘Asserrador ... trabajò mas que mil Indios.’ Herrera, dec. iii. lib. i. cap. xviii.
[1140] Ixtlilxochitl allows his namesake and Cortés to figure in this scene, the
former cutting off the idol’s head while the general secured the mask. He states
that the temple was captured during the first entry into the court. When the
Mexicans rallied, Ixtlilxochitl managed to kill their general, which so enraged them
that they rushed madly on and drove back the Spaniards till the horse arrived. Hor.
Crueldades, 29-30. His version is adopted in the main by Prescott and others; but
there are several discrepancies. The death of a general as a rule discouraged
native armies. Cortés, who would not have failed to claim the overthrow of the idol,
states that it was effected by four or five Spaniards, after the second capture of the
court. Cartas, 218. His presence was needed below to direct operations.
Gomara’s text must have misled Ixtlilxochitl and others. Hist. Mex., 197-8. Herrera
and Torquemada adhere better to Cortés.
[1142] Cortés, Cartas, 221. Bernal Diaz states that Alvarado received four and
Sandoval only two. Hist. Verdad., 141.
CHAPTER XXXIII.
CONTINUATION OF THE SIEGE.
June-July, 1521.