Professional Documents
Culture Documents
18 I Know That Person Generative Full Body and Face
18 I Know That Person Generative Full Body and Face
Abstract
1 15
images of humans that thwart human and machine-based to the k-same algorithm aim to improve data utility of the
recognition. We introduce a model that de-identifies both output, e.g. k-same select [7] and model-based k-same [9].
primary biometric and often neglected soft biometric and An important question to ask when considering identity
non-biometric personal identifiers. Assuming that the sil- protection through face de-identification is how much in-
houette of the person is known, we synthesize an alterna- formation can be mined on the individual even if the face
tive appearance that fits the silhouette and can therefore be is completely obfuscated. Oh et al. [23] propose a system
seamlessly integrated into the original image. Our model is for “faceless” person recognition and show that individu-
capable of producing results at varying level of detail, de- als in a social media setting can be identified even if their
pending on the input resolution, output resolution require- faces are blacked out and their clothing varies across im-
ments and the available segmentation of the human figure. ages. Although a higher level of de-identification than face-
If only a low resolution segmented human blob is available, only de-identification is needed, works on full-body de-
e.g. obtained in a surveillance scenario using background identification are scarce. In an early work, Park and Trivedi
subtraction, the model produces a low detail de-identified [24] introduce a method for tracking and privacy protec-
version of the blob. On the other hand, if a detailed segmen- tion in videos, with a de-identification scheme that covers
tation at the garment level is available, the model produces individual human bounding boxes with colored rectangles.
a highly detailed de-identified version of the blob, consid- Agrawal and Narayanan [2] propose a method for tracking,
ering each garment separately. segmenting and de-identifying individuals in videos. Two
de-identification strategies are considered, one based on ex-
2. Related work ponential pixel blurring and another based on line integral
The majority of research on de-identification of humans convolution. Although privacy-preserving, these schemes
in images still focuses on de-identifying primary biomet- do not emphasize data utility and naturalness of images.
ric features, predominately the face. Earliest approaches Our model for face and full body de-identification builds
to face de-identification involved applying naive transfor- on recently introduced generative adversarial neural net-
mations such as blurring or pixelization [30]. While naive works (GANs) [6]. GANs represent a promising new di-
transformations do thwart human recognition, it has been rection in artificial image synthesis, having enabled synthe-
shown [8] that they can be effectively circumvented using sizing images of various classes (e.g. faces, interiors, hand-
computer vision, by employing a classifier trained on im- written digits [6, 27]) that look highly realistic. In the GAN
ages on which the same transformation has been applied framework, two networks are trained, a generator and a dis-
(so-called parrot recognition). criminator. While the generator attempts to create an arti-
A higher level of privacy protection can be obtained ficial image of the target class, the discriminator attempts
through replacing the face with another, unrelated face, as to discern whether the image is artificial or real. Essen-
e.g. in the work of Bitouk et al. [4]. They introduce a sys- tially, the two networks are engaged in a min-max game
tem that replaces faces by selecting a similar face from a against each other. A number of extensions and improve-
database of real face images and adjusting the pose, light- ments to GAN have been proposed, most notable the archi-
ing, and skin tone to match the original face. However, tecture known as deep convolutional generative adversar-
the construction of such a database for real applications is ial network (DCGAN) [27] that introduces several architec-
ethically and legally problematic, as rendering the faces of tural constraints and improves the stability of the training
real people in de-identified sequences can potentially por- process. There have been works on applying GANs in a
tray those people in a negative light. Although a number of conditional setting, i.e. making the generated output con-
large-scale face databases are available to the research com- ditioned over some input. For example, Zhu et al. [37]
munity (e.g. MS-Celeb-1M [10], FaceScrub [22], CelebA propose a method for user-guided visual manipulation of
[18] etc.), the data is licensed only for research purposes, the generated images. Pathak et al. [26] introduce an in-
with the disclaimers that the data providers do not own the painting technique that conditions the generated output that
data. represents a missing image region to its surroundings. Isola
A viable alternative is to use synthetic face images for et al. [14] propose a general image-to-image translation
face replacement, as e.g. in the work of Newton et al. [21], framework suitable for a variety of tasks including map-
who propose the k-same face de-identification algorithm. to-image translation, edge-to-image translation and image
In the algorithm, each face image is replaced with an av- colorization.
erage of k most similar faces from a database where each Given the fact that deep networks have become a go-to
person known to the system is represented with a single im- solution for detection and recognition tasks in computer vi-
age. The algorithm is irreversible and offers a reasonable sion, the GAN framework is especially interesting in the
level of privacy protection, with the best possible success context of de-identification, as it is tailored toward maxi-
rate for re-identification being 1/k. Several improvements mizing the potential of fooling a deep network (the discrim-
16
inator). By generating synthetic human images that fool As shown in [14, 26], the quality of the images output
the discriminator, we ensure not only that the generated hu- from the generator can be improved by requiring them to be
mans look natural, but that a deep network cannot discern close to groundtruth images in accordance with some dis-
them from real images, making the re-identification harder. tance metric d:
In this work, we assume that some kind of segmentation The conditional GAN objective can then be expressed as:
of the person to be de-identified is known (e.g. through
using a person detector combined with a segmentation al- G∗cGAN = arg min max LcGAN (G, D) + λLd (G). (5)
G D
gorithm, or roughly estimated using background subtrac-
tion) and do not study how to obtain it. Rather, we pro- 3.1. Full body de-identification
pose a de-identification method that can work with seg- In our model, we perform full body de-identification us-
mentations of varying levels of detail: from simple fore- ing a conditional GAN that generates synthetic human im-
ground/background blob segmentations to precise garment- ages. In order to ensure that the synthetic images look nat-
level annotations. ural and fit in the original scene, the generative process
Our model builds on generative adversarial networks is guided by the segmentations of the persons that are to
(GANs), attempting to generate synthetic samples from the be de-identified. The conditional GAN is trained on pairs
distribution of all possible images that generated query seg- of human segmentations and human images, with the goal
mentations. In general, the GAN framework consists of two of outputting realistically-looking synthetic human images
parametric functions, a generator and a discriminator, mod- conditioned on the extracted segmentation. Depending on
eled by deep neural networks. Given data x, the generator the availability of data in particular applications, the net-
maps its input variables z (noise with a prior pz (z)) to the work can be trained to operate on segmentations with vary-
data space using the mapping G(z, θg ), where θg are gener- ing levels of detail, from simple silhouette-defining blobs
ator parameters. The discriminator D(x, θd ) with parame- obtained e.g. using background subtraction to full body seg-
ters θd takes data x as input and outputs a scalar that repre- mentations with detailed tags for individual garments.
sents the probability that x was sampled from real data gen-
erating distribution pdata , and not from the generator dis-
tribution pg . The two networks are trained simultaneously 3.1.1 Network architecture
so that D maximizes the probability of correctly identify- The structure of our full body de-identification network is
ing generated samples, while G is attempting to minimize based on architectures described in [31, 27, 14].
it. The discriminator and the generator are engaged in a Generator: The generator is an encoder-decoder net-
min-max game with the objective function: work with skip connections as in the U-Net architecture
[31]. There are eight encoder and eight decoder layers.
LGAN (G, D) = Ex∼pdata (x) (log D(x)) The skip connections connect each encoder layer i with the
decoder layer n − i, where n is the total number of lay-
+ Ez∼pz (z) (log(1 − D(G(z)))). (1)
ers (implemented as a concatenation of layer activations).
The GAN objective is: Each layer consists of the following operations: convolu-
tion/deconvolution, batch normalization that normalizes the
G∗GAN = arg min max LGAN (G, D). (2)
G D inputs to zero mean and unit variance [12], and leaky ReLU
The GAN framework can be extended to model depen- non-linearity [19]. We use filters of size 5×5 with a stride of
dence between pairs of images, i.e. to make the generator 2 and zero padding of 2. In effect, each convolution down-
output image conditioned on an input image, which is es- samples the image by a factor of 2, while each deconvolu-
pecially useful for various image to image translation tasks. tion upsamples the image by a factor of 2.
Let x denote the input image, and let y be the image con- Let conv-n denote a convolutional layer with n
ditioned on x that we wish the generator to output. Both channels, batch normalization and leaky ReLU activa-
generator and discriminator take a pair of vectors as input: tions, deconv-n a deconvolution layer with n chan-
the discriminator takes the two mutually dependent images nels, batch normalization and leaky ReLU activations, and
x and y, while the generator takes the conditional image x deconv-dropout-n a deconvolution layer with n chan-
and a noise vector z. The objective function of a conditional nels, 50% dropout rate, batch normalization and leaky
GAN can be expressed as: ReLU activations. The architectures of the encoder and the
decoder are shown in Table 1.
LcGAN (G, D) = Ex,y∼pdata (x,y) (log D(x, y)) (3) Discriminator: The discriminator consists of four con-
+ Ex∼pdata (x),z∼pz (z) (log(1 − D(x, G(x, z)))). volutional layers with batch normalization and leaky ReLU
17
Layer index Encoder Decoder
1 conv-64 deconv-dropout-512
2 conv-128 deconv-dropout-1024
3 conv-256 deconv-dropout-1024
4 conv-512 deconv-1024
5 conv-512 deconv-1024
6 conv-512 deconv-512
7 conv-512 deconv-256
8 conv-512 deconv-128
18
Iorig Iseg Network name
unchanged garments clothing
unchanged blob only clothing-mono
background Figure 4. Example faces synthesized by DCGAN.
garments clothing-nobg
masked
background
blob only clothing-mono-nobg
masked
19
Experiment name Network name Test set
clothing clothing clothing-garments
clothing-nobg clothing-nobg clothing-garments
clothing-mono clothing-mono clothing-blobs
clothing-mono-nobg clothing-mono-nobg clothing-blobs
naiveseg clothing-nobg clothing-naiveseg
bsblobs clothing-nobg bsblobs
bsblobs-naiveseg clothing-nobg bsblobs-naiveseg
20
Experiment Garment colors and Garment shapes Mean naturalness Mean recognizability
name textures are similar [% yes] are similar [% yes] score ±σ score ±σ
clothing 40.0 93.3 5.6 ± 2.6 3.9 ± 2.5
clothing-nobg 15.2 93.3 5.4 ± 2.6 4.2 ± 2.6
clothing-mono 13.3 32.4 3.6 ± 2.3 2.4 ± 1.7
clothing-mono-nobg 21.1 50.0 4.3 ± 2.1 2.8 ± 1.9
naiveseg 4.7 23.1 3.2 ± 2.1 2.0 ± 1.5
bsblobs 7.6 37.1 1.9 ± 1.3 1.5 ± 0.9
bsblobs-naiveseg 2.9 9.6 2.5 ± 2.1 1.5 ± 1.1
pear to be sensitive to noise, and that there is a cross-dataset people wear similarly tailored clothes. Overall, the high-
loss in performance. Furthermore, the face detector failed est mean naturalness score (5.6 on a scale of 1 to 10) is
on the majority of images in these datasets, contributing to obtained for the clothing experimental setup (see Ta-
the loss of naturalness. ble 3) The second highest mean naturalness score (5.4) is
A visualization of the merits of doing face synthesis as a obtained using a similar setup, clothing-nobg. Both
separate step in our model is shown in Fig. 8. Note how the clothing and clothing-nobg setups generate full
de-identified full body image lacks facial details, and how body de-identifications with garment shapes similar as in
the naturalness is improved when a synthetic face is added. the original image, which is to be expected given that the
As our training set includes images of real people, poten- input segmentation exactly outlines individual garments.
tially generating images that closely resemble these people Interestingly, removing the background seems to have in-
would present an ethical problem. However, we have found creased the capacity of the clothing-nobg network for
that the generated images do not resemble training set orig- varying garment colors and textures. Still, mean recogniz-
inals. Facial details in the generated images are lost (see ability scores are approximately the same and do not reflect
Fig. 8), and garment information in the network is learned the perceived difference in garment colors and textures. We
from many images, resulting in a diverse pool of potential believe that these recognizability scores reflect the users’
renderings for each garment. perceptual grouping according to the Gestalt principle of
similarity: even though asked to score how well they could
4.5. A perceptual study recognize the person based on the face and garment colors
Given a series of de-identified outputs obtained by ex- and textures, the users tended to consider the images sim-
perimental setups outlined in Table 3, we performed a per- ilar, and therefore the person recognizable, if they viewed
ceptual study to quantitatively evaluate how humans per- garment shapes as similar. This is supported by measuring
ceive and respond to the de-identified images produced by the Pearson correlation coefficient between garment shape
our model. We randomly sampled five de-identified im- similarity (column 3 of Table 4) and mean recognizability
ages from each of the seven experimental setups, resulting score (column 5 of Table 4), which is a high 0.94.
in a total of 35 images. The users were asked four ques- Networks trained just on black and white blob segmenta-
tions about pairs of original and de-identified images: (i) tions (clothing-mono and clothing-mono-nobg)
whether the coloring and the textures of the garments on the achieved lower mean naturalness score than their garment-
two images are similar [yes/no], (ii) whether most garments trained counterparts. Mean recognizability scores were also
are similarly shaped/tailored [yes/no], (iii) whether the de- lower. Removing the background of the original images
identified image looks realistic, on a scale of 1-10 (1 - not yielded a higher naturalness score.
realistic at all, 10 - completely realistic), (iv) to what degree Applying the naive segmentation algorithm on the blobs
the de-identified person is recognizable based on garments from the CCP dataset (naiveseg) offered lower similar-
or face, on a scale of 1-10 (1 - not recognizable at all, 10 - ity of color, texture and shape and lower mean recogniz-
completely recognizable). A total of 21 users participated ability score compared to de-identification based on blobs
in the study. Results are summarized in Table 4. only (clothing-mono), but at a cost of a slightly smaller
Ideally, our model should produce outputs that are per- mean naturalness score. Using naive segmentation on
ceptually distant from the originals (low garment color and background subtraction blobs from the Human3.6M dataset
texture similarity and low recognizability), while offering (bsblobs-naiveseg) somewhat improved the natural-
a high degree of naturalness. Although it is desirable for ness of the de-identification compared to bsblobs. Over-
garment shape similarity to be low, we do not believe gar- all, the naturalness score obtained on background subtrac-
ment shape to be particularly identity-revealing, as a lot of tion blobs is quite low, which we attribute mainly to noisy
21
background subtraction input that generated noisy output Accuracy [%]
Experiment name
(see Fig. 7) and to the fact that the face detector failed on k=1 k=3 k=5 k = 10
over 90% of images from the Human3.6M dataset, so the clothing 0.0 1.1 3.2 3.2
de-identified outputs did not have faces rendered. clothing-nobg 1.1 1.1 1.1 2.1
clothing-mono 0.0 0.0 2.1 2.1
4.6. Re-identification performance clothing-mono-nobg 0.0 0.0 0.0 1.1
To quantitatively support our perceptual study findings, naiveseg 0.0 0.0 0.0 0.0
we perform a series of experiments aimed to mimic a sim- blur 37.9 48.4 56.8 62.1
ple re-identification attack scenario. We assume that the at- pixelized 87.4 97.9 97.9 100.0
tacker has a gallery of images (in our case, we use all 2098 black-rect 100.0 100.0 100.0 100.0
images from the CCP dataset), among which is the original
Table 5. Re-identification k-nn retrieval performance.
image of the de-identified person, and study how likely it is
that the person can be re-identified using the de-identified
image as query (probe). We compare the performance for influenced by garment shape similarity between pairs of
de-identified outputs of the first five of our seven experi- original and de-identified images (correlation coefficient of
ments (i.e. we consider only the CCP dataset outputs, see 0.94). In real scenarios, it is our intuition that garment shape
Table 3) to performance for common naive de-identification is not in itself as identifying as garment colors or textures,
techniques including (i) blurring the person (σ = 10), (ii) but this intuition remains to be corroborated in further ex-
pixelization of the person, and (iii) covering the face with periments. As the users have recognized the de-identified
a black box and leaving the rest of the body as is. Each garment shapes as similar to the real-world shapes in the
person is represented by a concatenation of a 3D histogram original images, we believe that the lack of complete nat-
of RGB color components (20 bins per component) and a uralness perceived by users is due to the de-identified col-
weighted gradient orientation histogram (20 bins). We mea- ors and textures. As noted by Zhu et al. [37], GANs still
sure k-nearest neighbor retrieval performance using the sum have limitations (low resolution of generated results, blur
of histogram intersections for normalized color and gradient and lack of texture), and they perform better on structured
components as the distance measure. datasets than on more general imagery. We expect further
Experimental results are summarized in Table 5. By ap- improvements to the GAN state of the art will drive up the
plying our model, the likelihood of re-identification using naturalness of the images generated by our model.
color and gradient similarity is minimal even if the attacker As several experimental setups and network training
has the exact same original image used to generate the de- strategies were tried out (see Tables 2 and 3), our sum-
identified image. In contrast, results for blurring, pixeliza- mary recommendation is to use as detailed input segmen-
tion and black box-based re-identification support the intu- tation as possible when generating the de-identified images.
ition that individuals are still recognizable from cues other When only noisy blobs are available, it might be beneficial
than face similarity. in terms of output naturalness to smooth them as much as
Note that our representation does not consider silhouette possible and to employ a naive segmentation strategy.
shape, which could be used to trivially re-identify the de-
identified images in this example, as our gallery images are
the exact originals from which the de-identified images are 5. Conclusion
generated. However, we assume that in real applications it We have introduced a model for full body and face de-
is unlikely that the attacker has the original images. identification of humans in images that enables synthesiz-
ing artificial human images that fit an input segmentation.
4.7. Discussion
We have shown that the model is useful for various kinds
Applying our model on full body segmentations of any of segmentations, from simple noisy bounding subtraction
level of detail, from simple blobs and naively segmented blobs to highly detailed garment-level segmentations. In a
blobs to full garment-based segmentations, produces de- perceptual study, we have found that our model generates
identified outputs that are distant from the original images images that look reasonably natural (best-performing setup
in terms of color and gradient similarity, as illustrated by scored a 5.6 naturalness score on a scale of 1 to 10), while
our re-identification experiments. Simultaneously, our per- offering a solid level of identity protection. Quantitative
ceptual study shows that the the more detailed the segmen- exploration of de-identification performance found that the
tation input, the higher the naturalness of the de-identified de-identified images produced by our model share very lit-
output. However, naturalness and recognizability are also tle color and gradient similarity with the original images, in
highly correlated (Pearson correlation coefficient of 0.97). contrast to naive methods including blurring, pixelization
As noted earlier, we believe that recognizability scores were and covering the face with a black rectangle.
22
References recognition at scale. In The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2016. 1
[1] A. Acquisti, R. Gross, and F. Stutzman. Faces of Facebook:
[16] D. P. Kingma and J. Ba. Adam: A method for stochastic
Privacy in the age of augmented reality, August 2011. Black-
optimization. CoRR, abs/1412.6980, 2014. 4
hat USA Technical Security Conference . 1
[17] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature,
[2] P. Agrawal and P. J. Narayanan. Person de-identification in 521(7553):436–444, 5 2015. 1
videos. IEEE Transactions on Circuits and Systems for Video
[18] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face
Technology, 21(3):299–310, March 2011. 2
attributes in the wild. In Proc. Int. Conf. on Computer Vision
[3] E. Ahmed, M. Jones, and T. K. Marks. An improved deep (ICCV), 2015. 2
learning architecture for person re-identification. In The
[19] V. Nair and G. E. Hinton. Rectified linear units improve re-
IEEE Conference on Computer Vision and Pattern Recog-
stricted boltzmann machines. In J. Frnkranz and T. Joachims,
nition (CVPR), June 2015. 1
editors, Proceedings of the 27th International Conference on
[4] D. Bitouk, N. Kumar, S. Dhillon, P. Belhumeur, and S. K. Machine Learning (ICML-10), pages 807–814. Omnipress,
Nayar. Face swapping: Automatically replacing faces in 2010. 3
photographs. ACM Trans. Graph., 27(3):39:1–39:8, 2008. [20] A. Narayanan, J. Huey, and E. W. Felten. A Precautionary
2 Approach to Big Data Privacy, pages 357–385. Springer
[5] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Netherlands, Dordrecht, 2016. 1
Software Tools, 2000. 5 [21] E. M. Newton, L. Sweeney, and B. Malin. Preserving privacy
[6] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, by de-identifying face images. IEEE Transactions on Knowl-
D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio. edge and Data Engineering, 17(2):232–243, Feb 2005. 1, 2
Generative adversarial nets. In Proc. Conf. on Neural Infor- [22] H. W. Ng and S. Winkler. A data-driven approach to cleaning
mation Processing Systems (NIPS), pages 2672–2680, 2014. large face datasets. In Proc. Int. Conf. on Image Processing
2, 4 (ICIP), pages 343–347, 2014. 2
[7] R. Gross, E. Airoldi, B. Malin, and L. Sweeney. Privacy En- [23] S. J. Oh, R. Benenson, M. Fritz, and B. Schiele. Faceless per-
hancing Technologies: 5th International Workshop, chapter son recognition; privacy implications in social media. CoRR,
Integrating Utility into Face De-identification, pages 227– abs/1607.08438, 2016. 1, 2
242. Springer Berlin Heidelberg, 2006. 2 [24] S. Park and M. M. Trivedi. A track-based human move-
[8] R. Gross, L. Sweeney, J. F. Cohn, F. De la Torre, and ment analysis and privacy protection system adaptive to en-
S. Baker. Protecting Privacy in Video Surveillance, chapter vironmental contexts. In Advanced Video and Signal Based
Face De-identification, pages 129–146. Springer Publishing Surveillance, 2005. AVSS 2005. IEEE Conference on, pages
Company Incorporated, 2009. 2 171–176, Sept 2005. 2
[9] R. Gross, L. Sweeney, F. de la Torre, and S. Baker. Model- [25] O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face
based face de-identification. In Proc. IEEE Conf. Computer recognition. In Proceedings of the British Machine Vision
Vision and Pattern Recognition Workshops (CVPRW), pages Conference (BMVC), 2015. 1
161–161, 2006. 2 [26] D. Pathak, P. Krähenbühl, J. Donahue, T. Darrell, and
[10] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao. MS-Celeb-1M: A. Efros. Context encoders: Feature learning by inpainting.
A Dataset and Benchmark for Large-Scale Face Recognition, 2016. 2, 3
pages 87–102. Springer International Publishing, Cham, [27] A. Radford, L. Metz, and S. Chintala. Unsupervised repre-
2016. 2 sentation learning with deep convolutional generative adver-
[11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning sarial networks. CoRR, abs/1511.06434, 2015. 2, 3, 4, 5
for image recognition. In The IEEE Conference on Computer [28] H. A. Rashwan, A. Solanas, D. Puig, and A. Martı́nez-
Vision and Pattern Recognition (CVPR), June 2016. 1 Ballesté. Understanding trust in privacy-aware video surveil-
[12] S. Ioffe and C. Szegedy. Batch normalization: Accelerating lance systems. International Journal of Information Secu-
deep network training by reducing internal covariate shift. rity, 15(3):225–234, 2016. 1
In F. R. Bach and D. M. Blei, editors, ICML, volume 37 of [29] D. Reid, S. Samangooei, C. Chen, M. Nixon, and A. Ross.
JMLR Workshop and Conference Proceedings, pages 448– Soft biometrics for surveillance: an overview. In Machine
456. JMLR.org, 2015. 3 Learning: Theory and Applications, 31, pages 327–352. El-
[13] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu. Hu- sevier, 2013. 1
man3.6m: Large scale datasets and predictive methods for 3d [30] S. Ribarić, A. Ariyaeeinia, and N. Pavešić. De-identification
human sensing in natural environments. IEEE Transactions for privacy protection in multimedia content: A survey. Sig-
on Pattern Analysis and Machine Intelligence, 36(7):1325– nal Processing: Image Communication, 47:131 – 151, 2016.
1339, jul 2014. 4 1, 2
[14] P. Isola, J. Zhu, T. Zhou, and A. A. Efros. Image-to-image [31] O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolu-
translation with conditional adversarial networks. CoRR, tional Networks for Biomedical Image Segmentation, pages
abs/1611.07004, 2016. 2, 3, 5 234–241. Springer International Publishing, Cham, 2015. 3
[15] I. Kemelmacher-Shlizerman, S. M. Seitz, D. Miller, and [32] P. Viola and M. Jones. Robust real-time object detection. In
E. Brossard. The megaface benchmark: 1 million faces for International Journal of Computer Vision, 2001. 5
23
[33] M. J. Wilber, V. Shmatikov, and S. J. Belongie. Can we
still avoid automatic face detection? CoRR, abs/1602.04504,
2016. 1
[34] T. Xiao, H. Li, W. Ouyang, and X. Wang. Learning deep
feature representations with domain guided dropout for per-
son re-identification. In The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2016. 1
[35] Y. Xu, T. Price, J.-M. Frahm, and F. Monrose. Virtual u:
Defeating face liveness detection by building virtual models
from your public photos. In 25th USENIX Security Sym-
posium (USENIX Security 16), pages 497–512, Austin, TX,
2016. USENIX Association. 1
[36] W. Yang, P. Luo, and L. Lin. Clothing co-parsing by joint im-
age segmentation and labeling. In Proceedings of the 2014
IEEE Conference on Computer Vision and Pattern Recogni-
tion, CVPR ’14, pages 3182–3189, Washington, DC, USA,
2014. IEEE Computer Society. 1, 4
[37] J.-Y. Zhu, P. Krähenbühl, E. Shechtman, and A. A. Efros.
Generative visual manipulation on the natural image mani-
fold. In Proceedings of European Conference on Computer
Vision (ECCV), 2016. 2, 4, 8
24