Professional Documents
Culture Documents
Interpretable Partitioned Embedding For Customized Fashion Outfit Composition
Interpretable Partitioned Embedding For Customized Fashion Outfit Composition
Composition
†
Zunlei Feng, † Zhenyun Yu, ‡ Yezhou Yang, † Yongcheng Jing, ♭ Junxiao Jiang, † Mingli Song
†
Zhejiang University, Hangzhou, China
‡
Arizona State University, Phoenix, Arizona
arXiv:1806.04845v4 [cs.CV] 21 Jun 2018
♭
Alibaba Group, Hangzhou, China
Abstract
1
as a whole. There are two characteristics for such deep neu- using the embedding, millions of Instagram photos of peo-
ral network based embedding method: 1) the embedding ple sampled worldwide are analyze to study spatio-temporal
is unexplainable; 2) the embedding does not bring attribute trends in clothing around the globe. Simo-Serra et al. [23]
information. Unfortunately, in many practical applications, train a classifier network with ranking as the constraint to
it is necessary to understand the importance of different at- extract discriminative feature representation and also used
tributes in an outfit composition for designers, businesses high level feature of the classifier network as embeddings
and consumers. That is to say, an interpretable and parti- of fashion style. Other embedding methods that are related
tioned embedding is vital for a practical fashion outfit com- to our work include Auto-Encoder [?], Variational Auto-
position system. Encoder (VAE) [14] and Predictability Minimization (PM)
To address the aforementioned problems, we proposed model [21]. Autoencoder and VAE are used to encode em-
a partitioned embedding network in this paper. The pro- beddings from unlabeled data. The encoded embeddings
posed network architecture consists of three components: usually contain mixed and unexplainable features of orig-
an auto-encoder module, a supervised attributes module, inal images. To get partitioned embeddings, Schmidhuber
and a multi-independent module. The auto-encoder module [21] adopted an adversarial operation to get embeddings,
serves to encode all useful information into the embedding. where units are independent. However, the independent
In the supervised attributes module, multiple attributes la- units are not meaningful.
bels are adopted to ensure that different parts of the overall Fashion Outfit Composition. As described above, due
embedding correspond to different attributes. In the multi- to the difficulty in modeling outfit composition, there are
independent module, to ensure each part of the embedding few works [12, 15, 22] studying fashion outfit composi-
only relates to the corresponding attribute, the mutually in- tion. Iwata et al. [12] propose a topic model to recommend
dependent constraint is taken into account. Then, consider- ”Tops” for ”Bottoms”. The goal of this work is to compose
ing that matching items may appear multiple times in dif- fashion outfit automatically. challenging in modeling many
ferent outfits, we propose a fashion composition graph to aspects of the fashion outfits, such as compatibility and aes-
model matching relationships in outfit with the extracted thetics. Veit et al. [25] use a Siamese Convolutional Neu-
partitioned embeddings. Meanwhile, an attribute matching ral Network (CNN) architecture to learn clothing matching
map which learns the importance of different attributes in from the Amazon co-purchase dataset, focusing on the rep-
the composition is also built. Comparative framework be- resentative problem of learning compatible clothing style.
tween our model and the multi-modal embeddings based Simo-Serra1 et al. [22] introduce a Conditional Random
method [15] is shown in Figure 1. Field (CRF) to learn the different outfits, types of people
To summarize, our work has three primary contributions: and settings. The model is used to predict how fashionable
1) We present a partitioned embedding network, which can a person looks on a particular photograph. Li et al. [15]
extract interpretable embeddings of fashion outfit items. 2) use the quality scores as the label and multi-modal embed-
We put forward a weakly-supervised fashion outfit compo- dings as features to train a grading model. The framework
sition model, which depends solely on a large number of of Li’s approach is shown in Figure 1(a). However, the
outfits without quality scores of outfits as others. Besides, mixed multi model embedding used by Li’s model is unex-
our model can be extended to a dataset with annotated qual- plainable, which leads to unexplainable outfit composition.
ity scores. 3) An iterative and customized fashion outfit
composition scheme is given. Since fashion trends alter di- 3. The Proposed Method
rections quickly and dramatically, our model can keep up
To overcome the limitations of existing fashion outfit
with the fashion trends by easily incorporating new fashion
composition approaches, we propose an interpretable par-
outfit dataset.
titioned embedding method to achieve customized fashion
outfit composition. In section 3.1, the partitioned embed-
2. Related Work ding network is firstly presented how to partition an embed-
Embedding Methods. In recent years, there are sev- ding into interpretable parts which correspond to different
eral models [19, 23, 26] can be used to get embeddings attributes. Then, we introduce how to build composition re-
of clothes. Vittayakorn et al. [26] extract five basic fea- lationship in our proposed composition graph and attributes
ture (color, texture, shape, parse and style) of outfit appear- maps with the interpretable and partitioned embedding.
ance and concatenated them to form a vector for represent-
3.1. Interpretable Partitioned Embedding
ing outfit. They use the vector to learning the outfit sim-
ilarity and analyze visual trends in fashion. Matzen et al. Let I denote an item of outfit, R(I) denote the encoded
[19] adopt deep learning method to train several attribute embedding of item I. To make the extracted embedding in-
classifiers and used high-level features of the trained classi- terpretable and partitionable, there are two constraints that
fiers to create a visual embedding of clothing style. Then, should be satisfied: on one hand, the fixed parts of whole
2
Figure 2. The framework of partitioned embedding network. The overall network architecture consists of three components: an auto-
encoder module, a supervised attributes module and a multi-independent module. The auto-encoder module serves to encode all useful
information into the embedding. It is composed of encoder networks Enet 1, Enet 2, ..., Enet N and decoder network DN et. In the
supervised attributes module, attributes networks Anet 1, Anet 2, ..., Anet N − 1 and labels Label 1, Label 2, ..., Label N − 1 are
used to ensure that different parts of the overall embedding correspond to different attributes. In the multi-independent block, adversarial
prediction networks P net 1, P net 2, ..., P net N are adopted to make sure that different parts of whole embedding are independent.
embedding should correspond to specific attributes; on the where α, β are balancing parameters, I ′ is the de-
other hand, different parts of the whole embedding should I, Li (I, I ′ ) is the auto-encoder
coded image of item P
N
be mutual independent. Thus, the embedded attributes em- loss(I,Label k)
loss, Lt (I, Label) = k=1
N is the summed
bedding R(I) of item I can be described as below: loss of each auxiliary attributes loss(I, Label k) and
Ld (r1 , r2 , ..., rN ) is mutually independent loss. Inspired by
R(I) = [r1 ; r2 ; ...; rN ], (1)
the adversarial approach in [21], we adopt adversarial oper-
s.t.P (ri |{rj , j 6= i}) = P (ri ) ation to meet the mutually independent constraint. The mu-
tually independent loss Ld (r1 , r2 , ..., rN ) includes predict-
where ri corresponds to different parts of embedding and N ing loss Lp and encoding loss Le . In the prediction stage,
is the total number of attributes. Condition P (ri |{rj , j 6= each prediction net P net i tries to predict corresponding
i}) = P (ri ) implies that ri does not depend on {rj , j 6= embedding part ri as much as possible to maximize pre-
i}. Figure 2 shows framework of the interpretable par- dictability. The predicted loss Lp can be defined as:
titioned embedding network. The whole embedding net-
work is an auto-encoder network which embeds items N
X
of outfits into another feature space. The whole en- Lp = E[ri − fi (r1 , ..., ri−1 , ri+1 , ..., rN )]2 , (3)
coder network is composed of attribute encoder networks i=1
Enet 1, Enet 2, ..., Enet N , which embeds an original
item into different parts rk , k ∈ {1, 2, ..., N } of whole em- where fi is function representation of prediction net P net i.
bedding. The decoder network DN et decodes an whole In the encoding stage, all encoder nets Eneti try to make all
embedding back to original image. In the process of decod- prediction net fi fail to predict corresponding embedding,
ing, attributes networks Anet 1, Anet 2, ..., Anet N −1 and which means to minimize predictability. The encoded loss
labels Label 1, Label 2, ..., Label N − 1 serve to learn use- Le thus could be defined as:
ful features which are related to corresponding attributes. N
X
Meanwhile, the mutually independent constraint is taken Le = − E[ri − fi (r1 , ..., ri−1 , ri+1 , ..., rN )]2 . (4)
into account, which can not only ensure different parts of i=1
embedding solely related to corresponding attribute but also
extract embedding of indefinite attributes, such as, texture In summary, the auto-encoded loss Li (I, I ′ ) makes sure
and style. As a consequence, the loss of the partitioned em- that the whole embedding contains all information in the
bedding network is defined as follows: item of an outfit. Attributes sum loss Lt (I, Label) serves to
learn useful information which are related to corresponding
L = Li (I, I ′ ) + αLt (I, Label) + βLd (r1 , r2 , ..., rN ), (2) attributes. Mutually independent loss Ld (r1 , r2 , ..., rN ) en-
3
tion method [7] to extract color themes. This method can
extract large span and any number of ranked color themes.
We modify it to just extract color in the foreground area of
the item. Meanwhile, we adopt top-5 extracted color themes
Figure 3. Encoded shapes of hat by VAE model with uniform sam- of an item as the color label. In the experiment, we adopt
pling of latent embedding. Generative adversarial nets(GAN) [9] to extract color em-
bedding. So, the color attribute supervised network Anet 1
sures that different parts of embedding only related to cor- is a GAN architecture, which includes a generative color
responding attributes. model Gc and a color discriminative model Dc . Input and
output of network Gc are P art 1 and corresponding color
themes, respectively. Input and output of network Dc are
3.1.1 niuniuniu color themes and a discriminant value, respectively. The ar-
chitectures of these two networks are summarized in Table
As aforementioned, there is a large number of attributes [26,
1.
17, 19] for describing fashion, such as category, color,
shape, texture, style and so on. Considering that some Shape embedding extraction. To get an explicit map-
attributes are indefinable (such as texture and style), we ping relationship between shape and embedded codes, we
classify those attributes into same class as remaining at- adopt a Variational Auto-Encoder model [14] to encode
tributes. The information of remaining attributes can be and decode shape information. We conduct a toy experi-
extracted with the mutually independent constraint. To get ment to validate that shape network can encode all shapes
importance of attributes in the fashion cloth, we adminis- in shape space. The toy VAE network encode the origi-
ter a questionnaire among 30 professional stylists. Accord- nal hats items into latent embeddings with two parameters.
ing to results of the questionnaire, we split attributes into Then, through uniform sampling of two parameters, corre-
4 classes and rank them as follow: category, color, shape, sponding encoded shapes are shown in Figure 3, where we
texture and remaining attributes (texture, style and others). can see that almost all kinds of hats’ shape are included.
Thus, in the experiment, we split the whole dataset into Meanwhile, shapes information are encoded and decoded
different groups according to the category. So, the num- relatively satisfactory. For each item of an outfit, we use the
ber of attribute N equals to 3. Enet 1, Enet 2, Enet 3 basic open-close operation and threshold methods to get the
are encoding networks of the color attribute, shape attribute mask of item as the label of shape. Input and output of the
and remaining attributes( texture, style and so on), respec- shape attribute network Anet 2 are P art 2 and correspond-
tively. Anet 1, Anet 2, Anet 3 are supervised networks of ing shape mask, respectively.
the color attribute, shape attribute and remaining attributes, Remaining attributes embedding extraction. As de-
respectively. P N et 1, P N et 2, P N et 3 are partitioning scribed above, indefinable attributes (texture, shape and oth-
network of color attribute, shape attribute and remaining at- ers) are classify into same class as remaining attributes. To
tributes, respectively. extract corresponding information of remaining attributes,
Color embedding extraction. As described above, mutually independent constraint is took into account. In
color is a primary element in fashion outfit composition. To this article, adversarial method is adopted to realize mutu-
get color label, we adopt latest proposed color theme extrac- ally independent constraint. For each attribute correspond-
4
ing embedding P net i, there is a prediction network P net i
that ensure P art i is independent of all other {P art j, j 6=
i}. All the prediction networks share the same architec-
ture. Input and output of prediction network P net i are
{P art j, j 6= i}, P art i.
Let Ci denote a Convolution-BatchNorm-ReLU layer
with i filters. CDj denotes a Convolution-BatchNorm-
Dropout-ReLU layer with a dropout rate of 30% and j fil-
ters. DCk denotes a deconvolution-ReLU layer with k
filters, and F Ct denotes a Full Connection with t neu-
ron. All convolutions are 4x4 spatial filters with stride
2. Convolutions in both encoder and the discriminator
are downsampled by a factor of 2, whereas in the de-
coder they upsample by a factor of 2. Input image size of
Figure 4. Fashion outfit composition graph. In the graph, all items
Enet 1, Enet 2andEnet 3 is 128*128*3. All the network are classified into five classes according to the category. For each
architectures are summarized in table 1. category, items are clustered into different cluster centers accord-
ing to attributes’ importance. With these cluster centers as ver-
3.2. Fashion Outfit Composition
texes, edges and weights of fashion composition graph are learned
Considering that some matching items may appear many from outfit dataset. Meanwhile, attributes maps between items are
times in different outfits, we propose a fashion composi- built, which model significance of different attributes.
tion graph to model matching relationship in outfit. For
each category, we first cluster all items into Np cluster cen-
ters Pk , k ∈ {1, 2, 3, ..., Np} according to color embedding.
Then, for each cluster Pk , items that belongs to it is clus- M = {{Uxc , A}, {Hyc, B}, {Ozc , D}}, (7)
tered into Nq cluster centers Qk , k ∈ {1, 2, 3, ..., Nq } ac- c ∈ {1, 2, .., C}, x ∈ {1, 2, .., X},
cording to shape embedding. Lastly, items belonging to
y ∈ {1, 2, .., Y }, z ∈ {1, 2, .., Z}
shape cluster center are clustered into Nr cluster centers
Rk , k ∈ {1, 2, 3, ..., Nr } according to the remaining em- where, C is the number of categories, X, Y, Z are clus-
bedding. Then, we can get N = Np Nq Nr cluster centers ters’ number of color feature, shape feature and remain-
Ck , k ∈ {1, 2, 3, ..., N }. After getting all clustering cen- ing features respectively, Uxc is the x-th color attribute clus-
ters, fashion composition graph G is defined as follow: ter of category c, Uyc is the y-th shape attribute cluster of
category c, Ozc is the z-th other attribute cluster of cate-
G = (V, E, W ), (5) gory c, A, B, D are score values set and all initial score
value in set are equal to zero. When an item I of cate-
where V is the vertex set of all cluster center Ck , E is the gory c and an item I ′ of category c′ appear together in the
edge set and W is the weight set. same outfit, score value a(i, j) between color cluster Uic
At the initial stage, all vertexes v ∈ V have no connec- ′
( r1 ∈ Uic , R(I) = [r1 ; r2 ; r3 ]) and color cluster Ujc (
tion and all weights w ∈ W are equal to zeros. If item ′
I ∈ vi and item I ′ ∈ vj appear in the same outfit and there r1′ ∈ Ujc , R(I ′ ) = [r1′ ; r2′ ; r3′ ] ) will be updated using the
is no connection between vi and vj , a connection is created following equation:
and weight wi,j is set as one. If corresponding vertices vi a(i, j) = a′ (i, j) + 1, a(i, j) ∈ A, (8)
and vj already have connection, the weight between them
will be updated as follow: where,a′ (i, j) is last stage score value. b(i, j) ∈ B and
d ∈ D are updated in the same way.
′
wi,j = wi,j + α, (6) After getting outfit composition graph G and attributes
map M , composition score S of an outfit u is defined as
′
where wi,j is the weight in the last stage. Figure 4 shows follow:
an example of the connection process.
Attribute Matching Map. After obtaining the inter- S(u) = S1 + αS2 , (9)
PK
pretable and partitioned embeddings, for each category, ev-
k=1 wk
ery attribute class corresponding parts of embeddings are S1 = ,
Kσ(w1 , w2 , ..., wK )
clustered into several clusters. In the process of building 3 PK t
fashion outfit graph G, an attribute composition score map k=1 sk
X
S2 = t , st , ..., st )
M of different attributes is also built. which is defined as: t=1
Kσ(s 1 2 K
5
Figure 5. Composition visual comparison among different methods
6
Figure 7. items clustering visual result with different parts of em-
bedding.
7
Figure 9. Fashion trends with different years
Figure 8. Outfit Composition with user specify attributes (color or 4.5. Outfit Composition’s Extension
shape).
As described above, our fashion composition graph is
weakly supervised, which only dependents on outfit dataset
class. To verify the effectiveness of mutually independent without matching score labels. Our model can also be ex-
constraint, we use remaining attributes corresponding parts tended to a dataset with matching score labels. After nor-
of embeddings to cluster bag items. From Figure 7(c), we malizing the favorite user scores, the fashion composition
can observe that bags in the same class usually have some graph with the score is built using Eq.(6) by setting α a nor-
attributes in common. Bags in class 1 and class 2 have sim- malized score of the corresponding outfit. Outfit composi-
ilar texture and bags in class 3 have similar styles, which tion result with score label are given in Figure 5, Figure 6
demonstrates that remain feature includes other useful at- and table 2.
tribute feature. Meanwhile, bags in the same class usually The fashion trends is known by its quick and dogmat-
have multifarious shape and color, which proves that the ical variation [1]. The proposed technique enable us to
mutually independent constraint works. keep up with the fashion trend along time. When new
trend outfit dataset
√ joints, old W of composition graph is
4.4. Personalized Attributes Composition divided by Nm , where Nm is the max weight of the
graph. Then, the fashion composition graph is updated fol-
In the stage of building composition graph, we cluster lowing Eq.(6) with α > 1. In the experiments, we classify
the attributes corresponding parts of embeddings into dif- our whole outfit dataset into four parts according to years:
ferent level cluster centers, which are set as vertexes in the dataset1(2007-2009), dataset2(2010-2012), dataset3(2013-
graph. In the testing stage, users can specify their pre- 2015), dataset4(2016-2017). We use our proposed iterated
ferred attributes, such as “color=red” and “shape=circle”, model with dataset1(2007-2009) as the basic dataset to get
our model can give a series of specified attribute items with initial outfit composition graph. Then, by constantly adding
interpretable matching scores for users to choose. Fig- new dataset into the basic dataset, we can get new outfit
ure 5 shows some ordered recommended matching items composition graph, by which fashion trend is kept up with.
with five composition score. For example, sum composi- Figure 9 gives the visual most popular outfit along years.
tion score of the recommended hat in purple dashed box is We can see that our iterated model can keep up with fashion
51.87. Weight composition score in blue box is 16.45. At- trends tightly.
tributes matching scores corresponding to color, shape and
remaining attributes are 14.36, 11.74 and 9.32, respectively. 5. Conclusion
From attributes matching scores, we can see that color at-
tribute and shape attribute give more contribution to the fi- Fashion style tells a lot about one’s personality. Inter-
nal composition than remaining attributes, which is consis- pretable embeddings of items are fatal in design, market
tent with our questionnaire’s result. For the specified color and clothing retrieval. In this paper, we propose a parti-
attribute, recommended items are almost indiscriminate in tioned embedding network, which can extract interpretable
color. For the specified shape, all recommended items have embedding from items. With attributes label as the con-
similar shapes that are in line with the original specified straint, different parts of embedding are restrained to re-
shape. lated to corresponding attributes. With multi-independent
8
constraint, different parts of embedding are restrained to [11] J. Huang, R. S. Feris, Q. Chen, and S. Yan. Cross-domain im-
only related to corresponding attributes. Then, using the ex- age retrieval with a dual attribute-aware ranking network. In
tracted partitioned embeddings, a composition graph with Proceedings of the IEEE International Conference on Com-
attributes matching map are built. When users specify puter Vision, pages 1062–1070, 2015. 1
their preference attributes, such as color and shape, our [12] T. Iwata, S. Wanatabe, and H. Sawada. Fashion coordinates
model can recommend desirable outfit with interpretable recommender system using photographs from fashion mag-
azines. In IJCAI, volume 22, page 2262, 2011. 1, 2
attributes matching scores. Meanwhile, extensive exper-
[13] M. H. Kiapour, K. Yamaguchi, A. C. Berg, and T. L. Berg.
iments demonstrate that interpretable and partitioned em-
Hipster wars: Discovering elements of fashion styles. In
bedding is helpful for designer, businesses and consumers
European conference on computer vision, pages 472–488.
to better understand composition in outfits. In applications, Springer, 2014. 1
people’s skin color and stature have great influence on out- [14] D. P. Kingma and M. Welling. Auto-encoding variational
fit composition. Thus, personalization would be took into bayes. arXiv preprint arXiv:1312.6114, 2013. 2, 4
consideration in our future work. In further, straightforward [15] Y. Li, L. Cao, J. Zhu, and J. Luo. Mining fashion outfit com-
composition relationship among items is another direction position using an end-to-end deep learning approach on set
of our future work. data. IEEE Transactions on Multimedia, 2017. 1, 2
[16] Q. Liu, S. Wu, and L. Wang. Deepstyle: Learning user
References preferences for visual recommendation. In Proceedings of
the 40th International ACM SIGIR Conference on Research
[1] K. Abe, T. Suzuki, S. Ueta, A. Nakamura, Y. Satoh, and
and Development in Information Retrieval, pages 841–844.
H. Kataoka. Changing fashion cultures. arXiv preprint
ACM, 2017. 1
arXiv:1703.07920, 2017. 1, 8
[17] Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang. Deepfash-
[2] L. Bossard, M. Dantone, C. Leistner, C. Wengert, T. Quack,
ion: Powering robust clothes recognition and retrieval with
and L. Van Gool. Apparel classification with style. In Asian
rich annotations. In Proceedings of the IEEE Conference
conference on computer vision, pages 321–335. Springer,
on Computer Vision and Pattern Recognition, pages 1096–
2012. 1
1104, 2016. 1, 4
[3] H. Chen, A. Gallagher, and B. Girod. Describing clothing
[18] D. G. Lowe. Distinctive image features from scale-
by semantic attributes. Computer Vision–ECCV 2012, pages
invariant keypoints. International journal of computer vi-
609–623, 2012. 1
sion, 60(2):91–110, 2004.
[4] Q. Chen, J. Huang, R. Feris, L. M. Brown, J. Dong, and
S. Yan. Deep domain adaptation for describing people based [19] K. Matzen, K. Bala, and N. Snavely. Streetstyle: Exploring
on fine-grained clothing attributes. In Proceedings of the world-wide clothing styles from millions of photos. arXiv
IEEE conference on computer vision and pattern recogni- preprint arXiv:1706.01869, 2017. 1, 2, 4
tion, pages 5315–5324, 2015. 1 [20] J. Morovic. To develop a universal gamut mapping algo-
[5] N. Dalal and B. Triggs. Histograms of oriented gradients for rithm. 1998. 7
human detection. In Computer Vision and Pattern Recogni- [21] J. Schmidhuber. Learning factorial codes by predictability
tion, 2005. CVPR 2005. IEEE Computer Society Conference minimization. Learning, 4(6), 2008. 2, 3
on, volume 1, pages 886–893. IEEE, 2005. [22] E. Simo-Serra, S. Fidler, F. Moreno-Noguer, and R. Urta-
[6] W. Di, C. Wah, A. Bhardwaj, R. Piramuthu, and N. Sundare- sun. Neuroaesthetics in fashion: Modeling the perception of
san. Style finder: Fine-grained clothing style detection and fashionability. In Proceedings of the IEEE Conference on
retrieval. In Proceedings of the IEEE Conference on com- Computer Vision and Pattern Recognition, pages 869–877,
puter vision and pattern recognition workshops, pages 8–13, 2015. 2
2013. 1 [23] E. Simo-Serra and H. Ishikawa. Fashion style in 128 floats:
[7] Z. Feng, W. Yuan, C. Fu, J. Lei, and M. Song. Finding in- Joint ranking and classification using weak data for feature
trinsic color themes in images with human visual perception. extraction. In Proceedings of the IEEE Conference on Com-
Neurocomputing, 2017. 4 puter Vision and Pattern Recognition, pages 298–307, 2016.
[8] J. Fu, J. Wang, Z. Li, M. Xu, and H. Lu. Efficient clothing 1, 2
retrieval with semantic-preserving visual phrases. In Asian [24] L. L. Thurstone. A law of comparative judgment. Psycho-
Conference on Computer Vision, pages 420–431. Springer, logical review, 34(4):273, 1927. 7
2012. 1 [25] A. Veit, B. Kovacs, S. Bell, J. McAuley, K. Bala, and S. Be-
[9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, longie. Learning visual clothing style with heterogeneous
D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Gen- dyadic co-occurrences. In Proceedings of the IEEE Inter-
erative adversarial nets. In Advances in neural information national Conference on Computer Vision, pages 4642–4650,
processing systems, pages 2672–2680, 2014. 4 2015. 2
[10] M. Hadi Kiapour, X. Han, S. Lazebnik, A. C. Berg, and T. L. [26] S. Vittayakorn, K. Yamaguchi, A. C. Berg, and T. L. Berg.
Berg. Where to buy it: Matching street clothing photos in Runway to realway: Visual analysis of fashion. In Applica-
online shops. In Proceedings of the IEEE International Con- tions of Computer Vision (WACV), 2015 IEEE Winter Con-
ference on Computer Vision, pages 3343–3351, 2015. 1 ference on, pages 951–958. IEEE, 2015. 2, 4
9
[27] X. Wang and T. Zhang. Clothes search in consumer photos
via color matching and attribute learning. In Proceedings
of the 19th ACM international conference on Multimedia,
pages 1353–1356. ACM, 2011. 1
[28] K. Yamaguchi, M. H. Kiapour, L. E. Ortiz, and T. L. Berg.
Parsing clothing in fashion photographs. In Computer Vision
and Pattern Recognition (CVPR), 2012 IEEE Conference on,
pages 3570–3577. IEEE, 2012. 1
[29] K. Yamaguchi, T. Okatani, K. Sudo, K. Murasaki, and
Y. Taniguchi. Mix and match: Joint model for clothing and
attribute recognition. In BMVC, pages 51–1, 2015. 1
10