You are on page 1of 19

Connection Science

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/ccos20

ClothGAN: generation of fashionable Dunhuang


clothes using generative adversarial networks

Qiang Wu, Baixue Zhu, Binbin Yong, Yongqiang Wei, Xuetao Jiang, Rui Zhou &
Qingguo Zhou

To cite this article: Qiang Wu, Baixue Zhu, Binbin Yong, Yongqiang Wei, Xuetao Jiang,
Rui Zhou & Qingguo Zhou (2021) ClothGAN: generation of fashionable Dunhuang
clothes using generative adversarial networks, Connection Science, 33:2, 341-358, DOI:
10.1080/09540091.2020.1822780

To link to this article: https://doi.org/10.1080/09540091.2020.1822780

Published online: 29 Sep 2020.

Submit your article to this journal

Article views: 3551

View related articles

View Crossmark data

Citing articles: 10 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=ccos20
CONNECTION SCIENCE
2021, VOL. 33, NO. 2, 341–358
https://doi.org/10.1080/09540091.2020.1822780

ClothGAN: generation of fashionable Dunhuang clothes using


generative adversarial networks
Qiang Wu , Baixue Zhu, Binbin Yong, Yongqiang Wei, Xuetao Jiang, Rui Zhou and
Qingguo Zhou
School of Information & Engineering, Lanzhou University, Lanzhou, People’ s Republic of China

ABSTRACT ARTICLE HISTORY


Clothing is one of the symbols of human civilisation. Clothing design Received 28 June 2020
is an art form that combines practicality and artistry. The Dunhuang Accepted 8 September 2020
clothes culture has a long history which represents ancient Chinese KEYWORDS
aesthetics. Artificial intelligence (AI) technology has been recently Dunhuang clothes;
applied to multiple areas, which is also drawing increasing atten- designing clothes; GAN; style
tion in fashion. However, little research has been done on the usage transfer; Dunhuang element
of AI for the creation of clothing, especially in traditional culture. It
is challenging that the exploration of computer science and Dun-
huang clothing design, which is a cross-history interaction between
AI and Chinese classical culture. In this paper, we propose Cloth-
GAN, which is an innovative framework for “designing” new patterns
and styles of clothes based on generative adversarial network (GAN)
and style transfer algorithm. Besides, we built the Dunhuang clothes
dataset and conducted experiments to generate new patterns and
styles of clothes with Dunhuang elements. We evaluated these cloth-
ing works generated from different models by computing inception
score (IS), human prefer score (HPS) and generated score (IS and HPS).
The results show that our framework outperformed others in these
designing works.

1. Introduction
From ancient times to the present, the patterns and styles of clothing reflect the human
pursuit of beauty. Dress has been one of the embodiments of human civilisation. As time
goes on, human clothing changes at different times, but the pursuit of beauty remains con-
stant. Dunhuang is internationally well known. The Mogao Caves at Dunhuang is a cluster
of 492 caves, containing 45,000 m2 of frescoes and 2415 stucco statues, which is a precious
art heritage of the world. The frescoes and colourful sculptures at the Mogao Caves in Dun-
huang preserve rich and charming images and materials of medieval costumes, spanning
thousands of years (Jinshi, 2010), which cloud be considered as a source of inspiration for
many costume designers. Artificial intelligence (AI) (Soto-Morettini, 2017) is intelligence dis-
played by machines driven by data and algorithms, in contrast to the natural intelligence
demonstrated by humans and animals, which was proposed early in 1955. AI techniques

CONTACT Qingguo Zhou zhouqg@lzu.edu.cn

© 2020 Informa UK Limited, trading as Taylor & Francis Group


342 Q. WU ET AL.

have experienced a wave of ups and downs. In recent years, AI tech especially in deep learn-
ing (LeCun et al., 2015; Schmidhuber, 2015) has been widely used in computer vision (He
et al., 2015), natural language processing (Devlin et al., 2018), reinforcement learning (Silver
et al., 2016), healthcare (R. Zhou et al., 2019), smartcity (Wu et al., 2019), time-series forecast
(Yong et al., 2020), and intelligence system (Zhang et al., 2017; Q. Zhou et al., 2017).
Deep learning has begun to carry out a creative artistic design. In 2014, generative adver-
sarial networks (GAN) (Goodfellow et al., 2014) was proposed, which makes AI algorithms
much more creativity. GAN can perform image to image translation (Isola et al., 2017), text
to image generation (Xu et al., 2018), and other creative applications. Furthermore, GAN
networks obtained an excellent performance in image restoration (Lehtinen et al., 2018;
Radford et al., 2015), super-resolution image processing (Redmon et al., 2016), etc. Captur-
ing the internal characteristics of the convolution neural network (CNN) (He et al., 2015;
Redmon et al., 2016) is also a practical method of artistic creation of pictures, not just in the
process of traditional image recognition and processing.
There are ambitious relations between computers and creativity. AARON is a robotic sys-
tem, developed by the artist and programmer Harold Cohen (Zhang et al., 1995), which can
pick up a paintbrush with its robotic arm and paint on canvas on its own. Simon Colton’s
Painting Fool (Colton et al., 2015) is much more autonomous than AARON. Although the
software does not physically apply paint to canvas, it simulates many styles digitally, from
collage to painting strokes. Whether it is the painting by a computer or generating artistic
pictures, creativity is inseparable from computing power and algorithms.
In 2015, a computer version program named DeepDream (Mordvintsev et al., 2015),
which is created by Google to use a CNN structure to find and enhance patterns in image,
which could create a deliberately over-processed images with dream-like hallucinogenic
appearance. Style transfer algorithm (Gatys et al., 2016) was proposed in 2016, which inte-
grates the content of one image with the style of another to generate new photos that have
both content and style based on the different features of two images. Besides, the method
can adjust the weight of the content and style in the new image.
In recent years, with the rapid development of deep learning, some researchers began
to study the art and fashion field based on the GAN framework (Arriagada, 2020; Cui
et al., 2018; Elgammal et al., 2017; Kato et al., 2019). However, the current art design research
based on the GAN network has three shortcomings: (1) it is rarely found on other models
except for the GAN model; (2)it is seldom related to the traditional aesthetics in human
history; (3) the clothing works generated by recent methods are relatively simple and
unsatisfactory in terms of aesthetics.
To overcome these shortcomings, we summarise our contributions in this paper as
follows:

• We innovatively combine AI algorithms with Dunhuang clothing culture to design


clothing for protecting and inheriting Dunhuang clothing culture.
• We fusion the capability of creativity and extraction from the GAN framework and CNN
structure of the style transfer algorithm.
• We propose the ClothGAN framework to automatically generate new patterns and styles
of clothing with Dunhuang elements.
• We build a the Dunhuang dataset, including 52,908 images of Dunhuang
CONNECTION SCIENCE 343

• The results in the works generated by ClothGAN show that our method can produce
more acceptable clothing works than other models.

The remainder of this paper is organised as follows. Section 2 introduces the related
works, including GAN theory, style transfer algorithm and art-related works. Section 3
defines the clothing generation problem and makes the datasets ready. Section 4 details
our ClothGAN framework. The experiments and results are shown in Section 5. Section 6
concludes the paper and discusses future work.

2. Related works
In this section, we introduce fashion and AI-related works, GAN theory, Deepdream, and
style transfer algorithms, which are related to machine clothing design.

2.1. Art and AI


AI has begun to enter the field of art and fashion by a step-by-step trial and exploration
while the rapid development and the widespread use in areas such as computer vision and
natural language processing (NLP),
In 2015, a computer version program named DeepDream (Mordvintsev et al., 2015),
which is created by Google to use CNN structure to find and enhance patterns in image,
which could create a deliberately over-processed images with dream-like hallucinogenic
appearance. The DeepDream program could ask the computer to use its “imagination”. In
general, art and fashion need vision. With the continuous changes and upgrades of the GAN
network, the machine has creativity in addition to its imagination. Some researchers try to
use this creativity of computers to assist in clothing-related work.
Virtual clothing display can improve the design efficiency of clothing designers, which
plays an essential role in clothing design. FashionGAN was proposed (Cui et al., 2018) to
input the desired clothing sketch and a specified fabric image. After that, it could automat-
ically display a virtual clothing image consistent with the input clothing sketch and fabric
image shape and texture. In addition to assisting the designer with a quick display of cloth-
ing, the generated images from GAN can be used for fashion design or even directly design
clothing. A fashion designer can be inspired and inherited with these generated images
from the GAN structure. CAN: Creative Adversarial Networks (Elgammal et al., 2017) was devel-
oped to create novel figures simulating a given distribution. GANs-based clothes design
(Kato et al., 2019) was developed as a method of generating clothing design images.
Many researchers have begun to use the “imagination” and “creativity” of the computer
based on deep learning in the art and fashion field.

2.2. Generative adversarial networks


Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) consists of two parts, a
generative model and a discriminative model. Among them, the generation model can be
understood as a “forger”, trying to deceive the discrimination model by constructing fake
data; the discrimination model can be understood as a discriminator, try its best to dis-
criminate whether the data comes from real samples or simulated data constructed by the
344 Q. WU ET AL.

Figure 1. The structure of GAN: the goal of the generator is to make the generated data closer to the
real data, and the discriminator is trained to distinguish real data or not.

generation model. Both models improve their ability through continuous adversary learn-
ing. For the generator, it needs to learn about the distribution of data. For the discriminator,
it needs to indicate whether the data comes from real data or not (as shown in Figure 1).
The generative adversarial problem definition: the probability density function of the
known target high-dimensional data is Pdata (x), we set the probability density function
of the generator is PG (x; θ), θ is the parameters of the probability density function. We
need to find θPG (x; θ), which is getting closer to Pdata (x). By extracting m data from
Pdata (x) : x1 , x2 , x3 . . . xm . The process of finding θ is the process that maximises the result

of the maximum likelihood function L = m i=1 PG (x ; θ). The specific derivation process is as
i

follows:
     
θ ∗ = arg max PG x i ; θ = arg max log PG x i ; θ
θ θ
i=1 i=1

m
  
= arg max log PG x i ; θ x 1 , x 2 , . . . , x m fromPdata (x)
θ
i=1

≈ arg max Ex∼Pdata log PG (x; θ)


θ

= arg max Pdata (x) log PG (x; θ) dx − Pdata (x) log Pdata (x) dx (1)
θ X X

In fact, it is classified as a problem of calculating the minimum KL distance between


P( data)(x) and P( G)(x; θ).
The entire optimisation objective function is as Equation (1)

min max V(D, G) = Ex∼pdata (x) [log D(x)] + Ez∼pz (z) [log(1 − D(G(z)))] (2)
G D

where G is generator and D is discriminator, V(G,D) is the value function. We can train G by
minimising log(1 − D(G(z))). That is to say, the discriminator D and the generator play a
minimax game on the value function V(G,D).
At first, the generator and the discriminator of GAN were composed of deep neural
network (DNN). DCGAN (Simonyan & Zisserman, 2014) combines CNN and adversarial
CONNECTION SCIENCE 345

framework, which can perform relatively more stable training especially in the computer
version.

2.3. DeepDream
DeepDream (Mordvintsev et al., 2015) is an experiment that visualises neural network learn-
ing models. Similar to children watching clouds and trying to explain random shapes,
DeepDream will over-interpret and enhance the patterns they see in the image.
How the CNN framework works are difficult to explain in theory. Zeiler (2014) pro-
posed to use the gradient ascent method to visualise the characteristics of each layer of
the network. Thus, a noise image is used to input the system. When the reverse update
is performed, the network weights are fixed, but the pixel values of the initial image are
updated. The way of “training image” is to visualise the network, which is the foundation of
DeepDream and the style transfer algorithm in the later.
When DeepDream analyses these neurons, which input many pictures and then under-
stand what features these neurons detected are unrealistic. Because many features are
difficult for human eyes to recognise. The algorithm will force the neural network to pro-
duce something that does not exist in the picture, which will have similar dreams and
hallucinations.

2.4. Style transfer algorithm


An image has both content and style, and the styles and contents are not the same in dif-
ferent photos. Style transfer algorithm (Gatys et al., 2016) takes the style of an image a and
the content of an image p to get a new image x that has both the style of the image a and
the content of the image p (as shown in Figure 6). When training CNN classifiers (Simonyan
& Zisserman, 2014), the feature map near the input layer contains more detailed informa-
tion such as the texture of the image, while the feature map near the output layer contains
numerous content information. When applying the style of the picture a (style) on top of the
content of the picture p (content), we will use a gradient descent algorithm to update the
content of the target image x (target). Thus, the shallow convolution layer has a response
value similar to the picture a , and the deeper convolution layer has a similar response to
the p . It ensures that the generated image x has a similar style with a and similar content
with p .

3. Problems and datasets


3.1. Problem definition
The goal of the clothing generation (“design”) problem is to generate new patterns and
styles of clothing through algorithms and datasets, which could be defined as :

NewCloth = G(F) (3)

where G is the generation function to “design” new clothing; F is the clothing dataset.
Our Dunhuang clothing generation problem inherits and extends the clothing “design”
problem, which is driven by a clothing dataset, and adds Dunhuang style to these new
346 Q. WU ET AL.

patterns from the Dunhuang dataset.

NewCloth = ST(G(F), D) (4)

where G is the generation function to create new clothing; F is a clothing dataset; D is the
Dunhuang dataset; ST is the style transfer function.

3.2. Dataset
Our study mainly depends on two datasets. One is the Dunhuang dataset that is collected
and built by us and the other is an open-source dataset Fashion Mnist (Xiao et al., 2017).

3.2.1. Dunhuang dataset


We took about 10 months to collect and collate 52,908 images of the Dunhuang dataset
(as shown in Figure 2). These images are not just clothes but the texture and style in other
types of Dunhuang images, which are also worthy of reference for clothing design.
Here are the steps of building Dunhuang Dataset:

• We collected 3500 open source images from the Internet and labelled these images six
broad categories: buddha statues, dances, figures, animals, architecture, and decorative
patterns manually.
• We built an application for both web and smartphone to collect Dunhuang-related
images and labels as a crowdfunding platform, which can gather pictures and tags from
people interested in. They could upload the open-source Dunhuang images they discov-
ered and labelled the six broad categories we have defined. We have collected 49,408
Dunhuang images from the application.
• We used the ResNet50 (He et al., 2015) model as backbone to perform intelligent clas-
sification (6 classes)in the TensorFlow (Abadi et al., 2016) framework, because 10,040
pictures were without labels.
• We did unified processing on the resolution of the pictures in the Dunhuang dataset,
which are all in 64 × 64.

3.2.2. Fashion Mnist


Fashion Mnist is an open-source costume image dataset (Xiao et al., 2017). The dataset con-
sists of a training set of 60,000 images and a test set of 10,000 images. As shown in Figure 3,
each image is a 28 greyscale image associated with a label in 10 categories (T-shirt, Trouser,
Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot).
There is a large amount of complex background and missing content in the Dunhuang
data, which could make the clothing generation process relatively tricky. As a result, we
used the Fashion Mnist dataset instead to assist the clothing generation process in order to
make the generated clothing content more reasonable and stable. Besides, Fashion Mnist
has 2D images with 28 resolution, which is more suitable for fast computing.
We intend to use the Fashion Mnist dataset as the data source of the generator of our
method, which could generate new clothing patterns. Meanwhile, we plan to apply the
Dunhuang dataset for adding the Dunhuang elements into the generated clothing.
CONNECTION SCIENCE 347

Figure 2. The Dunhuang dataset contains 52,908 images of Dunhuang and six main categories.

4. ClothGAN framework
We name our framework as ClothGAN for its clothing creativity from the GAN and integra-
tion of content and the style from style transfer algorithm. The Gloth-STGAN framework is
developed to leverage the “creativity” (generation) capabilities of GAN with the extractabil-
ity of image content and texture of CNN (Simonyan & Zisserman, 2014). The framework
explores the design and generation of new clothing patterns and adds the elements of the
Dunhuang clothing style. As shown in Figure 4, our framework is composed of two major
modules: a GAN module and a style transfer module.

4.1. GAN module


This module is a crucial part of the ClothGAN framework. Same as the traditional GAN
method, GAN Module has two counterparts, one is a generator and the other is a discrimi-
nator. Inspired by DCGAN (Simonyan & Zisserman, 2014), we use CNN as the main structure
of the generator and the discriminator. As shown in Figure 5, the generator is composed
348 Q. WU ET AL.

Figure 3. The open-source Fashion Mnist dataset.

Figure 4. The ClothGAN framework is composed of two major modules: a GAN module and a style trans-
fer module. The GAN module is trained to generate new clothing patterns from Fashion Mnist images
and random noise, and the style transfer neural network is applied to add Dunhuang elements to the
generated works.

of multi-layer CNNs that we designed to generate new clothing patterns. The discrimina-
tor uses multi-layer CNN and fully connected neural network to classify whether the input
image is from real data or not.
CONNECTION SCIENCE 349

Figure 5. The GAN progress: input noise and constraint word to the generator, output image as input
of discriminator. Iterate this process, it could generate new patterns of clothing images.

• Generator
Inspired by the Condition-GAN (Mirza & Osindero, 2014), we input the generator not only
the random noise but also a constraint word like the goal of the generator to produce
like a coat, pants, skirt, etc. Generator learns G : {z, c} → ŷ where z is a random noise; c
is a constraint word; ŷ is a generated image from the generator.
• Discriminator
We respectively input the images of the Fashion Mnist dataset and the pictures of the
generator produced to the discriminator. The discriminator is trained to classify the input
image is real or not. D : {y, ŷ} → real/fake where y is real-data from the Fashion Mnist
dataset; ŷ is a generated image from the generator.
If the content of the generated image is not from Fashion Mnist dataset or the class of
generated image is not the same as the constraint word, the discriminator would recog-
nise it as a fake image.
Here are steps how the discriminator works:
(1) Train a small Fashion Mnist recognition network using three layers CNN and one
layer full connected (FC) neural network.
(2) Determine whether the image category from the generator is the same as the
constraint word. If the results are not the same, then directly provide False to the
generator; otherwise, give the results to the generator. If the results are the same,
enter the discriminator for further identification.

The GAN module training procedure is similar to a two-player game with the following
objective function

min max V(D, G) = Ex∼pdata [log D(x)] + Ez∼pz [log(1 − D(G(z)))] (5)
G D

where z is a random noise with distribution pz , c is the constraint word. D is trained to


improve the capability of classifying real or false. G is to minimise this objective function
by maximising log(D(G(z, c)).
350 Q. WU ET AL.

Figure 6. Image a and p through different CNN networks and extract style and content from different
layers of CNN. Fusion the style of a and the contend of p.

The GAN module has three main functions: (1) it can generate new clothing patterns
from the generator; (2) the constraint word can specifiy the type of generation; (3) the
discriminator can classify the data from the Fashion Mnist dataset is real or not after the
adversarial training between G and D,.
We just focus on the first two functions of the generation capability with our wishes,
which could “design” new patterns of clothing works.

4.2. Style transfer module


The style transfer module is to directly use generated images from the generator of the
GAN module as a content image and Dunhuang image from the Dunhuang dataset as a
style image. The module exploits the characteristics of different layers of a CNN: the first a
few layers extract styles and the last a few layers extract content. We fusion the content and
style in different images by minimising content loss and style loss using stochastic gradient
descent (SGD) (Robbins & Monro, 2007).
We apply an image a for Dunhuang style extraction and an image p , generated from the
GAN module, for clothing content extraction. We intend to generate x = Fusion(a, p ). (as
shown in Figure 6).
The training procedure of the style transfer module is with the following objective
functions (Equation (7))

• Content loss

1 l 2
Lcontent (p, x, l) = Fij − Pijl (6)
2
i
CONNECTION SCIENCE 351

Figure 7. We applied some contents of clothing and styles of Dunhuang images and generated new
clothing with 3D presentation.

• Style loss
1  l 2
El = Gij − Alij (7)
4Nl2 M2l i,j

• Total loss
Ltotal (p, a , x) = αLcontent (p, x) + βLstyle (a, x) (8)

where Fij is the value of the pixel point at position j of the ith Feature Map at CNN layer l; El
is the mean squared difference between the Gram matrix (Kokkendorff, 2004) of the CNN
layer l Feature Map and the Gram matrix of Fl . α and β are hyperparameters for adjusting
image content and style ratio.
This module has two main functions: (1) it can extract the content and style part from
images and (2) it can mix the content and style part to generate new image by training.

4.3. Algorithm
The STAP-GAN is a two-stage algorithm including a GAN stage and a style transfer stage.

5. Results
We trained our ClothGAN framework from the public Fashion Mnist dataset (Xiao
et al., 2017) and Dunhuang dataset we have built. As shown in Figure 7, the works are
“designed” by our framework.

5.1. Settings
We set our environment of experiments

• Dataset: Fashion Mnist dataset and the Dunhuang dataset.


352 Q. WU ET AL.

Algorithm 1 ClothGAN
1: Stage 1: The GAN stage:
2: Init the hyperparameters of Generator (G) and Discriminator (G)
3: Input the random noise z and constraint word c into the G.
4: repeat
5: repeat
6: Fix G
7: Update D parameters
8: until D’s accuracy of classification between real images and generated pictures is
greater than 90%
9: repeat
10: Fix D
11: Update G parameters
12: until D’s accuracy of classification between real pictures and generated pictures is
lower than 50%
13: until Episode end
14: Stage 2: the style transfer stage:
15: Make G generates N clothing images
16: Choose one image from N marked as p 
17: Choose one Dunhuang images marked as a a 
18: Put p into VGG19 neural network to extra content
19: Put a into VGG19 neural network to extra style
20: repeat
21: Ltotal (p, a , x) = αLcontent (p, x) + βLstyle (a, x)
22: Update parameters by SGD
23: until Episode end
 is the final design output.
24: x

• Computing framework: TensorFlow 2.0 version for the framework of our experiments.
• Computing hardware: a server with 2.5 GHz CPU (16 cores) and 2 NVIDIA GeForce GTX
1080Ti GPU devices.

5.2. Experiments
We conducted experiments with a two-stage process as the algorithm (Algorithm ??)
details. The first-stage process was to train the GAN module for generating new clothing
patterns based on the Fashion Mnist dataset and the second-stage process was to apply the
style transfer algorithm for adding the Dunhuang elements using the Dunhuang dataset.

• The first stage


In the first stage, we used 60,000 images with resolution 28 × 28 from the Fashion Mnist
dataset. The discriminator is composed of eight layers of convolution neural networks
with sixteen 3 × 3 filters, connecting two Fully Connected (FC) layers and to classify input
image real or false. Meanwhile, we set eight layers of CNN with sixteen 3 × 3 filters in
CONNECTION SCIENCE 353

the generator. We conducted 1000 epochs to train discriminator and generator alter-
nately. We leveraged the ability of creatively of the generator and collected the images
produced from the generator after training marked as set p .
• The second stage
In the second stage, we use two VGG19 (Simonyan & Zisserman, 2014) networks as a spe-
cialised neural network to extract features of style and content. a is a style image from
the Dunhuang dataset, and p is a content image generated from the GAN module of the
ClothGAN. We constrain the noisy image x through the loss function (Equation (8)) and
gradually approach the content p while approaching the style graph a . We conducted
the SGD algorithm to iteratively update the weight to update the image x for 300 epochs.
The hyperparameters α and β of the loss function (Equation (7)) are tuned by hand.

• The hyperparameters
In the first stage:
Generator: CNN layers: 8; number of filters: 16; FC layers: 3 learning rate: 0.01.
Discriminator: CNN layers: 8; number of filters: 16; FC layers: 2; learning rate: 0.01.
In the first stage:
α = 0.717, β = 0.283

5.3. Baseline models


We choose three baseline models, developed for clothing and art design, to compare the
works generated by these models between our ClothGAN framework:

• DCGAN (Simonyan & Zisserman, 2014) combines CNN networks and adversarial net-
works, which can perform relatively more regular training in the computer version.
• Fashion-GAN (Cui et al., 2018) is an image-to-image method, which can be input fashion
sketch and fabric image can be shown out quickly and automatically.
• GANs-based Clothes Design (Kato et al., 2019) is a method of generation of clothing
images for pattern makers.

5.4. Evaluation method


Measuring the quality of works generated by different models is a relatively difficult prob-
lem, especially in the field of aesthetics for there are many aspects of aesthetics, such as
composition, colour, etc. There has not been a particularly useful measurement method.
Simple numerical evaluation criteria are not a complete representation of the quality and
artistic value of the final generated image. Thus, we tried to apply the inception score (IS)
and human preference score (HPS), respectively, and analysed these scores to evaluate
these generated images.
Here are the reasons we choose IS and HPS to evaluate these art works.

• IS estimates the image score relatively objectively through technical means but lacks
subjective factors.
• HPS subjectively evaluates the score of the image but lacks objective support.
354 Q. WU ET AL.

Table 1. The average IS score of every model.


Dataset
Fashion Fashion Mnist
Model Mnist Dunhuang and Dunhuang
DCGAN 3.51 2.12 2.82
Fashion-GAN 3.78 2.14 2.71
GANs-based Design 4.14 2.25 3.11
Clothing-STGAN 3.75 2.63 4.13

5.4.1. Inception score (IS)


Inception model (Barratt & Sharma, 2018), which has been trained on ImageNet (Deng
et al., 2009) dataset, has a particular ability to recognise objects. We use generated images
from creativity models as input through the inception model and finally create a softmax
vector. At this time, we take the clarity and diversity into consideration.

• Clarity: input the generated image x into Inception V3 (Barratt & Sharma, 2018), and
output a 1000-dimensional vector, which corresponds to the probability that the image
belongs to a certain category, which should as big as possible, while the entropy of py | x
distribution should be as small as possible.
• Diversity: if a model can generate enough diverse images, then the distribution of the
images it generates in each category should be average- the entropy of p(y) is as large
as possible.

In general, an ideal generated image is to minimise −p(y | x) log p(y | x) while maximis-
ing −p(y) log p(y).
 
IS(G) = exp Ex∼pq DKL (p(y | x)  p(y)) (9)

5.4.2. Human preference score


We performed a human preference study survey on the website of our laboratory to
compare works designed by different methods.

5.5. Results
We applied the Fashion Mnist dataset and the Dunhuang dataset respectively for the
baseline generated models and compared these “design” works with our clothing-STGAN
model.
Here are the results of the experiments:

• IS score
We sampled 50 images from each model produced from different datasets, including
the Fashion Mnist dataset, the Dunhuang dataset (use the Fashion Mnist data first, then
use the Dunhuang data). As shown in Table 1, from the average IS score of every model,
we can see that although the inception score (IS) ranked the second place, the score is
only 0.2% different from the GANs-based Design model.
• HPS score
Twenty volunteers participated in our survey online. We randomly choose 15 images
CONNECTION SCIENCE 355

Table 2. The HPS score of every volunteer according to different


models.
Model
GANs-based
ID DCGAN Fashion-GAN Design Clothing-STGAN
1 2.75 2.46 2.01 3.94
2 3.08 3.07 2.37 4.63
3 2.83 2.61 2.10 4.11
4 3.30 4.29 2.60 4.88
5 3.10 3.10 2.39 4.67
6 2.70 2.35 1.95 3.83
7 4.10 3.09 3.46 4.66
8 4.01 2.94 3.37 4.48
9 2.90 2.74 2.17 4.25
10 3.50 3.21 2.80 4.79
11 3.02 2.96 2.30 4.50
12 3.07 3.19 2.39 4.77
13 3.43 3.68 2.73 5.31
14 3.21 3.15 2.46 4.72
15 3.20 2.77 2.34 4.29
16 3.29 2.85 2.61 4.38
17 2.92 1.99 1.96 3.42
18 3.10 3.25 2.43 4.83
19 2.80 2.45 2.03 3.93
20 3.10 3.11 2.39 4.67
Average Score 3.17 2.91 2.44 4.45

Table 3. The General Score of all models.


Model
GANs-based
Score DCGAN Fashion-GAN Design Clothing-STGAN
IS 0.24 0.31 0.16 0.29
HPS 0.26 0.18 0.20 0.36
General score 0.50 0.49 0.36 0.65

from every generated model (60 photos total). Each volunteer could see 60 images
(they did not know which model the photos came from), and scored images on a scale
of 0–5. We collected the result of scores and computed the average score of each
model. As shown in Table 2, we can see the HPS score of every volunteer according to
different models. Our Clothing-STGAN obviously outperformed than other models. In
other words, volunteers prefer to the clothing works generated by the Clothing-STGAN
framework.
• IS and HPS
The IS or HPS single evaluation model is not very comprehensive. We normalise the
scores of IS and HPS and add them together to comprehensively reflect the performance
of each model. As shown in the Table 3, the Clothing-STGAN framework obtains obvious
advantage among the models.

As shown in Figure 8, the works designed by our model in the 3D presentation.


356 Q. WU ET AL.

Figure 8. The 3D overall effect and partially enlarged view of the clothing designed by our model.

6. Conclusion
Clothing design is an art form that combines practicality and artistry, which is challeng-
ing for researchers. In this paper, we proposed the Clothing-STGAN model for “creatively”
designing clothing, in addition, considering the Chinese traditional artistry: Dunhuang
elements. Our framework leverages a GAN framework and a style transfer paradigm, gen-
erating clothing blending of old and new beauty. The Clothing-STGAN outperformed other
models by evaluating the works generated from experiments and online surveys. However,
the images generated from our framework are not high-resolution and the designed gar-
ment styles are not rich. In the future, we will try to find a better method to handle the
complex background and missing content of the Dunhuang dataset, directly extracting the
clothing information from the dataset, and we will generate more high-resolution works of
clothing in the future. Meanwhile, we will stick to collect and label Dunhuang images to the
Dunhuang dataset, which will be open-source for researchers.
AI technology is drastically changing the nature of creative processes. New technolo-
gies are playing very significant roles in creative activities such as arts, music, architecture,
and clothing design. We believe that we must aim at more meaningful relations between
computers and creativity by adding the traditional culture.

Acknowledgments
This work was supported by Ministry of Education – China Mobile Research Foundation under Grant
No. MCM20170206, The Fundamental Research Funds for the Central Universities under Grant No.
lzujbky-2019-kb51 and lzujbky-2018-k12, National Natural Science Foundation of China under Grant
No. 61402210, Major National Project of High Resolution Earth Observation System under Grant No.
30-Y20A34-9010-15/17, State Grid Corporation of China Science and Technology Project under Grant
No. SGGSKY00WYJS2000062, Program for New Century Excellent Talents in University under Grant
No. NCET-12-0250, Strategic Priority Research Program of the Chinese Academy of Sciences with
Grant No. XDA03030100, Google Research Awards and Google Faculty Award. We also gratefully
acknowledge the support of NVIDIA Corporation with the donation of the Jetson TX1 used for this
research.
CONNECTION SCIENCE 357

Disclosure statement
No potential conflict of interest was reported by the authors.

ORCID
Qiang Wu http://orcid.org/0000-0003-0655-0479

References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard,
M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasude-
van, V., Warden, P., . . . Zheng, X. (2016, November 2–4). Tensorflow: A system for large-scale
machine learning. OSDI’16:Proceedings of the 12th USENIX conference on Operating Systems Design
and Implementation, Savannah, GA, USA (pp. 770–778).
Arriagada, L. (2020). CG-Art: Demystifying the anthropocentric bias of artistic creativity. Connection
Science. https://doi.org/10.1080/09540091.2020.1741514
Barratt, S., & Sharma, R. (2018). A note on the inception score. arXiv:1801.01973
Colton, S., Halskov, J., Ventura, D., Gouldstone, I., Cook, M., & Ferrer, B. P. (2015). The painting fool sees!
New projects with the automated painter. International Conference on Computational Creativity,
Utah, USA (pp. 189–196). ICCC.
Cui, Y. R., Liu, Q., Gao, C. Y., & Su, Z. (2018). FashionGAN: Display your fashion design
using conditional generative adversarial nets. Computer Graphics Forum, 37(7), 109–119.
https://doi.org/10.1111/cgf.2018.37.issue-7
Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image
database. In 2009 IEEE conference on computer vision and pattern recognition (CVPR), Miami, FL, USA
(pp. 248–255). IEEE.
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers
for language understanding. arxiv:1810.04805
Elgammal, A. M., Liu, B., Elhoseiny, M., & Mazzone, M. (2017). CAN: Creative adversarial networks, gen-
erating “art” by learning about styles and deviating from style norms. In International conference on
computer and communications (ICCC), Atlanta, GA, USA. ICCC.
Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks.
In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA
(pp. 2414–2423). IEEE. https://doi.org/10.1109/CVPR.2016.265
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Ben-
gio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems
(pp. 2672–2680). Curran Associates, Inc.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. Preprint,
arXiv:1512.03385.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings
of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA (pp. 436–444).
IEEE.
Jinshi, F. (2010). The caves of Dunhuang. The Dunhuang Academy.
Kato, N., Osone, H., Oomori, K., Ooi, C. W., & Ochiai, Y. (2019). FashionGAN: GANs-based clothes
design:pattern maker is all you need to design clothing. In AH2019: Proceedings of the 10th aug-
mented human international conference (pp. 1–7). Association for Computing Machinery.
Kokkendorff, S. L. (2004). Gram matrix analysis of finite distance spaces in constant curvature. Discrete
and Computational Geometry, 31, 515–543. https://doi.org/10.1007/s00454-004-0806-2
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/
10.1038/nature14539
Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., & Aila, T. (2018). Noise2Noise:
Learning image restoration without clean data. In International conference on machine learning
(ICML). PMLR.
358 Q. WU ET AL.

Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. CoRR. abs/1411.1784
Mordvintsev, A., Olah, C., & Tyka, M. (2015). DeepDream – a code example for visualizing neural networks.
Google Research. https://ai.googleblog.com/2015/07/deepdream-code-example-for-visualizing.
html
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolu-
tional generative adversarial networks. Computer science.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016, June 27–30). You only look once: Unified, real-
time object detection. In 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las
Vegas, NV, USA (pp. 779–788). IEEE.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Mathematical Statistics,
22(3), 400–407. https://doi.org/10.1214/aoms/1177729586
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.
https://doi.org/10.1016/j.neunet.2014.09.003
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J.,
Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbren-
ner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016).
Mastering the game of go with deep neural networks and tree search. Nature, 529, 484–489.
https://doi.org/10.1038/nature16961
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recog-
nition. Computer science.
Soto-Morettini, D. (2017). Reverse engineering the human: Artificial intelligence and acting theory.
Connection Science, 29(1), 64–76. https://doi.org/10.1080/09540091.2016.1271398
Wu, Q., Shen, J., Yong, B., Wu, J., Li, F., Wang, J., & Zhou, Q. (2019). Smart fog based
workflow for traffic control networks. Future Generation Computer Systems, 97, 825–835.
https://doi.org/10.1016/j.future.2019.02.058
Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-MNIST: A novel image dataset for benchmarking machine
learning algorithms. arXiv:cs.LG/1708.07747
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., & He, X. (2018). AttnGAN: Fine-grained text to
image generation with attentional generative adversarial networks. In 2018 IEEE/CVF conference on
computer vision and pattern recognition (CVPR) (pp. 1316–1324). https://doi.org/10.1109/CVPR.2018.
00143
Yong, B., Xu, Z., Shen, J., Chen, H., Wu, J., Li, F., & Zhou, Q. (2020). A novel Monte Carlo-based neural
network model for electricity load forecasting. International Journal of Embedded Systems, 12(4),
522–533. https://doi.org/10.1504/IJES.2020.107631
Zeiler, M. D. (2014). Visualizing and understanding convolutional networks. The European conference
on computer vision (ECCV). Lecture Notes in Computer Science (Vol. 97, pp. 818–833). Springer.
Zhang, G., Yong, B., Chen, H., & Zhou, Q. (1995). The further exploits of Aaron, painter. Stanford
Humanities Review, 4(2), 141–158. https://doi.org/10.5555/212154.212174
Zhang, G., Yong, B., Chen, H., & Zhou, Q. (2017). Intelligent monitor system based on cloud and con-
volutional neural networks. The Journal of Supercomputing, 73(7), 3260–3276. https://doi.org/10.
1007/s11227-016-1934-1
Zhou, Q., Zhou, R., Yong, B., Wang, X., Zhang, G., Jiang, H., & Li, K. C. (2017). L4eRTL: A robust and secure
real-time architecture with L4 microkernel and para-virtualised PSE51 partitions. International
Journal of Embedded Systems, 9(6), 583–594. https://doi.org/10.1504/IJES.2017.088040
Zhou, R., Li, X., Yong, B., Shen, Z., Wang, C., Zhou, Q., Cao, Y., & Li, K. C. (2019). Arrhythmia recognition
and classification through deep learning-based approach. International Journal of Computational
Science and Engineering, 19(4), 506–517. https://doi.org/10.1504/IJCSE.2019.101897

You might also like