You are on page 1of 4

([SORULQJ3DLQWLQJ6\QWKHVLVZLWK'LIIXVLRQ0RGHOV

Exploring Painting Synthesis with Diffusion Models


'D<L
DaYi &KDR*XR
Chao Guo 7LDQ[LDQJ%DL
Tianxiang Bai
7KH6WDWH.H\/DERUDWRU\IRU
The State Key Laboratory for 7KH6WDWH.H\/DERUDWRU\IRU
The State Key Laboratory for 7KH6WDWH.H\/DERUDWRU\IRU
The State Key Laboratory for
0DQDJHPHQWDQG
Management and 0DQDJHPHQWDQG
Management and 0DQDJHPHQWDQG
Management and
&RQWURORI&RPSOH[6\VWHPV
Control of Complex Systems, &RQWURORI&RPSOH[6\VWHPV
Control of Complex Systems, &RQWURORI&RPSOH[6\VWHPV
Control of Complex Systems,
,QVWLWXWHRI$XWRPDWLRQ
Institute of Automation, ,QVWLWXWHRI$XWRPDWLRQ
Institute of Automation, ,QVWLWXWHRI$XWRPDWLRQ
Institute of Automation,
&KLQHVH$FDGHP\RI6FLHQFHV
Chinese Academy of Sciences; &KLQHVH$FDGHP\RI6FLHQFHV
Chinese Academy of Sciences; &KLQHVH$FDGHP\RI6FLHQFHV
Chinese Academy of Sciences;
6FKRRORI$UWLILFLDO,QWHOOLJHQFH
School of Artificial Intelligence, 6FKRRORI$UWLILFLDO,QWHOOLJHQFH
School of Artificial Intelligence, 6FKRRORI$UWLILFLDO,QWHOOLJHQFH
School of Artificial Intelligence,
8QLYHUVLW\RI&KLQHVH$FDGHP\RI
University of Chinese Academy of 8QLYHUVLW\RI&KLQHVH$FDGHP\RI
University of Chinese Academy of 8QLYHUVLW\RI&KLQHVH$FDGHP\RI
University of Chinese Academy of
6FLHQFHV
Sciences 6FLHQFHV
Sciences 6FLHQFHV
Sciences
%HLMLQJ&KLQD
Beijing, China %HLMLQJ&KLQD
Beijing, China %HLMLQJ&KLQD
Beijing, China
\LGD#LDDFFQ
yida2019@ia.ac.cn JXRFKDR#LDDFFQ
guochao2014@ia. ac.cn EDLWLDQ[LDQJ#LDDFFQ
baitianxiang2014@ia.a c. en
 

$EVWUDFW²$VDVLJQLILFDQWFRPSRVLWLRQRIDUWILQHDUWSDLQW
Abstract-As a significant composition of art, fine art paint­ GLIIHUHQWrepresentation
different UHSUHVHQWDWLRQfrom
IURPreal
UHDOimages
LPDJHVand
DQGirreplaceable.
LUUHSODFHDEOH
2021 IEEE 1st International Conference on Digital Twins and Parallel Intelligence (DTPI) | 978-1-6654-3337-2/21/$31.00 ©2021 IEEE | DOI: 10.1109/DTPI52967.2021.9540115

LQJLVEHFRPLQJDUHVHDUFKKRWVSRWLQPDFKLQHOHDUQLQJFRPPX
ing is becoming a research hotspot in machine learning commu­ 2Qthe
On WKHother
RWKHUhand,
KDQGthe
WKHlack
ODFNof
RItraining
WUDLQLQJdata
GDWDisLVcommon
FRPPRQin
LQ
QLW\:LWKXQLTXHDHVWKHWLFYDOXHSDLQWLQJVKDYHTXLWHGLIIHUHQW
nity. With unique aesthetic value, paintings have quite different SDLQWLQJUHODWHGWDVNVZKLOHUHFHQWZRUN>@KDVVKRZQWKDW
painting-related tasks, while recent work [19] has shown that
UHSUHVHQWDWLRQVIURPQDWXUDOLPDJHVPDNLQJWKHPLUUHSODFHD
representations from natural images, making them irreplacea­ SDLQWLQJV can
FDQ be
EH used
XVHG as
DV aD form
IRUP of
RI data
GDWD augmentation
DXJPHQWDWLRQ to
WR
paintings
EOH0HDQZKLOHWKHODFNRIWUDLQLQJGDWDLVFRPPRQLQSDLQWLQJ
ble. Meanwhile, the lack of training data is common in painting­ LPSURYHthe
WKHmodel's
PRGHO Vrobustness.
UREXVWQHVVTherefore,
7KHUHIRUHthe
WKHsynthesis
V\QWKHVLVof
RI
improve
UHODWHGPDFKLQHOHDUQLQJWDVNV7KHUHIRUHWKHV\QWKHVLVRIILQH
related machine learning tasks. Therefore, the synthesis of fine KLJKTXDOLW\fine
ILQHart
DUWpaintings
SDLQWLQJVhas
KDVextremely
H[WUHPHO\high
KLJKaesthetic
DHVWKHWLF
high-quality
DUWSDLQWLQJLVPHDQLQJIXODQGFKDOOHQJLQJZRUN7KHUHDUHWZR
art painting is meaningful and challenging work. There are two YDOXHDQGUHVHDUFKVLJQLILFDQFH>@>@$VIDUDVZHNQRZ
value and research significance [1], [2]. As far as we know,
PDLQW\SHVRIJHQHUDWLYHPRGHOVIRULPDJHV\QWKHVLVJHQHUDWLYH
main types of generative models for image synthesis: generative RQO\aDfew
IHZstudies
VWXGLHV[17]
>@have
KDYHIRFXVHGRQ JHQHUDWLQJfine
ILQHDUW
only focused on generating art
DGYHUVDULDO networks
adversarial QHWZRUNV (GANs)
*$1V  and
DQG likelihood-based
OLNHOLKRRGEDVHG models.
PRGHOV
SDLQWLQJV
paintings.
*$1EDVHGPRGHOVFDQREWDLQKLJKTXDOLW\VDPSOHVEXWXVXDOO\
GAN-based models can obtain high-quality samples but usually
VDFULILFHGLYHUVLW\DQGWUDLQLQJVWDELOLW\'LIIXVLRQPRGHOVDUHD
sacrifice diversity and training stability. Diffusion models are a 7KHUH are
There DUH two
WZRmain
PDLQtypes
W\SHV of
RI image
LPDJH generation
JHQHUDWLRQ models
PRGHOV
FODVVRIOLNHOLKRRGEDVHGPRGHOVDQGKDYHUHFHQWO\EHHQVKRZQ
class of likelihood-based models and have recently been shown SURSRVHGE\SUHYLRXVZRUNVJHQHUDWLYHDGYHUVDULDOQHWZRUNV
proposed by previous works: generative adversarial networks
WRDFKLHYHVWDWHRIWKHDUWTXDOLW\RQWKHLPDJHV\QWKHVLVWDVNV
to achieve state-of-the-art quality on the image synthesis tasks. *$1V  and
DQG likelihood-based
OLNHOLKRRGEDVHG models.
PRGHOV GANs
*$1V currently
FXUUHQWO\
(GANs)
,QWKLVSDSHUZHH[SORUHJHQHUDWLQJILQHDUWSDLQWLQJVE\XVLQJ
In this paper, we explore generating fine art paintings by using SHUIRUPEHVWRQPRVWWDVNV>@LQLPDJHV\QWKHVLV+RZHYHU
perform best on most tasks [5] in image synthesis. However,
GLIIXVLRQPRGHOV:HFDUULHGRXWWKHH[SHULPHQWVRQWKHSDUWLDO
diffusion models. We carried out the experiments on the partial *$1V usually
XVXDOO\ sacrifice
VDFULILFH diversity
GLYHUVLW\ to
WR obtain
REWDLQ high-quality
KLJKTXDOLW\
GANs
LPSUHVVLRQ paintings
impression SDLQWLQJV from
IURP the
WKH Wikiart
:LNLDUW dataset.
GDWDVHW The
7KH results
UHVXOWV VDPSOHVDQGDUHRIWHQGLIILFXOWWRWUDLQ7KHGLIIXVLRQPRGHO
samples and are often difficult to train. The diffusion model
GHPRQVWUDWHWKDWWKHGLIIXVLRQPRGHOFDQJHQHUDWHKLJKTXDOLW\
demonstrate that the diffusion model can generate high-quality >@isLVDFODVVRI OLNHOLKRRGEDVHGgenerative
JHQHUDWLYHmodels.
PRGHOVRecent
5HFHQW
VDPSOHVDQGLWLVHDV\WRWUDLQWRFRYHUPRUHWDUJHWGLVWULEXWLRQ [6] a class of likelihood-based
samples, and it is easy to train to cover more target distribution UHVHDUFK>@VKRZVWKDWGLIIXVLRQPRGHOVDFKLHYHVWDWHRIWKH
WKDQWKH*$1EDVHGPHWKRGV research [7] shows that diffusion models achieve state-of-the­
than the GAN-based methods.
DUWquality
art TXDOLW\ on
RQ the
WKHimage
LPDJHsynthesis
V\QWKHVLV tasks
WDVNV and
DQG capture
FDSWXUHmore
PRUH
.H\ZRUGV²SDLQWLQJsynthesis,
V\QWKHVLVimage
LPDJHgeneration,
JHQHUDWLRQdiffusion
GLIIXVLRQ GLYHUVLW\WKDQWKH*$1EDVHGPRGHOV7KHUHIRUHZHH[SORUH
diversity than the GAN-based models. Therefore, we explore
Keywords-painting
PRGHOV WKHSRWHQWLDORIGLIIXVLRQPRGHOVLQILQHDUWSDLQWLQJV\QWKHVLV
the potential of diffusion models in fine art painting synthesis
models
DQGFRPSDUHLWZLWK*$1EDVHGPHWKRGVWRHYDOXDWHLWVSHU
and compare it with GAN-based methods to evaluate its per­
,INTRODUCTION
I. ,1752'8&7,21 IRUPDQFH
formance.

 )LQH art
Fine DUWpainting
SDLQWLQJ isLV aD creative
FUHDWLYH expression
H[SUHVVLRQ with
ZLWK various
YDULRXV
VW\OHV including
styles, LQFOXGLQJ baroque,
EDURTXH rococo,
URFRFR romanticism,
URPDQWLFLVP etc.
HWF We
:H
HYDOXDWH the
evaluate WKH diffusion
GLIIXVLRQ model
PRGHO on
RQ the
WKH partial
SDUWLDO impression
LPSUHVVLRQ
SDLQWLQJV(Fig.
paintings )LJ1)
 from
IURPthe
WKHWikiart
:LNLDUWdataset
GDWDVHWcreated
FUHDWHGby
E\Van
9DQ
*RJKRU&ODXGH0RQHW
Gogh or Claude Monet.
7KHPDLQFRQWULEXWLRQVRIWKLVZRUNDUHDVIROORZV
The main contributions of this work are as follows:
‡:HH[SORUHJHQHUDWLQJILQHDUWSDLQWLQJVXVLQJGLIIXVLRQ
• We explore generating fine art paintings using diffusion
PRGHOV7KHUHVXOWVGHPRQVWUDWHWKDWLWFDQJHQHUDWHKLJK
models. The results demonstrate that it can generate high­
TXDOLW\VDPSOHVVLPLODUWRUHDOSDLQWLQJV
quality samples similar to real paintings.
‡:H

FRQWUDVWdiffusion
We contrast GLIIXVLRQmodels
PRGHOVwith
ZLWKGAN-based
*$1EDVHGmodelsPRGHOV
DQGfind
and ILQGdiffusion
GLIIXVLRQmodels
PRGHOV easyHDV\ to
WR train
WUDLQto
WR maintain
PDLQWDLQ aD
KLJKHU coverage
higher FRYHUDJH of
RI target
WDUJHW distribution
GLVWULEXWLRQ than
WKDQ the
WKH GAN­
*$1

 EDVHGPRGHOVLQWKHSDLQWLQJJHQHUDWLRQWDVN
based models in the painting generation task.
)LJ,PSUHVVLRQLVWDUWZRUNV
Fig. 1. hnpressionist artworks' SDLQWHGE\&ODXGH0RQHW DaG DQG
painted by Claude Monet (a�d) and
9LQFHQW9DQ*RJK
Vincent HaK 
Van Gogh (e�h). ,,5
II. :25.6
(/$7('WORKS
RELATED
2YHUWKHSDVWIHZ\HDUVILQHDUWSDLQWLQJLVEHFRPLQJDKRW
Over the past few years, fine art painting is becoming a hot
UHVHDUFK topic
WRSLF in
LQ machine
PDFKLQH learning
OHDUQLQJ community.
FRPPXQLW\ As
$V an
DQ $Machine
A. 0DFKLQH/HDUQLQJLQ)LQH$UW3DLQWLQJV
Learning in Fine Art Paintings
research
LQIRUPDWLRQFDUULHUSDLQWLQJLVDQHQGXULQJDUWFUHDWLRQIRUP
information carrier, painting is an enduring art creation form,
0DQ\ machine
Many PDFKLQH learning
OHDUQLQJ tasks
WDVNV on
RQ paintings
SDLQWLQJV have
KDYH been
EHHQ
DQGitLWundertakes
and XQGHUWDNHVvarious
YDULRXVsocial
VRFLDOfunctions
IXQFWLRQVsuch
VXFKas
DVaesthetic
DHVWKHWLF VWXGLHG such
studied, VXFK as
DV object
REMHFW detection
GHWHFWLRQ [8],
>@ fine
ILQH art
DUW paintings
SDLQWLQJV
FRJQLWLRQ aesthetic
cognition, DHVWKHWLF education,
HGXFDWLRQ aesthetic
DHVWKHWLF entertainment.
HQWHUWDLQPHQW FODVVLILFDWLRQ>@HWF'XHWRWKHXQLTXHDHVWKHWLFYDOXHRI
classification [10], etc. Due to the unique aesthetic value of
'XULQJ aD painting
During SDLQWLQJ creation,
FUHDWLRQ theWKH painter
SDLQWHU can
FDQ use
XVH painting
SDLQWLQJ DUWZRUNV the
artworks, WKH content
FRQWHQW understanding
XQGHUVWDQGLQJ [>@ DQG aesthetic
11] and DHVWKHWLF
WHFKQLTXHV WR KLJKOLJKW WKH REMHFW RI LQWHUHVW
techniques to highlight the object of interest and omit DQG RPLW HYDOXDWLRQ [9]
evaluation >@of
RIpaintings
SDLQWLQJVhave
KDYHalso
DOVRbeen
EHHQdiscussed.
GLVFXVVHGSome
6RPH
XQLPSRUWDQW details.
unimportant GHWDLOV It,W makes
PDNHV the
WKH painting
SDLQWLQJ have
KDYH aD quite
TXLWH URERWEDVHGUHVHDUFKHVH[SORUHWKH
robot FUHDWLRQof
-based researches explore the creation RIUHDO DUWZRUNV
real artworks
>@ Style
[12]. 6W\OH transfer
WUDQVIHU tasks
WDVNV [22]
>@ render
UHQGHU aD content
FRQWHQW image
LPDJH in
LQ

7KH ILQH DUW SDLQWLQJV VKRZQ KHUH DUH IURP WKH :LNLDUW GDWDVHW
1 The fine art paintings shown here are from the Wikiart dataset:
GLIIHUHQW styles.
different VW\OHV This
7KLV process
SURFHVV involves
LQYROYHV reorganizing
UHRUJDQL]LQJ the
WKH
KWWSVZZZZLNLDUWRUJ
https ://www. wikiart. org. LPDJHV  style
images' VW\OH and
DQG content,
FRQWHQW which
ZKLFK differs
GLIIHUV from
IURP painting
SDLQWLQJ
7KLVZRUNLVVXSSRUWHGLQSDUWE\6N\ZRUN,QWHOOLJHQFH&XOWXUH
This 7HFK
work is supported in part by Skywork Intelligence Culture & Tech­ JHQHUDWLRQWDVNV
generation tasks.
QRORJ\/7'
nology LTD.

978-1-6654-3337-2/21/$31.00
Authorized ©2021 IEEE
licensed use limited to: INDIAN INSTITUTE
978-1-6654-3337-2/21/$31.00 ©2021 IEEE 332
OF INFORMATION TECHNOLOGY.
332
Downloaded on September 30,2023 at 09:48:28 UTC from IEEE Xplore. Restrictions apply.

Fig. 2. Latent samples from denoising difti.!sion modeL cosine schedule is applied to sample at linearly spaced values of t from T to 0.
)LJ/DWHQWVDPSOHVIURPGHQRLVLQJGLIIXVLRQPRGHOFRVLQHVFKHGXOHLVDSSOLHGWRVDPSOHDWOLQHDUO\VSDFHGYDOXHVRIWIURP7WR

B.
% Fine
)LQH$UW3DLQWLQJV*HQHUDWLRQ
Art Paintings Generation ,,,METHODOLOGY
Ill. 0(7+2'2/2*<
2QHRIWKHPRVWFKDOOHQJLQJWDVNVRIPDFKLQHOHDUQLQJLQ
One of the most challenging tasks of machine learning in ,QWKLVVHFWLRQZHILUVWGHPRQVWUDWHWKDWKRZWKHSDLQWLQJ
In this section, we first demonstrate that how the painting
DUWLVILQHDUWSDLQWLQJVJHQHUDWLRQ0DFKDGRHWDO>@SUHVHQW
art is fine art paintings generation. Machado et al. [13] present JHQHUDWLRQLVPRGHOHGDVDGHQRLVLQJGLIIXVLRQSURFHVV7KHQ
generation is modeled as a denoising diffusion process. Then
aDSDLQWLQJJHQHUDWLRQV\VWHPE\FRPELQLQJDQHXUDOQHWZRUN
painting generation system by combining a neural network ZHLQWURGXFHGLIIXVLRQPRGHOVDQGWKHFRUUHVSRQGLQJVHWWLQJV
we introduce diffusion models and the corresponding settings
IRULPDJHFODVVLILFDWLRQZLWKDJHQHWLFDOJRULWKP(OKRVHLQ\HW
for image classification with a genetic algoritlun. Elhoseiny et IRUJHQHUDWLQJILQHDUWSDLQWLQJV)LQDOO\ZHFDUHIXOO\VHOHFW
for generating fine art paintings. Finally, we carefully select
DO>@SURSRVH
al. [14] propose aDnovel
QRYHOloss
ORVVVWUXFWXUHRI*$1V
structure of GANs toWRgenerate
JHQHUDWH WKHGDWDVHWWRHYDOXDWHWKHSHUIRUPDQFH
the dataset to evaluate the perfonnance.
PDFKLQHRULJLQDOSDLQWLQJVWKDWGHYLDWHGIURPKRPRJHQHRXV
machine-original paintings that deviated from homogeneous
VW\OHV7DQHWDO>@>@SURSRVHGDPRGHOFDOOHG$UW*$1LQ
styles. Tan et al. [3L [4] proposed a model called ArtGAN, in $Assumptions
A. $VVXPSWLRQV
ZKLFK the
which WKH label
ODEHO infonnation
LQIRUPDWLRQ isLV propagated
SURSDJDWHG back
EDFN to
WR the
WKH 7R generate
To JHQHUDWH paintings,
SDLQWLQJV we
ZH establish
HVWDEOLVK an
DQ inverse
LQYHUVH of
RI the
WKH
JHQHUDWRUIRUOHDUQLQJPRUHHIILFLHQWO\5HVXOWVVKRZWKDWWKH
generator for learning more efficiently. Results show that the denoising process from pure noise Xr் WRWKHSDLQWLQJ
LWHUDWLYHGHQRLVLQJSURFHVVIURPSXUHQRLVH‫ݔ‬
iterative to the painting
PRGHOFDQJHQHUDWHILQHDUWSDLQWLQJVEDVHGRQDUWLVWJHQUH
model can generate fine art paintings based on artist genre, x0
‫ݔ‬଴ (Fig.
)LJ2).
 :H a data distribution x0଴ -
FRQVLGHUDGDWDGLVWULEXWLRQ‫ݔ‬
We consider ‫ ׽‬q(x0
‫ݍ‬ሺ‫ݔ‬଴ ሻLQ
, in )
DQGVW\OH5HFHQWUHVHDUFK>@SURSRVHVD*$1EDVHGPRGHO
and style. Recent research [15] proposes a GAN-based model q is a forward noising process. q adds Gaussian noise
ZKLFK‫ݍ‬LVDIRUZDUGQRLVLQJSURFHVV‫ݍ‬DGGV*DXVVLDQQRLVH
which
IRFXVHGRQ(DVW$VLDQDUW7KHUHVXOWVRIH[LVWLQJVWXGLHVVWLOO
focused on East Asian art. The results of existing studies still IURP‫ݔ‬
from x1ଵ WR‫ݔ‬
to Xr் LQZKLFKa7UHSUHVHQWVGLIIHUHQWVWHSWLQWKH
in which 1 T represents different step t in the

KDYHDODUJHJDSZLWKUHDOILQHDUWSDLQWLQJV
have a large gap with real fine art paintings. LWHUDWLYHprocess.
iterative SURFHVV With
:LWKvariance f3t௧ E‫( א‬0,1
YDULDQFHߚ ሺͲǡͳሻ),the
WKHprocessing
SURFHVVLQJ
C.
&Image
,PDJH*HQHUDWLRQ0RGHOV
Generation Afodels FDQEHVKRZQDVIROORZV
can be shown as follows:

*HQHUDWLYHmodels
Generative PRGHOVcan
FDQproduce
SURGXFHrepresentative
UHSUHVHQWDWLYH data
GDWD by
E\ q(x
‫ݍ‬ሺ‫ݔ‬v ǥ'ǡX‫ݔ‬y் I‫ פ‬X‫ݔ‬o)
ଵ ǡ ... ଴ ሻ := ς்௧ୀଵ
‫׷‬ൌ nr= ԝ‫ݍ‬ሺ‫ݔ‬௧ I‫ פ‬Xt
1 q(xt ‫ݔ‬௧ିଵ (1)
-1)ሻሺͳሻ
VLPXODWLQJWKHREVHUYHGGLVWULEXWLRQ7KHUHDUHWZRPDLQW\SHV
simulating the observed distribution. There are two main types 
‫ݍ‬ሺ‫ݔ‬௧ I‫ פ‬xt_1)
q(xt ‫ݔ‬௧ିଵ ሻ =‫׷‬ൌ = N(
ࣨ൫‫ݔ‬xt௧;.J 1-
Ǣ ඥͳ െf3txt -vf3tl)
ߚ௧ ‫ݔ‬௧ିଵ ǡ ߚ௧ ۷൯ሺʹሻ
(2)
RIimage
of LPDJHgeneration
JHQHUDWLRQmodels
PRGHOVproposed
SURSRVHGby
E\previous
SUHYLRXVworks:
ZRUNV
*$1V and
GANs DQG likelihood-based
OLNHOLKRRGEDVHG models,
PRGHOV both
ERWK of
RI which
ZKLFK are
DUH T is large enough and the variance f3t௧ LVZHOOEHKDYHG
,IܶLVODUJHHQRXJKDQGWKHYDULDQFHߚ
If is well behaved,
SRWHQWLDOIRUSDLQWLQJJHQHUDWLRQWDVNV
potential for painting generation tasks. ZHcan
we FDQconsider
FRQVLGHUthe
WKHfinal
ILQDOnoise Xr் DV
QRLVH‫ݔ‬ as anDQisotropic
LVRWURSLFGaussian
*DXVVLDQ
GLVWULEXWLRQ Therefore,
distribution. 7KHUHIRUHonce
RQFHthe
WKHexact
H[DFWreverse
UHYHUVHdistribution
GLVWULEXWLRQ
*$1V have
GANs KDYH achieved
DFKLHYHG great
JUHDW success
VXFFHVV in
LQ natural
QDWXUDO image
LPDJH
q(xt_1
‫ݍ‬ሺ‫ݔ‬௧ିଵ I‫ פ‬xt) is knowiL the Xr் -
‫ݔ‬௧ ሻLVNQRZQWKH‫ݔ‬ ‫ ׽‬N(O,
ࣨሺͲǡI)
۷ሻFDQEHVDPSOHGDQG
can be sampled and
JHQHUDWLRQ [5L
generation >@ [16].
>@ Instead
,QVWHDG of
RI estimating
HVWLPDWLQJ the
WKH tnaximum
PD[LPXP
we can generate sample painting images from Xr் E\UHYHUVLQJ
ZHFDQJHQHUDWHVDPSOHSDLQWLQJLPDJHVIURP‫ݔ‬ by reversing
OLNHOLKRRG the
likelihood, WKH GAN-based
*$1EDVHG framework
IUDPHZRUN trains
WUDLQV the
WKH model
PRGHO
WKURXJKan
through DQadversarial
DGYHUVDULDOprocess.
SURFHVVInspired
,QVSLUHGby
E\painting-related
SDLQWLQJUHODWHG WKHIRUZDUGQRLVLQJSURFHVV
the forward noising process.

ZRUNV>@>@VRPHUHVHDUFKHUVJHQHUDWHSDLQWLQJVE\XVLQJ
works [3L [4L some researchers generate paintings by using B.
% Dijfi1sion
'LIIXVLRQ0RGHOV
Afodels
*$1V>@>@ZKLFKDUHVWDWHRIWKHDUWRQQDWXUDOLPDJH
GANs [16L [18], which are state-of-the-art on natural image
 $VGHVFULEHGDERYHWKHREMHFWLYHRIGLIIXVLRQPRGHOVLQ
As described above, the objective of diffusion models in
JHQHUDWLRQWDVNV+RZHYHU*$1VXVXDOO\VDFULILFHGLYHUVLW\
generation tasks. However. GANs usually sacrifice diversity
learn the denoising process from xt_1 to xt.
WROHDUQWKHGHQRLVLQJSURFHVVIURP‫ݔ‬
to ௧ିଵ WR‫ݔ‬௧ *LYHQWKLVWKH
Given this, the
WRobtain
to REWDLQhigh-quality
KLJKTXDOLW\samples
VDPSOHVand
DQGeasily
HDVLO\collapse
FROODSVHwithout
ZLWKRXW
WUDLQLQJ objective
training REMHFWLYH can
FDQ be
EH parameterized
SDUDPHWHUL]HG as
DV the
WKH predictive
SUHGLFWLYH
DSSURSULDWHK\SHUSDUDPHWHUV>@
appropriate hyperparameters [5].
EeఏCxt,
ሺ‫ݔ‬௧ ǡ ‫ݐ‬ሻZKLFKFRPSXWHWKHGLVWULEXWLRQRIQRLVHLQ
IXQFWLRQ߳
function t)
which compute the distribution of noise in
'LIIXVLRQPRGHOVDUHDFODVVRIJHQHUDWLYHPRGHOVEDVHG
Diffusion models are a class of generative models based xt
‫ݔ‬௧.Ǥ 'XULQJ WKH training
During the WUDLQLQJ process,
SURFHVV each
HDFK training GDWD xt
WUDLQLQJ data ‫ݔ‬௧ LV
is
RQWKHOLNHOLKRRGIXQFWLRQ'LIIXVLRQPRGHOVJHQHUDWHVDPSOHV
on the likelihood function. Diffusion models generate samples randomly produced by three factors: a real sample x0,
UDQGRPO\SURGXFHGE\WKUHHIDFWRUVDUHDOVDPSOH‫ݔ‬ ଴ DQRLVH
a noise
WKURXJK iterative
through LWHUDWLYH denoising
GHQRLVLQJ noisy
QRLV\ images
LPDJHV aimed
DLPHG to
WR get
JHW aD distribution E And a time step t. Therefore, the loss function
GLVWULEXWLRQ߳$QGDWLPHVWHS–7KHUHIRUHWKHORVVIXQFWLRQ
UHZHLJKWHGvariational
reweighted YDULDWLRQDOlower
ORZHUbound.
ERXQGRecent
5HFHQWresearch
UHVHDUFKshows
VKRZV RIWKHPRGHOFDQEHGHILQHGDV
of the model can be defined as:
WKDWdiffusion
that GLIIXVLRQmodels
PRGHOVachieve
DFKLHYHstate-of-the-art
VWDWHRIWKHDUWquality
TXDOLW\on
RQthe
WKH
LPDJHsynthesis
image V\QWKHVLVtasks
WDVNV[7].
>@Furthennore,
)XUWKHUPRUHimproved
LPSURYHGdiffusion
GLIIXVLRQ L‫ܮ‬simple ൌ E‫ܧ‬t௧ǡ௫
VLPSOH = ,x0AII
బ ǡఢ
ൣ‫צצ‬E߳-
െEe(Xt, t)ll2ଶ]൧ሺ͵ሻ
߳ఏ ሺ‫ݔ‬௧ ǡ ‫ݐ‬ሻ‫צצ‬ (3)
PRGHOVFDQFDWFKPRUHGLYHUVLW\WKDQVWDWHRIWKHDUW*$1V
models can catch more diversity than state-of-the-art GANs
,WPHDQVWKHPHDQVTXDUHGHUURUEHWZHHQSUHGLFWHGQRLVH
It means the mean-squared error between predicted noise
>@7KHUHIRUHZHH[SORUHWKHSRWHQWLDORIGLIIXVLRQPRGHOV
[23]. Therefore, we e.\.lJlore the potential of diffusion models
DQGreal
and UHDOnoise.
QRLVHHo
+RetHWal.
DO [7]
>@demonstrate
GHPRQVWUDWHthat
WKDWthrough
WKURXJKthe
WKH
LQILQHDUWSDLQWLQJV\QWKHVLV7RHYDOXDWHLWVSHUIRUPDQFHZH
in fine art painting synthesis. To evaluate its perfonuance, we
UHDVRQDEOHDSSUR[LPDWLRQRIWKHUHOHYDQWSDUDPHWHUVRQHFDQ
reasonable approxituation of the relevant parameters, one can
FRPSDUHWKHGLIIXVLRQPRGHOZLWK94*$1>@DVWDWHRI
compare the diffusion model with VQGAN [18L a state-of­
the sample from noise predictor Ee
JHWWKHVDPSOHIURPQRLVHSUHGLFWRU߳
get ఏCxt,
ሺ‫ݔ‬௧ ǡ ‫ݐ‬ሻ t).
WKHDUW*$1EDVHGPRGHORQQDWXUDOLPDJHJHQHUDWLRQWDVNV
the-art GAN-based model on natural image generation tasks.
 7KHabove
The DERYHdiffusion
GLIIXVLRQmodels
PRGHOVneed
QHHGto
WRsample
VDPSOHfrom
IURPthou­
WKRX
VDQGVRIGLIIXVLRQVWHSV7KHUHIRUHLWWDNHVDORQJWLPHIRUWKH
sands of diffusion steps. Therefore, it takes a long time for the

333
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY.
333Downloaded on September 30,2023 at 09:48:28 UTC from IEEE Xplore. Restrictions apply.
,9 EXPERIMENTS
IV. (;3(5,0(176
$ Implementation
A. ,PSOHPHQWDWLRQ'HWDLOV
Details
7R evaluate
To HYDOXDWH the
WKHSHUIRUPDQFHRI WKHGLIIXVLRQ
performance of the PRGHO we
diffusion modeL ZH
FRPSDUH LW ZLWK 94*$1 >@ RQH RI WKH VWDWHRIWKHDUW
compare it with VQGAN [18], one of the state-of-the-art
*$1EDVHGPRGHOVRQQDWXUDOLPDJHJHQHUDWLRQWDVNV
GAN-based models on natural image generation tasks.

D  The
a) 7KH Difji1sion
'LIIXVLRQ�M0RGHO 7KH structure
odel: The VWUXFWXUH of
RI the
WKH model
PRGHO in
LQ
RXUH[SHULPHQWVLVVLPLODUWRWKDWSURSRVHGE\'KDULZDOHWDO
our experiments is similar to that proposed by Dhariwal et al.
>@,QGHWDLOZHWUDLQWKHPRGHOLQGLIIXVLRQVWHSVZLWK
[6]. In detaiL we train the model in 4000 diffusion steps with
$GDPRSWLPL]HU>@$QGZHXVHDWWHQWLRQKHDGVDQG
Adam optimizer [20]. And we use 4 attention heads and 128
FKDQQHOVDWDWWHQWLRQOD\HUV:HWUDLQWKHPRGHOVIRU.LW
channels at attention layers. We train the models for 116K it­

HUDWLRQV7KHOHDUQLQJUDWHRIRXUH[SHULPHQWVLV
erations. The learning rate of our experiments is 1o-4,DQGWKH
and the
EDWFKVL]HLV&RQVLGHULQJWKHUHVROXWLRQRIWUDLQLQJGDWD
batch size is 128. Considering the resolution of training data,
ZH choose
we FKRRVH aD cosine
FRVLQH schedule
VFKHGXOH to
WR add
DGG noise,
QRLVH which
ZKLFK has
KDV been
EHHQ
VKRZQWRZRUNEHWWHUDWORZHUUHVROXWLRQV>@
shown to work better at lower resolutions [6].

E  The
b) 7KH94*$12XUWUDLQLQJVWUDWHJ\LVWKHVDPHDVWKH
VQGAN: Our training strategy is the same as the
RULJLQDO model.
original PRGHO In
,Q the
WKH VQGAN
94*$1 architecture,
DUFKLWHFWXUH the
WKH downsam­
GRZQVDP
SOLQJVWHSPLVDQGWKHDGDSWLYHZHLJKWߣLVVHWWRZKLFK
pling step m is 5, and the adaptive weight A. is set to 0, which
HPSLULFDOO\OHDGVWREHWWHUUHVXOWV>@$IWHUJHWWLQJWKHFRGH
empirically leads to better results [18]. After getting the code­
ERRNVZHXVHWKHWUDQVIRUPHUWROHDUQZLWKWHPSHUDWXUHW
books, we use the transformer to learn with temperature t = 
DQGDWRSNFXWRIIN
1.0 and a top-k cutoff k = 
100.

% Results
B. 5HVXOWVDQG'LVFXVVLRQ
and Discussion
)LJVKRZVWKHGHQRLVLQJSURFHVVLQRXUH[SHULPHQWV:H
Fig. 2 shows the denoising process in our experiments. We
VDPSOH IURP diffusion
sample from GLIIXVLRQ step
VWHS from
IURP T
7 to
WR 0.
 We
:H reduce
UHGXFH the
WKH
GLIIXVLRQ VWHSV 7 XVHG LQ VDPSOLQJ WR VSHHG XS
diffusion steps T used in sampling to speed up the processWKH SURFHVV
VLQFH the
WKH‫ܮ‬ PRGHO FDQ PDLQWDLQWKH VDPSOHTXDOLW\ ZLWK
since ybrid model can maintain the sample quality with
LhK\EULG
VDPSOLQJVWHSVIHZHUWKDQLWZDVWUDLQHGZLWK'XULQJWKHVDP
sampling steps fewer than it was trained with. During the sam­
SOLQJSURFHVV:HVHW7WRZKLFKLVVXIILFLHQWIRUXVWRJHW
pling process. We set T to 250, which is sufficient for us to get
DKLJKTXDOLW\VDPSOHLQIHZPLQXWHV
a high-quality sample in few minutes.


)LJ6DPSOHVIURPWKHGLIIXVLRQPRGHO
Fig. 64).
î
3. Samples from the diffusion model (64
PRGHOWRJHQHUDWHKLJKTXDOLW\VDPSOHV7RLPSURYHWKHVDP
model to generate high-quality samples. To improve the sam­
SOH quality
ple TXDOLW\ with
ZLWK fewer
IHZHU diffusion
GLIIXVLRQ steps,
VWHSV Dhariwal
'KDULZDO et
HW al.
DO >@
[6]
FKDQJHWKHYDULDQFHRIWKHRULJLQDOQRLVH*DXVVLDQGLVWULEXWLRQ
change the variance of the original noise Gaussian distribution
IURPDFRQVWDQWߚ
from a constant f3t௧ WRDQHXUDOQHWZRUNȭఏ ሺ‫ݔ‬
to a neural network Ie ௧ ǡt)
(xt, ‫ݐ‬ሻWKDWQHHGVWR
that needs to
EH trained.
be WUDLQHG Furthermore,
)XUWKHUPRUH they
WKH\ propose
SURSRVH aD hybrid
K\EULG loss
ORVV to
WR train
WUDLQ 
ERWK߳ ሺ‫ݔ‬ ǡ ‫ݐ‬ሻDQGȭ ሺ‫ݔ‬ ǡ ‫ݐ‬ሻUHGXFHWKHVWHSRIVDPSOLQJDQG
both Ee(xt,t) and I11(xt, t) reduce the step of sampling, and
ఏ ௧ ఏ ௧ )LJ&RPSDULVRQEHWZHHQWUDLQLQJGDUDVHW
Fig. FROXPQVDDQGE
4. Comparison between training daraset (columns a and b) DQGWKH
and the
WKHVDPSOHTXDOLW\KDVEDUHO\GURSSHG GLIIXVLRQPRGHO
diffusion FROXPQVFDQGG
model (columns c and d).
the sample quality has barely dropped.

‫ܮ‬ ൌ Lsimple
‫ܮ‬VLPSOH +
൅ A.ߣ‫ܮ‬  The7KHVDPSOHSDLQWLQJVRIWKHGLIIXVLRQPRGHODUHVKRZQLQ
LhK\EULG
y brid =
୴୪ୠ ሺͶሻ
Lvlb (4) sample paintings of the diffusion model are shown in
)LJ
Fig.  %\ comparing
3. By FRPSDULQJ the WKH UHVXOWV ZLWK VDPSOHV
results with IURP the
samples from WKH
 7KH YDULDWLRQDO ORZHU ERXQG ‫ܮ‬
The variational lower bound Lvlb FDQ EH REWDLQHG E\
୴୪ୠ can be obtained by WUDLQLQJGDWDVHW
training )LJ
dataset (Fig. 4),ZHDUJXHWKDWVDPSOHVJHQHUDWHGE\
we argue that samples generated by
WUHDWLQJWKHGLIIXVLRQPRGHODVD9$(:HDGRSWWKLVK\EULG
treating the diffusion model as a VAE. We adopt this hybrid WKHGLIIXVLRQPRGHOUHDFKDVLPLODUJUDQXODUOHYHOWRUHDOILQH
the diffusion model reach a similar granular level to real fine
ORVVDQGWKHUHOHYDQWSDUDPHWHUVWRFRQGXFWRXUH[SHULPHQWV
loss and the relevant parameters to conduct our experiments. DUW painting.
art SDLQWLQJ Meanwhile,
0HDQZKLOH the WKH landscape
ODQGVFDSH paintings
SDLQWLQJV (Fig.
)LJ 4
FROXPQV aD and
columns DQG b)
E  generated
JHQHUDWHG byE\ the
WKH diffusion
GLIIXVLRQ model
PRGHO have
KDYH aD
& 'DWDVHWIRU(YDOXDWLRQ
C. Dataset for Evaluation VROLGFRORUDQGVPRRWKEUXVKZRUNVLPLODUWR0RQHW¶VSDLQWLQJ
solid color and smooth brushwork similar to Monet's painting,
:H conduct
We FRQGXFWRXUH[SHULPHQWVXVLQJ WKH partial
our experiments using the SDUWLDO data
GDWD from
IURP ZKLFK is
which LV considered
FRQVLGHUHG to WR mimic
PLPLF theWKH painter's
SDLQWHU V unique
XQLTXH
WKH:LNLDUWGDWDVHW:LNLDUWLVDQDUWZRUNVGDWDVHWFROOHFWHG
the Wikiart dataset. Wikiart is an artworks dataset collected FKDUDFWHULVWLFV>@,WGHPRQVWUDWHVWKDWWKHGLIIXVLRQPRGHO
characteristics [21]. It demonstrates that the diffusion model
IURPWKHZLNLDUWRUJZHEVLWHFRQWDLQVPRUHWKDQILQH
from the wikiart.org website, contains more than 80,000 fine FDQJHQHUDWHSDLQWLQJVDFFRUGLQJWRWKHSDLQWHU
can generate paintings according to the painter's VVSHFLILFVW\OH
specific style.
DUWSDLQWLQJV7RJHQHUDWHDSDUWLFXODUSDLQWLQJVW\OHZHVSH
art paintings. To generate a particular painting style, we spe­
FLILFDOO\VHOHFWLPDJHVIURPWKHGDWDVHWE\WZRIDPRXV %HVLGHVZHTXDOLWDWLYHO\FRPSDUHRXUVDPSOHVZLWKVDP
Besides, we qualitatively compare our samples with sam­
cifically select 1,993 images from the dataset by two famous
LPSUHVVLRQLVWSDLQWHUV9DQ*RJKDQG&ODXGH0RQHW&RQVLG SOHV generated
ples JHQHUDWHG by
E\ VQGAN.
94*$1 as
DV shown
VKRZQ in
LQ Fig.
)LJ 
5. &RPSDUHG
Compared
impressionist painters: Van Gogh and Claude Monet. Consid­
HULQJWKHH[FHVVLYHFRPSXWDWLRQDOFRQVXPSWLRQDQGWLPHFRVW ZLWKWKHGLIIXVLRQPRGHOWKHVDPSOHVIURP94*$1DUHSRRU
with the diffusion modeL the samples from VQGAN are poor
ering the excessive computational consumption and time cost
ZKHQWUDLQLQJKLJKUHVROXWLRQLPDJHVZHUHVL]HWKHWUDLQLQJ DWWKHJUDQXODUOHYHODQGREMHFWVLQVRPHVDPSOHVDUHKDUGO\
at the granular leveL and objects in some samples are hardly
when training high-resolution images, we resize the training
LPDJHVWRîZKLFKLVWKHVDPHUHVROXWLRQXVHGLQ>@ UHFRJQL]DEOH (Fig.
recognizable )LJDDQGEWRSURZ
4a and b, top row),ZKLOHLWVFRQVWUXFWLRQ
while its construction
images to 64x64, which is the same resolution used in [7].
UHVXOWV are
results DUH quite
TXLWH well.
ZHOO And
$QG almost
DOPRVW all
DOO samples
VDPSOHV JHQHUDWHG E\
generated by
94*$1DUHODQGVFDSHSDLQWLQJV2WKHUFODVVHVRIVDPSOHVLQ
VQGAN are landscape paintings. Other classes of samples in
WKHGDWDVHW
the HJSRUWUDLWEULGJHWRZQHWF
dataset (e.g., portrait bridge, town, etc.) DUHLJQRUHGGXULQJ
are ignored during
WKHVDPSOLQJSURFHVV:HDUJXHWKDWWKLVSKHQRPHQRQPLJKW
the sampling process. We argue that this phenomenon might
EHFDXVHGE\DQLQDGHTXDWHVHOHFWLRQRIK\SHUSDUDPHWHUV2Q
be caused by an inadequate selection of hyperparameters. On

334
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on September 30,2023 at 09:48:28 UTC from IEEE Xplore. Restrictions apply.
334
PRGHO¶VSHUIRUPDQFH:HKRSHRXUZRUNFDQRIIHULQVLJKWLQWR
model's performance. We hope our work can offer insight into
ILQHDUWSDLQWLQJVV\QWKHVLV
fine art paintings synthesis.

5()(5(1&(6
REFERENCES
>@ )<:DQJ³3DUDOOHODUWIURPLQWHOOLJHQWDUWWRDUWLVWLFLQWHOOLJHQFH´
[1] F.-Y. Wang, "Parallel art: from intelligent art to artistic intelligence,"
7KH Alfred
The $OIUHG North
1RUWK Whitehead
:KLWHKHDG College,
&ROOHJH Tech.
7HFK 5HS  the
Rep., 2017, WKH Alfred
$OIUHG
1RUWK:KLWHKHDG$FDGHP\
North Whitehead Academy.
>@
[2] C.&*XR</X</LQ)=KXRDQG)<:DQJ³3DUDOOHODUWDUWLVWLF
Guo, Y. Lu, Y. Lin, F. Zhuo, and F.-Y. Wang, "Parallel art: artistic
FUHDWLRQXQGHUKXPDQPDFKLQHFROODERUDWLRQ´&KLQHVH-RXUQDORI,Q
creation under human-machine collaboration," Chinese Journal of In­
WHOOLJHQW6FLHQFHDQG7HFKQRORJ\YROQRSS±
telligent Science and Technology, vol. 1, no. 4, pp. 335-341, 2019.
>@
[3] W.:57DQ&6&KDQ+($JXLUUHDQG.7DQDND$UW*$1$UW
R. Tan, C. S. Chan, H. E. Aguirre, and K. Tanaka, "ArtGAN: Art­
ZRUNV\QWKHVLVZLWKFRQGLWLRQDOFDWHJRULFDO*$1VLQ,(((,Q
work synthesis with conditional categorical GANs," in 2017 IEEE In­
 WHUQDWLRQDO&RQIHUHQFHRQ,PDJH3URFHVVLQJ
ternational Conference on Image Processing (ICIP), ,&,3 SS±
2017, pp. 3760-
)LJ&RPSDULVRQEHWZHHQ94*$1>@
Fig. FROXPQVDDQGE
5. Comparison between VQGAN[ l8] (columns a and b)DQGWKH
and the 
3764.
GLIIXVLRQPRGHO
diffusion model (columns c and d).
FROXPQVFDQGG >@ :57DQ&6&KDQ+($JXLUUHDQG.7DQDND,PSURYHGDUW
[4] W. R. Tan, C. S. Chan, H. E. Aguirre, and K. Tanaka, "Improved art­
WKH contrary,
the FRQWUDU\ samples
VDPSOHV from
IURP the
WKH diffusion
GLIIXVLRQ model
PRGHO have
KDYH JDQ for
gan IRU conditional
FRQGLWLRQDO synthesis
V\QWKHVLV of
RI natural
QDWXUDO image
LPDJH and
DQG artwork,"
DUWZRUN IEEE
,(((
VLJQLILFDQWO\PRUHFODVVHVZKLFKVKRZVWKDWGLIIXVLRQPRGHOV 7UDQVDFWLRQVRQ,PDJH3URFHVVLQJYROQRSS±
Transactions on Image Processing, vol. 28, no. 1, pp. 394-409, 2018.
significantly more classes, which shows that diffusion models
>@
[5] A.$&UHVZHOO7:KLWH9'XPRXOLQ.$UXONXPDUDQ%6HQJXSWD
Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta,
DUH easy
are HDV\ to
WR train
WUDLQ to
WR cover
FRYHU more
PRUH target
WDUJHW distribution
GLVWULEXWLRQ than
WKDQ the
WKH DQG$$%KDUDWK*HQHUDWLYHDGYHUVDULDOQHWZRUNV$QRYHUYLHZ
and A. A. Bharath, "Generative adversarial networks: An overview,"
94*$1
VQGAN. ,(((6LJQDO3URFHVVLQJ0DJD]LQHYROQRSS±
IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 53-65, 2018.
>@
[6] A.$1LFKRODQG3'KDULZDO,PSURYHGGHQRLVLQJGLIIXVLRQSUREDELOLV
Nichol and P. Dhariwal, "Improved denoising diffusion probabilis-
 :HIXUWKHUPDNHD9LVXDO7XULQJ7HVWZLWKSDUWLFLSDQWV
We further make a Visual Turing Test with 3 3 participants WLFPRGHOVDU;LYSUHSULQWDU;LY
tic models," arXiv preprint arXiv:2102.09672, 2021.
WRHYDOXDWHWKHTXDOLW\RIRXUVDPSOHV$PRQJWKHSDUWLFLSDQWV
to evaluate the quality of our samples. Among the participants, >@
[7] -+R$-DLQDQG3$EEHHO'HQRLVLQJGLIIXVLRQSUREDELOLVWLFPRG
J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic mod­
QLQH are
nine DUH experienced
H[SHULHQFHG in
LQ painting,
SDLQWLQJ nine
QLQH are
DUH AI
$, engineers
HQJLQHHUV HOVDU;LYSUHSULQWDU;LY
els," arXiv preprint arXiv:2006.11239, 2020.
IDPLOLDU with
familiar ZLWK image
LPDJH generation,
JHQHUDWLRQ and
DQG the
WKH rest
UHVW do
GR not
QRW have
KDYH >@
[8] N.1:HVWODNH+&DLDQG3+DOO³'HWHFWLQJSHRSOHLQDUWZRUNZLWK
Westlake, H. Cai, and P. Hall, "Detecting people in artwork with
UHOHYDQWH[SHULHQFH7KHWHVWVHWFRQVLVWVRISDLQWLQJVVSOLW FQQV´LQ(XURSHDQ&RQIHUHQFHRQ&RPSXWHU9LVLRQSS±
cnns," in European Conference on Computer Vision, 2016, pp. 825-
relevant experience. The test set consists of 30 paintings, split

841.
HYHQO\EHWZHHQKXPDQSDLQWLQJVGLIIXVLRQPRGHOSDLQWLQJV
evenly between human paintings, diffusion model paintings, >@ </X&*XR</LQ)=XRDQG)<:DQJ&RPSXWDWLRQDODHV
[9] Y. Lu, C. Guo, Y. Lin, F. Zuo, and F.-Y. Wang, "Computational aes­
DQG94*$1SDLQWLQJV(DFKSDUWLFLSDQWQHHGVWRDQVZHUWKH
and VQGAN paintings. Each participant needs to answer the WKHWLFVRIILQHDUWSDLQWLQJV7KHVWDWHRIWKHDUWDQGRXWORRN$FWD$X
thetics of fine art paintings: The state of the art and outlook," Acta Au­
TXHVWLRQ:KRFUHDWHGWKHSDLQWLQJVDKXPDQRUFRPSXWHU"
question: Who created the paintings, a human or computer? WRPDWLFD6LQLFDYROQRSS±
tomatica Sinica, vol. 46, no. 11, pp. 2239-2259, 2020.
>@ (&HWLQLFDQG6*UJLF³*HQUHFODVVLILFDWLRQRISDLQWLQJV´LQ
[10] E. Cetinic and S. Grgic, "Genre classification of paintings," in 2016
7$%/(,5
TABLE I. RESULT 9,68$/TuRING
OF THE VISUAL
(68/72)7+( 785,1*TEST
7(67 ,QWHUQDWLRQDO6\PSRVLXP(/0$5SS±
International Symposium EIMAR, 2016, pp. 201-204.
>@ ( Cetinic,
[11] E. &HWLQLF T.7 Lipic,
/LSLF and
DQG S.
6 Grgic,
*UJLF "A
³$ deep
GHHS learning
OHDUQLQJ perspective
SHUVSHFWLYH onRQ
 Average
$YHUDJH Stddev
6WGGHY EHDXW\VHQWLPHQWDQGUHPHPEUDQFHRIDUW´,((($FFHVVYROSS
beauty, sentiment, and remembrance of art," IEEE Access, vol. 7, pp.
VQGAN
94*$1 ͲǤʹ͵͸
0.236 ͲǤʹͺ͸
0.286
±
73694-73710, 2019.
>@ &*XR7%DL</X</LQ*;LRQJ;:DQJDQG)<:DQJ
[12] C. Guo, T. Bai, Y. Lu, Y. Lin, G. Xiong, X. Wang, and F.-Y. Wang,
Diffusion Model
'LIIXVLRQ0RGHO ͲǤ͸͹͸
0.676 ͲǤʹ͵Ͷ
0.234
³6N\ZRUNGDYLQFL A
"Skywork-davinci: $ novel
QRYHO cpss-based
FSVVEDVHG painting
SDLQWLQJ support
VXSSRUW system,"
V\VWHP´ in
LQ
Human Paintings
+XPDQ3DLQWLQJV ͲǤ͸ͺʹ
0.682 ͲǤʹʹͻ
0.229 3URFHHGLQJVRIWKH,(((WK,QWHUQDWLRQDO&RQIHUHQFHRQ$XWRPDWLRQ
Proceedings of the IEEE 16th International Conference on Automation
6FLHQFHDQG(QJLQHHULQJ,(((SS±
Science and Engineering. IEEE, 2020, pp. 673--678.
 7DEOH,VKRZVWKHIUHTXHQF\WKDWGLIIHUHQWFDWHJRULHVRI
Table. I shows the frequency that different categories of >@ 3 Machado,
[13] P. 0DFKDGR J. -5RPHUR
Romero, andDQG%0DQDULV ³([SHULPHQWVLQ
B. Manaris, "Experiments FRPSXWD
in computa­
SDLQWLQJV are
paintings DUH considered
FRQVLGHUHG human
KXPDQ art.
DUW We
:H can
FDQ see
VHH that
WKDW the
WKH WLRQDODHVWKHWLFV´LQ7KHDUWRIDUWLILFLDOHYROXWLRQ6SULQJHUSS
tional aesthetics," in The art of artificial evolution, Springer, 2008, pp.
GLIIXVLRQPRGHOSHUIRUPVREVHUYDEO\EHWWHUWKDQWKH94*$1 ±
381-415.
diffusion model performs observably better than the VQGAN
>@ $(OJDPPDO%/LX0(OKRVHLQ\DQG00D]]RQH³&DQ&UHDWLYH
[14] A. Elgammal, B. Liu, M. Elhoseiny, and M. Mazzone, "Can: Creative
LQIRROLQJSDUWLFLSDQWVZKLOHVLPLODUWRWKHUHDOSDLQWLQJV:H
in fooling participants while similar to the real paintings. We DGYHUVDULDOQHWZRUNVJHQHUDWLQJDUWE\OHDUQLQJDERXWVW\OHVDQGGH
adversarial networks, generating" art" by learning about styles and de­
IXUWKHU use
further XVH the
WKH two-tailed
WZRWDLOHG tWWHVW IRU statistical
-test for VWDWLVWLFDO analysis.
DQDO\VLV 7KH
The YLDWLQJIURPVW\OHQRUPV´DU;LYSUHSULQWDU;LY
viating from style norms," arXiv preprint arXiv: 1706.07068, 2017.
VWDWLVWLFDOUHVXOWVKRZVQRVLJQLILFDQWGLIIHUHQFHEHWZHHQWKH
statistical result shows no significant difference between the >@ $;XH³(QGWRHQGFKLQHVHODQGVFDSHSDLQWLQJFUHDWLRQXVLQJJHQHU
[15] A. Xue, "End-to-end chinese landscape painting creation using gener­
GLIIXVLRQ model
diffusion PRGHO paintings
SDLQWLQJV and
DQG human
KXPDQ paintings,
SDLQWLQJV while
ZKLOH the
WKH DWLYHDGYHUVDULDOQHWZRUNV´LQ3URFHHGLQJVRIWKH,(((&9):LQWHU
ative adversarial networks," in Proceedings of the IEEEICVF Winter
IRUPHULVTXLWHGLIIHUHQWIURPWKH94*$1¶V S &RQIHUHQFHRQ$SSOLFDWLRQVRI&RPSXWHU9LVLRQSS±
on Applications of Computer Vision, 2021, pp. 3863-3871.
former is quite different from the VQGAN's (p < 0.001).,W
It Conference
>@ 7.DUUDV0$LWWDOD-+HOOVWHQ6/DLQH-/HKWLQHQDQG7$LOD
[16] T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila,
GHPRQVWUDWHV that
demonstrates WKDW the
WKH diffusion
GLIIXVLRQ model
PRGHO can
FDQ generate
JHQHUDWH high­
KLJK ³7UDLQLQJ generative
JHQHUDWLYH adversarial
DGYHUVDULDO networks
QHWZRUNV with
ZLWK limited
OLPLWHG data,"
GDWD´ arXiv
DU;LY
'Training
TXDOLW\ILQHDUWSDLQWLQJVVLPLODUWRDUWLVWFUHDWHGSDLQWLQJV
quality fine art paintings similar to artist-created paintings. SUHSULQWDU;LY
preprint arXiv:2006.06676, 2020.
>@
[17] I.,6DQWRV/&DVWUR15RGULJXH])HUQDQGH]$7RUUHQWH3DWLQRDQG
Santos, L. Castro, N. Rodriguez-Fernandez, A. Torrente-Patino, and
0HDQZKLOH we
Meanwhile, ZH found
IRXQG some
VRPH problems
SUREOHPV during
GXULQJ the
WKH $ Carballal,
&DUEDOODO "Artificial
³$UWLILFLDO Neural
1HXUDO Networks
1HWZRUNV and
DQG Deep
'HHS Learning
/HDUQLQJ in WKH
LQ the
A.
H[SHULPHQWV The
experiments. 7KH diffusion
GLIIXVLRQ models
PRGHOV recall
UHFDOO some
VRPH existing
H[LVWLQJ 9LVXDO$UWV$UHYLHZ´1HXUDO&RPSXWLQJDQG$SSOLFDWLRQVSS±
Visual Arts: A review," Neural Computing and Applications, pp. 1-
LPDJHV from
images IURP the
WKH training
WUDLQLQJ dataset
GDWDVHW during
GXULQJ the
WKH sampling.
VDPSOLQJ We
:H 
37, 2021.
VSHFXODWH that
speculate WKDW this
WKLV overfitting
RYHUILWWLQJ phenomenon
SKHQRPHQRQ is
LV caused
FDXVHG by
E\ >@ 3(VVHU55RPEDFKDQG%2PPHU³7DPLQJWUDQVIRUPHUVIRUKLJK
[18] P. Esser, R. Rombach, and B. Ommer, 'Taming transformers for high­
LQVXIILFLHQWWUDLQLQJGDWD,WFDQEHVROYHGE\H[SDQGLQJWKH UHVROXWLRQLPDJHV\QWKHVLV´LQ3URFHHGLQJVRIWKH,(((&9)&RQIHU
resolution image synthesis," in Proceedings of the IEEEICVF Confer­
insufficient training data. It can be solved by expanding the
HQFHRQ&RPSXWHU9LVLRQDQG3DWWHUQ5HFRJQLWLRQSS±
ence on Computer Vision and Pattern Recognition, 2021, pp. 12873-
WUDLQLQJ dataset.
training GDWDVHW Dhariwal
'KDULZDO et
HW al.
DO >@ SURSRVH to
[6] propose WR avoid
DYRLG 
12883.
RYHUILWWLQJE\LQFUHDVHWKHVL]HRIWKHGLIIXVLRQPRGHODQGD
overfitting by increase the size of the diffusion model, and a >@ +/LQ0YDQ=XLMOHQ6&3RQW0::LMQWMHVDQG.%DOD³:KDW
[19] H. Lin, M. van Zuijlen, S. C. Pont, M. W. Wijntjes, and K. BaJa, "What
ODWHQW space
latent VSDFH interpolation
LQWHUSRODWLRQ experiment
H[SHULPHQW has
KDV been
EHHQ used
XVHG to
WR &DQ6W\OH7UDQVIHUDQG3DLQWLQJV'R)RU0RGHO5REXVWQHVV"´LQ3UR
Can Style Transfer and Paintings Do For Model Robustness?," in Pro­
GHPRQVWUDWH that
demonstrate WKDW the
WKH diffusion
GLIIXVLRQ models
PRGHOV can
FDQ generate
JHQHUDWH novel
QRYHO FHHGLQJVRIWKH,(((&9)&RQIHUHQFHRQ&RPSXWHU9LVLRQDQG3DW
ceedings of the IEEEICVF Conference on Computer Vision and Pat­
LPDJHVLQVWHDGRIPHPRUL]LQJWKHWUDLQLQJGDWD
images instead of memorizing the training data. WHUQ5HFRJQLWLRQSS±
tern Recognition, 2021, pp. 11028-11037.
>@ '3.LQJPDDQG-%D$GDP$
[20] D. P. Kingma and J. Ba, "Adam: A method PHWKRGIRUVWRFKDVWLFRSWLPL]D
for stochastic optimiza­
9&21&/86,21 WLRQDU;LYSUHSULQWDU;LY
tion," arXiv preprint arXiv:1412.6980, 2014.
V. CONCLUSION
>@ $(OJDPPDO<.DQJDQG0'HQ/HHXZ³3LFDVVR0DWLVVHRUD
[21] A. Elgammal, Y. Kang, and M. Den Leeuw, "Picasso, Matisse, or a
7KLV paper
This SDSHU explores
H[SORUHV generating
JHQHUDWLQJ fine
ILQH art
DUW paintings
SDLQWLQJV using
XVLQJ )DNH"$XWRPDWHG$QDO\VLVRI'UDZLQJVDWWKH6WURNH/HYHOIRU$WWULE
Fake? Automated Analysis of Drawings at the Stroke Level for Attrib­
GLIIXVLRQPRGHOV7KHUHVXOWVGHPRQVWUDWHWKDWLWFDQJHQHUDWH
diffusion models. The results demonstrate that it can generate XWLRQDQG$XWKHQWLFDWLRQ´LQ3URFHHGLQJVRIWKH$$$,&RQIHUHQFHRQ
ution and Authentication," in Proceedings of the AAAI Conference on
VDPSOHV similar
VLPLODU to
WR real
UHDO paintings.
SDLQWLQJV We
:H qualitatively
TXDOLWDWLYHO\ $UWLILFLDO,QWHOOLJHQFHYROQR
Artificiallntelligence, 2018, vol. 32, no. 1.
samples
>@ <-LQJ<<DQJ=)HQJ-<H<<XDQG06RQJ³1HXUDOVW\OH
[22] Y. Jing, Y. Yang, Z. Feng, J. Ye, Y. Yu, and M. Song, "Neural style
GHPRQVWUDWHWKDWWKHGLIIXVLRQPRGHOVFDQFRYHUPRUHWDUJHW
demonstrate that the diffusion models can cover more target WUDQVIHU$UHYLHZ´,(((WUDQVDFWLRQVRQYLVXDOL]DWLRQDQGFRPSXWHU
transfer: A review," IEEE transactions on visualization and computer
GLVWULEXWLRQ than
distribution WKDQ the
WKH GAN-based
*$1EDVHG models
PRGHOV on
RQ the
WKH painting
SDLQWLQJ JUDSKLFVYROQRSS±
graphics, vol. 26, no. 11, pp. 3365-3385, 2019.
JHQHUDWLRQ task
generation WDVN through
WKURXJK the
WKH contrast
FRQWUDVW experiment.
H[SHULPHQW In
,Q future
IXWXUH >@ 3'KDULZDODQG$1LFKRO³'LIIXVLRQPRGHOVEHDWJDQVRQLPDJHV\Q
[23] P. Dhariwal and A. Nichol, "Diffusion models beat gans on image syn­
ZRUN we
work, ZH will
ZLOO conduct
FRQGXFW more
PRUH quantitative
TXDQWLWDWLYH experiments
H[SHULPHQWV on
RQ aD WKHVLV´DU;LYSUHSULQWDU;LY
thesis," arXiv preprint arXiv:2105.05233, 2021.
ODUJHUVFDOHGDWDVHWZLWKPRUHYDULHGSDLQWLQJVWRHYDOXDWHWKH
larger scale dataset with more varied paintings to evaluate the


335
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on September 30,2023 at 09:48:28 UTC from IEEE Xplore. Restrictions apply.
335

You might also like