Professional Documents
Culture Documents
Abstract—Modeling layout is an important rst step for graphic design. Recently, methods for generating graphic layouts have
progressed, particularly with Generative Adversarial Networks (GANs). However, the problem of specifying the locations and sizes of
design elements usually involves constraints with respect to element attributes, such as area, aspect ratio and reading-order.
Automating attribute conditional graphic layouts remains a complex and unsolved problem. In this paper, we introduce
Attribute-conditioned Layout GAN to incorporate the attributes of design elements for graphic layout generation by forcing both the
generator and the discriminator to meet attribute conditions. Due to the complexity of graphic designs, we further propose an element
arXiv:2009.05284v1 [cs.CV] 11 Sep 2020
dropout method to make the discriminator look at partial lists of elements and learn their local patterns. In addition, we introduce
various loss designs following different design principles for layout optimization. We demonstrate that the proposed method can
synthesize graphic layouts conditioned on different element attributes. It can also adjust well-designed layouts to new sizes while
retaining elements’ original reading-orders. The effectiveness of our method is validated through a user study.
1 I NTRODUCTION
GAN model, we use the wireframe rendering layer similar to 3 ATTRIBUTE -C ONDITIONED L AYOUT GAN
LayoutGAN [7] that rasterizes all parameterized elements into 3.1 Design Representation
wireframe images. In addition to feeding all the elements to the
Instead of generating pixel-level images as a layout representation,
discriminator to learn their global patterns, we also introduce a
the model synthesizes a graphic layout comprised of a set of N
novel element dropout strategy which randomly removes some
design elements represented as (p1 , θ1 ),[· · · , (pN , θ]N ), where
of the generated elements and feeds them to the discriminator.
p is a vector of the class label and θ = θ 1 , ..., θ m denotes a
As there lacks global layout information delivered by a complete
set of m geometric parameters that can be in various forms for
set of elements, the discriminator is forced to exploit local cues,
different graphic layouts [7]. We assume an element is a rectangle,
such as alignment and non-overlapping of the remaining elements
or bounding box represented [ by its normalized
] center coordinates,
to better learn local patterns. We further propose several hand-
width and height, i.e., θ = xC , yC , w, h , which is very common in
crafted loss functions following different design principles, such
various graphic designs.
as non-overlapping, alignment and preservation of reading-orders
to facilitate model optimization.
3.2 Attribute Description
We apply our model to synthesize attribute conditional graphic
layouts of different aspect ratios and retarget existing layouts to A good professional design, as exampled in Figure 2a, is built on
new sizes while preserving elements’ reading-orders. For evalua- a layout in which all design elements are arranged in proper sizes
tion, we automatically generate a number of graphic designs, and and locations according to their content-based attributes such as
compare the generated results with those produced by template- area, aspect ratio and reading-order.
based methods, novice designers and professional designers.
Experimental comparisons show that our automatic designs are 3.2.1 Expected Area
better than those produced by template-based methods and novice The size of a design element in the layout usually highly depends
designers, and sometimes comparable to professional designs. on its content. For example, for a text element, the area of the
To sum up, this paper comprises the following contributions: generated bounding box should match with the text length, so
1) A conditional generator that synthesizes attribute conditional that the target text can be rendered into the generated region with
structured data for layout design. 2) A conditional discriminator proper font size to ensure readability and aesthetic soundness.
with element dropout that exploits and learns both global and local Otherwise, as shown in Figure 2b, generating a small bounding
layout patterns. 3) Several hand-crafted loss functions based on box (in yellow) for a large amount of text would make the
various design principles for layout optimization. rendered text too small, thus ruining a design. The expected area,
determined by elements’ content, is thus an essential attribute to
be considered for layout design.
Fig. 3: Overall architecture of attribute-conditioned layout GAN. The generator takes graphic elements with class labels, random
geometric parameters and attributes as input. An encoder followed by a stacked relation module embeds the input and contextually
refines the embedded features for each element. The refined features are then decoded back to geometric parameters to assemble
layouts. The discriminator takes generated structured data as input and renders all elements and a random partial list of them into two
wireframe images, upon which two CNNs learn to capture both global and local behaviours of elements respectively. The discriminator
is further tasked for attribute reconstruction to enhance discrimination.
the image element is thus a necessary attribute to be preserved in graphic element as a function of its relations with all the other
the output layout. elements in the design, and outputs p and contextually refined
θ ′ to assemble graphic layouts G(z ) = (p1 , θ1′ ), · · · , (pN , θN ′ )
type c for element i. As a result, the discriminator outputs both a 3.6 Generator Optimization
probability of the input belonging to real layouts, and an estimated We introduce several losses based on different design principles
area distribution over all element types. for layout optimization.
I (x, y, c) = max pi,c Fθi (x, y). (2) Ladv = − log DΘa (GΘg (z)) − log DΘ′a (GΘg (z)). (7)
i∈[1...N]
The wireframe discriminator mainly perceives global patterns 3.6.2 Margin Area Loss
of all elements. However, local behaviours such as alignment We control the area of the generated bounding box for different
and non-overlapping are also important for a good design. To elements by feeding their expected area attributes to the generator.
better capture those local rules, we further introduce element We denote si and s′i as the input expected area and the area of
dropout strategy by randomly removing some input elements the predicted bounding box for element i respectively. si and s′i
during the rendering process. Denote r as a N-dimension vector are expected to be close in value, while remaining some degree of
of independent Bernoulli random variables, each of which has a flexibility to better balance different design principles for layout
probability b of being 1. This vector is sampled and multiplied optimization. We thus introduce a margin area loss:
with the rendered grayscale image Fθ (x, y) to produce another N ( ′ )
si − si
rendered image I ′ : Larea = ∑ k −α , (8)
i=1 si
ri ∼ Bernoulli(b), (3)
where k(x) = max(0, x) (implemented as ReLU activation), and
I ′ (x, y, c) = max ri pi,c Fθi (x, y). (4) α ≥ 0 is a preset parameter to control the allowable relative area
i∈[1...N]
difference, which is set as 0.3 in our experiments.
Figure 3 depicts the architecture of the discriminator, which
consists of both a global and a local branch. The global branch 3.6.3 Overlapping Loss
renders all input elements to wireframe images and uses CNNs
Well-designed layouts typically avoid overlapping elements. We
to measure and optimize global patterns. Meanwhile, the local
offer an overlapping loss to penalize overlapping between any
branch randomly removes some elements during rendering and
element pairs:
adopts additional CNNs to focus on local patterns by looking at N
si ∩ s j
partial lists of elements. The discriminator is thus able to well Lover = ∑ ∑ , (9)
capture both the global and the local layout patterns. i=1 ∀ j6=i si
Fig. 4: Overall pipeline for automatic advertisement design. Given a set of design elements, the system first samples a number of
locations for the product image, and then feeds the sampled locations and elements’ attributes to layout GAN to generate a number of
layouts, which are further grouped and ranked for final advertisement rendering and recommendation.
TABLE 1: Spatial analysis of layouts from different models. TABLE 2: Comparisons of using different dropout probabilities.
TABLE 5: Comparisons of average inference time. [5] A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for
high fidelity natural image synthesis,” arXiv preprint arXiv:1809.11096,
Methods Template-based Energy-based Our method 2018.
[6] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture
Average inference time (s) 0.32 ∼2400.00 0.69 for generative adversarial networks,” arXiv preprint arXiv:1812.04948,
2018.
[7] J. Li, J. Yang, A. Hertzmann, J. Zhang, and T. Xu, “LayoutGAN: Gen-
the normal and the professional users are consistent. Concretely, erating graphic layouts with wireframe discriminators,” in International
Conference on Learning Representations, 2019.
though not at the level of professional designs, our automatic [8] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with
designs achieve better rankings compared to designs produced by auxiliary classifier gans,” in Proceedings of the 34th International Con-
the template-based baseline and also those by novice designers. ference on Machine Learning, 2017, pp. 2642–2651.
[9] N. Hurst, W. Li, and K. Marriott, “Review of automatic document
This suggests that our generated designs are generally preferred formatting,” in Proceedings of the 9th ACM symposium on Document
by participants. engineering. ACM, 2009, pp. 99–108.
We further adapt our method to adjusting existing layouts to [10] Z. Bylinskii, N. W. Kim, P. O’Donovan, S. Alsheikh, S. Madan,
new sizes while retaining elements’ reading-orders. The first two H. Pfister, F. Durand, B. Russell, and A. Hertzmann, “Learning visual
importance for graphic designs and data visualizations,” in Proceedings
rows in Figure 12 provide retargets produced by our method and of the 30th Annual ACM Symposium on User Interface Software and
the model variant without considering the reading-order constraint Technology. ACM, 2017, pp. 57–69.
respectively when given a specific design (middle in red) as [11] B. Deka, Z. Huang, C. Franzen, J. Hibschman, D. Afergan, Y. Li,
input. We also compare with results from the energy-based model J. Nichols, and R. Kumar, “Rico: A mobile app dataset for building
data-driven design applications,” in Proceedings of the 30th Annual ACM
proposed by O’Donovan et al. [16], as shown in the third row. One Symposium on User Interface Software and Technology. ACM, 2017,
can see that our method can properly specify the locations and pp. 845–854.
sizes of different elements while retaining original reading-orders, [12] B. Kovacs, P. Donovan, K. Bala, and A. Hertzmann, “Context-aware
thus generating more appealing and adequate retargets compared asset search for graphic design,” IEEE Transactions on Visualization and
Computer Graphics, vol. PP, pp. 1–1, 2018.
to the other two counterparts. [13] D. Ren, B. Lee, and M. Brehmer, “Charticulator: Interactive construc-
At last, we compare the average inference time of producing tion of bespoke chart layouts,” IEEE transactions on visualization and
an automatic graphic layout by using different methods, i.e., the computer graphics, vol. 25, no. 1, pp. 789–799, 2018.
[14] S. J. Harrington, J. F. Naveda, R. P. Jones, P. Roetling, and N. Thakkar,
template-based baseline, the energy-based model [16] and our
“Aesthetic measures for automated document layout,” in Proceedings of
method. As shown in Table 5, our method is greatly faster or at the 2004 ACM symposium on Document engineering. Citeseer, 2004,
least comparable in terms of speed compared to the alternatives, pp. 109–111.
while creating better designs. [15] C. G. Morcilllo, V. J. Martin, D. V. Fernandez, J. J. C. Sanchez, and
J. A. Albusac, “Gaudii: An automated graphic design expert system,” in
Proceedings of the National Conference on Articial Intelligence, 2010.
6 C ONCLUSION [16] P. O’Donovan, A. Agarwala, and A. Hertzmann, “Learning layouts for
single-pagegraphic designs,” IEEE transactions on visualization and
In this paper, we have proposed a novel attribute-conditioned computer graphics, vol. 20, no. 8, pp. 1200–1213, 2014.
layout GAN to synthesize graphic layouts by incorporating el- [17] ——, “Designscape: Design with interactive layout suggestions,” in
ement attributes as conditions into both the generator and the Proceedings of the 33rd annual ACM conference on human factors in
computing systems. ACM, 2015, pp. 1221–1224.
discriminator. In addition, we introduce a novel element dropout [18] X. Pang, Y. Cao, R. W. Lau, and A. B. Chan, “Directing user attention
strategy to enhance the capability of the discriminator in capturing via visual flow on web designs,” ACM Transactions on Graphics (TOG),
local patterns, as well as several hand-crafted loss designs to vol. 35, no. 6, p. 240, 2016.
facilitate layout optimization. We apply our model to generate [19] A. Swearngin, M. Dontcheva, W. Li, J. Brandt, M. Dixon, and A. J.
Ko, “Rewire: Interface design assistance from examples,” in Proceedings
attribute-conditioned layouts and adjust existing ones to new of the 2018 CHI Conference on Human Factors in Computing Systems.
sizes. Experimental results well demonstrate the effectiveness ACM, 2018, p. 504.
of our method. In this work, we define the reading-order by [20] P. O’Donovan, J. Lı̄beks, A. Agarwala, and A. Hertzmann, “Exploratory
simply following the left-to-right and top-to-down viewing order font selection using crowdsourced attributes,” ACM Transactions on
Graphics (TOG), vol. 33, no. 4, p. 92, 2014.
commonly used in banner ads. However, for designs containing [21] S. Qi, Y. Zhu, S. Huang, C. Jiang, and S.-C. Zhu, “Human-centric
many elements, defining the reading-order could be a complex indoor scene synthesis using stochastic grammar,” in Proceedings of the
problem. Future work include determining more content-adaptive IEEE Conference on Computer Vision and Pattern Recognition, 2018,
pp. 5899–5908.
reading-orders using theories such as visual attention and salience
[22] L.-F. Yu, S.-K. Yeung, C.-K. Tang, D. Terzopoulos, T. F. Chan, and S. J.
for automatic graphic design. Osher, “Make it home: automatic optimization of furniture arrangement,”
in ACM Transactions on Graphics (TOG), vol. 30, no. 4. ACM, 2011,
p. 86.
R EFERENCES [23] P. Merrell, E. Schkufza, Z. Li, M. Agrawala, and V. Koltun, “Interactive
[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, furniture layout using interior design guidelines,” ACM Transactions on
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Graphics (TOG), vol. 30, no. 4, p. 87, 2011.
in Advances in neural information processing systems, 2014, pp. 2672– [24] M. Fisher, D. Ritchie, M. Savva, T. Funkhouser, and P. Hanrahan,
2680. “Example-based synthesis of 3d object arrangements,” ACM Transac-
[2] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of tions on Graphics (TOG), vol. 31, no. 6, p. 135, 2012.
GANs for improved quality, stability, and variation,” in International [25] K. Wang, M. Savva, A. X. Chang, and D. Ritchie, “Deep convolutional
Conference on Learning Representations, 2018. priors for indoor scene synthesis,” ACM Transactions on Graphics
[3] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation (TOG), vol. 37, no. 4, p. 70, 2018.
learning with deep convolutional generative adversarial networks,” arXiv [26] H. Fan, H. Su, and L. J. Guibas, “A point set generation network for 3d
preprint arXiv:1511.06434, 2015. object reconstruction from a single image.” in Proceedings of the IEEE
[4] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Conference on Computer Vision and Pattern Recognition, vol. 2, no. 4,
Metaxas, “Stackgan++: Realistic image synthesis with stacked genera- 2017, p. 6.
tive adversarial networks,” IEEE Transactions on Pattern Analysis and [27] R. Kumar, J. O. Talton, S. Ahmad, and S. R. Klemmer, “Bricolage:
Machine Intelligence, 2018. example-based retargeting for web design,” in Proceedings of the
10