Professional Documents
Culture Documents
University of Leeds
SSS
Figure 1: Our system, SSS, takes input vector cartographic data (left), and generates realistic and seamless high-resolution satellite images
(right; the image y03 is 8,192 pixels wide, approximately 67 megapixels) which are continuous over space and scale. Given the input, SSS
generates massive spatially seamless images over a range of scales (y01 , y02 , and y03 ) by blending the results of lower resolution networks.
Abstract
We introduce Seamless Satellite-image Synthesis (SSS), a novel neural architecture to create scale-and-space continuous satel-
lite textures from cartographic data. While 2D map data is cheap and easily synthesized, accurate satellite imagery is expensive
and often unavailable or out of date. Our approach generates seamless textures over arbitrarily large spatial extents which
are consistent through scale-space. To overcome tile size limitations in image-to-image translation approaches, SSS learns to
remove seams between tiled images in a semantically meaningful manner. Scale-space continuity is achieved by a hierarchy
of networks conditioned on style and cartographic data. Our qualitative and quantitative evaluations show that our system
improves over the state-of-the-art in several key areas. We show applications to texturing procedurally generation maps and
interactive satellite image manipulation.
CCS Concepts
• Computing methodologies → Texturing; Image processing;
1. Introduction Convolutional neural networks are state of the art for image to
image translation; such networks can translate from cartographic
Satellite images have revolutionized the way we visualise our en- images to satellite images. However, these networks are unable to
vironment. Massive, high-resolution, satellite images are essential generate outputs of arbitrary areas – the resolution of these syn-
to many modern endeavors including realistic virtual environments thetic satellite image tiles remains limited by available memory and
and regional- or city-planning. However, these images can be pro- compute. Arranging many such small tiles to cover a large map
hibitively expensive, out of date, or of low quality. Further, contin- results in inconsistencies (seams) between the tiles. Further, if a
uous satellite images covering large areas are often unavailable due larger number of tiles are joined together and viewed at multiple
to limited sensor resolution or orbital mechanics. In contrast, car- scales, the result lacks continuity and style coordination.
tographic (map) information is frequently accessible and available
over massive continuous scales; further, it can be easily manually Inspired by image-to-image translation systems such as
edited or synthesized. In this paper we address the problem of gen- Pix2Pix [IZZE17] and SPADE [PLWZ19], Seamless Satellite-
erating satellite images from map data at arbitrarily large resolu- image Synthesis (SSS, Figure 1) creates satellite image tiles by con-
tions. ditioning generation on cartographic data. These tiles are of fixed
resolution (256 × 256 pixels in our implementation) and exhibit are well adapted for urban domains and can be used to guide syn-
good continuity internally, but would contain spatial and scale ar- thesis [GAD∗ 20].
tifacts if naïvely tiled over a larger area. Spatial discontinuities be-
With the advent of Convolutional Neural Networks (CNNs),
tween adjacent tiles are caused by a lack of coordination between
texture synthesis has increasingly become data-driven, creating im-
network evaluations, resulting in lack of continuity for properties
ages by gradient descent in feature space [GEB15] or with Gener-
not determined by the cartographic input (e.g., color of building
ative Adversarial Networks (GANs) [ZZB∗ 18]. Another approach
roof or species of tree). We introduce a network which removes
to texture generation is to transform an input image to a differ-
such spatial discontinuities by leveraging learned semantic domain
ent style or domain while retaining the structure. Such image-to-
knowledge as well as the cartographic data. Scale discontinuities
image translation is implemented by the influential UNet archi-
between a larger set of synthetic satellite tiles arise from unnatural
tecture [RFB15] which passed the input through an “hourglass”
style variations over larger areas (e.g., distribution of cars through-
shaped network of convolutional layers. This allows the network
out a city or different camera sensors). To address this, SSS uses a
to aggregate and coordinate features within this bottleneck. UN-
hierarchy of networks synchronized by color guidance.
ets have been applied to generate satellite images [IZZE17] from
The SSS system contains three components – two neural archi- map data, however the resolution of the output tiles was limited
tectures to generate tiles of different resolutions (map2sat) and re- by available memory. Image-to-image networks typically disentan-
move seams (seam2cont) between those tiles as well as an interface gle the structure from the style to enable style transfer [GEB16].
allowing the generated textures to be explored interactively. SSS These have been applied to generate terrains [GDG∗ 17] and within
contributes: chained sequences [KGS∗ 18] for architectural-domain specific tex-
ture pipelines.
• A neural approach to generate massive scale-and-space continu-
ous images conditioned on cartographic data. Label-to-image translation is a variation of image-to-image
• A novel architecture to train and evaluate image-to-image gener- translation; the input is a label-map with each pixel having a
ation networks at high resolutions with reduced memory use. specific value, such as cartographic data where an area is di-
• An application to texture procedurally generated cartographic vided into distinct classes such as streets, buildings, or vegetation.
maps – allowing the the synthesis of textures for entirely novel Again, we note the application of hourglass networks [WLZ∗ 18].
areas following user style guidance. In this paper we build on Semantic Image Synthesis with Spatially-
• An application to interactive exploration of endless satellite im- Adaptive Normalization (SPADE) [PLWZ19] which uses a sim-
ages generated on-the-fly from cartographic data. pler V-shaped architecture. From a latent code, successive SPADE
Residual Blocks generate feature-maps of increasing resolution.
The textures generated by SSS have many applications, includ-
Within these blocks, a SPADE unit introduces learned per-label
ing interactive city planning, virtual environments, and procedural
style information using Spatially-Adaptive Normalization
modeling. During city planning, our textures can help stakeholders
(e.g., homeowners, planners, and motorists) understand proposed Texture blending or inpainting approaches give a way to com-
developments. For example, sit-down sessions in which stakehold- bine multiple smaller images into a larger texture. These may
ers interactively edit a map and are shown birds-eye-view (satel- match small image patches [LLX∗ 01, EF01, BSFG09], or mini-
lite) “photos” within seconds. Further, virtual environments may mize a function over a grid of pixels using seam energy mini-
be constructed using SSS’s output (e.g., a flight simulator) to tex- mization [AS07, KSE∗ 03, RKB04], exemplars [CPT03], or a Pois-
ture massive terrains on-the-fly. Finally, the streets and open areas son formulation [PGB03]. Although such local approaches pro-
of urban procedural models [PM01] of unbounded extent may be duce excellent pixel-level results, they do not contain the higher,
photo-realistically textured using our approach. semantic-level knowledge to blend textures and images with com-
plex structures. We may overcome this limitation with user in-
In the remainder of this work we will discuss the corpus sur-
put [ADA∗ 04], large datasets [HE07], and distributions learned by
rounding our work in Section 2, introduce our system in Section 3,
neural architectures [ZZL19, ZWS20, BSP∗ 19]. These learn dis-
give the details of the neural architectures and their coordination
tributions for the blended area using techniques such as guided
in section 4 before evaluating our results qualitatively and quan-
Poisson constraints [WZZH19] and symmetry [DCX∗ 18] Domain-
titatively in section 5. Source code and weights for our system
specific image blending has been studied for domains such as mi-
are available online at https://github.com/Misaliet/
croscopy [LCH∗ 15], façade texturing [KGS∗ 18], and satellite gen-
Seamless-Satellite-image-Synthesis.
eration. TileGAN [FAW19] searches a library of latent encodings
of adjacent micro-tiles to create seamless images, rather than learn-
2. Related Work ing a domain specific blending function.
Texture synthesis is the generation of texture images from ex- In contrast to relatively recent satellite images, applications of
emplars or lower dimension representations. We refer the reader cartography date back to antiquity. Today, there are high-quality,
to Wei et al. [WLKT09] for an overview. Non-neural approaches publicly available maps of much of the world [Fou21, Ser21]. We
from the AI Winter create images with local and stationary prop- may also generate maps with a variety of techniques, including pro-
erties using techniques such as Markov Random Fields to generate cedural systems; see Vanegas et al. [VAW∗ 10] for an overview. In
pixels [EL99, WL00] or patches [LLX∗ 01, EF01] from an exem- particular, we may use rule and parameter controlled geometric al-
plar image. Variations on these themes generate larger, more varied gorithms to recreate the distribution of streets [CEW∗ 08], build-
textures [DSB∗ 12]. Procedural languages to textures [MZWVG07] ings [PM01], and city parcels [VKW∗ 12]. These procedural sys-
+ seam2cont
map2sat
color-guidance
(when z > 1)
Figure 2: SSS uses a hierarchy of networks for different scales. A pair of networks is shown above, which process input cartographic maps to
large seamless images at a single scale level, z. First, a vector map of arbitrary extent is processed to create an image tile of map data. The
map2sat network then performs a translation to the satellite texture domain, before the seam2cont network removes visible tilling artifacts
to create a seamless image and the result is combined with neighboring tiles to create an image of arbitrary resolution (right). Subsequent,
smaller scale, map2sat networks use previous outputs as scale-guidance (bottom left).
variation, leading to a chaotic swings in is variation between adja- blend the input satellite image. The output of the system is seam-
cent tiles (e.g., roads suddenly changing color). Additionally, our lessly tillable satellite image, y0i , with a feature distribution similar
interactive system allows users to explore a dynamically generated to the seamless ground-truth images.
satellite image at a range of scales; in this application, it is neces-
Figure 4 shows the architecture of the seam2cont network. We
sary to generate imagery at a given scale without overly onerous
use U-Net [RFB15, ZZP∗ 17] as our backbone generator M to gen-
number of generator network applications. SSS uses a scale-space
erate masks, mi from input xi , si and ti :
hierarchy of generator networks to create realistic variation over
areas at multiple scales (Section 4.1). By evaluating the hierarchy mi = M(xi , si ,ti ) (1)
tiles lazily, the interactive system is able to quickly evaluate tiles at
arbitrary scales. To evaluate the quality of the generated mask, an output tile, y0i is
blended from si and ti by B:
The second problem is that simple tiling of the synthetic satel-
lite images to cover a large area does not produce natural images, y0i = B(si ,ti , mi ) = si ∗ mi + ti ∗ (1 − mi ) (2)
but creates an image with seams; state-of-the-art CNN architec-
tures do not provide spatial-continuity between adjacent tiles. This M is a 8 layer UNet which we train with three goals for the
is largely because of their reliance on spatially narrow bottleneck blended image y0i : (1) tileable – with si near the borders, (2) re-
layers, which allow tile-wide coordination of features at the cost alistic, and (3) internally seamless. We construct a differentiable
of limiting maximum tile size. Adding additional layers to increase loss function to achieve these three goals which can also be back-
the tile size becomes prohibitively expensive due to the memory re- propogated to improve M during training.
quirements of these large layers. Another source of seams in tiled
CNN outputs is the padding [AKM∗ 21] around layer boundaries, To address goal (1) the mask should select si at the edge of the
where synthetic values (i.e., zero or flipped) are used to pad convo- tile to ensure continuity with adjacent tiles. We implement this by
lution input; these introduce seam artifacts around the perimeter of forcing the outside part of generated mask mi to be white in F,
network outputs. We address this problem in the following section. guided by a constant circular mask c (Figure 4):
m0i = F(mi ) = mi ∗ (1 − c) + c (3)
4.1. Spatial Continuity: seam2cont Network
and we can modify equation 2 to give us the masked output:
The limited resolution of CNN image synthesis techniques, such
as those we use in the map2sat network, leads to seam discon- y0i = B(si ,ti , m0i ) = si ∗ m0i + ti ∗ (1 − m0i ) (4)
tinuities when we join multiple tiles together. Here we introduce
the seam2cont network which removes the seams, allowing tiling To support goal (2) we use a discriminator Drealism , a Patch-
over an arbitrarily large area. Given tiles generated by our image- GAN [IZZE17], to reward M for the realism of the masked output.
to-image network we introduce a novel architecture to blend pixels Drealism is trained to discriminate between generated images and
from these existing tiles to create a seam-free image. We found those from the ground-truth without seams. At evaluation time, it
blending (masking) a sufficient approach to seam removal as the takes results y0i and the constant circular mask c. The conditions
tile content is well aligned by strong cartographic conditioning, and allow the discriminator to locate the output seams [TSK∗ 19] and
blending over a wide area allows the network to identify semantic increase accuracy. The loss Drealism is then:
plausible locations for each transition.
Lrealism = Exi ,si ,ti [−Drealism (y0i , c)] (5)
For the area of one tile i, we use two independent, overlapping
layers of map2sat evaluation to synthesize tiles si and ti . The areas Following the standard GAN architecture, the loss from Drealism is
of si and ti are offset so for each tile i, there is a seam-free option, used during training to update M to be realistic.
as Figure 3. Our goal is to blend si and ti such that the result has
Goal (3) is to remove internal seams from y0i . To achieve this, we
no internal seams (as ti ), but also has no seams with neighboring
train an image-to-image CNN to locate seams and use it as a dis-
tiles (as si ). The quarters of si are continuous across the tile bound-
criminator, Dseams . It encourages M to generate a mask mi which
ary because they have come from the same output tile of a sin-
removes seams from the blended result. It is an image-to-image
gle map2sat evaluation. The problem of removing seams is trans-
translation network which learns to identify seams in patches of
formed to a masking (image blending) problem – where the mask
y0i , outputting a grayscale image that is white (1) where there is no
predominantly selects pixels from ti in the center and from si in
seam and black (0) where there is a seam. Dseams is used as an ad-
the periphery. In-between it should select pixels which maximize
versarial loss when training M by constructing a L2 loss between
realism. We propose a method to learn to generate the mask, m,
the discriminator’s output and the desired (seamless; 1) result. Fol-
which selects how to blend si and ti . By learning a mask generator,
lowing CUT [PAAEZ20], we extract random patches, P, from the
seam2cont can learn the semantic meanings of the inputs, produc-
output of mask to contribute to the loss:
ing higher quality results than optimizing for gradient [KSE∗ 03] or
search [FAW19]. Lseams = ||Dseams (P(y0i )) − 1|| (6)
For one tile area i, seam2cont takes the map image xi (including Finally, following Pix2Pix [IZZE17], we add a L1 loss to the
label and instance data), the generated satellite image with seams ground-truth in order to improve the training stability:
(si ), and the generated satellite image without seams (ti ). The output
of the seam2cont generator is a mask, mi , which is used to complete Lrecons = |y0i − yi | (7)
si c
Lrealism
m0i y0i
Dreal
ti
M Lseams
xi mi Dseams
Figure 4: The data-flow when testing the seam2cont network (black arrows). The inputs are map label images xi , as well as generated
satellite images si and ti . M generates a mask, mi which is multiplied by a fixed circle image c to create m0i . This final mask blends si and ti
to create our spatially continuous seamless image y0i . Cyan arrows: the two discriminators contribute losses when training M.
5. Results
e In the following results we use SSS with zmax = 3 and a scale factor
of f = 4 to create a 512×512 pixels image at z1 , a 2,048×2,048
map2sat 1 seam2cont 1 map2sat 2 pixels image at z2 , and a 8,192×8,192 pixels image at z3 .
Figure 9 shows the output of our technique for real-world map
guidance images; further results are given in our Additional Ma-
terials (see folder sss_8k_images/ ). Figure 10 shows the output of
our technique for synthetic guidance images generated using a pro-
cedural modeling system. Figure 11 illustrates the style continuity
seam2cont 2 map2sat 3 seam2cont 3
of SSS pipeline.
Figure 7: SSS combines a series of map2sat and seam2cont net- Figure 12 shows our interactive results. We allow users to move
works. Here we show a 3-scale (zmax = 3) pipeline. map2sat sub- around in the map, zoom in or out, and generate corresponding
networks generate satellite image tiles which are processed by the satellite images of different styles. We also provide some opera-
seam2cont networks to remove spatial discontinuities. tions to modify the map, such as randomly replacing a single tile in
the map input. All networks (map2sat and seam2cont) for 3 scales
can be loaded into the 12GB memory of an Nvidia Titan V graph-
ics card. Generation times for a 512 × 512 pixels window vary be-
xn xn+1 tween 2.4 and 3.42 seconds depending on the scale. Smaller scales
require additional network evaluations.
tn sn t n+1
sn+1 6. Evaluation
blur blur 6.1. Methodology
Evaluating the quality of composite synthesized images for neu-
y0n y0n+1 ral networks is a difficult problem; there is no canonical method
to assess the performance of image stitching. Traditional edge de-
tection technology, such as Sobel filter detection and Canny edge
Figure 8: The geometry of SSS tiles. Left, right columns: network detection, can be used for seams detection but are not optimized
outputs at scales z = n and z = n + 1. Scale-space continuity is for tile-based systems where the seam locations can be anticipated.
provided by passing a blurred crop of yn (green bottom left) to In addition, these algorithms output images with edges rather than
map2satn+1 (cyan mid-right). comparable values.
To evaluate our seam2cont network results, we use two meth-
ods. First, we design a qualitative user study survey. We ask users
to identify the best system for removal of seams between tiles to
examine the effectiveness of our seam2cont network against other
remove seams in real-world satellite images, or we may insert ad- algorithms which remove seams. Second, we introduce a novel al-
ditional sub-modules to create satellite images with smaller scales. gorithm, MoT (Mean over Tiles), to measure the ability to remove
The general complexity of the entire SSS pipeline depends on the seams in images. We evaluate against the following baselines: tradi-
scale level, z = k, and the linear scale factor, f , used between each tional image blending with a soft mask, Graphcut [KSE∗ 03], Gen-
sub-network. For example, a scale factor of f = 4 between each erative Image Inpainting [YLY∗ 19], and TileGAN [FAW19].
sub-network implies that the smaller scale sub-network has f 2 =
16 tiles covering the area of a single larger scale tile. To generate
one scale, all larger scales must also be generated. The worst-case For traditional image blending, we use s as input and blend s
number of map2sat evaluations, assuming we generate 4 tiles at with t by a soft mask shown in the right bottom corner of Figure 13
z = k, are then: (c) to obtain an image without seams. We also use s and t to run
the Graphcut algorithm by removing seams task by splice the cen-
k ter part of t to s. In order to remove seams entirely, we use a small
∑ 4/( f 2 )i−1 (13) area which is 10 × 10 pixels shown in the right bottom corner of
i=1
Figure 13 (d) as a buffer area. We train the Generative Image In-
and worst-case number of seam2cont evaluations are: painting model, which uses gated convolutions [YLY∗ 19] for the
image inpainting task. We train the Generative Image Inpainting
k
model with our generated seams dataset s and a diamond shape
q
∑( 4/( f 2 )i−1 − 1)2 (14)
i=1
mask shown in the right bottom corner of Figure 13 (e). This is to let
the model learn to generate the center diamond area which covers
the seams. Generative Image Inpainting does synthesize good re-
sults, but it always changes the original content structure in s since
Figure 9: Results from the SSS pipeline. Rows: examples of generated satellite images from real-world map data. Columns: left to right show
scales z = 1, 2, 3. Inset are the corresponding map inputs.
Figure 10: Result of our system on synthetic procedurally generated map data.
there is no map data input for it. Last, we train PGGAN [KALL17] but can takes a small low-resolution image as guidance to generate
with ground-truth satellite images of 256×256 pixels resolution to large, high-resolution, images.
implement TileGAN over the satellite image domain. TileGAN can
generate large seamless images by searching the latent space of the Figure 13 shows selected representative output tiles for each of
pre-trained PGGAN model to create tiles that are stitched into a the baselines. Traditional image blending in Figure 13 (c) does a
large image. It is also not an image-to-image translation system, good job generally, but since it uses a fixed mask, the rough shape
difference
0.000006
0 128 256
Pixel position
Figure 15: Selected MoT algorithm results: difference value response for ground-truth, ground-truth with seams, our map2sat with no
seam2cont and our whole pipeline. Images with strong vertical and horizontal seams have a strong response on the tile boundary at 128
pixels.
[DSB∗ 12] DARABI S., S HECHTMAN E., BARNES C., G OLDMAN [PAAEZ20] PARK T., A LEXEI A. E FROS R. Z., Z HU J.-Y.: Contrastive
D. B., S EN P.: Image melding: combining inconsistent images using learning for unpaired image-to-image translation. ECCV (2020). 4
patch-based synthesis. ACM SIGGRAPH 31, 4 (2012), 1–10. 2 [PGB03] P ÉREZ P., G ANGNET M., B LAKE A.: Poisson image editing.
[EF01] E FROS A. A., F REEMAN W. T.: Image quilting for texture syn- In ACM SIGGRAPH 2003 Papers. 2003, pp. 313–318. 2
thesis and transfer. In Proceedings of the 28th annual conference on
[PLWZ19] PARK T., L IU M.-Y., WANG T.-C., Z HU J.-Y.: Semantic
Computer graphics and interactive techniques (2001), pp. 341–346. 2
image synthesis with spatially-adaptive normalization. In IEEE CVPR
[EL99] E FROS A. A., L EUNG T. K.: Texture synthesis by non- (2019), pp. 2337–2346. 1, 2, 3, 5, 6, 10
parametric sampling. In IEEE ICCV (1999), vol. 2, IEEE, pp. 1033–
[PM01] PARISH Y. I. H., M ÜLLER P.: Procedural modeling of cities.
1038. 2
In Proceedings of SIGGRAPH 2001 (2001), pp. 301–308. doi:10.
[FAW19] F RÜHSTÜCK A., A LHASHIM I., W ONKA P.: Tilegan: synthe- 1145/383259.383292. 2
sis of large-scale non-homogeneous textures. ACM SIGGRAPH 38, 4
[RFB15] RONNEBERGER O., F ISCHER P., B ROX T.: U-net: Convolu-
(2019), 1–11. 2, 4, 7
tional networks for biomedical image segmentation. In International
[Fou21] F OUNDATION O.: OpenStreetMap, 2021. URL: https:// Conference on Medical image computing and computer-assisted inter-
www.openstreetmap.org. 2 vention (2015), Springer, pp. 234–241. 2, 4
[GAD∗ 20] G UEHL P., A LLEGRE R., D ISCHLER J.-M., B ENES B., [RKB04] ROTHER C., KOLMOGOROV V., B LAKE A.: " grabcut" inter-
G ALIN E.: Semi-procedural textures using point process texture basis active foreground extraction using iterated graph cuts. ACM SIGGRAPH
functions. In CGF (2020), vol. 39, Wiley Online Library, pp. 159–171. 23, 3 (2004), 309–314. 2
2
[Ser21] S ERVICE D. O. S. W. M.: Digimap, 2021. URL: https://
[GDG∗ 17] G UÉRIN É., D IGNE J., G ALIN É., P EYTAVIE A., W OLF C., digimap.edina.ac.uk/. 2, 11
B ENES B., M ARTINEZ B.: Interactive example-based terrain author-
[TSK∗ 19] T ETERWAK P., S ARNA A., K RISHNAN D., M ASCHINOT A.,
ing with conditional generative adversarial networks. ACM TOG 36, 6
B ELANGER D., L IU C., F REEMAN W. T.: Boundless: Generative adver-
(2017), 1–13. 2
sarial networks for image extension. In IEEE ICCV (2019), pp. 10521–
[GEB15] G ATYS L., E CKER A. S., B ETHGE M.: Texture synthesis using 10530. 4
convolutional neural networks. In NIPS (2015), pp. 262–270. 2
[VAW∗ 10] VANEGAS C. A., A LIAGA D. G., W ONKA P., M ÜLLER P.,
[GEB16] G ATYS L. A., E CKER A. S., B ETHGE M.: Image style transfer WADDELL P., WATSON B.: Modelling the appearance and behaviour of
using convolutional neural networks. In Proceedings of the IEEE confer- urban spaces. In CGF (2010), vol. 29, Wiley Online Library, pp. 25–42.
ence on computer vision and pattern recognition (2016), pp. 2414–2423. 2
2
[VKW∗ 12] VANEGAS C. A., K ELLY T., W EBER B., H ALATSCH J.,
[geo] Geopackage. https://www.geopackage.org/. Accessed: A LIAGA D. G., M ÜLLER P.: Procedural generation of parcels in urban
2010-09-30. 3 modeling. In CGF Eurographics (2012), vol. 31, Wiley Online Library,
[HE07] H AYS J., E FROS A. A.: Scene completion using mil- pp. 681–690. 2
lions of photographs. ACM SIGGRAPH 26, 3 (July 2007), 4–es. [WL00] W EI L.-Y., L EVOY M.: Fast texture synthesis using tree-
URL: https://doi.org/10.1145/1276377.1276382, doi: structured vector quantization. In Proceedings of the 27th An-
10.1145/1276377.1276382. 2 nual Conference on Computer Graphics and Interactive Techniques
[IZZE17] I SOLA P., Z HU J.-Y., Z HOU T., E FROS A. A.: Image-to- (USA, 2000), SIGGRAPH ’00, ACM Press/Addison-Wesley Publishing
image translation with conditional adversarial networks. IEEE CVPR Co., p. 479–488. URL: https://doi.org/10.1145/344779.
(2017). 1, 2, 4, 9, 10 345009, doi:10.1145/344779.345009. 2
[KAH∗ 20] K ARRAS T., A ITTALA M., H ELLSTEN J., L AINE S., L EHTI - [WLKT09] W IE L.-Y., L EFEBVRE S., K WATRA V., T URK G.: State
NEN J., A ILA T.: Training generative adversarial networks with limited
of the Art in Example-based Texture Synthesis. In CGF Eurograph-
data. arXiv preprint arXiv:2006.06676 (2020). 11 ics (2009), Pauly M., Greiner G., (Eds.), The Eurographics Association.
doi:10.2312/egst.20091063. 2
[KALL17] K ARRAS T., A ILA T., L AINE S., L EHTINEN J.: Progres-
sive growing of gans for improved quality, stability, and variation. arXiv [WLZ∗ 18] WANG T.-C., L IU M.-Y., Z HU J.-Y., TAO A., K AUTZ J.,
preprint arXiv:1710.10196 (2017). 8 C ATANZARO B.: High-resolution image synthesis and semantic manip-
ulation with conditional gans. In IEEE CVPR (2018), pp. 8798–8807.
[KB15] K INGMA D. P., BA J.: Adam: A method for stochastic opti- 2
mization. In ICLR (2015), Bengio Y., LeCun Y., (Eds.). URL: http:
//arxiv.org/abs/1412.6980. 5 [WZZH19] W U H., Z HENG S., Z HANG J., H UANG K.: Gp-gan: To-
wards realistic high-resolution image blending. In Proceedings of the
[KGS∗ 18] K ELLY T., G UERRERO P., S TEED A., W ONKA P., M ITRA 27th ACM International Conference on Multimedia (2019), pp. 2487–
N. J.: Frankengan: guided detail synthesis for building mass models 2495. 2
using style-synchonized gans. ACM TOG 37, 6 (2018), 1–14. 2
[YLY∗ 19] Y U J., L IN Z., YANG J., S HEN X., L U X., H UANG T. S.:
[KSE∗ 03] K WATRA V., S CHÖDL A., E SSA I., T URK G., B OBICK A.: Free-form image inpainting with gated convolution. In IEEE ICCV
Graphcut textures: image and video synthesis using graph cuts. ACM (2019), pp. 4471–4480. 7
SIGGRAPH 22, 3 (2003), 277–286. 2, 4, 7
[ZWS20] Z HANG L., W EN T., S HI J.: Deep image blending. In The
[LCH∗ 15] L EGESSE F., C HERNAVSKAIA O., H EUKE S., B OCKLITZ T., IEEE Winter Conference on Applications of Computer Vision (2020),
M EYER T., P OPP J., H EINTZMANN R.: Seamless stitching of tile scan pp. 231–240. 2
microscope images. Journal of microscopy 258, 3 (2015), 223–232. 2
[ZZB∗ 18] Z HOU Y., Z HU Z., BAI X., L ISCHINSKI D., C OHEN -O R D.,
[LLX∗ 01] L IANG L., L IU C., X U Y.-Q., G UO B., S HUM H.-Y.: Real- H UANG H.: Non-stationary texture synthesis by adversarial expan-
time texture synthesis by patch-based sampling. ACM Trans. Graph. sion. ACM SIGGRAPH 37, 4 (July 2018). URL: https://doi.
20, 3 (July 2001), 127–150. URL: https://doi.org/10.1145/ org/10.1145/3197517.3201285, doi:10.1145/3197517.
501786.501787, doi:10.1145/501786.501787. 2 3201285. 2
[Mas21] M ASTERMAP O. S.: Ordnance, 2021. URL: https: [ZZL19] Z HAN F., Z HU H., L U S.: Spatial fusion gan for image synthe-
//www.ordnancesurvey.co.uk/business-government/ sis. In IEEE CVPR (2019), pp. 3653–3662. 2
products/mastermap-topography. 11
[ZZP∗ 17] Z HU J.-Y., Z HANG R., PATHAK D., DARRELL T., E FROS
[MZWVG07] M ÜLLER P., Z ENG G., W ONKA P., VAN G OOL L.: A. A., WANG O., S HECHTMAN E.: Toward multimodal image-to-image
Image-based procedural modeling of facades. ACM SIGGRAPH 26, 3 translation. In NIPS (2017). 4
(2007), 85. 2