You are on page 1of 19

IET Computer Vision

Special Section: Computer Vision for the Creative Industries

Advances in colour transfer ISSN 1751-9632


Received on 24th November 2019
Revised 10th June 2020
Accepted on 13th July 2020
E-First on 25th September 2020
doi: 10.1049/iet-cvi.2019.0920
www.ietdl.org

Francois Pitié1
1Electronic & Electrical Engineering Department, Trinity College Dublin, Ireland
E-mail: pitief@tcd.ie

Abstract: Colour grading is an essential step in movie post-production, which is done in the industry by experienced artists on
expensive edit hardware and software suites. This paper presents a review of the advances made to automate this process.
The review looks in particular at how the state-of-the-art in optimal transport and deep learning has advanced some of the
fundamental problems of colour transfer, and how far are we still from being able to automatically grade images.

1 Introduction transport (OT) [Morovic and Sun [3] (2003), Pitié et al. [4]
(2005)], that satisfying solutions for colour images were found.
Colour grading is the artistic process of changing the colour Most of the literature that followed is still based on or related to the
attributes of an image, through the adjustment of colour controls, mathematical framework of OT.
such as white balance, exposure, brightness, contrast or gamma. It Colour transfer does not, however, stop at just finding a
is an essential step in image editing and it is used at different points mapping that links two colour distributions. In practice, a number
in the movie production pipeline. It is first used to colour correct of issues need also to be addressed. In a series of papers on the
shots to reproduce accurately what was recorded. It is then used to topic, Pitié et al. [4–8] (2005, 2007, 2008) identified some of these
colour balance all the shots and establish continuity throughout the specific issues. One is that applying non-linear colour mapping can
movie. Colour grading can also be used later on towards achieving cause artefacts that need to be filtered out in post-processing steps.
specific practical effects, such as when filming nighttime scenes in Another is that any difference of content between the input and
daylight. Lastly, colour grading is used to establish a desired visual target images can cause issues during the colour transfer, and
artistic style. The task is to change the original colours, to match mitigating strategies must be put in place.
the director's desired colour scheme, whether it is the In this review, we study the recent advances in colour transfer,
monochromatic colour scheme of The Matrix, the variations of OT and deep learning for the application of colour grading images
pink in Grand Budapest Hotel or the complementary orange/blue and see how the literature has advanced on these issues.
colours of most action movies. Organisation of the review: we will first give some context to
Currently, in the industry, colour grading is achieved by the application and go through the core colour grading operations
experienced artists, called colourists, who use expensive edit used by the artists and see how these impact the colour
hardware and software suites to manually match the colour distributions (Section 2). The colour transfer problem is then
between the frames. Typical colour grading operations include introduced in Section 3 and linked to the theory of OT in Section 4.
adjusting the exposure, brightness and contrast, gammas, Recent advances will then be discussed in Sections 5 and 6, and
calibrating the white point or setting a colour mapping curve to assessed in Section 7.
non-linearly affect the colour channels. More complex colour
grading will require retouching specific areas of the images.
Typically the object of interest (actor's face or a car), would be 2 Background
masked out, tracked throughout the shot, and graded separately Colour correction usually refers to the adjustment of white and
from the rest of the scene. Grading then becomes a fairly complex black levels, exposure, contrast and white balance that are applied
operation and it would be therefore beneficial to automate this task to restore the image original and unprocessed colours. The aim of
in some way. colour correction is to provide a visually consistent baseline on
In the academic community, the problem of automating this which subsequent colour adjustments can be accurately applied.
process was first approached as an example-based colour transfer All footage and scenes are first carefully matched during that step
application. The idea, coined by Reinhard et al. [1] (2002), is to to create visual consistency throughout scenes.
find a colour mapping that, when applied to an input image, can Part of the need for colour correction used stems from colour
change the colour palette to match the one of a target image (see management issues. On the same production, footage may come
Fig. 1). Early works mainly looked at per-channel solutions [1, 2], from different cameras, each camera using its own native input
but it is only when the problem was linked to the theory of optimal colour space (e.g. Red Log Film, BMD Film Log, S-Log 2 etc.).
Also, various footages may be sent around to different post-
production teams, which may apply their own colour corrections or
might be working in different colour spaces. Still, the final output
has to look the same on any display, let it be a TV display or
cinema display. Adopting consistent colour management practices
has been a big challenge for the industry, and to standardise the
colour management workflows for production and post-production,
the Academy of Motion Picture Arts and Sciences introduced in
2014 the Academy Colour Encoding System (ACES), which aimed
Fig. 1  Colour transfer example. The source image is recoloured to match at becoming the industry standard for managing colour across the
the colour palette of a target image entire film making process. Typically artists will start from

IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322 304


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 2  Colour grading tools in DaVinci resolve. On top, colour wheels for controlling the RGB lifts, gains and gammas values. On the bottom, the user-
defined luminance curve and RGB histograms

predefined profiles and then work their way to match a precise where A is the 3 × 3 matrix of all cross channels linear
colour palette. dependencies. For instance, colour desaturation can be expressed
as:
2.1 Basic colour grading controls
(1 − s)0.2 + s (1 − s)0.7 (1 − s)0.1 ur
The basic colour grading operations available to colourists are
usually presented to the artists through colour wheels interfaces t(u) = (1 − s)0.2 (1 − s)0.7 + s (1 − s)0.1 ug (4)
(see Fig. 2), which help the artist set the lift, gain and gamma for (1 − s)0.2 (1 − s)0.7 (1 − s)0.1 + s ub
each colour channel. Histograms for each colour channels give the
artist useful feedback on their grading operation. Assuming where s ∈ [0, 2] controls the amount of colour saturation.
floating-point bitdepth, the basic colour grading operation is Curves: Colourists may need to go beyond these basic controls
defined as follows: and manually define one-dimensional (1D) piece-wise polynomial
colour mappings by directly specifying spline control points (see
t: [0, 1]3 → [0, 1]3 Fig. 2 bottom).
γr Colour lookup table (LUT): The combined effect of all
ur vr tr(ur) = arur + br successive grading operations can be baked into a single LUT. The
γg (1)
ug ↦ vg = tg(ug) = agug + bg . LUT can be 1D if the colour manipulations are done independently
on each channel. The LUT needs to be 3D if the cross-channels’
ub vb tb(ub) = abub + bb γb
operations are involved. LUT are commonly distributed to achieve
colour space conversion or to apply a specific grading profile.
The offsets br, bg, and bb are sometimes called lifts and control the Secondary and localised grading: Following a global colour
overall brightness for each channel. grade, the colourists may need to proceed to a secondary grade,
Gamma: The gamma values γr, γg, and γb can be used to convert which is applied to selected areas of the image. For instance, the
to display gamma, which is usually 2.2, but they can also be used specific grade could be applied to the yellow pixels only.
to control the midtones. Alternatively, the grade may need to be applied to a particular
Gain: The gain values ar, ag, and ab can be used to control the object, or person. In that case, the artist would need to pull a mask
of the object and track throughout the video.
colour tint, temperature, and exposure. For instance, the exposure
can be changed as follows:
2.2 Colour grading and colour statistics
exposure
t(u) = 2 × u. (2) There is a key relationship between histograms, brightness,
contrast and the colour grading that is applied. Colourists have
The white balance/tint can be changed by setting the gains to some intuition for it, as they monitor the effect of the colour grade
[ar, ag, ab] = [1/wr, 1/wg, 1/wb], where w = [wr, wg, wb] defines the on the histograms. Let us detail the mathematics of this
red, green, and blue (RGB) values of white. Typically the artist relationship.
would get these values by sampling in the image a pixel that should Assume that we apply an affine colour mapping
be white (using for instance a calibration card). u ↦ v = t(u) = au + b on the luminance alone. The mean intensity
The gain, lift, and gamma parameters are interdependent but it and variance are directly affected by the transform as follows:
may be easier to think that lift primarily affects the shadows, whilst
gamma affects the midtones, and gain affects the highlights. var(v) = var(t(u)) = a2var(u) (5)
Colour matrix: Cross channels interactions can also be defined
through a simple affine model: mean(v) = mean(t(u)) = a × mean(u) + b . (6)

t(u) = Au + b (3) What we learn from this is that changing the lift b directly affects
the overall brightness of the image. Conversely, changing the gain
a affects the variance and thus the overall contrast of the image.

IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322 305


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
v0 = t(u0) after transformation. Thus, we have this simple
relationship:

P u < u0 = P v < t(u0) , ∀u0 ∈ [0, 1] . (7)

The probability Pu u < u0 corresponds to the cumulative


distribution function Hu(u0) (e.g. cumulative histogram):

Hu(u) = Hv(t(u)) . (8)

Assuming continuous distributions, differentiating leads to the


following relationship between the probability density functions
(pdfs):

hu(u) = f ′(u) hv(t(u)) . (9)

In practice, the continuous pdfs hu and hv are approximated by


histograms.
Fig. 3  OT map for continuous distributions
The colour mapping operates on the cumulative histogram as a
warp and the histogram as a re-scaling and warp.

2.3 Remarks
At this point, we can already make two remarks.
Firstly, colour statistics can be gathered without taking notice of
the exact locations of the sampled colours. The colour mean,
variance, and histograms are all relatively insensitive to motion and
stay similar between consecutive frames. This is interesting as
colour matching is usually done between pictures with similar
content but not pixel identical (e.g. two frames from a sequence,
two images of a similar cityscape).
Secondly, extremely stretched colour mappings do have an
impact on the image noise. Noise is always present on images, due
to a combination of sensor thermal noise, debayer algorithms and
subsequent compressions artefacts and applying stretched colour
mappings may enhance the noise artefacts. For instance, it is well
known by colour artists that noisy low light images are difficult to
grade, as increasing the exposure has the effect of increasing the
Fig. 4  OT assignment for discrete distributions noise accordingly (6).

3 Colour transfer
We are now interested in the inverse problem of automatically
finding the colour grading operations to apply to a picture, to
match the colour grade of another one. This is the colour transfer
problem, as first coined by Reinhard et al. [1] (2002).
If we denote as ui the colour samples of the original image and
vi the colour samples of the target palette image, the colour palette
of the original and target pictures correspond then to the
distributions/histograms hu and hv. The problem is to find a colour
mapping u ↦ t(u) such that the new colour distribution ht(u)
matches the target palette hv.

3.1 Simple cases


Regression: if we have pixel correspondences between the source
scene and the target scene, it is possible to obtain the grading
parameters through optimisation. Consider for instance that pixels
Fig. 5  Permutations of an assignment do not change the effect on the ui = (ri, gi, bi)i in the original image are matched to pixel colours
transformed distribution. This assignment is a permutation of Fig. 4 but vi = (ri′, gi′, bi′)i in the target image, then we can recover all the
both assignments produce the same distribution affine colour transform parameters with least squares:
This relationship can be extended to curves and histograms as arr arg arb br
well. Assume that the 1D curve is increasing monotonically. That
is, if we rank luminance values from the darkest to the brightest agr agg agb bg −1
= X ⊤X X ⊤Y (10)
u0 < u1 < u2 < ⋯ < un, then the mapping does not change the abr abg abb bb
order: t(u0) < t(u1) < t(u2) < ⋯ < t(un). br bg bb 1
Most of the basic colour grading operations are increasing
monotonically thus most final grades are increasing monotonically where the matrices X = [ri, gi, bi, 1]i and Y = [ri′, gi′, bi′, 1]i stack the
too. Now, under such colour transformations, the proportion of colours of the matching pixels. This is the approach used in many
pixels that have a brightness less than a particular value u0 is the colour correction techniques (see for instance Hwang et al. [9],
same as the proportion of pixels that have a brightness less than 2014).

306 IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Histogram matching: 1D curves/LUTs can also be estimated
from the 1D histograms of both images. Starting from the
relationship of (8), the histogram matching 1D curve can be found
using the inversion of the cumulative histogram Hv(v) [15]:

t(u) = Hv−1 Hu(u) (14)

where Hv−1(α) = inf α Hv(α) ≥ α . This 1D mapping can easily


be solved by using discrete look-up tables.
Going beyond 1D LUT and estimating full 3D LUTs turns out
to be significantly more complicated. We could apply this 1D non-
linear pdf matching separately for each channel [16] but the results
are poor. It is only with the works of Morovic and Sun [3] (2003)
and Pitié [5] (2005) that practical solutions were proposed.

3.2 Problem with N > 1


For continuous distributions (Fig. 3), the relationship on the pdfs of
Fig. 6  For discrete distributions, the one-to-one assignment constraint can (9) extends to the N-dimensional case as follows:
be a problem as a single colour may need to be mapped into multiple
colours hu(u) = hv(t(u)) detJt(u) (15)

where Jt(u) is the Jacobian of t taken at u. This is a non-trivial


constraint and the simple relationship we had for cumulative
histograms does not hold anymore for N > 1 dimensions, as order
and monotonicity are not defined anymore (e.g. we cannot say that
blue>orange or that yellow>purple).
The idea of monotonicity is actually pivotal and, without it,
there are infinitely many solutions to (15). To understand this,
consider the discrete distributions problem, with two images of
identical size: the distribution transfer problem becomes an
assignment problem (see Fig. 4). Given two sets of n colour
samples {ui}i = 1: n and {vi}i = 1: n from both images, any assignment
of colour ui to vσ(i), for any permutation σ , will work

t: ui → vσ(i) . (16)

In other words, assigning pixels in the input image to randomly


selected pixels of the target image would be a valid colour grade as
the generated image would exactly follow the target distribution.
Fig. 7  In the case where c(u, v) = ∥ u − v ∥2, the Monge–Kantorovitch
See for instance how both assignments in Figs. 4 and 5 produce the
map is the gradient of a convex function. This means that the optimal
same target palette. Random assignments are unlikely to produce
colour transformation contains no colour rotation/inversion
interesting colour grading and the assignment in Fig. 4 is probably
more desirable than the one in Fig. 5.
Moments matching: if we do not have correspondences, we can
This is a real practical problem: recolouring the sky in green
still derive from (6) a simple mechanism for matching the grade
and the grass in blue may still globally produce the correct target
between two images. Given two images for which we measure the
colour palette but locally the colours will have been swapped.
mean mean(u), mean(v) and variance var(u) and var(v) for each of
the channels. Then we can recover the estimates for the gain and
lift per channel with: 4 Optimal transport
To solve the colour transfer problem, we need to do more than just
var(v) matching colour distributions. Other conditions need to be added to
a= (11)
var(u) bring interesting properties to the colour mappings. The most
suitable mathematical framework to do this is OT. The link
b = mean(v) − a ⋅ mean(u) . (12) between colour transfer and OT was first made by Morovic and
Sun [3] (2003) and Pitié et al. [4] (2005).
We can even estimate the full affine colour transform parameters
by matching the means μu, μv and covariance matrices Σu and Σv of 4.1 Monge–Kantorovitch formulation
both distributions. A popular method [10–14] is to find the
mapping that realigns the principal axes of Σv to that of Σu: In the historical formulation of OT by Monge (1781), the mapping
is not only required to match the target distribution but also to
minimise its displacement cost.
t(u) = Σv1/2Σ−1/2
u (u − μu) + μv . (13)
For discrete distributions, the Monge OT problem is to find the
permutation/assignment σ that minimises:
The square root matrices Σ1/2 u and Σv are uniquely defined as
1/2

1/2 ⊤ ⊤
Σ1/2
u = PuDu Pu with Σu = PuDuPu the spectral decomposition of E(σ) = ∑ c ui, vσ i () (17)
Σu. i
Note that the estimated mapping is not necessarily the same
mapping like the one that would be found using least squares with where c(u, v) is the transportation cost function, which is typically
matching pixels. The intension is here to match the target palette, c(u, v) = ∥ u − v ∥2 or c(u, v) = ∥ u − v ∥.
not the exact transformation of each pixel. In fact, many mappings For continuous distributions, the Monge solution minimises the
can achieve the transfer of colour distributions. following objective function:

IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322 307


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
E(t) = ∫ c(u, t(u))h (u)du .
u
u (18)
A notable special case is a 1D case, where the mapping is
actually simply the one we defined in (14).
Another interesting case is if both distributions are multivariate
In colour transfer, this OT colour map will give the desired palette Gaussians, the OT solution is linear and admits a closed-form
while minimising the overall colour changes. As we will see later, solution [27]. For two multivariate Gaussian (MVG) N(μuΣu) and
the optimal map has very interesting geometrical properties. One N(μvΣv), the linear OT solution is defined as:
can think of the displacement cost as a very reasonable prior on the
1/2
transformation: we want as few changes as possible. t(u) = Σ−1/2
u Σ1/2 1/2
u ΣvΣu Σ−1/2
u (u − μu) + μv . (22)
The Monge problem is actually very difficult to solve in its
original formulation as it is combinatorial in nature. Also, the one- This gives a better alternative to the principal axis method of (13).
to-one assignment constraint causes issues when multiple pixels The merits of this linear mapping for colour transfer are discussed
share the same colour values (see Fig. 6). by Pitié in [7, 8] (2007).
Real practical colour transfer difficulties arise because of the
one-to-one mapping constraints. Consider, for instance, the case
4.3 Geometrical properties of the OT maps
where the input image U contains a large region with sky but that
the target image V only contains a small patch of sky. The Despite its mathematical and numerical complexity, OT offers a
assignment will have to assign some of the pixels in the sky region useful framework for the colour transfer problem. The main
to something else, hence creating visual artefacts. interest is that OT maps possess very desirable geometrical
The Kantorovitch's optimal transportation formulation (1942) properties.
offers a relaxation of the one-to-one mapping constraint. Instead of The first one is that assuming continuous pdfs and
estimating a function u ↦ t(u), we are looking to estimate a c(u, v) = ∥ u − v ∥2, the OT solution always exists and is unique.
transportation plan π(u, v), that indicates the proportion of pixels Hence, if we take a continuous approximation of the colour
with colour u are mapped to a particular colour v. distributions, there is always an unambiguous OT map. Moreover,
For discrete distributions, the transportation plan π is minimised the OT map is the gradient of a convex function [18, 28]:
as follows:
t = ∇ϕ where ϕ: ℝN → ℝ is convex . (23)
π^ = inf
π
∑ π(ui, v j) ∥ ui − v j ∥ 2
(19)
i, j This property is the extension of monotonicity to N-dimensional
functions. We pointed out earlier that, in practice, 1D curves used
where π is essentially a joint histogram huv(u, v) of correspondences by colourists are most likely monotonic and preserve the order of
between colours u and v, with ∀ j, ∑i π(ui, v j) = hv(v j) and the pixel values. We cannot order colours in ℝ3 but this property is
∀i, ∑ j π(ui, v j) = hu(ui). the closest we can get to it. This means that bright areas of a
Similarly, for continuous distributions π needs to minimise: picture will remain bright after transformation and that colour
inversions or rotations will not happen. In Fig. 7 is shown the OT
E(π) = ∫u, v
π(u, v)c(u, v) du dv (20)
mapping when matching multivariate Gaussians distributions. The
OT mapping is only made of pulling and pushing, no curl or
rotation.
and π(u, v) is a joint pdf of correspondences between u and v.
Importantly, for continuous distributions, the Kantorovitch 4.4 Using OT in colour transfer
solution is actually the Monge solution, i.e. the best map is indeed
OT offers a solid mathematical framework for estimating the
a one-to-one mapping: π(u, v) = δ(t(u) − v). In practice, the
colour transfer mapping between colour distributions and the
continuous case can be approximated by increasing the number of
geometrical properties the optimal colour maps are a good match to
pixels n. For large datasets, the discrete Kantorovitch solution is
the kind of operations that colourists do.
almost a one-to-one assignment.
The practical details on how to use OT are however key. One
OT is a rich mathematical problem. It originated in optimal
issue is the size of the distribution. As the cost of computing the
transportation and allocation of resources applications (e.g. how to
OT grows very quickly with the number of samples, Morovic and
minimise the cost of moving mining ores to factories) but it is now
Sun [3] (2003), and many other works, first discretise the colour
used in many applications in a wide range of fields and major
distributions into a handful of clusters (e.g. with k-means). Since
mathematical advances have been made in recent years [17–20].
the number of clusters is relatively low, the OT plan π(u, v) is not a
Reference books on OT include the monographs by Villani [19, 20]
straightforward one-to-one mapping and input colour clusters will
(2003, 2009) and more recently by Peyré and Cutiri [21] (2018).
be assigned to multiple target colour clusters. Morović proposes to
One notable feature of OT is that it defines very useful
randomly sample the target colours according to the proportions
distances between histograms and probability measures. These OT
given by the plan π(u, v). This process is, in essence, similar to a
distances are also known as Wasserstein distance in statistics and
randomised dithering.
earth mover's distance (EMD) in computer vision [22]:
The first solution to work on very large datasets was proposed
by Pitié et al. [4, 6] (2005). The proposed solution is to
W pp(U, V) = min
π
∫ u, v
π(u, v) ∥ u − v ∥ p du dv . (21) approximate the original Monge–Kantorovitch OT problem by a
sliced approximation over 1D distributions. The idea is to break
down the problem into a succession of simultaneous 1D
4.2 Solving the OT problem distribution transfer problems. For a series of colour directions
The cost of computing the Kantorovitch OT solution is still very en ∈ ℝ3, the colours are projected as u ∈ ℝ3 ↦ u⊤en ∈ ℝ. The
expensive as its complexity grows as O(n3log(n)) with the number projection amounts to applying a colour filter of colour e and
of points n [23]. The discretised problem can be numerically solved looking at the resulting intensity image. The advantage of the
by linear programming using the simplex algorithm [24]. Several projection is that it brings the problem down to a 1D colour
specialised algorithms for solving the transport problem also exist, transfer problem, which we know how to solve using (14).
notably the northwest corner method and the Vogel's Intuitively, we are trying to solve the problem by iterating through
approximation [25]. Recently, in a landmark paper, Cuturi [26] all the possible colour components of the picture, e.g. cycling
showed that a further entropic relaxation of OT could lead to a through, not only red, green and blue, but also magenta, cyan,
much more efficient implementation using Sinkhorn's algorithm. A orange, purple, yellow, turquoise, grey etc. This approximation of
detailed review of this technique and other OT solvers can be the original OT problem has then been studied in detail as the
found in [21]. sliced Wasserstein distance [21, 29, 30] (2012, 2015, 2018). The

308 IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 8  Effect of the colour grading post-processing for varying parameters

Fig. 9  Results of colour grading post-processing using Pitié et al. [6] and Levin et al. [31]

method is computationally very efficient and can deal with a very 5.1 Gradient matching
large number of data points and the distribution does not need to be
segmented into clusters, while being very close to the OT solution This problem was first recognised by Pitié et al. [4, 6] (2005). The
[6]. proposed solution was inspired by the work of Pérez et al. [32]
(2003) on Poisson image editing. The idea is to loosely match the
gradient field of the graded picture to the one of the original
5 Colour grading post-processing picture. That way the structure of the original image is preserved.
Figs. 8 and 9 show that applying colour mapping may produce We present here a slightly simplified version of the energy
some grainy artefacts. The artefacts are caused by a stretch of the optimised in [4]:
colour mapping as any slight variations in the original image
become magnified. As discussed in Section 2.2, the image variance
is affected by the stretch as var(au) = a2var(u). Stretches in the
E(W) = ∫ ∫ ∥ ∇W − ∇t(U) ∥ + λ∥ V − W ∥ dx dy .
Ω
2 2
(24)
colour map can be due to large differences in the content of the
images (see Fig. 10) or simply because the desired transfer is The term ∥ ∇W − ∇U ∥2 forces the input image gradient to be
actually extreme. It is often difficult to fully address this problem preserved and ∥ W − t(U) ∥2 preserves the constructed colour
before applying the transfer. mapping (for simplicity the spatial indices x, y have been omitted).

IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322 309


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 10  For continuous distributions, one region of the input space can be mapped to many regions by non-linearly stretching the mapping. This kind of
stretch will happen in colour transfer when objects have different sizes in the input and target images

Fig. 11  Estimation of Levin et al. [31]’s local gain maps arr, arg, arb, agr, agg, agb, abr, abg, abb and offset maps br, bg, bb for the image pairs of Fig. 8, with the
explicit approach of Pitié [33]. The gain and offset maps are shown as a colour picture (e.g. red = arr /5 + 0.5, green = arg /5 + 0.5, and blue = arb /5 + 0.5)

The original paper [4] also include weight fields which control 5.2 Matting Laplacian
where the gradient does not need to match the input image. The
energy can be rewritten as a quadratic function: To be able to change the contrast, another approach is to make
local approximations of the colour grading with affine colour
transforms. As affine transforms have a constant stretch
E(W) = tr W ⊤LW + ∥ V − W ∥2 (25)
detJt(u) = Σv 1/2 / Σu 1/2, they do not produce visible artefacts
(besides a possible increase of the noise). By only making local
where the sparse matrix L is the Laplacian matrix. affine approximations, the impact on the overall colour distribution
One issue with this approach is that by fixing the gradient is minimum. Ideally, each local affine transform should be
values, we are essentially assuming that the mapping is locally a consistent with the neighbouring affine transformations. We can
simple offset t(u) = u + b. This means that, locally, the image formalise this using the Matting Laplacian framework of Levin et
contrast stays the same, regardless of t(u). We can see on the last al. [31] (2008). The idea is to jointly optimise for the local affine
row of Fig. 9 that the portrait looks slightly blurred. The reason is parameters Ai, bi at each pixel i and the filtered image W:
that the overall contrast is not coherent with the local contrast.

310 IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
E(W, A, b) = ∑∑ ∥ Aiu j + bi − w j ∥2 6 Colour transfer beyond optimal transport
i j ∈ Ni (26) Colour transfer does not stop at OT. The literature on colour
+λ ∥ wi − t(u)i ∥2 + ϵ ∥ Ai ∥22 transfer is large [see review by Faridul et al. [39] (2014)] and a
number of fundamental problems still remain unsolved.
where u j, w j and t(v)i are the colours in U, W and the graded image
t(U) at pixel j and i. Ni is the neighbourhood at location i, and 6.1 Colour transfer challenges
∥ Ai ∥22 = ∑kl Ai(k, l)2 is the L2 norm. The parameter λ controls the Recall that colour transfer is about finding a transformation on an
fidelity to the graded image t(U). input image that can match the colour palette of a target image.
At this point, following the original approach of Levin et al., we Carefully looking at some of the assumptions we made so far will
can integrate A, b out of the problem and generate a marginalised help us identify key outstanding points that need to be addressed.
estimate of W:
6.1.1 Colour palettes are not colour distributions: The first
E(W) = min E(W, A, b) . (27) assumption we made was that the colour palette could be
A, b
approximated by the colour distribution. This is not always true. As
This leads to a quadratic energy function: we have seen in Section 4, many colour mapping can achieve a
colour distribution transfer, but not all will produce the correct
assignments, and cross-matches between different objects are
E(W) = tr W ⊤LMW + λ ∥ W − t(U) ∥2 (28)
possible.
If OT gives reasonable estimates, these are not always accurate.
where the sparse Matting Laplacian matrix LM is computed from For instance, the linear OT map of (22) cannot estimate the
the input image U [31]. Alternatively, as proposed in Pitié [33] desaturation of (4). The reason is that the linear OT map has to be
(2016), we can explicitly solve for A, b by minimising: symmetric definite positive and the matrix in (4) is not. In practice,
there is not much difference in the final grade because the
E(A, b) = min E(W, A, b) (29) distributions do match, but both images would be different in some
W
subtle ways as seen in Fig. 12.
Contextual information needs thus to be exploited to guide the
and recover the filtered image W by simply applying the local
colour transfer. This could be done using pixel correspondences for
affine transforms:
instance.
wi = Aiui + bi . (30)
6.1.2 Distributions rarely match: The second assumption made
in the OT approach is that the colour distributions should match.
The Matting Laplacian approach is very successful at preserving
This, again, is not necessarily true. Images used in colour transfer
the original image structures and it has now become the method of
applications are never identical. They could be frames taken from
choice for reducing colour transfer artefacts in colour transfer and
different shots or they could be radically different images.
style transfer applications [34–38] (2013, 2016, 2017, 2018, 2019).
Frequently objects will differ in size (e.g. the amount of visible
Results for various values of λ are shown in Fig. 8. Good
sky), some objects would have moved or that are only present in
parameters are typically around λ = 10−3 and ϵ = 10−3. The idea is one image. In all these cases, the OT map will be affected. We
that you want to keep ϵ as low as possible to obtain optimal must therefore take care of the distributions before OT can be
sharpness. The value for λ depends on the quality of the initial applied.
colour grade. Low values of λ guarantee maximum fidelity to the
original image structure.
6.1.3 Colour transfer is more than colour mapping: The last
Existing works use the implicit derivation from Levin's original
assumption is that one-to-one colour mappings can transform the
paper (see (28)), without ever computing the actual values for the
grade of the image. This is true in first approximation for most
gain and offset values. Directly estimating the gain and offset
tasks, but this is not always true. Take for instance the application
values as in Pitié [33] would, however, be more interesting for the
of matching lighting conditions for pictures taken at different times
colourists as these maps can easily be edited and applied at
of day, under different weather conditions. Properly re-lighting
different resolutions. Fig. 11 shows the estimated gains and offsets
images would require information about the 3D scene geometry,
maps for the image pair of Fig. 8.
materials, lights setup etc., and no global or local colour grade

Fig. 12  Example when the OT solution differs from the ground truth mapping. On the right the original image, in the middle the desaturated image following
(4) with s = 1. On the right, the results of the OT solution when using the middle image as a target. Brightness values do differ. At the locations highlighted in
red and green, the grey values are 0.4863 and 0.465 in the ground truth but 0.4756 and 0.4471 in the OT solution

IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322 311


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 13  Limitations of using colour mappings for colour transfer. In these examples, the colour mappings are locally estimated as optimal affine colour
transforms (see Section 5.2). Despite a perfect pixel alignment, the estimated colour maps cannot capture the day time to night time transition (especially the
artificial lights)

Fig. 14  Style transfer results (from [62]). The stylisation effect on the top row is impressive. By increasing the content fidelity λ, bottom row, a more realistic
colour grading can be obtained. Stylisation artefacts, however, can still be visible

could possibly account for complex shadows and intricate lighting This means that colour transfer may need to operate in a
effects. different space than the colour space, as colour alone is not enough
An example of this problem is demonstrated in Fig. 13, where to identify how a pixel should be recoloured.
we try to match the first image of the sequence to photographs
taken later during the day. All the pixel locations are aligned and 6.2 Guiding the colour assignments
we can directly estimate local affine transforms between both
images. To do this we use Levin's Matting Laplacian framework. In this section, we look at the solutions proposed in the literature to
Even in this best case scenario, where we can find correspondences guide the colour transfer and deal with colour mismatch and
for each pixel, it is clear that the quality of the grade deteriorates content differences between images.
with the day-to-night transition. What is happening is that there is
no affine colour transform that can predict how pixels will be 6.2.1 Pixel correspondences: One approach to avoid
illuminated at night. misassignments is to establish pixel correspondences between the
input and target images and use these to constrain the colour

312 IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
mapping estimation. This is the approach used in works such as 6.2.5 Relaxing the optimal transport constraints: A number of
Oliveira et al. [40] (2015), Hwang et al. [9] (2014), Park et al. [41] works have tried to jointly address the optimal transfer problem
(2016), Vazquez-Corral and Bertalmio [42] (2014) and Frigo et al. and the need to preserve the original picture structure [see
[43] (2015). In these papers, pixel correspondences are used to Papadakis et al. [55, 56] (2010), Freedman and Kisilev [57]
regress basic global affine colour transform parameters and the (2010)]. In Ferradans et al. [58] (2013), for instance, the OT
gammas. The aim is to achieve a consistent white balance and problem is reworked to relax the mass conservation constraint and
gamma across different images or a video. When matching vastly allow for additional regularisation terms. The joint optimisation is
different images, classic tracking methods will fail. Dedicated usually made possible because the structure-preserving energies
matching algorithms can be found in the work Shih et al. [38] defined in Section 5 yield simple quadratic regularisation terms. If,
(2013) Efros and Freeman [44] (2001) and more recently in Liu et as in Papadakis et al. [55], we want to match the original gradient
al. [45] (2018), but these only give rough, sparse, noisy matches. (Pitié et al. [6]), the regularisation term to include is:

6.2.2 Supplying semantic masks: Another possible approach tr W ⊤LW . (33)


used by deep learning works [34–37] is to produce semantic label
maps for the images. Modern semantic segmentation algorithms Similarly, as proposed in deep photo style transfer [36], Levin et
such as DeepLab [46] (2018) can generate masks for common al.'s [31] regularisation penalty can be included by simply adding
classes (e.g. sky, grass, trees, buildings, cars, people etc.), which this regularisation term to the overall objective function:
can then be used to apply per-class colour transfers. We still need
to deal with possible content mismatches as some labels might be tr W ⊤LMW . (34)
present in the input image but not in the target image. Luan et al.
[36] (2017) proposed to avoid these orphan labels by constraining
the input semantic labels to be chosen among the labels of the 6.3 Neural colour transfer
reference style image. The idea is to force the semantic As discussed in Section 6.1.3, colour transformations that operate
segmentation algorithm to find the nearest available label (e.g. on the colours alone cannot handle difficult relighting situations.
choosing ‘sea’ instead of ‘lake’). The semantic maps does not need Early works have quickly understood that working in different
to be precise to be useful as they are mainly here to compensate for transformed representations could help achieve better style transfer.
large content variations in the distributions. Also, any error is For instance, some elements of image contrast and style are
usually dealt with by the post-processing filter. transferred in the bilateral space in Bae et al. [59] (2006), in the
Laplacian pyramid in Li et al. [60] (2005) and the Haar pyramid in
6.2.3 Using simpler colour models: As artefacts mainly arise Sunkavalli et al. [61] (2010).
with non-smooth colour mappings, a solution to mitigate the
impact of content mismatch is to impose some smoothness on the 6.3.1 Neural style transfer: The recent advances in deep neural
colour mapping. For instance, considering a linear mapping as in networks have however given us much more powerful feature
(22) will avoid any risk of visible artefacts. Usually, some non- transforms. Gatys et al. [62] (2016) showed in their seminal work
linearity is necessary when the styles between the input and target on neural style transfer that the distributions of deep features can
images are very different. Then a popular way to obtain smooth very accurately encode the visual style of an image. From the input
non-linear functions is to consider thin plate splines (TPS) as in image U and target image V, the deep feature maps ϕ(U) and ϕ(V)
Grogan et al. [47–49] (2012, 2017). There is also the option to can be obtained by evaluating the forward pass of a pre-trained
approximate the estimated OT map as a TPS [see Frigo et al. [43] deep convolutional neural network, such as VGG-19 [63], and
(2015)]. stacking all the tensor values at each layer in the network as one
Alternatively, we can impose smoothness on the distributions column vector. Gatys et al. then proposed to achieve style transfer
themselves as it will have the implicit effect of smoothing the by engineering a picture W, such that the content of W matches the
resulting colour map. This can be done by smoothing the content of U and the style of V:
histograms or by employing a kernel density estimates (see
Silverman [50]) with large bandwidth h: E(W) = Estyle(W, V) + λEcontent(W, U) (35)
n
1 u − ui
nh i∑
hu(u) ≃ Kh (31) where the content is compared in the deep feature space:
=1
h
Econtent(W, U) = ∥ ϕ(W) − ϕ(U) ∥22 (36)
where Kh is the kernel function. Increasing h, increases the
smoothness the pdf estimate and of the resulting OT map. The and the styles are represented by the Gram matrix of the feature
most popular approach is probably to explicitly model the vectors:
distribution as Gaussian mixture models [48, 49, 51–53]:
Estyle(W, V) = ∥ ϕ(W)ϕ(W)⊤ − ϕ(V)ϕ(V)⊤ ∥22 . (37)
nc
hu(u) ≃ ∑ wiN μi, Σi . (32) This term is the Frobenius norm of the difference between the
i=1
Gram matrices of the neural network responses. Minimising the
quantity corresponds to minimising the maximum mean
6.2.4 Cluster equalisation: To deal with objects changing size discrepancy between the two distributions [64]. The image W itself
between images, a solution is to reduce the relative size of can be optimised by gradient descent. Backpropagation is used to
distribution clusters. In this way, the variations in colour compute the gradient of the loss function (35) with respect to the
proportions are limited [see Pitié et al. [4] (2005) and Neumann input image pixel values.
and Neumann [54] (2005)]. This strategy is specially adapted when Results reproduced from [62] are shown in Fig. 14. While the
working with GMMs. For instance, Grogan et al. [49] (2017) limit method shows impressive performance for artistic stylisation (see
the GMM to 50 clusters and normalise all the weights to wi = 1. top row), the artefacts introduced by the style transfer make this
Directly manipulating the distributions comes however with approach inappropriate for direct use as a colour transfer method.
many drawbacks. One is that small clusters have their relative
importance increased and thus increase the number of orphan 6.4 Deep photo style transfer
clusters that do not have a match in the target image. Also, if the
distributions were actually correct to start with, we are now taking Luan et al. [36] adapted the neural style transfer approach to also
the risk of biasing the colour transfer. The best solution is often to include specific losses tailored to the colour transfer problem. In
leave most of the correction process to the post-processing filter. particular, Luan et al. use the Matting Laplacian regularisation

IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322 313


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 15  Colour grading results for the image pairs 1, 2, 3 of [36], see Table 1 for the list of methods

term. They also introduce semantic segmentation masks to guide A few colour transfer examples are shown in Figs. 15–21. Luan
the colour transfer. et al.'s technique is labelled as DPST. What is remarkable is that

314 IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 16  Colour grading results for the image pairs 4, 5 of [36], see Table 1 for the list of methods

complex colour inversions are possible. For instance, in Fig. 16-⩾ using an encoder–decoder architecture (see Fig. 22). The encoder is
and Fig. 18-⑳, DPST can selectively recolour the buildings at night VGG-19 and the decoder learns to reconstruct the input image.
time. The idea is then to start from the input image U, compute its
One notable issue with the approach is that generating a picture deep features ϕ(U), apply a simple linear pdf transfer on the deep
requires a costly optimisation. Also, gradient descent is not feature distributions and then come back to the image domain
guaranteed to reach the global minimum. Looking at the results, we using the decoder network.
find that strong re-colouring choices are being made and colours
tend to be too intense. We suspect that this is because of a poor 7 Results
convergence of the gradient descent optimisation. Examples
showing optimisation failures can be seen in Fig. 20-37 or The difficulty with evaluating colour transfer approaches is the
Fig. 19-44. lack of accepted objective benchmark on which the research
community could compare their results. Part of the reason is that
6.5 Using an encoder–decoder architecture for a closed-form any evaluation requires an artistic assessment of the transfer. In
solution fact, recent works do start to include user studies to report user
preference scores.
To avoid this slow optimisation framework, Li et al. [35] have In this review, we are mainly interested in the recent advances
trained a decoder network that, given some deep features φ, can in colour transfer that address complex colour style transfers for
predict a pre-image X such that ϕ(X) = φ. The network is trained images. We are thus not going to review colour corrections that

IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322 315


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 17  Colour grading results for the image pairs 13, 15 of [36], see Table 1 for the list of methods

only necessitate global affine or gamma corrections. We will The technique operates by matching the 50 most important
instead consider example-based colour style transfer applications. colour clusters in both images. It is particularly faithful when
We selected a number of challenging input/target image pairs from assigning colour matches (see Fig. 15-② for instance). The cluster
the list of 60 image pairs provided in [36]. The images are size equalisation makes it however prone to mismatch (see for
presented in Figs. 15–21. The circled numbers on the figures instance the colour inversion in Fig. 17-⑬). Note that Grogan and
correspond to the pair index in that dataset. Dahyot [48] are interested in an interactive application, thus it is
The list of evaluated methods and their label in the figures are expected that the colourist will manually fix these incorrect
detailed in Table 1. The range of methods covers most of the assignments [48].
significant advances and approaches in the colour transfer. We One other issue is that TPS is too smooth to be able to capture
indicate with the superscript ∗ when the method is followed by the changes across regions that have close colours (see for instance
Matting Laplacian filter as a post-process. Fig. 16-⩾, where the sky and the building are mapped together,
IDT (Pitié et al. [6], 2005), is a reference in the field and is our instead of separately).
baseline approximation of the OT solution. OTreg (Ferrandans et al. [58], 2013). The method is based on a
GMM (Grogan and Dahyot [48], 2019). The method takes a regularised OT framework. The framework relaxes the constraint
Gaussian mixture model approximation of the colour distribution that cluster sizes need to match and also adds a penalty to preserve
and also makes an TPS approximation for the colour mapping. The the image structure.
technique also equalises the colour cluster sizes, as discussed in The method operates on colour clusters, which can cause some
Section 6.2.4. banding artefacts (see Fig. 23), as, contrary to Grogan and Dahyot

316 IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 18  Colour grading results for the image pairs 20, 24 of [36], see Table 1 for the list of methods

[48], there is no implicit smoothness imposed on the transform. PWCT (Li et al. [35], 2018). Closed-form deep colour transfer
Note however that this can be easily addressed by using the approach (see Section 6.5).
Matting Laplacian filter as a post-process (OTreg*). The encoder–decoder structure allows for a fast colour transfer.
For most cases, the results, after filtering, are very similar to the However, the raw results are very noisy (see Fig. 24). Once the
baseline IDT approach of Pitié et al. [6] et al. images have been post-processed (see PWCT*), the results are quite
MKL (Pitié and Kokaram [8], 2007). This is the linear OT good and differ in interesting ways from the baseline OT solutions.
solution. The colour distributions are approximated as multivariate For instance, contrary to GMM, OTreg, MKL, IDT, the method can, to
Gaussians. some extend, deal with day-to-night transitions (see the bridge in
The results are similar to the baseline IDT approach of Pitié et Fig. 16-⩾,the house in Fig. 18-⑳ and the sky in Fig. 16-⩽ and
al. [6] but they usually lack the contrast or the colour accuracy of Fig. 18-24).
non-linear transforms. The method relies, however, heavily on the Matting Laplacian
DPST (Luan et al. [36], 2017). The method is the deep photo filter and noisy artefacts still show through after filtering (see the
style transfer approach discussed in Section 6.4. The method gives textured sky in Fig. 15-② for instance).
impressive results on some difficult image pairs, especially for day- Semantic labelling. In addition to the state-of-the-art, we also
to-night transitions (see Fig. 16-⩾ and Fig. 18-⑳). The wanted to see how providing semantic labels improves the quality
convergence of the method is an issue, however, and strong of the colour transfer. We modified the code for (IDT), (MKL) and
artefacts can be seen in Fig. 20-37, Fig. 19-44. (regOT) to also operate on semantic masks. We indicate that a
method relies on semantic masks with the subscript s. For instance,

IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322 317


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 19  Colour grading results for the image pairs 44, 57 of [36], see Table 1 for the list of methods

in IDTs∗, we apply IDT independently on each of the semantic labels because, precise, dense and complete pixel correspondence maps
and then post-processed the composite image using the Matting are simply never available. Abstracting the analysis to distributions
Laplacian filter. allows us to extend solutions to all pixels in the image. Two of the
Most of the day-to-night transfers can be suitably achieved if long-standing issues with OT methods are that: (i) due to content
semantic masks are given, even when using the baselines IDT and changes, distributions rarely match; and (ii) monotonic OT
MKL. Also, potential issues related to content changes disappear solutions cannot cope with the complex transfer, like in day-to-
(see for instance Fig. 17-⑮), and colour inversions are possible night transitions.
(see Fig. 20-49 and Fig. 20-50). New artefacts can, however, be The addition of semantic masks seems to make an instant
produced if the masks are incorrect (e.g. the sky in Fig. 21-59 is improvement to all OT methods. It is a very efficient and affordable
split into different segments, creating the white band on the right). way of dealing with content mismatches and complex localised
Also, providing semantic masks nullifies the need for grading. The advantages are also that semantic masks can easily be
regularising the OT solution or to equalise the cluster weights. edited by the colourist and integrated into their workflow. When
Indeed the regularisation of OTRegs∗ acts as a bias in the using semantic masks, it seems to be advantageous to stick to
simple, unbiased, methods (e.g. IDT [6], MKL [8]) as making
estimation, and in most situations, the simple IDT s∗ is superior (e.g. assumptions about the shape of the transform and colour
the colours are better reproduced in Fig. 20-50, Fig. 15-① or distributions lead to biases in the estimate and can cause failure in
Fig. 15-②). edge cases.
The recent advances in deep learning offer the most promising
8 Discussion and conclusion avenue to go beyond colour mapping. By working directly in the
deep feature space, complex transforms become possible. The
As we have seen in this review, OT is still the cornerstone of the
approach is still relatively novel, and there is scope to address a
colour transfer applications. It is almost always necessary,
number of issues. For instance, if the optimisation scheme of Luan
318 IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322
This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 20  Colour grading results for the image pairs 37, 49, 50 of [36], see Table 1 for the list of methods

et al. [36] is not practical, the encoder–decoder approach of Li et Laplacian cannot work for day-to-night transitions or other
al. [35] is quite imprecise and relies heavily on the post-processing complex transfers. At its core, the Matting Laplacian assumes that
filter. A better approach is still required. the colour transfer can be explained by a simple local one-to-one
Lastly, Levin's Matting Laplacian regularisation can be very colour mapping. This is however not always the case. The solution
effective for suppressing colour grading artefacts.. We propose that to go beyond the Matting Laplacian will probably come from
explicitly estimating its local affine parameter fields [33] can neural network approaches, as they seem to hold the key to
provide colourists a simple way of editing non-global colour complex image manipulations.
transfer maps. Yet, as we have seen in this review, the Matting

IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322 319


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 21  Colour grading results for the image pairs 59, 60 of [36], see Table 1 for the list of methods

Table 1 Labels of the methods used the evaluation

320 IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 22  Architecture of PhotoWCT [35]. PhotoWCT is based on a classic VGG19 encoder/decoder architecture, with max pooling and unpooling. A linear
distribution transfer is performed at the feature level using a principal axis transfer method (see (13))

Fig. 23  Banding artefacts can arise when working with clusters, such as in Ferrandans et al. [58] (2013)

Fig. 24  Results of the encoder–decoder-based colour transfer of Li et al. [35] (2018), prior to post-processing. Severe artefacts need to be smoothed out by
the Matting Laplacian filter

IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322 321


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
9 References [35] Li, Y., Liu, M.-Y., Li, X., et al.: ‘A closed-form solution to photorealistic
image stylization’. ECCV, Munich, Germany, 2018
[1] Reinhard, E., Stark, M., Shirley, P., et al.: ‘Photographic tone reproduction for [36] Luan, F., Paris, S., Shechtman, E., et al.: ‘Deep photo style transfer’. 2017
digital images’, ACM Trans. Graph. (Proc. ACM SIGGRAPH 2002), 2002, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
21, (3), pp. 267–276 Amsterdam, the Netherlands, 2017, pp. 6997–7005
[2] Neumann, L., Neumann, A.: ‘Color style transfer techniques using hue, [37] Penhouet, S., Sanzenbacher, P.: ‘Automated deep photo style transfer’.
lightness and saturation histogram matching’. Computational Aesthetics, arXiv:1901.03915 [cs], arXiv: 1901.03915, 2019
Girona, Spain, 2005a, pp. 111–122 [38] Shih, Y.-C., Paris, S., Durand, F., et al.: ‘Data-driven hallucination of different
[3] Morovic, J., Sun, P.-L.: ‘Accurate 3D image colour histogram times of day from a single outdoor photo’, ACM Trans. Graph., 2013, 32, pp.
transformation’, Pattern Recognit. Lett., 2003, 24, (11), pp. 1725–1735 200:1–200:11
[4] Pitié, F., Kokaram, A., Dahyot, R.: ‘N-dimensional probability density [39] Faridul, H.S., Pouli, T., Chamaret, C., et al.: ‘A survey of color mapping and
function transfer and its application to colour transfer’. Int. Conf. on its applications’, Eurographics (State of the Art Reports), 2014, 3, (see http://
Computer Vision (ICCV'05), Beijing, China, 2005a dx.doi.org/10.2312/egst.20141035)
[5] Pitié, F., Kokaram, A., Dahyot, R.: ‘Towards automated colour grading’. 2nd [40] Oliveira, M., Sappa, A.D., Santos, V.: ‘A probabilistic approach for color
IEE European Conf. on Visual Media Production (CVMP'05), London, UK, correction in image mosaicking applications’, IEEE Trans. Image Process.,
2005b 2014, 24, (2), pp. 508–523
[6] Pitié, F., Kokaram, A., Dahyot, R.: ‘Automated colour grading using colour [41] Park, J., Tai, Y.-W., Sinha, S.N., et al.: ‘Efficient and robust color consistency
distribution transfer’, J. Comput. Vis. Image Understand., 2007, 107, (1–2), for community photo collections’. Proc. of the IEEE Conf. on Computer
pp. 123–137 Vision and Pattern Recognition, Las Vegas, Nevada, USA, 2016, pp. 430–438
[7] Pitie, F., Kokaram, A., Dahyot, R.: ‘Enhancement of digital photographs [42] Vazquez-Corral, J., Bertalmío, M.: ‘Color stabilization along time and across
using color transfer techniques’, Image Processing Series. (CRC Press, Boca shots of the same scene, for one or several cameras of unknown
Raton, FL, USA, 2008), pp. 295–321 specifications’, IEEE Trans. Image Process., 2014, 23, (10), pp. 4564–4575
[8] Pitié, F., Kokaram, A.: ‘The linear Monge-Kantorovitch linear colour [43] Frigo, O., Sabater, N., Delon, J., et al.: ‘Motion driven tonal stabilization’.
mapping for example-based colour transfer’. 4th European Conf. on Visual 2015 IEEE Int. Conf. on Image Processing (ICIP), Québec city, Canada,
Media Production, 2007. CVMP 2007, London, UK, 2007, pp. 1–9 2015, pp. 3372–3376
[9] Hwang, Y., Lee, J.-Y., So Kweon, I., et al.: ‘Color transfer using probabilistic [44] Efros, A.A., Freeman, W.T.: ‘Image quilting for texture synthesis and
moving least squares’. Proc. of the IEEE Conf. on computer vision and transfer’. Proc. of the 28th Annual Conf. on Computer Graphics and
pattern recognition, Columbus, OH, USA, 2014, pp. 3342–3349 Interactive Techniques, SIGGRAPH ‘01, New York, NY, USA, 2001, pp.
[10] Abadpour, A., Kasaei, S.: ‘A fast and efficient fuzzy color transfer method’. 341–346
Proc. of the IEEE Symp. on Signal Processing and Information Technology, [45] Liu, J., Yang, W., Sun, X., et al.: ‘Photo stylistic brush: robust style transfer
Rome, Italy, 2004, pp. 491–494 via superpixel-based bipartite graph’, IEEE Trans. Multimed., 2018, 20, (7),
[11] Abadpour, A., Kasaei, S.: ‘An efficient PCA-based color transfer method’, J. pp. 1724–1737
Vis. Commun. Image Represent., 2007, 18, (1), pp. 15–34 [46] Chen, L., Papandreou, G., Kokkinos, I., et al.: ‘Deeplab: semantic image
[12] Kotera, H.: ‘A scene-referred color transfer for pleasant imaging on display’. segmentation with deep convolutional nets, atrous convolution, and fully
Proc. of the IEEE Int. Conf. on Image Processing, Genoa, Italy, 2005, pp. 5–8 connected CRFs’, IEEE Trans. Pattern Anal. Mach. Intell., 2018, 40, (4), pp.
[13] Pitié, F.: ‘Statistical signal processing techniques for visual post-production’. 834–848
Ph.D. thesis, University of Dublin, Trinity College, 2006 [47] Grogan, M., Dahyot, R.: ‘Robust registration of Gaussian mixtures for colour
[14] Trussell, H.J., Vrhel, M.J.: ‘Color correction using principle components’, transfer’. arXiv:1705.06091 [cs], arXiv: 1705.06091, 2017
vol. 1452 (SPIE, San Jose, California, USA, 1991), pp. 2–9 [48] Grogan, M., Dahyot, R.: ‘L2 divergence for robust colour transfer’, Comput.
[15] Gonzalez, R.C., Woods, R.E.: ‘Digital image processing’ (Addison Wesley, Vis. Image Underst., 2019, 181, pp. 39–49
Boston, USA, 1992) [49] Grogan, M., Dahyot, R., Smolic, A.: ‘User interaction for image recolouring
[16] Grundland, M., Dodgson, N.A.: ‘Color histogram specification by histogram using £ 2’. Proc. of the 14th European Conf. on Visual Media Production
warping’. Proc. of the SPIE Color Imaging X: Processing, Hardcopy, and (CVMP 2017), London, UK, 2017, p. 6
Applications, San Jose, California, USA, 2005, vol. 5667, pp. 610–624 [50] Silverman, B.W.: ‘Density estimation for statistics and data analysis’
[17] Evans, L.C.: ‘Partial differential equations and Monge-Kantorovich mass (Chapman and Hall, Boca Raton, FL, USA, 1986)
transfer’, Current Dev. Math., 1998, 1997, pp. 65–126 [51] Jeong, K., Jaynes, C.: ‘Object matching in disjoint cameras using a color
[18] Gangbo, W., McCann, R.: ‘The geometry of optimal transport’, Acta transfer approach’, Mach. Vis. Appl., 2008, 19, (5–6), pp. 443–455
Mathematica, 1996, 177, pp. 113–161 [52] Xiang, Y., Zou, B., Li, H.: ‘Selective color transfer with multi-source images’,
[19] Villani, C.: ‘Topics in optimal transportation’, vol. 58 of Graduate Studies in Pattern Recognit. Lett., 2009, 30, (7), pp. 682–689
Mathematics (American Mathematical Society, Providence, RI, 2003) [53] Xu, S., Zhang, Y., Zhang, S., et al.: ‘Uniform color transfer’. IEEE Int. Conf.
[20] Villani, C.: ‘Optimal transport: old and new’, vol. 338 (Springer Verlag, on Image Processing 2005, Québec city, Canada, 2005, vol. 3, p. II–940
Berlin-Heidelberg-New York-Tokyo, 2009), https://doi.org/ [54] Neumann, L., Neumann, A.: ‘Color style transfer techniques using hue,
10.1007/978-3-540-71050-9 lightness and saturation histogram matching’. Proc. of Computational
[21] Peyré, G., Cuturi, M.: ‘Computational optimal transport’. arXiv:1803.00567 Aestetics in Graphics, Visualization and Imaging, Girona, Spain, 2005b, pp.
[stat], arXiv: 1803.00567, 2018 111–122
[22] Rubner, Y., Tomasi, C., Guibas, L.J.: ‘The earth mover's distance as a metric [55] Papadakis, N., Provenzi, E., Caselles, V.: ‘A variational model for histogram
for image retrieval’, Int. J. Comput. Vis., 2000, 40, (2), pp. 99–121 transfer of color images’, IEEE Trans. Image Process., 2011, 20, (6), pp.
[23] Pele, O., Werman, M.: ‘Fast and robust earth mover's distances’. 2009 IEEE 1682–1695
12th Int. Conf. on Computer Vision, Kyoto, Japan, 2009, pp. 460–467 [56] Papadakis, N.: ‘Optimal transport for image processing’. Habilitation thesis,
[24] Hitchcock, F.L.: ‘The distribution of a product from several sources to Habilitation à diriger des recherches, Université de Bordeaux, 2015
numerous localities’, J. Math. Phys., 1941, 20, pp. 224–230 [57] Freedman, D., Kisilev, P.: ‘Object-to-object color transfer: optimal flows and
[25] Wu, N., Coppins, R.: ‘Linear programming and extensions’ (McGraw-Hill, SMSP transformations’. 2010 IEEE Computer Society Conf. on Computer
New York, 1981) Vision and Pattern Recognition, San Francisco, CA, USA, 2010, pp. 287–294
[26] Cuturi, M.: ‘Sinkhorn distances: lightspeed computation of optimal transport’. [58] Ferradans, S., Papadakis, N., Peyré, G., et al.: ‘Regularized discrete optimal
Proc. of the 26th Int. Conf. on Neural Information Processing Systems – transport’, SIAM J. Imag. Sci., 2014, 7, (3), pp. 1853–1882
Volume 2, NIPS'13, Lake Tahoe, USA, 2013, pp. 2292–2300 [59] Bae, S., Paris, S., Durand, F.: ‘Two-scale tone management for photographic
[27] Olkin, I., Pukelsheim, F.: ‘The distance between two random vectors with look’, ACM Trans. Graph. (Proc. ACM SIGGRAPH 2006), 2006, 25, (3), pp.
given dispersion matrices’, Linear Algebr. Appl., 1982, 48, pp. 257–263 637–645
[28] Brenier, Y.: ‘Polar factorization and monotone rearrangement of vector-valued [60] Li, Y., Sharan, L., Adelson, E.H.: ‘Compressing and companding high
functions’, Commun. Pure Appl. Math., 1991, 44, (4), pp. 375–417 dynamic range images with subband architectures’, ACM Trans. Graph.,
[29] Bonneel, N., Rabin, J., Peyré, G., et al.: ‘Sliced and radon wasserstein 2005, 24, (3), pp. 836–844
barycenters of measures’, J. Math. Imaging Vis., 2015, 51, (1), pp. 22–45 [61] Sunkavalli, K., Johnson, M.K., Matusik, W., et al.: ‘Multi-scale image
[30] Rabin, J., Peyré, G., Delon, J., et al.: ‘Wasserstein barycenter and its harmonization’, ACM Trans. Graph., 2010, 29, (4), pp. 125:1–125:10
application to texture mixing’. Scale Space and Variational Methods in [62] Gatys, L.A., Ecker, A.S., Bethge, M.: ‘Image style transfer using
Computer Vision, Ein-Gedi, Israel, 2012 (LNCS), pp. 435–446 convolutional neural networks’. 2016 IEEE Conf. on Computer Vision and
[31] Levin, A., Lischinski, D., Weiss, Y.: ‘A closed-form solution to natural image Pattern Recognition (CVPR), Las Vegas, Nevada, USA, 2016, pp. 2414–2423
matting’, IEEE Trans. Pattern Anal. Mach. Intell., 2008, 30, pp. 228–242 [63] Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-
[32] Pérez, P., Blake, A., Gangnet, M.: ‘Poisson image editing’, ACM Trans. scale image recognition’. Int. Conf. on Learning Representations, San Diego,
Graph. (SIGGRAPH'03), 2003, 22, (3), pp. 313–318 CA, USA, 2015
[33] Pitié, F.: ‘An alternative matting Laplacian’. 2016 IEEE Int. Conf. on Image [64] Li, Y., Wang, N., Liu, J., et al.: ‘Demystifying neural style transfer’. Proc. of
Processing (ICIP), Phoenix, Arizon, USA, 2016, pp. 3623–3627 the 26th Int. Joint Conf. on Artificial Intelligence, IJCAI'17, Melbourne,
[34] Gatys, L.A., Ecker, A.S., Bethge, M., et al.: ‘Controlling perceptual factors in Australia, 2017, pp. 2230–2236
neural style transfer’. 2017 IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), Las Vegas, Nevada, USA, 2016, pp. 3730–3738

322 IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322


This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)

You might also like