Professional Documents
Culture Documents
Advances in Colour Transfer: Francois Pitié
Advances in Colour Transfer: Francois Pitié
Francois Pitié1
1Electronic & Electrical Engineering Department, Trinity College Dublin, Ireland
E-mail: pitief@tcd.ie
Abstract: Colour grading is an essential step in movie post-production, which is done in the industry by experienced artists on
expensive edit hardware and software suites. This paper presents a review of the advances made to automate this process.
The review looks in particular at how the state-of-the-art in optimal transport and deep learning has advanced some of the
fundamental problems of colour transfer, and how far are we still from being able to automatically grade images.
1 Introduction transport (OT) [Morovic and Sun [3] (2003), Pitié et al. [4]
(2005)], that satisfying solutions for colour images were found.
Colour grading is the artistic process of changing the colour Most of the literature that followed is still based on or related to the
attributes of an image, through the adjustment of colour controls, mathematical framework of OT.
such as white balance, exposure, brightness, contrast or gamma. It Colour transfer does not, however, stop at just finding a
is an essential step in image editing and it is used at different points mapping that links two colour distributions. In practice, a number
in the movie production pipeline. It is first used to colour correct of issues need also to be addressed. In a series of papers on the
shots to reproduce accurately what was recorded. It is then used to topic, Pitié et al. [4–8] (2005, 2007, 2008) identified some of these
colour balance all the shots and establish continuity throughout the specific issues. One is that applying non-linear colour mapping can
movie. Colour grading can also be used later on towards achieving cause artefacts that need to be filtered out in post-processing steps.
specific practical effects, such as when filming nighttime scenes in Another is that any difference of content between the input and
daylight. Lastly, colour grading is used to establish a desired visual target images can cause issues during the colour transfer, and
artistic style. The task is to change the original colours, to match mitigating strategies must be put in place.
the director's desired colour scheme, whether it is the In this review, we study the recent advances in colour transfer,
monochromatic colour scheme of The Matrix, the variations of OT and deep learning for the application of colour grading images
pink in Grand Budapest Hotel or the complementary orange/blue and see how the literature has advanced on these issues.
colours of most action movies. Organisation of the review: we will first give some context to
Currently, in the industry, colour grading is achieved by the application and go through the core colour grading operations
experienced artists, called colourists, who use expensive edit used by the artists and see how these impact the colour
hardware and software suites to manually match the colour distributions (Section 2). The colour transfer problem is then
between the frames. Typical colour grading operations include introduced in Section 3 and linked to the theory of OT in Section 4.
adjusting the exposure, brightness and contrast, gammas, Recent advances will then be discussed in Sections 5 and 6, and
calibrating the white point or setting a colour mapping curve to assessed in Section 7.
non-linearly affect the colour channels. More complex colour
grading will require retouching specific areas of the images.
Typically the object of interest (actor's face or a car), would be 2 Background
masked out, tracked throughout the shot, and graded separately Colour correction usually refers to the adjustment of white and
from the rest of the scene. Grading then becomes a fairly complex black levels, exposure, contrast and white balance that are applied
operation and it would be therefore beneficial to automate this task to restore the image original and unprocessed colours. The aim of
in some way. colour correction is to provide a visually consistent baseline on
In the academic community, the problem of automating this which subsequent colour adjustments can be accurately applied.
process was first approached as an example-based colour transfer All footage and scenes are first carefully matched during that step
application. The idea, coined by Reinhard et al. [1] (2002), is to to create visual consistency throughout scenes.
find a colour mapping that, when applied to an input image, can Part of the need for colour correction used stems from colour
change the colour palette to match the one of a target image (see management issues. On the same production, footage may come
Fig. 1). Early works mainly looked at per-channel solutions [1, 2], from different cameras, each camera using its own native input
but it is only when the problem was linked to the theory of optimal colour space (e.g. Red Log Film, BMD Film Log, S-Log 2 etc.).
Also, various footages may be sent around to different post-
production teams, which may apply their own colour corrections or
might be working in different colour spaces. Still, the final output
has to look the same on any display, let it be a TV display or
cinema display. Adopting consistent colour management practices
has been a big challenge for the industry, and to standardise the
colour management workflows for production and post-production,
the Academy of Motion Picture Arts and Sciences introduced in
2014 the Academy Colour Encoding System (ACES), which aimed
Fig. 1 Colour transfer example. The source image is recoloured to match at becoming the industry standard for managing colour across the
the colour palette of a target image entire film making process. Typically artists will start from
predefined profiles and then work their way to match a precise where A is the 3 × 3 matrix of all cross channels linear
colour palette. dependencies. For instance, colour desaturation can be expressed
as:
2.1 Basic colour grading controls
(1 − s)0.2 + s (1 − s)0.7 (1 − s)0.1 ur
The basic colour grading operations available to colourists are
usually presented to the artists through colour wheels interfaces t(u) = (1 − s)0.2 (1 − s)0.7 + s (1 − s)0.1 ug (4)
(see Fig. 2), which help the artist set the lift, gain and gamma for (1 − s)0.2 (1 − s)0.7 (1 − s)0.1 + s ub
each colour channel. Histograms for each colour channels give the
artist useful feedback on their grading operation. Assuming where s ∈ [0, 2] controls the amount of colour saturation.
floating-point bitdepth, the basic colour grading operation is Curves: Colourists may need to go beyond these basic controls
defined as follows: and manually define one-dimensional (1D) piece-wise polynomial
colour mappings by directly specifying spline control points (see
t: [0, 1]3 → [0, 1]3 Fig. 2 bottom).
γr Colour lookup table (LUT): The combined effect of all
ur vr tr(ur) = arur + br successive grading operations can be baked into a single LUT. The
γg (1)
ug ↦ vg = tg(ug) = agug + bg . LUT can be 1D if the colour manipulations are done independently
on each channel. The LUT needs to be 3D if the cross-channels’
ub vb tb(ub) = abub + bb γb
operations are involved. LUT are commonly distributed to achieve
colour space conversion or to apply a specific grading profile.
The offsets br, bg, and bb are sometimes called lifts and control the Secondary and localised grading: Following a global colour
overall brightness for each channel. grade, the colourists may need to proceed to a secondary grade,
Gamma: The gamma values γr, γg, and γb can be used to convert which is applied to selected areas of the image. For instance, the
to display gamma, which is usually 2.2, but they can also be used specific grade could be applied to the yellow pixels only.
to control the midtones. Alternatively, the grade may need to be applied to a particular
Gain: The gain values ar, ag, and ab can be used to control the object, or person. In that case, the artist would need to pull a mask
of the object and track throughout the video.
colour tint, temperature, and exposure. For instance, the exposure
can be changed as follows:
2.2 Colour grading and colour statistics
exposure
t(u) = 2 × u. (2) There is a key relationship between histograms, brightness,
contrast and the colour grading that is applied. Colourists have
The white balance/tint can be changed by setting the gains to some intuition for it, as they monitor the effect of the colour grade
[ar, ag, ab] = [1/wr, 1/wg, 1/wb], where w = [wr, wg, wb] defines the on the histograms. Let us detail the mathematics of this
red, green, and blue (RGB) values of white. Typically the artist relationship.
would get these values by sampling in the image a pixel that should Assume that we apply an affine colour mapping
be white (using for instance a calibration card). u ↦ v = t(u) = au + b on the luminance alone. The mean intensity
The gain, lift, and gamma parameters are interdependent but it and variance are directly affected by the transform as follows:
may be easier to think that lift primarily affects the shadows, whilst
gamma affects the midtones, and gain affects the highlights. var(v) = var(t(u)) = a2var(u) (5)
Colour matrix: Cross channels interactions can also be defined
through a simple affine model: mean(v) = mean(t(u)) = a × mean(u) + b . (6)
t(u) = Au + b (3) What we learn from this is that changing the lift b directly affects
the overall brightness of the image. Conversely, changing the gain
a affects the variance and thus the overall contrast of the image.
2.3 Remarks
At this point, we can already make two remarks.
Firstly, colour statistics can be gathered without taking notice of
the exact locations of the sampled colours. The colour mean,
variance, and histograms are all relatively insensitive to motion and
stay similar between consecutive frames. This is interesting as
colour matching is usually done between pictures with similar
content but not pixel identical (e.g. two frames from a sequence,
two images of a similar cityscape).
Secondly, extremely stretched colour mappings do have an
impact on the image noise. Noise is always present on images, due
to a combination of sensor thermal noise, debayer algorithms and
subsequent compressions artefacts and applying stretched colour
mappings may enhance the noise artefacts. For instance, it is well
known by colour artists that noisy low light images are difficult to
grade, as increasing the exposure has the effect of increasing the
Fig. 4 OT assignment for discrete distributions noise accordingly (6).
3 Colour transfer
We are now interested in the inverse problem of automatically
finding the colour grading operations to apply to a picture, to
match the colour grade of another one. This is the colour transfer
problem, as first coined by Reinhard et al. [1] (2002).
If we denote as ui the colour samples of the original image and
vi the colour samples of the target palette image, the colour palette
of the original and target pictures correspond then to the
distributions/histograms hu and hv. The problem is to find a colour
mapping u ↦ t(u) such that the new colour distribution ht(u)
matches the target palette hv.
t: ui → vσ(i) . (16)
1/2 ⊤ ⊤
Σ1/2
u = PuDu Pu with Σu = PuDuPu the spectral decomposition of E(σ) = ∑ c ui, vσ i () (17)
Σu. i
Note that the estimated mapping is not necessarily the same
mapping like the one that would be found using least squares with where c(u, v) is the transportation cost function, which is typically
matching pixels. The intension is here to match the target palette, c(u, v) = ∥ u − v ∥2 or c(u, v) = ∥ u − v ∥.
not the exact transformation of each pixel. In fact, many mappings For continuous distributions, the Monge solution minimises the
can achieve the transfer of colour distributions. following objective function:
Fig. 9 Results of colour grading post-processing using Pitié et al. [6] and Levin et al. [31]
method is computationally very efficient and can deal with a very 5.1 Gradient matching
large number of data points and the distribution does not need to be
segmented into clusters, while being very close to the OT solution This problem was first recognised by Pitié et al. [4, 6] (2005). The
[6]. proposed solution was inspired by the work of Pérez et al. [32]
(2003) on Poisson image editing. The idea is to loosely match the
gradient field of the graded picture to the one of the original
5 Colour grading post-processing picture. That way the structure of the original image is preserved.
Figs. 8 and 9 show that applying colour mapping may produce We present here a slightly simplified version of the energy
some grainy artefacts. The artefacts are caused by a stretch of the optimised in [4]:
colour mapping as any slight variations in the original image
become magnified. As discussed in Section 2.2, the image variance
is affected by the stretch as var(au) = a2var(u). Stretches in the
E(W) = ∫ ∫ ∥ ∇W − ∇t(U) ∥ + λ∥ V − W ∥ dx dy .
Ω
2 2
(24)
colour map can be due to large differences in the content of the
images (see Fig. 10) or simply because the desired transfer is The term ∥ ∇W − ∇U ∥2 forces the input image gradient to be
actually extreme. It is often difficult to fully address this problem preserved and ∥ W − t(U) ∥2 preserves the constructed colour
before applying the transfer. mapping (for simplicity the spatial indices x, y have been omitted).
Fig. 11 Estimation of Levin et al. [31]’s local gain maps arr, arg, arb, agr, agg, agb, abr, abg, abb and offset maps br, bg, bb for the image pairs of Fig. 8, with the
explicit approach of Pitié [33]. The gain and offset maps are shown as a colour picture (e.g. red = arr /5 + 0.5, green = arg /5 + 0.5, and blue = arb /5 + 0.5)
The original paper [4] also include weight fields which control 5.2 Matting Laplacian
where the gradient does not need to match the input image. The
energy can be rewritten as a quadratic function: To be able to change the contrast, another approach is to make
local approximations of the colour grading with affine colour
transforms. As affine transforms have a constant stretch
E(W) = tr W ⊤LW + ∥ V − W ∥2 (25)
detJt(u) = Σv 1/2 / Σu 1/2, they do not produce visible artefacts
(besides a possible increase of the noise). By only making local
where the sparse matrix L is the Laplacian matrix. affine approximations, the impact on the overall colour distribution
One issue with this approach is that by fixing the gradient is minimum. Ideally, each local affine transform should be
values, we are essentially assuming that the mapping is locally a consistent with the neighbouring affine transformations. We can
simple offset t(u) = u + b. This means that, locally, the image formalise this using the Matting Laplacian framework of Levin et
contrast stays the same, regardless of t(u). We can see on the last al. [31] (2008). The idea is to jointly optimise for the local affine
row of Fig. 9 that the portrait looks slightly blurred. The reason is parameters Ai, bi at each pixel i and the filtered image W:
that the overall contrast is not coherent with the local contrast.
Fig. 12 Example when the OT solution differs from the ground truth mapping. On the right the original image, in the middle the desaturated image following
(4) with s = 1. On the right, the results of the OT solution when using the middle image as a target. Brightness values do differ. At the locations highlighted in
red and green, the grey values are 0.4863 and 0.465 in the ground truth but 0.4756 and 0.4471 in the OT solution
Fig. 14 Style transfer results (from [62]). The stylisation effect on the top row is impressive. By increasing the content fidelity λ, bottom row, a more realistic
colour grading can be obtained. Stylisation artefacts, however, can still be visible
could possibly account for complex shadows and intricate lighting This means that colour transfer may need to operate in a
effects. different space than the colour space, as colour alone is not enough
An example of this problem is demonstrated in Fig. 13, where to identify how a pixel should be recoloured.
we try to match the first image of the sequence to photographs
taken later during the day. All the pixel locations are aligned and 6.2 Guiding the colour assignments
we can directly estimate local affine transforms between both
images. To do this we use Levin's Matting Laplacian framework. In this section, we look at the solutions proposed in the literature to
Even in this best case scenario, where we can find correspondences guide the colour transfer and deal with colour mismatch and
for each pixel, it is clear that the quality of the grade deteriorates content differences between images.
with the day-to-night transition. What is happening is that there is
no affine colour transform that can predict how pixels will be 6.2.1 Pixel correspondences: One approach to avoid
illuminated at night. misassignments is to establish pixel correspondences between the
input and target images and use these to constrain the colour
term. They also introduce semantic segmentation masks to guide A few colour transfer examples are shown in Figs. 15–21. Luan
the colour transfer. et al.'s technique is labelled as DPST. What is remarkable is that
complex colour inversions are possible. For instance, in Fig. 16-⩾ using an encoder–decoder architecture (see Fig. 22). The encoder is
and Fig. 18-⑳, DPST can selectively recolour the buildings at night VGG-19 and the decoder learns to reconstruct the input image.
time. The idea is then to start from the input image U, compute its
One notable issue with the approach is that generating a picture deep features ϕ(U), apply a simple linear pdf transfer on the deep
requires a costly optimisation. Also, gradient descent is not feature distributions and then come back to the image domain
guaranteed to reach the global minimum. Looking at the results, we using the decoder network.
find that strong re-colouring choices are being made and colours
tend to be too intense. We suspect that this is because of a poor 7 Results
convergence of the gradient descent optimisation. Examples
showing optimisation failures can be seen in Fig. 20-37 or The difficulty with evaluating colour transfer approaches is the
Fig. 19-44. lack of accepted objective benchmark on which the research
community could compare their results. Part of the reason is that
6.5 Using an encoder–decoder architecture for a closed-form any evaluation requires an artistic assessment of the transfer. In
solution fact, recent works do start to include user studies to report user
preference scores.
To avoid this slow optimisation framework, Li et al. [35] have In this review, we are mainly interested in the recent advances
trained a decoder network that, given some deep features φ, can in colour transfer that address complex colour style transfers for
predict a pre-image X such that ϕ(X) = φ. The network is trained images. We are thus not going to review colour corrections that
only necessitate global affine or gamma corrections. We will The technique operates by matching the 50 most important
instead consider example-based colour style transfer applications. colour clusters in both images. It is particularly faithful when
We selected a number of challenging input/target image pairs from assigning colour matches (see Fig. 15-② for instance). The cluster
the list of 60 image pairs provided in [36]. The images are size equalisation makes it however prone to mismatch (see for
presented in Figs. 15–21. The circled numbers on the figures instance the colour inversion in Fig. 17-⑬). Note that Grogan and
correspond to the pair index in that dataset. Dahyot [48] are interested in an interactive application, thus it is
The list of evaluated methods and their label in the figures are expected that the colourist will manually fix these incorrect
detailed in Table 1. The range of methods covers most of the assignments [48].
significant advances and approaches in the colour transfer. We One other issue is that TPS is too smooth to be able to capture
indicate with the superscript ∗ when the method is followed by the changes across regions that have close colours (see for instance
Matting Laplacian filter as a post-process. Fig. 16-⩾, where the sky and the building are mapped together,
IDT (Pitié et al. [6], 2005), is a reference in the field and is our instead of separately).
baseline approximation of the OT solution. OTreg (Ferrandans et al. [58], 2013). The method is based on a
GMM (Grogan and Dahyot [48], 2019). The method takes a regularised OT framework. The framework relaxes the constraint
Gaussian mixture model approximation of the colour distribution that cluster sizes need to match and also adds a penalty to preserve
and also makes an TPS approximation for the colour mapping. The the image structure.
technique also equalises the colour cluster sizes, as discussed in The method operates on colour clusters, which can cause some
Section 6.2.4. banding artefacts (see Fig. 23), as, contrary to Grogan and Dahyot
[48], there is no implicit smoothness imposed on the transform. PWCT (Li et al. [35], 2018). Closed-form deep colour transfer
Note however that this can be easily addressed by using the approach (see Section 6.5).
Matting Laplacian filter as a post-process (OTreg*). The encoder–decoder structure allows for a fast colour transfer.
For most cases, the results, after filtering, are very similar to the However, the raw results are very noisy (see Fig. 24). Once the
baseline IDT approach of Pitié et al. [6] et al. images have been post-processed (see PWCT*), the results are quite
MKL (Pitié and Kokaram [8], 2007). This is the linear OT good and differ in interesting ways from the baseline OT solutions.
solution. The colour distributions are approximated as multivariate For instance, contrary to GMM, OTreg, MKL, IDT, the method can, to
Gaussians. some extend, deal with day-to-night transitions (see the bridge in
The results are similar to the baseline IDT approach of Pitié et Fig. 16-⩾,the house in Fig. 18-⑳ and the sky in Fig. 16-⩽ and
al. [6] but they usually lack the contrast or the colour accuracy of Fig. 18-24).
non-linear transforms. The method relies, however, heavily on the Matting Laplacian
DPST (Luan et al. [36], 2017). The method is the deep photo filter and noisy artefacts still show through after filtering (see the
style transfer approach discussed in Section 6.4. The method gives textured sky in Fig. 15-② for instance).
impressive results on some difficult image pairs, especially for day- Semantic labelling. In addition to the state-of-the-art, we also
to-night transitions (see Fig. 16-⩾ and Fig. 18-⑳). The wanted to see how providing semantic labels improves the quality
convergence of the method is an issue, however, and strong of the colour transfer. We modified the code for (IDT), (MKL) and
artefacts can be seen in Fig. 20-37, Fig. 19-44. (regOT) to also operate on semantic masks. We indicate that a
method relies on semantic masks with the subscript s. For instance,
in IDTs∗, we apply IDT independently on each of the semantic labels because, precise, dense and complete pixel correspondence maps
and then post-processed the composite image using the Matting are simply never available. Abstracting the analysis to distributions
Laplacian filter. allows us to extend solutions to all pixels in the image. Two of the
Most of the day-to-night transfers can be suitably achieved if long-standing issues with OT methods are that: (i) due to content
semantic masks are given, even when using the baselines IDT and changes, distributions rarely match; and (ii) monotonic OT
MKL. Also, potential issues related to content changes disappear solutions cannot cope with the complex transfer, like in day-to-
(see for instance Fig. 17-⑮), and colour inversions are possible night transitions.
(see Fig. 20-49 and Fig. 20-50). New artefacts can, however, be The addition of semantic masks seems to make an instant
produced if the masks are incorrect (e.g. the sky in Fig. 21-59 is improvement to all OT methods. It is a very efficient and affordable
split into different segments, creating the white band on the right). way of dealing with content mismatches and complex localised
Also, providing semantic masks nullifies the need for grading. The advantages are also that semantic masks can easily be
regularising the OT solution or to equalise the cluster weights. edited by the colourist and integrated into their workflow. When
Indeed the regularisation of OTRegs∗ acts as a bias in the using semantic masks, it seems to be advantageous to stick to
simple, unbiased, methods (e.g. IDT [6], MKL [8]) as making
estimation, and in most situations, the simple IDT s∗ is superior (e.g. assumptions about the shape of the transform and colour
the colours are better reproduced in Fig. 20-50, Fig. 15-① or distributions lead to biases in the estimate and can cause failure in
Fig. 15-②). edge cases.
The recent advances in deep learning offer the most promising
8 Discussion and conclusion avenue to go beyond colour mapping. By working directly in the
deep feature space, complex transforms become possible. The
As we have seen in this review, OT is still the cornerstone of the
approach is still relatively novel, and there is scope to address a
colour transfer applications. It is almost always necessary,
number of issues. For instance, if the optimisation scheme of Luan
318 IET Comput. Vis., 2020, Vol. 14 Iss. 6, pp. 304-322
This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Fig. 20 Colour grading results for the image pairs 37, 49, 50 of [36], see Table 1 for the list of methods
et al. [36] is not practical, the encoder–decoder approach of Li et Laplacian cannot work for day-to-night transitions or other
al. [35] is quite imprecise and relies heavily on the post-processing complex transfers. At its core, the Matting Laplacian assumes that
filter. A better approach is still required. the colour transfer can be explained by a simple local one-to-one
Lastly, Levin's Matting Laplacian regularisation can be very colour mapping. This is however not always the case. The solution
effective for suppressing colour grading artefacts.. We propose that to go beyond the Matting Laplacian will probably come from
explicitly estimating its local affine parameter fields [33] can neural network approaches, as they seem to hold the key to
provide colourists a simple way of editing non-global colour complex image manipulations.
transfer maps. Yet, as we have seen in this review, the Matting
Fig. 23 Banding artefacts can arise when working with clusters, such as in Ferrandans et al. [58] (2013)
Fig. 24 Results of the encoder–decoder-based colour transfer of Li et al. [35] (2018), prior to post-processing. Severe artefacts need to be smoothed out by
the Matting Laplacian filter