You are on page 1of 15

University of Ljubljana

Faculty of Mathematics and Physics

Department of Physics

Seminar Ib - 1. year, 2nd grade

Eulerian Video Magnification

Author: Tilen Brecelj


Mentor: doc. dr. Daniel Svenšek
Ljubljana, April 2013

Abstract

The seminar talks about Eulerian Video Magnification, which is a computational


method, that reveals subtle temporal motions and colour changes in videos, that are im-
possible or very difficult to see with the naked eye. It presents, how this method analyses
the video sequence by performing spatial decomposition and temporal filtering to video
frames and amplifies the perceived motions or colour changes, that would otherwise re-
main unseen.
Contents

1 Introduction 2

2 Human Eye and Cameras Spatial resolution 2

3 The basic idea 3

4 Mathematical basis 6
4.1 Signal amplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 Amplification bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 Filter and parameters selection 10

6 Sensitivity to noise 12

7 Conclusion 13

References 14

1
1 Introduction

Eulerian Video Magnification [1] is a computational method, which reveals subtle tem-
poral motions and colour changes in videos, that are impossible or very difficult to see
with the naked eye. This method analyses the video sequence by performing spatial de-
composition and temporal filtering, which reveals changes in specific temporal intervals,
to video frames. The perceived subtle motions or colour changes are then amplified and
displayed in a way, to be easily seen with a naked eye. Subtle (almost) invisible motion can
give us a lot of important information. For example human skin slightly changes colour
with heartbeats and blood circulation, which cannot be seen with a naked eye, but can
serve to extract pulse rate or diagnose an irregularity in blood flow. There are also a lot
of interesting motions with very low spatial amplitudes and as such cannot be seen with a
naked eye, but can reveal a lot about certain mechanisms or mechanical behaviours that
can serve in different fields of research such as medicine, biology, engineering etc.

2 Human Eye and Cameras Spatial resolution

To get an idea of why something cannot be seen with a naked eye, but can be perceived by
a camera, let us take a look at the capabilities of a human eye and cameras to distinguish
subtle motions. Let us compare their spatial resolutions, which describes the ability of
an image device or organ, to distinguish between two points, that are located at a small
angular distance - the smaller is the angular distance, the higher is the spatial resolution
and more details can be seen. For describing spatial resolution mathematically, we can
use the Rayleigh’s Criterion. When light enters a lens, a diffraction pattern is produced.
The Rayleigh’s Criterion defines the minimal distance between diffraction patterns of two
distinguishable objects as the distance, when the diffraction minimum of one source point
coincides with the diffraction maximum of the other source point. According to Rayleigh’s
Criterion, we can calculate the minimal angular distance Θ, which must separate two
distinguishable points of an object as
λ
Θ = 1.22 , (1)
D
where λ is the wavelength of the light and D is the diameter of the light gathering
region - eye pupils or camera aperture. In our case the wavelength will bi taken as λ = 550
nm, because it is the approximate wavelength of the human eye’s greatest sensitivity.
The eye pupils’ diameter depends on the brightness of the environment and varies from
3 to 9 mm, but for our calculations will be used the smallest diameter possible, when our
angular resolution is the smallest, so Dh = 3 mm. When inserting these two parameters in
equation 1 and presuming the best observational conditions and a very good sight, we get
humans’ spatial resolution, Θh = 2.24 × 10−4 rad or 1.28 × 10−2 deg which is approximately
46”. For a better illustration, this would be the angular distance between two points that
are 1 mm apart and cca 4.46 m away from our eye. The properties of human eye light
perceptive cells have no influence to the spatial resolution. The spatial resolution on the
human light perceptive cells can be expressed as ∆l = Θf , where f is the focal length of

2
the eye and is approximately 22 mm. Inserting the angular resolution calculated above
and the focal length, we get ∆l ≈ 5 × 10−6 m. In contrast, the size of a light perceptive cell
is about 2 × 10−6 m, so enough small to perceive the angular resolution as a consequence
of eye pupils’ diameter.
Let us now take a look at cameras spatial resolution, that can also depend on their
detector properties. That is why we have to discuss spatial resolutions on cameras detec-
tors, ∆l, as a consequence of cameras aperture size and camera sensor pixel sizes. Spatial
resolution on cameras detectors is expressed as ∆l = Θf , where f is the focal length of
the camera. Inserting the equation 1, we get

∆l = 1, 22 (2)
D

where f /D is a typical property of cameras called focal ratio (or relative aperture) and
varies for middle class cameras from 1.4 to 22. We will take the smallest ratio for our
calculations, to get the minimal distance ∆l (and as such, the greatest spatial resolution).
Inserting these numbers in equation 2, we get ∆l = 0.94 µm. This means that a typical
detector with pixel size approximately 5 µm, would not be able to distinguish between
these two points and that middle class cameras’ resolutions are not determined by the
diameter of their aperture, but by the pixel sizes of their detectors. Let us assume, that
the distance between two photons on the detector, ∆ldet , to be perceived as photons from
different sources, must be two pixel sizes, so ∆ldet ≈ 10 µm, for an average middle class
camera. This way, they can be detected by two pixels, that lay one pixel apart. Knowing
the value of a typical focal length, f , of a middle class camera, which is about 15 mm, we
can calculate the spatial resolution of an average camera, determined by the pixel size as
Θ = ∆ldet /f . This estimate leads to the result, Θ ≈ 6.7 × 10−4 rad or 3.8 × 10−2 deg which
is approximately 2’17”. This means that the Eulerian video magnification algorithm does
not base on cameras capabilities to reveal unseen motion, but on other principles.
As detection of motion, colour detection does not base on the capabilities of the camera
either (the signal to be revealed has frequently lower changes of amplitudes than the noise),
so the colour magnification algorithm also bases on other principles.

3 The basic idea

The basic approach of Eulerian Video Magnification for colour magnification is to amplify
the variation of colour values at any spatial location (pixel) in a temporal frequency band,
that suits a certain phenomenon. For example, by amplifying the skin colour in a band
of temporal frequencies, that include plausible human heart rates, reveals the variation in
redness as blood flows through the face - figure 1.

3
Figure 1: An example of how Eulerian Video Magnification amplifies subtle colour changes
as a consequence of heart bits and blood flow. In (a) we can see four frames of the original
input video sequence; in (b) we can see the same frames as in (a) with amplified colours
by a factor α = 100; in (c) we can see a vertical scan line from the original video (top) and
from the amplified video (bottom), plotted over time - the colour variation in the bottom
figure can be very well seen. Source: [1]

As mentioned before, Eulerian Video Magnification can also reveal low-amplitude mo-
tion. Figure 2 shows how subtle motion of a blood vessel can be magnified.

Figure 2: This two figures show how Eulerian Video Magnification can amplify subtle mo-
tions of blood vessels arising from blood flow - to reduce motion magnification of irrelevant
objects, only the area around the wrist was amplified. Figure (a) shows how a slice of
the wrist evolves in time; figure (b) shows the same slice, after motion was magnified by a
factor α = 10, evolving in time. The temporal filter was tuned to a frequency band that
includes the heart rate — 0.88 Hz (53 bpm). Source: [1]

The technique used to extract and reveal the wanted signal is localised spatial pool-

4
ing1 and bandpass filtering2 . The whole process is schematically shown on figure 3. To
obtain a good quality of the output video in a reasonable time period, before analysing,
the videos should be downsampled and filtered by a spatial lowpass filter (Laplacian pyra-
mid3 ), to reduce the noise and to boost the subtle changes in the video. Than the video
is decomposed into different bands of wavenumbers k (k = 2π λ
). After spatial processing,
temporal processing is performed on each band of wavenumbers. A bandpass temporal
filter is applied to each band of wavenumbers, to extract (pass) the motions, that suit the
frequency bands of the observed phenomenon. Motions of different passed wavenumbers
are then magnified differently because of two reasons: firstly, they might have different
signal-to-noise ratios and secondly, they might contain wavenumbers for which the linear
approximation used in motion magnification does not hold. In the second case, the mag-
nification is reduced to suppress artifacts. After the magnification, magnified bandpassed
signals are added to the original signal.

Figure 3: Eulerian video magnification framework. The video is firstly decomposed into
different bands of wavenumbers. The same temporal filter is than applied to all bands of
wavenumbers, to reveal the time interval of the motion and the motion of each band of
wavenumbers. Than a passband filter is applied to pass the bands of wavenumbers that
suit the time interval of the observed phenomenon. The filtered bands of wavenumbers
are then amplified by a given factor α, added back to the original signal and collapsed to
generate the output video. Source: [1]

1
Spatial pooling identifies frequently observed patterns and memorizes them as coincidences. Patterns
that are significantly similar to each other are treated as the same coincidence. [8]
2
A band-pass filter is a filter, that passes frequencies within a certain range and rejects (attenuates)
frequencies outside that range. [9]
3
Laplacian pyramid is a set of band pass filters, used while pooling pixels into one. [10]

5
4 Mathematical basis

4.1 Signal amplification

Let us now describe mathematically how Eulerian motion magnification works for trans-
lational motion in one dimension (this analysis can be than generalized directly to two
dimensions). The image intensity function I(x, t) determines the image intensity at the
location x at time t. After translational motion, lasting t, the initial image intensity func-
tion I(x, 0) = f (x) evolves in I(x, t) = f (x + δ(t)), where δ(t) is the displacement function.
The goal of the Eulerian video magnification is to get the magnified intensity function
ˆ t) = f (x + (1 + α)δ(t)),
I(x, (3)

with the magnification factor α. If the image intensity function can be approximated
by a first order Taylor series expansion about x, we can express it at time t as
∂f (x)
I(x, t) ≈ f (x) + δ(t) . (4)
∂x

When we apply a broadband temporal bandpass filter to the image intensity function
I(x, t), over the whole area of interest, we get the intensity variation function B(x, t), which
expresses the change of the image intensity over the whole area of interest, after time t. In
other words, by applying a broadband temporal bandpass filter, we pick form equation 4
everything except f (x). If the displacement δ(t) is within the passband4 of the broadband
temporal bandpass filter, we can express the intensity variation function B(x, t) as
∂f (x)
B(x, t) = δ(t) (5)
∂x

˜ t), all we have to do is sum the original


To get the amplified intensity function I(x,
intensity function I(x, t) and the intensity variation function B(x, t), amplified by factor
α, as
˜ t) = I(x, t) + αB(x, t).
I(x, (6)

After inserting equations 4, 5 and 6, we can express our amplified intensity function as

˜ t) ≈ f (x) + (1 + α)δ(t) ∂f (x) .


I(x, (7)
∂x

Finally, if the first order Taylor expansion of the temporally bandpassed image intensity
˜ t) is a good approximation of the magnified intensity function I(x,
function I(x, ˆ t), the
magnified motion is
˜ t) ≈ f (x + (1 + α)δ(t)).
I(x, (8)

This means that the spatial displacement δ(t) of the initial image, denoted by the
intensity function f (x), after time t, has been amplified to the displacement (1 + α)δ(t).
4
A passband is the range of frequencies that can pass through a filter without being attenuated [11]

6
On figure 4 we can see, how this method works in practice for a single sinusoid and how
the amplified signal looks like. We can notice, that the approximated signal with the first
order Taylor series expansion at time t + 1 matches very well with the input signal at the
same time.

Figure 4: The figure above shows, how can spatial translation be approximated and am-
plified in one dimension (it can be equally applied to two dimensions). The input sig-
nal is shown at times t as I(x, t) = f (x) and t + 1 as I(x, t + 1) = f (x + δ), the ap-
proximated signal with the first order Taylor series expansion is shown at time t + 1 as
˜ t) ≈ f (x) + δ(t) ∂f (x) and the magnified approximation of the signal at time t + 1 is
I(x, ∂x
˜ t) ≈ f (x) + (1 + α)B(x, t), for α = 1. The temporal filter used to get B(x, t)
shown as I(x,
is a finite difference filter, that subtracts the curves f (x) and f (x + δ). Source: [1]

All of the above assumptions hold in practice for smooth images (i.e. images with slow
changes in contrast) and small motions. The inaccuracy of the magnification increases
with the magnification factor α and the displacement function δ(t). This is illustrated in
figure 5.

7
Figure 5: The figures above illustrate motion amplification on a one dimensional signal for
different wavenumbers and magnification factors α. For the images on the left side, λ = 2π
and δ(1) = π8 , for the images on the right side λ = π and δ(1) = π8 . Figure (a) shows
the true displacement of I(x, 0) from equation 3 at time t = 1, for different amplification
factors α, denoted with different colours. Figure (b) shows the amplified displacement
produced by the filter from equation 7. Different amplification factors α have the same
colours as in figure (a). Referencing equation 13, the red curves of each plot correspond
to (1 + α)δ(t) = λ4 for the left plot and (1 + α)δ(t) = λ2 for the right plot, showing the
artifacts introduced in the motion magnification from exceeding the bound on (1 + α) by
factors 2 and 4, respectively. Source: [1]

4.2 Amplification bounds

Let us now derive the maximum value of α as a function of wavenumber k for a given
observed motion δ(t), so that the gained amplification error would not be too large. Let
˜ t) (equation 7) and
us start with the approximate equalisation of the processed signal I(x,
ˆ t) (equation 3)
the true magnified motion I(x,

∂f (x)
f (x) + (1 + α)δ(t) ≈ f (x + (1 + α)δ(t)). (9)
∂x

Let us take for example f (x) = cos(kx) and denote β = 1 + α. We get

cos(kx) − βkδ(t) sin(kx) = cos(kx) cos(βkδ(t)) − sin(kx) sin(βkδ(t)). (10)

8
Hence, the following should hold

cos(βkδ(t)) ≈ 1 (11)

sin(βkδ(t)) ≈ βkδ(t) (12)

If we want for example our small angle approximation to hold within 10%, the following
values should be βkδ(t) ≤ π4 (as sin( π4 ) = 0.9 π4 ). Denoted with the spatial wavelength of
the moving signal λ = 2π
k
, this gives

λ
(1 + α)δ(t) ≤ . (13)
8

This way we get the largest motion magnification factor α, that magnifies motion
for the given accuracy. On figure 6 we can see, how the error increases with increasing
magnification factor α and displacement function δ(t).

Figure 6: Motion magnification error as function of wavelength, computed as L1 -norm


(distance) between the true motion amplified signal (figure 5(a)) and the temporally filtered
result (figure 5(b)): (a) for different values of δ(t), where α = 1; (b) for different values of
α, where δ(t) = 2. The triangular marks on each curve represent the cutoff point derived
in equation 13. Source: [1]

In practice, the amplification factor α is fixed (at an optional value, as we define it)
for spatial bands that are within the bounds of equation 13 and is attenuated linearly for
higher wavenumbers, as shown in figure 7. This way we get rid of the undesirable artifacts
that would arise for high wavenumbers, seen on figure 6(b).

9
Figure 7: Amplification factor α as function of spatial wavelength λ. The amplification
factor is fixed for spatial bands that are within the derived bound in equation 13 and is
attenuated linearly for higher wavenumbers. Source: [1]

5 Filter and parameters selection

To enable the amplification of motion from a video, the user has to select some of the
basic parameters of the Eulerian video magnification algorithm.
Firstly, a temporal bandpass filter that suits the observed phenomenon must be selected,
to extract the desired motions or signals - the importance of the coincidence of the temporal
filter passband and the observed phenomenon temporal interval can be seen on figure 8.

Figure 8: (a) shows a frame from a video, on which blobs oscillate at different temporal
frequencies, as noted under each blob. Figure (b) shows the spatio-temporal slices from
the magnified motion, after an ideal temporal bandpass filter of 1 − 3 Hz was applied, to
amplify only the motions occurring within the specified passband. Source: [1]

There exist many different temporal filters that can be used, depending of the signal
to be extracted. For broad but subtle motion magnification, temporal filters with a broad

10
passband are preferred. For colour amplification, for example of blood flow, narrow tempo-
ral passband filters are preferred, to produce a more noise free result. Motions at specific
temporal frequencies (such as colour amplification) demands ideal temporal bandpass fil-
ters with sharp cut off frequencies. There are also some other types of filters, for example
IIR filters5 , that are used for both, motion and colour magnification. On figure 9 we can
see some types of temporal filters.

Figure 9: Different types of temporal filters. (a) and (b) show ideal filters with sharp cut off
frequencies, (c) shows a so called Butterworth filter, that has as flat a frequency response
as possible in the passband [13] and (d) shows a so called second-order IIR filter with a
broad frequency passband. Under each image there is a note in brackets that specifies on
which video was each filter used. All of the videos with the same names as noted in the
brackets can be found on [3]. Some sequences from these videos can also be seen on figure
1 - face and figure 2 - pulse detection. Source: [1]

Secondly, the user must select the amplification factor α and a wavenumber cut off
(specified by spatial wavelength λc ) beyond which an attenuated version of α is used - it
can be either forced to zero for all λ < λc (this was used for the human pulse amplification,
shown in figure 1), or linearly scaled down to zero. Note that equation 13 is an example
derived for a sine wave for a small angle approximation that holds within 10% and is not
a principle for how α and λc are connected and can only be used as a guide. Various
combinations of α and λc can be used, also those that violate the bounds to exaggerate
the specific motions or colour changes at the cost of increasing noise or introducing more
artifacts.
5
IIR filters - infinite impulse response filters are filters, with an infinite impulse response function, that
is non-zero over an infinite range of frequencies [12]

11
6 Sensitivity to noise

To extract as many information from a video as possible, this should be filmed in the best
conditions, i.e. in an enough bright environment and with a low ISO value, to produce as
minimal noise as possible. But even so, Eulerian video magnification can amplify signals,
that have smaller magnitude variations than the noise inherent in the video. Let us assume
that the noise is zero mean white and equally distributed all over the image. With suitable
spatial low pass filters, the subtle signals can be enhanced over the area of filtering. By
computing the sum x of N pixels as x = N x0 , where x0 is the signal, and the sum of noise
power σ 2 over N pixel values as σ 2 = N σ02 , where σ02 is the average noise power of a pixel,
we can increase the signal to noise ratio as
x N x0 √ x0
=p 2 = N . (14)
σ N σ0 σ0

We can see, that the ratio increases with the square root of the number of pixels taken.
As the number of pixels is proportional to the size of the area, N ∝ r, over which the
filtering is made, we can assume, that the bigger is the area of averaging, the greater is
the signal to noise ratio. Considering equation 14 and the relation between the number of
pixels N and the area size r, we can also estimate the area size that needs to be filtered,
to reveal the signal at a certain noise power level. The importance of the correctly chosen
filter size is shown on figure 10.

Figure 10: This figure shows the importance of proper spatial pooling. (a) shows a frame
from a video on which colour magnification was performed with σ = 0.1 pixel noise added.
(b) and (c) show intensity traces over time, for the pixel on (a) marked blue. (b) shows
the trace obtained when the (noisy) sequence is processed with the same spatial filter used
to process the original face sequence. In this trace the pulse signal is not visible. (c) shows
the trace when using a filter with the filter size estimated in equation 14. The pulse is
visible as periodic peaks about one second apart. Source: [1]

12
7 Conclusion

The seminar describes the Eulerian video magnification, a straightforward method that
amplifies subtle changes of signal (either colour changes or translational motions) by per-
forming spatial and temporal processing, with no need of feature tracking and motion
prediction, as other, computationally expensive methods with similar goals created so far
used to do (for example Lagrangian method of processing). The seminar also compares the
human eye spatial and cameras spatial resolution. Further, it describes the basic idea of
the Eulerian video magnification algorithm, discusses its mathematical bases and derives
the bounds of magnification, within which the magnification does not product to many ar-
tifacts. In the end we become aware of some typically used filters and get an idea of how to
deal with noise. With more improvements, Eulerian video magnification could be in future
used on different fields of science to reveal subtle, unseen motions or colour changes with
the naked eye, that could help us explore different subtle dynamical processes, understand
certain mechanism, diagnose illnesses etc.

13
References
[1] Hao-Yu Wu et al: Eulerian Video Magnification for Revealing Subtle Changes in the
World (ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Pro-
ceedings, Volume 31 Issue 4, Article No. 65, July 2012)

[2] Hao-Yu Wu et al: Eulerian Video Processing and Medical Applications (Degree of
Master of Engineering at Massachusetts Institute of Technology, June 2012)

[3] http://people.csail.mit.edu/mrub/vidmag/

[4] http://hyperphysics.phy-astr.gsu.edu/hbase/phyopt/raylei.html

[5] http://webphysics.davidson.edu/physlet_resources/bu_semester2/c27_
rayleigh.html

[6] http://www.wikilectures.eu/index.php/Resolution_of_human_eye

[7] http://en.wikipedia.org/wiki/Angular_resolution

[8] http://en.wikipedia.org/wiki/Hierarchical_temporal_memory

[9] http://en.wikipedia.org/wiki/Band-pass_filter

[10] http://www.cs.utah.edu/~arul/report/node12.html

[11] http://en.wikipedia.org/wiki/Passband

[12] http://en.wikipedia.org/wiki/Infinite_impulse_response

[13] http://en.wikipedia.org/wiki/Butterworth_filter

[14] http://www.cambridgeincolour.com/tutorials/cameras-vs-human-eye.htm

[15] http://en.wikipedia.org/wiki/Retina

Note: All the web pages were active in April, 2013.

14

You might also like