Al-Naji2018 Article AnEfficientMotionMagnification

Machine Vision and Applications (2018) 29:585–600
https://doi.org/10.1007/s00138-018-0916-0
ORIGINAL PAPER
An efficient motion magnification system for real-time applications

Ali Al-Naji1,2 · Sang-Heon Lee1 · Javaan Chahl1,3
Received: 19 October 2016 / Revised: 23 January 2018 / Accepted: 23 January 2018 / Published online: 13 February 2018
© Springer-Verlag GmbH Germany, part of Springer Nature 2018
Abstract
The human eye cannot see subtle motion signals that fall outside human visual limits, due to either limited resolution of intensity
variations or lack of sensitivity to lower spatial and temporal frequencies. Yet, these invisible signals can be highly informative
when amplified to be observable by a human operator or an automatic machine vision system. Many video magnification
techniques have recently been proposed to magnify and reveal these signals in videos and image sequences. Limitations,
including noise level, video quality and long execution time, are associated with the existing video magnification techniques.
Therefore, there is value in developing a new magnification method where these issues are the main consideration. This study
presents a new magnification method that outperforms other magnification techniques in terms of noise removal, video quality
at large magnification factor and execution time. The proposed method is compared with four methods, including Eulerian
video magnification, phase-based video magnification, Riesz pyramid for fast phase-based video magnification and enhanced
Eulerian video magnification. The experimental results demonstrate the superior performance of the proposed magnification
method regarding all video quality metrics used. Our method is also 60–70% faster than Eulerian video magnification, whereas
other competing methods take longer to execute than Eulerian video magnification.
Keywords Video magnification techniques · Wavelet video decomposition · Objective video quality metrics
1 Introduction of blood from the heart to the head via the carotid arteries
that can also be used to measure cardiac activity [4–7]. Sim-
Small-amplitude changes in video and image sequences are ilarly, the arterial pulse in different locations of the human
difficult to perceive by the human eye because of our lim- body is normally difficult for humans to see, but the motion
ited spatiotemporal sensitivity and conflicting stabilization can be magnified to measure heart rate and beat length [8,9].
and luminance invariance mechanisms. These changes may Also, breath motion which also has low spatial amplitude
contain useful information which can be used in many appli- can be amplified to measure the baby’s respiratory activ-
cations, especially in the biomedical field. For example, ity [10,11]. Therefore, several studies have been proposed
blood circulation causes invisible skin colour changes that to reveal imperceptible motion in videos. The first study
can be amplified to measure heart rate [1–3]. Another exam- by Liu et al. [12] proposed a motion magnification tech-
ple is the subtle head motion resulting from the cardiac cycle nique based on Lagrangian perspective to magnify subtle
motions in video sequences to detect interesting mechanical
B Ali Al-Naji behaviour. However, their study is computationally expen-
ali_abdulelah_noori.al-naji@mymail.unisa.edu.au sive, with long execution time (10 h at that time) because
Sang-Heon Lee it relied on optical flow calculations and a feature tracking
Sang-Heon.Lee@unisa.edu.au algorithm. Also, it is difficult to make the image artefact-
Javaan Chahl free due to noise magnification obtained from their study.
Javaan.Chahl@unisa.edu.au A new magnification technique based on Eulerian perspec-
1 School of Engineering, University of South Australia,
tive, the Eulerian video magnification (EVM) technique, was
Mawson Lakes, SA 5095, Australia proposed by Wu et al. [13] to reveal temporal variations in
2 Electrical Engineering Technical College, Middle Technical
videos that are invisible to the human eye. This technique is
University, Al Doura, Baghdad 10022, Iraq based on Eulerian perspective, where properties of a voxel
3 Joint and Operations Analysis Division, Defence Science and
of fluid, including velocity and pressure evolve over time.
Technology Group, Melbourne, Victoria 3207, Australia This study applied the Laplacian pyramid decomposition
123
586 A. Al-Naji et al
Fig. 1 An example of magnified videos using different magnification methods
technique to decompose source video into different spatial mid, instead of using a complex steerable pyramid to reduce
frequency bands, followed by temporal filtering of theses over-completeness, execution time and cost to construct. The
spatial frequency bands. The resulting signals are then mag- main limitation in their work is that the Riesz pyramid fails
nified and added back to the original signals to enhance to maintain the power of the input signal, which can cause
spatiotemporal information in video. Finally, output video minor artefacts, and it still takes longer to execute than the
is reconstructed by collapsing the pyramid. Although EVM EVM. Furthermore, their method also suffers from some of
succeeds in amplifying motion and colour changes in videos the limitations of EVM [19]. Another study [20] proposed
and eliminates the need for costly optical flow computation a post-processing technique to improve EVM that is called
[12], it only supports small magnification factors at high spa- enhanced EVM (E2VM). E2VM relies on using EVM as a
tial frequencies and linearly increases the noise level with spatiotemporal motion analyser with image warping between
increasing magnification factor. Additionally, during skin the input and magnified videos to magnify motion without
colour magnification, some undesirable motion will also be noise amplification. E2VM also supports larger magnifica-
magnified. To solve these problems, Wadhwa et al. [14] pro- tion factors better than EVM and is significantly less affected
posed a new Eulerian method based on complex steerable by frame noise, because it does not involve modification of
pyramids [15,16], which was inspired by phase-based optical pixel values. This is, however, time-consuming and some
flow methods [17,18]. Their proposed method supports larger magnification specifications may be lost during image warp-
magnification factors and has fewer artefacts and less noise ing. Furthermore, because the E2VM uses the EVM as part
compared to EVM. Their proposed method can also be used of its algorithm, any failure in the EVM limits the quality of
to attenuate and remove low-amplitude motion in the case the results. Figure 1 shows an example of magnified videos
of colour magnification. However, it is more complex, costly obtained via different magnification methods.
to construct, over-complete and takes longer to execute than To resolve these performance problems, we proposed a
the EVM. In addition, the magnified videos based on their new magnification method to extract invisible motion infor-
method may be incoherent, if some frame sequences have mation from videos based on wavelet decomposition, the
a noisy phase signal. Therefore, they extended their work Chebyshev band-pass filter and image de-noising. This study
[19] using a new compact image pyramid, the Riesz pyra- contributes to the improvement of the video magnifica-
123
An efficient motion magnification system for real-time applications 587
tion techniques in performance, video quality, complexity

and execution to be more suitable for real-time applica-
tions. The proposed method is called an efficient motion
magnification system (EMMS) that outperforms the other
magnification methods in terms of noise removal, output
video quality, overall performance at large magnification fac-
tors and reduced time to execute. The proposed EMMS may
have the potential to be accepted in demanding real-time
video processing applications. The EMMS was compared to
other magnification methods in four studies, including EVM
[13], phase-based video magnification [14], Riesz pyramid
for fast phase-based video magnification [19] and E2VM
[20]. To prescribe most of the image characteristics in the
video quality test, thirteen full-reference objective metrics
of video quality were used in this study to test which mag-
nification method provided highest video quality, and one
no-reference objective metric was used to compare their per-
formance with different magnification factors.
2 The proposed method
In this section, the proposed block diagram of EMMS is intro-

duced as shown in Fig. 2 and the main algorithms of the
proposed method are discussed. As shown in Fig. 2, the
RGB frame sequences of the source video are first converted
to YCb Cr colour space to separate the intensity informa-
tion from the colour information. The intensity channel
(Y) is resized down 50% by using the Lanczos resampling
method [21] to reduce processing time. The Y channel is then
decomposed into different spatial frequency bands using a
multiresolution pyramid analysis. The main algorithms of
the proposed EMMS are discussed below. Fig. 2 The block diagram of the EMMS
2.1 Multiresolution pyramidal wavelet

decomposition f (t) = a j,n ω j,n (t) + b j,n ϕ j,n (t) (1)
Multiresolution pyramid analysis allows to decompose image where the wavelet function ω j,n (x) and scaling function
into sequence of image transformations with different spatial ϕ j,n (x) satisfy the shifting and dilation and span orthogo-
resolution to extract the information content in an image. The nally complimentary spaces at a set of integers T , as follows:
multiresolution decomposition based on wavelet transform is
an efficient way to provide a time-frequency representation of ω j,n (t) = 2−0.5 j ω t − 2− j n , j, n ∈ T (2)
an image and to analyse an image in horizontal, vertical and
diagonal detailed with different resolution levels. Compared ϕ j,n (t) = 2−0.5 j ϕ t − 2− j n , j, n ∈ T (3)
with Fourier-based methods, the wavelet decomposition has
local support in both space and frequency domain, while The shifting and dilation of ω (t) that produces the orthonor-
Fourier-based methods are local in frequency, but has global mal basis, ω j,n (t) , are called wavelets which have the
support in the space domain [22]. Another advantage is the characteristics of f (t) in both the frequency and spatial
availability of fast algorithms [22]. domains. a j,n and b j,n are the coefficients of ω j,n (t) and
Generally, the wavelet transform can represent any func- ϕ j,n (t), and these coefficients are expressed as
tion, f (t), in terms of two sets of basis functions: the mother

wavelet function, ω j,n (t), and the scaling function, ϕ j,n (t), a j,n = H2n−k b j−1,k , b j,n = L 2n−k bm−1,k (4)
as follows [22]: k k
123
Fig. 3 Filter structure used to determine the wavelet coefficients
where H and L are defined as

(L)n = 20.5 ϕ (t − 2) ϕ (2t) dx (5)
(H )l = (−1)l .L −l+1 (6)
where L and H are low-pass and high-pass filters correspond- Fig. 4 The RMSE for the magnified image sequences of baby video
ing to the coefficients (L)n and (H )n , respectively. Thus, the using a Butterworth and Chebyshev band-pass filter with the same mag-
discrete wavelet version of Eq. (1) becomes nification parameters

b j−1,l ( f ) = [L 2n−l .b j,n ( f ) + H2n−l .a j,n ( f )] (7) 2.2 Temporal filtering
n
Temporal filtering based on a zero-phase Chebyshev Type I
For the orthogonal wavelet bases, the wavelet function band-pass filter is applied on each level of the wavelet pyra-
becomes: mid to extract the frequency bands of interest based on the
frequency characteristics of the image obtained from wavelet
∞
pyramid at different levels. Compared with a Butterworth
ω (t) dt = 0 (8)
−∞ filter, a Chebyshev Type I band-pass has attenuation rate
beyond the passband better than the Butterworth filter and
Using the coefficients b j,n at a specific level j, we can cal- can achieve a sharper transition faster than that achieved by
culate the coefficients at level j − 1 using a filter structure as the same order Butterworth filter as well as requiring less
shown in Fig. 3. time to execute [28]. In addition, the Chebyshev Type I band-
As shown in Fig. 3, two levels of decomposition for pass filter provides a root mean square error (RMSE) smaller
one-dimensional (1-D) scheme are depicted. According to than the Butterworth band-pass filter during the magnifica-
Mallat [23], the wavelet decomposition of a 2-D scheme tion process as shown in Fig. 4, which contributes to reducing
can be accomplished by applying the 1-D scheme along the the processing artefacts and thus increases the quality of the
rows and columns of the image separately. To reconstruct magnified video.
the wavelet pyramid, inverse discrete wavelet transformer The magnitude response of the low-pass Chebyshev Type
(DWT) is applied in which the low-frequency sub-band of I is given by [29]
the finer level is reconstructed from the four sub-bands of 2
the coarser level. The reconstruction is achieved by placing 1
H N e jw = (9)
ω
zeroes between each sample of sub-bands and then convolv- 1 + ε2 PN ωc
ing the resulting signals via low-pass and high-pass filters
and adding them to obtain the reconstructed image.
In this study, we have used the 2-D DWT based on Haar N is number of filter order, ωc is a cut-off frequency,
where
PN ωωc is the Chebyshev polynomial, and ε is a control
family, since this family is the simplest of all wavelet families
to implement and outperforms other families (Daubechies, parameter on the mount of band-pass ripple. The ripple is
Coiflet and Symlet) in edge detection [24], feature extrac- often given in dB as below:
tion [25] and image compression [26]. As information loss
in the image with the Haar family is less, reconstruction of Ripple = 20 log 1 + ε2 (10)
the image is of better quality than that obtained from other
wavelet families. Further details of the wavelet Haar family A band-pass Chebyshev Type I filter is constructed by
can be found in [27]. subtracting two low-pass Chebyshev Type I filters. The
123
Table 1 Performance comparison of the video magnification methods

EVM [13] Phase-based method Fast phase-based E2VM [20] Proposed method
[14] method [19] (EMMS)
Decomposition Laplacian pyramid Complex steerable Riesz pyramid Laplacian pyramid Wavelet pyramid
pyramid
Noise Magnified Translated Translated Minimized via Removed
post-processing
−2
Over- 3
4 image 2h/(1 − 2 j )a (3–4 ≈ 20–80% faster ≈ 15–20% slower ≈ 60–70% faster
completeness times slower than than phase-based than EVM than EVM
EVM with octave method (octave
bandwidth, 2 bandwidth, 2
orientations) orientations)
Magnification Support small Support large Support large Support medium Support large
magnification magnification magnification magnification magnification
factor factor b with factor factor factor
sub-octave
bandwidth filters
Exact for Linear ramps Sinusoid Sinusoid Linear ramps Linear ramps
ah is the number of orientation bands, and j is the number of filters per octave for each orientation
b Phase-based method does not support large magnification factors with (octave bandwidth filter, two orientations)
zero-phase is optionally used with the Chebyshev filter to which make them impractical for real-time use. The objec-
eliminate processing noise at low and high frequencies and tive quality metrics are computational models that are used to
to remove the phase delay involved. assess the quality of a distorted image by comparing it with
the original image (reference image). The objective quality
2.3 Magnification metrics can be divided into three categories depending on
the existence of the reference image: full-reference quality
In the magnification process, the extracted band-passed sig- metrics, no-reference quality metrics and reduced-reference
nals are multiplied by a magnification factor (MF) to amplify quality metrics. Because no individual metric can cover all
signals of interest. The magnified signals are then collapsed image characteristics and performs well for all groups of
by using wavelet pyramid reconstruction. The magnified sig- distortions, this study uses thirteen objective full-reference
nals are then filtered by using an improved adaptive median quality metrics classified into three categories: metrics based
filter algorithm [30] that maintains excellent image edge and on mathematical model, metrics based on characteristics of
detail information, and increases signal-to-noise ratio for the the human visual system and metrics based on natural scene
processed signal better than the tradition median filters [30]. statistics, and one objective no-reference quality metric to
The output signals are resized up and added back to the input compare the video quality of the magnified video from the
signals to obtain a processed Y channel. Finally, the pro- EMMS with other magnification methods and select which
cessed Y channel is then concatenated with the original Cb method provides better quality and performance.
and Cr channels and converted to RGB to obtain the final
output. 3.1 Full-reference metrics based on mathematical
Table 1 shows the key differences between the proposed models
EMMS and the other magnification methods.
Peak signal-to-noise ratio (PSNR) and mean squared error
(MSE) are the most commonly used metric for measuring
3 Video quality metrics a video quality because they are mathematically simple to
calculate, easy to deal with for optimization purposes and
The quality metrics generally divide into two categories: they have clear physical meaning [31]. They are based on a
subjective and objective. The subjective quality metrics are simple pixel-wise comparison between the reference image
measured by asking a number of expert observers to look at and the distorted image. A good-quality image has higher
each distorted image and rate its quality. Although subjec- PSNR and a lower MSE. To calculate PSNR for each frame
tive metrics are the most natural, accurate and reliable way of in the magnified video. Let F = { f i |i = 1, 2, 3, . . . , N }
measuring the image quality, they have several drawbacks, be the original image signal with size of M × N , and G =
including inconvenience, time-consuming and expensive, {gi |i = 1, 2, 3, . . . , N } be the M × N distorted image signal
123
which represents the image signal for each magnified frame, σ F G 2 F̄ Ḡ 2σ F σG

UQI = (14)
the PSNR and MSE between F and G are defined by: σ F σG F̄ 2 + Ḡ 2 (σ F )2 + (σG )2
2552 The first component in (14) represents a correlation coef-

PSNR = 10 log10 (11)
MSE (F, G) ficient between F and G with range [1, − 1], the second
1 component measures the mean luminance between F and G
M N
MSE = [F − G]2 (12) with range of [1, 0], and the third component computes the
MN
m=1 n=1 contrast between F and G with range of [1, 0], where the
best value is “1” when F = G. As claimed in [33,34], the
where PSNR measured in decibels and the best image quality
UQI performs better than MSE in different distortion types.
occurred when the MSE is close to zero. Generally, a pro-
(b) Structural similarity (SSIM)
cessed image with PSNR greater than 30 dB is considered
SSIM is a metric based on HVS features proposed by
acceptable to human eyes [32].
Wang et al. [35] to measure the variation of structural infor-
3.2 Full-reference metrics based on characteristics mation between F and G. The SSIM is further improvement
of the human visual system of the UQI, which also extracts three separate components
from the F and G to obtain the similarity score between
(a) Universal quality index (UQI) F and G. SSIM is implemented by using a window of
UQI is a human visual system (HVS) feature-based metric 11 × 11 circular-symmetric Gaussian weighting function,
proposed by Wang and Bovik [33] to model and measure that is moved pixel by pixel over the image, instead of 8 × 8
the distortion of the image based on three components: loss window used in the UQI. The SSIM is expressed as function
of correlation, luminance and contrast distortion. The UQI of three components:
calculates the quality index of the distorted image as:
SSIM (F, G) = [l (F, G)]α [c (F, G)]β [s (F, G)]γ (15)
4σ F G F̄ Ḡ
UQI (F, G) = 2 (13)
σ F + σG2 F̄ 2 Ḡ 2
where α, β, γ > 0 are parameters used to adjust the lumi-
where nance component l (F, G), contrast component c (F, G) and
structural component s (F, G). The comparison functions of
1
N these components are computed as follows:
F̄ = fi ,
N
i=1 2μ F μG + C1
l (F, G) = (16)
1
N
(μ F )2 + (μG )2 + C1
Ḡ = gi ,
N 2σ F σG + C2
i=1 c (F, G) = (17)
(σ F )2 + (σG )2 + C2
1
N
2
σ F2 = f i − F̄ , 2σ F G + C3
N −1 s (F, G) = (18)
i=1 2σ F σG + C3
1
N
2
σG2 = gi − Ḡ , where μ F and μG are the mean luminance of the F
N −1
i=1 and G, respectively. C1, C2 and C3 are squared mul-
1 tiply pixel values (255 for 8-bit grayscale images) by a
N

σF G = f i − F̄ gi − Ḡ small (K 1) to avoid results instability when
N −1 2 constant

i=1 μ F + μG , (σ F2 + σG2 ) and 2σ F σG are very close to zero.
2
To simplify the expression, a specific form of the SSIM with

where F̄ and Ḡ are the mean luminance, σ F and σG are α = β = γ = 1 and C3 = C2/2 is given by [35,36]:
viewed as standard deviations of the contrast and the lumi-
nance, and σ F G represents the linear correlation between F [2μ F μG + C1] [2σ F G + C2]
and G. The range of UQI is between [1, − 1] which mea- SSIM =

(μ F ) + (μG )2 + C1 (σ F )2 + (σG )2 + C2
2
sures how close the contrast and the mean luminance are
between F and G. The best value “1” is achieved only when (19)
σ F = σG for all i = 1, 2, 3, . . . , N , whereas the worst value
“− 1” occurs when gi = 2 F̄ − f i for all i = 1, 2, 3, . . . , N . The SSIM index ranges between 1 and 0, where the best value
The UQI can be expressed as criteria of three components 1 is achieved only if F = G. The SSIM index corresponds
by: to UQI when C1 = C2 = C3 = 0 and α = β = γ = 1.
123
(c) Multiscale structural similarity (MS-SSIM) of the phase congruency maps S pc (F, G), and the similarity
MS-SSIM extents structural similarity, which has better measure of the gradient magnitude maps Sgs (F, G):
results than its single-scale version. In the MS-SSIM, struc-
tural similarity is accomplished on each scale of both F and 2 pc1 pc2 + C5
S pc (F, G) = (23)
G, which is filtered and weighted. The original F and G are ( pc1 )2 + ( pc2 )2 + C5
weighted at scale one and iteratively down-sampling by fac- 2gs1 gs2 + C6
Sgs (F, G) = (24)
tor 2, where the highest scale is p. The comparison function (gs1 )2 + (gs2 )2 + C6
of all scales is then combined to obtain the final score as
follows: [36] where C5 and C6 are positive constants depending on the
dynamic range of the phase congruency and the gradient

α p p
β p
γ p
MS-SSIM = l p (F, G) c p (F, G) s p (F, G) magnitude values, respectively. The total similarity measure,
j=1
St , is then computed as a combination of S pc (F, G) and
(20)
Sgs (F, G):
where l p (F, G) , c p (F, G) and s p (F, G) are the lumi-

α
β
St = S pc (F, G) Sgs (F, G) (25)
nance, contrast and structural distortions for the jth scale
obtained from each single-scale SSIM index. MS-SSIM is
where α and β are positive parameters used to adjust the pc
more flexible and accurate than its single-scale version when
and gs features. Because the HVS is very sensitive to the low-
incorporating variations of viewing conditions [36].
level features [37,39], the maximum value ( pcmax ) of pc1 and
(d) Gradient similarity measure (GSM)
pc2 is used as a weighting function to obtain a single similar-
GSM is a metric based on low-level HVS features pro-
ity score at each location w, St (w) . The FSIM is accordingly
posed by Liu et al. [37] to perceive image quality. Similar
defined for whole image spatial domain τ (w ∈ τ ) as
to the SSIM, the GSM considers luminance and contrast–
structural changes in F and G and uses gradient similarity
w St (w) pc max (w)
(gs ) to measure the changes in the contrast and structure in FSIM = (26)
the images. Therefore, the GSM is defined as a combination w pc max (w)
of a luminance comparison and a contrast/structural compar- The FSIM ranges within [1, 0], where one is achieved for
ison [37,38]: identical images.
(f) Spectral residual similarity (SRSIM)
GSM (F, G) = [l (F, G)]α [cgs (F, G)]β [sgs (F, G)]γ Zhang and Li [40] proposed a new quality metric similar
(21) to the FSIM, except they used a specific visual HVS feature,
2gs F gsG + C4 spectral residual visual saliency (vs ), instead of phase con-
GSM = (22)
(gs F )2 + (gsG )2 + C4 gruency used in the FSIM. With same assumptions, they used
the similarity measure function to the visual spectral residual
where gs F and gs G are the gradient similarity values for the and gradient magnitude features as follows:
image F and G, respectively, and C4 is a free parameter to
avoid the denominator being zero. The range of the GSM is 2vs1 vs2 + C7
Svs (F, G) = (27)
[1, 0], where the best value “1” is achieved only if F = G. (vs1 )2 + (vs2 )2 + C7
According to Liu et al. [37], the GSM outperforms the SSIM 2gs1 gs2 + C6
Sgs (F, G) = (28)
index because the gradient information (low-level HVS fea- (gs1 )2 + (gs2 )2 + C6
tures) of the image allows it to detect edge variations around
image regions which cannot be detected by the SSIM. where Svs (F, G) is the similarity measure of the visual
(e) Feature similarity (FSIM) spectral residual maps, Sgs (F, G) is the similarity measure
FSIM is another metric based on low-level HVS features of the gradient magnitude maps, and C7is another positive
proposed by Zhang et al. [39]. FSIM uses two low-level parameter to increase vs stability. The total similarity mea-
features, including phase congruency ( pc ) and gradient mag- sure, St , is also computed as a combination of Svs (F, G) and
nitude (gs ), to compute the local similarity map between F Sgs (F, G):
and G. These features are extracted from the luminance chan-
α
β
nels of colour images. To compute the FSIM, let pc1 and pc2 St = Svs (F, G) Sgs (F, G) (29)
represent the phase congruency maps extracted from F and
G, and gs1 and gs2 represent the gradient magnitude maps For a given position w, if anyone of F (w) and G (w) has a
extracted from F and G. The similarity measure is applied to high visual spectral residual value, the position w will have
these components separately to obtain the similarity measure a high impact on the HVS when evaluating the similarity
123
between images. Therefore, the maximum value (vs max ) of contrast detection thresholds are computed by using wavelet-
vs1 and vs2 is used as a weighting function to obtain a single based models of visual masking to determine if the errors
similarity score. The SRSIM is accordingly defined for a (distortions E = G − F) are perceivable by the HVS. If
whole image at a spatial domain τ (w ∈ τ ) as the errors are below the threshold of detection, the distorted
image G has a perfect visual fidelity (VSNR = ∞) and

w St (w) vs max (w) then the algorithm terminates. When the errors are above
SRSIM = (30)
w vs max (w)
the threshold of detection, a second stage is performed by
using the ratio of the root mean square (RMS) contrast to the
The SRSIM ranges within [1, 0], where the values close to weighted values of the distortions of the perceived contrast
one indicate the best similarity. d pc and disruption of global precedence dgp . The visual dis-
(g) Noise quality measure (NQM) tortion (V D) is defined as the following linear combination
NQM is a metric based on HVS features proposed by of d pc and dgp
Damera-Venkata et al. [41] to quantify linear frequency dis- dgp
tortion and noise injection between F and G. NQM spatially V D = αd pc + (1 − α) √ (33)
2
analyses the variations in visual effects including: contrast
sensitivity, local luminance, contrast interaction and contrast For supra-threshold distortions, d pc is relatively invariant
masking between the reference and distorted images. In the with spatial frequency and is approximated to the RMS con-
NQM algorithm, the reference image is passed through a trasts of the distortions, R (E) ,
restoration algorithm to obtain a model restored image based
on the same parameters that were used to restore a degraded d pc = R (E) (34)
image. Nonlinear space-frequency processing based on Peli’s
contrast pyramid [42] is performed to obtain the simulated The quantity dgp is the distance between the actual con-
images, F̂ (x, y) and Ĝ (x, y). The SNR is then calculated trasts of G and the global precedence contrasts computed for
based on the difference between the simulated images to pro- the same total RMS distortion contrast in an M-dimensional
duce NQM in decibels: space as given below
M 0.5
⎡ ⎤ 2
dgp = R E f m − R E f m (35)
⎢ x y F̂ (x, y) ⎥
NQM = 10 log10 ⎣ 2 ⎦ (31)

m=1
x y F̂ (x, y) − Ĝ (x, y)
where R E f m represents the actual contrast of the dis-
tortions within the band centred at f m computed in the
An image with a higher NQM has a better quality than an first stage and R represents a vector of global precedence-
image with lower NQM. preserving distortion contrasts. The VSNR, in decibels, is
(h) Weighted signal-to-noise ratio (WSNR) thus given
WSNR is a HVS-based metric that is defined in deci- 2
bels as the ratio R (F) R (F)
of the mean weighted signal power VSNR = 10 log10 = 20 log10 (36)
weighted Psignal to the mean weighted noise power VD VD
(weighted Pnoise ) [43]
where R (F) denotes the respective RMS contrasts of the
original image F.
weighted Psignal
WSNR = 10 log10 (32)
weighted Pnoise
3.3 Full-reference metrics based on natural scene
A low-pass or band-pass contrast sensitivity function (CSF) statistics
[44] is used as a weighting function. CSF is a linear spa-
tially invariant approximation which is gained by using the (a) Information fidelity criterion (IFC)
frequency response of the contrast sensitivity of the HVS. Sheikh et al. [47] used the Gaussian scale mixture (GSM)
Therefore, WSNR cannot determine nonlinear and spatially model in the wavelet domain as natural scene statistics [48]
varying effects of HVS between F and G [45]. model to extract mutual information between the input of the
(i) Visual signal-to-noise ratio (VSNR) channel (the reference image) and the output of the channel
VSNR is a metric based on near-threshold and supra- (the distorted image). One sub-band of the wavelet decom-
threshold features of the HVS proposed by Chandler and position
of an image was modelled as a random field GSM,

Hemami [46] to capture the distortions in the wavelet domain. C = C i : i ∈ I where C is the product of two station-
VSNR operates in two processing stages. In the first stage, ary random fields that are independent of each other and I
123
denotes the set of spatial indices for the random field. The GSM source model C at given S as follows
GSM source model can be expressed as
1 N
M
N N N si 2 E k

C = S U = si u i : i I I C ; A |s = log2 1 + (43)
(37) 2 σn 2
i=1 k=1
1 N
M
where S = {si : i I } is a random field GSM of positive N N N ςi 2 si 2 E k
I C ; B |s = log2 1 + 2 (44)
scalars, and U is a Gaussian vector (zero mean and covariance 2 σn + σv 2
i=1 k=1
of σu2 . The distortion model is obtained by adding gain and
additive noise in each sub-band as shown below:
where I C N ; A N |s N and I C N ; B N |s N represent the

mutual visual information of the reference and distorted
D = ς C + v = ςi v i : i I (38)
images, respectively, that can be extracted from HVS out-
put by the brain for a particular channel. Values of E k are

where ς = ς i : i ∈ I is a deterministic scalar gain the eigenvalues of the covariance of U , where σn2 and σv2 are
and v is a stationary additive zero-mean Gaussian noise variances of the Gaussian noise for the C and D, respectively.
with covariance of σv2 . Let C N = (C1 , C2 , C3 . . . , C N ) Visual information obtained independently from each chan-
denote elements of the GSM source C at given S and nel is incorporated to obtain the VIF criterion as follows for
D N = (D1 , D2 , D3 . . . , D N ) denote elements of the dis- j ∈ channel:
tortion model D. The mutual information between C and D

for one sub-band is denoted a j I (C N , j ; A N , j |s N , j )
VIF = (45)
N j I (C N , j ; B N , j |s N , j )
I C N ; D N |s N = I (Ci ; Di |si )
i=1 where C N , j represent N elements of the random
field C j that

1
N
ςi2 si2 σu2 denotes the coefficients from channel j, I C ; A N , j |s N , j
N , j
= log2 1 + (39)
2 σv2 represents the mutual visual information between the input
i=1
and the output of the HVS channel without distortion for the
To find the mutual information for all sub-bands, each sub- reference signal A from channel j, and I C N , j ; B N , j |s N , j
band must be independently implemented and then IFC represents the mutual visual information between the input
obtained by summing all sub-bands ( j ∈ sub-band) of the distortion channel and the output of the HVS channel
for the distorted signal B from channel j. The range of VIF
IFC = I C N , j ; D N , j |s N , j (40) is [0, 1] where the best values are close to one.
j
3.4 No-reference metric based on natural scene
where C N , j denotes coefficients from the random field C j statistics
of the jth sub-band, and similarly for D N , j and s N , j . IFC
is in the range from zero (no information fidelity) to infinity Mittal et al. [50] proposed a no-reference quality metric
(perfect information fidelity). based on natural scene statistics to quantify natural image
(b) Visual information fidelity (VIF) statistics without the need for any reference image, which
VIF was designed by Sheikh and Bovik [49] to improve is called blind/reference less image spatial quality evaluator
the performance of IFC in estimating image quality based on (BRISQUE). BRISQUE operates in the spatial domain, and it
the same principles as the IFC. The main difference is that is statistically better than the full-reference PSNR and SSIM.
VIF compared the information obtained from the output of a BRISQUE ranges between (0–100), where the value of “0”
natural image with the amount of information extracted from indicates high spatial quality and a value of “100” indicates
HVS. Let A and B denote the visual signals extracted from low spatial quality.
the HVS after added noise ℵ and ℵ (zero-mean uncorrelated
multivariate Gaussian). 3.5 Experimentation
A = C +ℵ (Reference image) (41) 3.6 Source videos and software

B = D+ℵ (Distorted image) (42)
Four source videos were used for system validation as shown
Visual information of the reference and distorted image can in Fig. 5. The first video is baby1 video captured by a digital
be extracted from the output of the HVS (A and B) and the camera (Canon EOS 60D) with a resolution of 960 × 544
123
Fig. 5 Source videos, a baby1 video, b baby2 video, c guitar video and d camera video
pixel, 301 frames and a frame rate of 30 fps [13]. The second videos obtained from four existing magnification techniques
video is baby2 video captured by a digital camera (Nikon [13,14,19,20] to determine which method has the best video
D5300) with a resolution of 960 × 540 pixel, 301 frames and quality outcome. The video quality metrics (PSNR, MSE,
a frame rate of 60 fps. The third video is guitar video with a WSNR, VSNR, UQI, SSIM, MSSIM, GSM, RSIM, FSIM,
resolution of 432 × 192 pixel, 300 frames and a frame rate of IFC, VIF and NQM) were chosen based on three categories
600 fps. The fourth video is camera video with a resolution to cover most of the image characteristics in the video quality
of 512 × 384 pixel, 1000 frames and a frame rate of 300 fps. test. Table 2 shows the averages for each metric of all frame
Both guitar and camera videos were captured by a digital sequences of the baby1 video and their ranks according to
camera (Casio Exilim EX F1)[13]. the best video quality for all magnification methods. Table 3
The experiments and data analysis were carried out by shows the video quality test of the baby2 video. Tables 4 and 5
using MATLAB software environment (MathWorks, Inc., show video quality outcomes for the guitar video and camera
Natick, USA). MATLAB software was used in this study video, respectively, where each row in these tables represents
because it is one of the most popular scientific high-level the averages for each metric of all frame sequences. The ranks
programming environments for a large number of scientific of methods from the highest “1” to the lowest “5” according
and engineering applications [51,52]. to their video quality scores were also shown in these tables.
As can be seen from Tables 2, 3, 4 and 5, the proposed
3.7 Results and discussion EMMS significantly outperforms other magnification meth-
ods in all video quality metrics. Also, the EMMS ranked first
The temporal band-pass filter processing was performed on regarding the metrics based on mathematical model, charac-
each level of the wavelet pyramid by applying a fifth-order teristics of the human visual system, natural scene statistics,
Chebyshev Type I band-pass filter with 0.5 dB of passband while the other methods may outperform each other in some
ripple and different selected frequencies in conformity with quality metrics and not others.
the source video. Both baby1 and baby2 source videos were To evaluate the performance of all methods in different
magnified with same parameters (magnification = 20×, cut- magnification factors, one no-reference video quality score
off frequency = 16, 0.4–0.05 Hz band-pass filter and eight (BRISQUE) was used to determine which method has better
pyramid levels), while guitar video was magnified by 50×, spatial quality for larger magnification factor. The averaged
10 Hz of cut-off frequency of 10, 72–92 Hz of band-pass BRISQUE for all frame sequences of the source videos
filter and eight pyramid levels, and camera video was mag- was 36.59, 15.39, 8.72 and 21.25 for baby1, baby2, guitar
nified by 100×, 15 Hz of cut-off frequency, 36–62 Hz of andcamera, respectively. We magnified both baby1 video and
band-pass filter and eight pyramid levels. The magnified baby2 video with nine magnification factors ranged between
videos based on the EMMS were compared to the output 5–50 (5, 10, 15, 20, 25, 30, 35, 40, 45 and 50) for each method
123
Table 2 Methods comparison based on video quality metrics for “baby1 video”
Video quality metrics EVM [13] Phase-based method Fast phase-based E2VM [20] Proposed EMMS
(octave band-pass) [14] method [19]
1 PSNR Metrics based on 31.208 (4) 32.892 (3) 31.027 (5) 34.911 (2) 39.444 (1)
mathematical
model
2 MSE 49.199 (4) 34.166 (3) 51.198 (5) 21.023 (2) 7.4884 (1)
3 UQI Metrics based on 0.4147 (5) 0.6170 (3) 0.6290 (2) 0.6121 (4) 0.6867 (1)
characteristics of
the human visual
system
4 SSIM 0.8823 (5) 0.9477 (4) 0.9516 (3) 0.9602 (2) 0.9704 (1)
5 MSSIM 0.9105 (5) 0.9749 (4) 0.9819 (2) 0.9774 (3) 0.9888 (1)
6 GSM 0.9888 (5) 0.9964 (4) 0.9972 (3) 0.9977 (2) 0.9984 (1)
7 FSIM 0.8961 (5) 0.9663 (4) 0.9743 (3) 0.9820 (2) 0.9881 (1)
8 SRSIM 0.9496 (5) 0.9804 (4) 0.9864 (3) 0.9882 (2) 0.9901 (1)
9 NQM 18.877 (4) 22.756 (2) 18.682 (5) 21.386 (3) 28.308 (1)
10 WSNR 27.027 (4) 30.635 (3) 26.635 (5) 30.644 (2) 38.084 (1)
11 VSNR 18.682 (5) 21.327 (3) 21.268 (4) 25.627 (2) 32.135 (1)
12 IFC Metrics based on 1.4038 (5) 1.8614 (4) 2.1299 (3) 2.2291 (2) 2.5271 (1)
natural scene
statistics
13 VIF 0.4717 (5) 0.4723 (4) 0.5663 (3) 0.5932 (2) 0.6549 (1)
Table 3 Methods comparison based on video quality metrics for “baby2 video”
mathematical
model
2 MSE 203.329 (3) 256.175 (5) 220.106 (4) 55.016 (2) 43.412 (1)
characteristics of
the human visual
system
4 SSIM 0.8703 (3) 0.8378 (5) 0.8530 (4) 0.9553 (2) 0.9573 (1)
5 MSSIM 0.8923 (5) 0.9072 (4) 0.9242 (3) 0.9910 (2) 0.9947 (1)
6 GSM 0.9833 (5) 0.9896 (4) 0.9908 (3) 0.9933 (2) 0.9972 (1)
7 FSIM 0.8773 (5) 0.8983 (4) 0.9116 (3) 0.9911 (2) 0.9943 (1)
8 SRSIM 0.9062 (5) 0.9545 (4) 0.9606 (3) 0.9960 (2) 0.9975 (1)
9 NQM 13.422 (5) 15.840 (4) 17.574 (3) 24.594 (2) 29.113 (1)
10 WSNR 24.754 (5) 26.295 (4) 27.528 (3) 37.493 (2) 42.546 (1)
11 VSNR 11.246 (5) 11.726 (4) 12.612 (3) 24.637 (2) 26.056 (1)
natural scene
statistics
13 VIF 0.4615 (3) 0.3584 (5) 0.3786 (4) 0.6286 (2) 0.6560 (1)
123
Table 4 Methods comparison based on video quality metrics for “guitar video”
mathematical
model
2 MSE 35.201 (3) 216.243 (5) 209.804 (4) 18.019 (2) 14.522 (1)
characteristics of
the human visual
system
4 SSIM 0.9449 (3) 0.8534 (4) 0.8568 (3) 0.9701 (2) 0.9750 (1)
5 MSSIM 0.9758 (4) 0.9651 (5) 0.9773 (3) 0.9909 (2) 0.9953 (1)
6 GSM 0.9958 (3) 0.9890 (4) 0.9884 (5) 0.9978 (2) 0.9980 (1)
7 FSIM 0.9749 (3) 0.9291 (4) 0.9206 (5) 0.9864 (2) 0.9878 (1)
8 SRSIM 0.9881 (3) 0.9632 (4) 0.9577 (5) 0.9932 (2) 0.9937 (1)
9 NQM 23.284 (3) 20.284 (5) 20.293 (4) 28.330 (2) 32.351 (1)
10 WSNR 31.889 (4) 33.456 (3) 27.937 (5) 36.624 (2) 42.434 (1)
11 VSNR 27.369 (3) 23.567 (5) 26.216 (4) 34.179 (2) 40.832 (1)
natural scene
statistics
13 VIF 0.6205 (3) 0.4842 (5) 0.5394 (4) 0.7444 (2) 0.7987 (1)
Table 5 Methods comparison based on video quality metrics for “camera video”
mathematical
model
2 MSE 95.530 (3) 193.553 (5) 146.067 (4) 24.410 (2) 13.111 (1)
characteristics of
the human visual
system
4 SSIM 0.8790 (4) 0.8433 (5) 0.8844 (3) 0.9662 (2) 0.9731 (1)
5 MSSIM 0.8875 (5) 0.9520 (4) 0.9719 (3) 0.9762 (2) 0.9913 (1)
6 GSM 0.9863 (5) 0.9917 (4) 0.9948 (3) 0.9980 (2) 0.9990 (1)
7 FSIM 0.9185 (5) 0.9361 (4) 0.9566 (3) 0.9882 (2) 0.9942 (1)
8 SRSIM 0.9595 (5) 0.9704 (4) 0.9811 (3) 0.9955 (2) 0.9976 (1)
9 NQM 16.136 (5) 23.845 (2) 20.578 (4) 23.375 (3) 29.465 (1)
10 WSNR 23.817 (5) 28.908 (3) 26.861 (4) 31.153 (2) 37.888 (1)
11 VSNR 20.173 (5) 21.958 (4) 24.985 (3) 30.873 (2) 39.628 (1)
natural scene
statistics
13 VIF 0.5174 (3) 0.3447 (5) 0.4458 (4) 0.7287 (2) 0.7882 (1)
123
Fig. 6 The no-reference quality score at different magnification factors using five magnification methods for the source videos a baby1 video, b
baby2 video, c guitar video and d cameravideo
to evaluate their performance regarding increased magnifica- BRISQUE for the EMMS remains in proximity with aver-
tion factors as shown in Fig. 6a, b, while we magnified both aged BRISQUE of the source videos (guitar and camera)
guitar video and camera video at nine different magnifica- which means the EMMS had the best video quality score
tion factors ranged between 10–100 (10, 20, 30, 40, 50, 60, than all methods with the increase of the magnification fac-
70, 80, 90 and 100) as shown in Fig. 6c, d. tors from 10 to 100.
It is evident from Fig. 6 that there was a proportional lin- The execution time for each method for all source videos
ear relationship between the magnification factor and the is computed on a PC with Intel i5–4570 processor running
video quality score (BRISQUE) and this linear relation- at 3.2 GHz with 8 GB of RAM under Windows 8 operating
ship demonstrates that the proposed EMMS had the best system as shown in Table 6.
value of the quality score (the closest to zero) among other It is noted from Table 6 that the EMMS performs the mag-
methods with increasing the magnification factors. From nification process in 50.68 and 48.85 s for baby1 and baby2
Fig. 6a, b, the averaged BRISQUE for the EMMS, phased- videos, respectively, which were approximately 1.7× (70%)
based method and fast phase-based method increased slightly faster than EVM, making it suitable for real-time use. The
with the increase in the magnification factors from 5 to 50 and execution times of the EMMS were also approximately 4.3×
remained in proximity with averaged BRISQUE of the source (330%) faster than phased-based method, 2.3× (130%) faster
videos (baby1 and baby2). In contrast, it varies significantly than fast phase-based method and 2× (100%) faster than
after 20× magnification for other methods, which means E2VM. From the guitar video, EMMS was approximately
that the EMMS, phased-based method and fast phase-based faster 1.6× (60%), 2.7× (170%), 2.1× (110%), and 1.96×
method worked more robustly with larger magnification fac- (96%) than the EVM, phase-based method, fast phase-based
tors than did the other methods. From Fig. 6c, d, the averaged method and E2VM, respectively. From the camera video,
123
Table 6 Execution time comparison in seconds for all magnification methods

Video magnification methods Baby1 video (s) Baby2 video (s) Guitar video (s) Camera video (s)
EVM 85.67 82.52 19.77 100.76

Phase-based methoda 213.17 211.74 33.84 243.18
Fast phase-based method 117.45 115.73 26.37 130.24
E2VM 101.32 97.84 24.15 120.05
EMMS 50.68 48.54 12.31 59.29
a With four orientations octave bandwidth pyramid
EMMS was approximately faster 1.7× (70%), 4.1× (310%), References

2.2× (120%), and 2× (100%) than the EVM, phase-based
method, fast phase-based method and E2VM, respectively. 1. Poh, M.-Z., Mcduff, D., Picard, R.W.: Non-contact, automated car-
diac pulse measurements using video imaging and blind source
separation. Opt. Soc. Am. 18(10), 10762–10774 (2010)
2. Poh, M.-Z., Mcduff, D.J., Picard, R.W.: Advancements in noncon-
tact, multiparameter physiological measurements using a webcam.
IEEE Trans. Biomed. Eng. 58(1), 7–11 (2011)
3. Verkruysse, W., Svaasand, L.O., Nelson, J.S.: Remote plethys-
4 Conclusion mographic imaging using ambient light. Opt. Express 16(26),
21434–21445 (2008)
4. Balakrishnan G., Durand F., Guttag J.: Detecting pulse from head
Video magnification has attracted wide attention in the motions in video. In: 2013 IEEE Conference on Computer Vision
field of biomedical applications because of its capacity of and Pattern Recognition (CVPR), pp. 3430–3437 (2013)
extracting useful information in video and image sequences. 5. Shan, L., Yu, M.: Video-based heart rate measurement using head
motion tracking and ICA. In: 2013 6th International Congress on
However, the existing video magnification methods suffer Image and Signal Processing (CISP), pp. 160–164. IEEE (2013)
from decreased video quality and increased noise level when 6. Irani, R., Nasrollahi, K., Moeslund, T.B.: Improved pulse detection
the magnification is increased as well as taking a long time to from head motions using DCT. In: 2014 International Conference
execute. In order to solve these problems, several improve- on Computer Vision Theory and Applications (VISAPP), pp. 118–
124 (2014)
ments have been made based on wavelet decomposition, 7. Al-Naji, A., Chahl, J.: Contactless cardiac activity detection based
Chebyshev band-pass filter, image de-noising and resam- on head motion magnification. Int. J. Image Graph. 17(1), 1–18
pling method to enhance noise removal, video quality and (2017)
execution time. The magnified videos of five magnifica- 8. He, X., Goubran, R.A., Liu, X.P.: Wrist pulse measurement and
analysis using Eulerian video magnification. In: 2016 IEEE-EMBS
tion methods are compared quantitatively in terms of their International Conference on Biomedical and Health Informatics
video quality, the performance at different magnification (BHI). pp. 41–44. IEEE (2016)
factors and the execution time. The results based on four 9. Al-Naji, A., Chahl, J.: Non-contact heart activity measurement sys-
source videos indicate that the proposed method has better tem based on video imaging analysis. Int. J. Pattern Recognit. Artif.
Intell. 31(2), 1–21 (2017)
video quality metrics related to most of image character- 10. Al-Naji, A., Chahl, J.: Remote respiratory monitoring system based
istics and could yield better performance than all current on developing motion magnification technique. Biomed. Signal
methods with the potential for larger magnification. More- Process. Control 29, 1–10 (2016)
over, the proposed EMMS has low execution time (60–70 11. Ali, A., Kim, G., Sang-Hoen, L., Javaan, C.: Real time apnoea
monitoring of children using the Microsoft Kinect sensor: a pilot
faster than EVM) compared to other methods. Thus, the pro- study. Sensors 17(2), 286 (2017)
posed method is the best candidate for real-time biomedical 12. Liu, C., Torralba, A., Freeman, W.T., Durand, F., Adelson, E.H.:
applications. Further improvement on the proposed method Motion magnification. ACM Trans. Graph. (TOG) 24(3), 519–526
based on another multiresolution pyramidal decomposition (2005)
13. Wu, H.Y., Rubinstein, M., Shih, E., Guttag, J.V., Durand, F.,
and advanced image de-noising methods may be a sub- Freeman, W.T.: Eulerian video magnification for revealing subtle
ject of future work. changes in the world. ACM Trans. Graph. (TOG) 31(4), 65 (2012)
14. Wadhwa, N., Rubinstein, M., Durand, F., Freeman, W.T.: Phase-
based video motion processing. ACM Trans. Graph. (TOG) 32(4),
Compliance with ethical standards 80 (2013)
15. Portilla, J., Simoncelli, E.P.: A parametric texture model based on
Conflict of interest The authors declare that they have no conflict of joint statistics of complex wavelet coefficients. Int. J. Comput. Vis.
interest. 40(1), 49–70 (2000)
16. Simoncelli, E.P., Freeman, W.T., Adelson, E.H., Heeger, D.J.:
Ethical standard Ethical approval for the experimental work was given Shiftable multiscale transforms. IEEE Trans. Inf. Theory 38(2),
by the UniSA Human Research Ethics Committee. 587–607 (1992)
123
17. Fleet, D.J., Jepson, A.D.: Computation of component image veloc- the Thirty-Seventh Asilomar Conference on Signals, Systems and
ity from local phase information. Int. J. Comput. Vis. 5(1), 77–104 Computers, pp. 1398–1402 (2003)
(1990) 37. Liu, A., Lin, W., Narwaria, M.: Image quality assessment based on
18. Gautama, T., Van Hulle, M.M.: A phase-based approach to the gradient similarity. IEEE Trans. Image Process. 21(4), 1500–1512
estimation of the optical flow field using spatial filtering. IEEE (2012)
Trans. Neural Netw. 13(5), 1127–1136 (2002) 38. Gondal, I., Murshed, M.: A novel color image fusion QoS measure
19. Wadhwa, N., Rubinstein, M., Durand, F., Freeman, W.T.: Riesz for multi-sensor night vision applications. In: 2010 IEEE Sym-
pyramid for fast phase-based video magnification. In: 2014 IEEE posium on Computers and Communications (ISCC), pp. 399–404
International Conference on Computational Photography (ICCP), (2010)
pp. 1–10. IEEE (2014) 39. Zhang, L., Mou, X., Zhang, D.: FSIM: a feature similarity index
20. Liu, L., Lu, L., Luo, J., Zhang, J., Chen, X.: Enhanced Eulerian for image quality assessment. IEEE Trans. Image Process. 20(8),
video magnification. In: 2014 7th International Congress on Image 2378–2386 (2011)
and Signal Processing (CISP), pp. 50–54. IEEE (2014) 40. Zhang, L., Li, H.: SR-SIM: a fast and high performance IQA index
21. Madhukar, B., Narendra, R.: Lanczos resampling for the digital based on spectral residual. In: 19th IEEE International Conference
processing of remotely sensed images. In: Proceedings of Interna- on Image Processing, pp. 1473–1476 (2012)
tional Conference on VLSI, Communication, Advanced Devices, 41. Damera-Venkata, N., Kite, T.D., Geisler, W.S., Evans, B.L., Bovik,
Signals and Systems and Networking (VCASAN-2013), pp. 403– A.C.: Image quality assessment based on a degradation model.
411. Springer (2013) IEEE Trans. Image Process. 9(4), 636–650 (2000)
22. Kaymaz, E., Lerner, B.T., Campbell, W.J., Le Moigne, J., Pierce, 42. Peli, E.: Contrast in complex images. J. Opt. Soc. Am. A 7(10),
J.F.: Registration of satellite imagery utilizing the low-low compo- 2032–2040 (1990)
nents of the wavelet transform. In: 25th Annual AIPR Workshop on 43. Dodangeh, M., Figeiredo, I.N., Goncalves, G.: Spatially Adaptive
Emerging Applications of Computer Vision, International Society Total Variation Deblurring with Split Bregman Technique, pp. 1–
for Optics and Photoncs, pp. 45–54 (1997) 24. University of Coimbra, Paço das Escolas (2016)
23. Mallat, S.G.: A theory for multiresolution signal decomposition: 44. Mitsa, T., Varkur, K.L.: Evaluation of contrast sensitivity functions
the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. for the formulation of quality measures incorporated in halftoning
11(7), 674–693 (1989) algorithms. In: International Conference on Acoustics, Speech, and
24. Singh, R., Vasquez, R.E., Singh, R.: Comparison of Daubechies, Signal Processing, pp. 301–304. IEEE (1993)
Coiflet, and Symlet for edge detection. In: Proceedings SPIE 3074, 45. Slanina, M., Ricny, V.: A comparison of full-reference image qual-
Visual Information Processing VI, pp. 151–159 (1997). https://doi. ity assessment methods. In: Radioelektronika 2006 Conference
org/10.1117/12.280616 Proceedings. Slovak Technical University in Bratislava, pp. 165–
25. Dettori, L., Semler, L.: A comparison of wavelet, ridgelet, 168 (2006). ISBN 80-227-2388-6
and curvelet-based texture classification algorithms in computed 46. Chandler, D.M., Hemami, S.S.: VSNR: a wavelet-based visual
tomography. Comput. Biol. Med. 37(4), 486–498 (2007) signal-to-noise ratio for natural images. IEEE Trans. Image Pro-
26. Kale, V.U., Khalsa, N.N.: Performance evaluation of various cess. 16(9), 2284–2298 (2007)
wavelets for image compression of natural and artificial images. 47. Sheikh, H.R., Bovik, A.C., De Veciana, G.: An information fidelity
Int. J. Comput. Sci. Commun. 1(1), 179–184 (2010) criterion for image quality assessment using natural scene statistics.
27. Stanković, R.S., Falkowski, B.J.: The Haar wavelet transform: Its IEEE Trans. Image Process. 14(12), 2117–2128 (2005)
status and achievements. Comput. Electr. Eng. 29(1), 25–44 (2003) 48. Srivastava, A., Lee, A.B., Simoncelli, E.P., Zhu, S.C.: On advances
28. Sandhu, M., Kaur, S., Kaur, J.: A study on design and implementa- in statistical modeling of natural images. J. Math. Imaging Vis.
tion of Butterworth, Chebyshev and elliptic filter with Matlab. Int. 18(1), 17–33 (2003)
J. Emerg. Technol. Eng. Res. 4(6), 111–114 (2016) 49. Sheikh, H.R., Bovik, A.C.: Image information and visual quality.
29. Podder, P., Mehedi Hasan, M., Rafiqul Islam, M., Sayeed, M.: IEEE Trans. Image Process. 15(2), 430–444 (2006)
Design and implementation of Butterworth, Chebyshev-I and ellip- 50. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image qual-
tic filter for speech signal analysis. Int. J. Comput. Appl. 98(7), ity assessment in the spatial domain. IEEE Trans. Image Process.
12–18 (2014) 21(12), 4695–4708 (2012)
30. Yu, W., Ma, Y., Zheng, L., Liu, K.: Research of improved adaptive 51. Lyshevski, S.E.: Engineering and Scientific Computations Using
median filter algorithm. In: Proceedings of the 2015 International MATLAB. Wiley, Hoboken (2005)
Conference on Electrical and Information Technologies for Rail 52. Abbas, E.I., Noori, A.A.: High resolution direction-of-arrival esti-
Transportation, pp. 27–34. Springer (2016) mation using genetic algorithm. Eng. Technol. J. 27(9), 1746–1754
31. Wang, Z., Lu, L., Bovik, A.C.: Video quality assessment based on (2009)
structural distortion measurement. Sig. Process. Image Commun.
19(2), 121–132 (2004) Ali Al-Naji received the Bach-
32. Chen, T.S., Chang, C.C., Hwang, M.S.: A virtual image cryptosys- elor of Medical Instrumentation
tem based upon vector quantization. IEEE Trans. Image Process. Engineering Techniques from the
7(10), 1485–1488 (1998) Electrical Engineering Technical
33. Wang, Z., Bovik, A.C.: A universal image quality index. IEEE College, Middle Technical Uni-
Signal Process. Lett. 9(3), 81–84 (2002) versity, Baghdad, Iraq, in 2005,
34. Ruikar, J.D., Sinha, A.K., Chaudhury, S.: Image quality assessment and the M.S. degree in Electrical
algorithms: study and performance comparison. In: 2014 Inter- & Electronic Engineering Depart-
national Conference on Electronics and Communication Systems ment from the Technology Uni-
(ICECS), pp. 1–4 (2014) versity, Baghdad, Iraq, in 2008.
35. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image qual- He is currently working toward
ity assessment: from error visibility to structural similarity. IEEE the Ph.D. degree in the Depart-
Trans. Image Process. 13(4), 600–612 (2004) ment of Electrical & Information
36. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural sim- Engineering, University of South
ilarity for image quality assessment. In: Conference Record of Australia (UniSA). His research
123
interests include biomedical engineering, computer vision systems and Javaan Chahl completed his Doc-
microcontroller applications. torate at the Australian National
University. He joined the Defence
Science and Technology Group
(DST Group) as a Research Sci-
entist in 1999. In 2011 Javaan
Sang-Heon Lee received a B.ESc
joined RMIT University as found-
degree in aeronautical engineer-
ing Professor of Unmanned Aerial
ing from Inha University, Korea,
Vehicles. In 2012 Javaan became
M.ESc in mechatronics from the
Chair of Sensor Systems, a Joint
University of New south Wales,
appointment between DST Group
and a Ph.D. degree in System
and the University of South Aus-
Engineering from Australian
tralia. Javaan is a member of the
National University. He is cur-
IEEE, the Institute of Engineers
rently a program director and a
Australia and was elected Fellow
senior lecturer in the School of
of the Royal Aeronautical Society in 2014.
Engineering, University of South
Australia. His current research
focus is on development of effi-
cient algorithm for green supply
chain and digital image process-
ing in agricultural and medical applications. His main research inter-
ests are in the area of discrete-event systems, fuzzy logic control and
neural networks. He has published over 100 papers in academic jour-
nals and conferences.
123

Al-Naji2018 Article AnEfficientMotionMagnification

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Al-Naji2018 Article AnEfficientMotionMagnification

Uploaded by

Copyright:

Available Formats

Machine Vision and Applications (2018) 29:585–600

An efficient motion magnification system for real-time applications

Fig. 1 An example of magnified videos using different magnification methods

tion techniques in performance, video quality, complexity

2 The proposed method

In this section, the proposed block diagram of EMMS is intro-

2.1 Multiresolution pyramidal wavelet

Fig. 3 Filter structure used to determine the wavelet coefficients

where H and L are defined as

(H )l = (−1)l .L −l+1 (6)

Table 1 Performance comparison of the video magnification methods

which represents the image signal for each magnified frame, σ F G 2 F̄ Ḡ 2σ F σG

2552 The first component in (14) represents a correlation coef-

To simplify the expression, a specific form of the SSIM with

where l p (F, G) , c p (F, G) and s p (F, G) are the lumi-

A = C +ℵ (Reference image) (41) 3.6 Source videos and software

Table 6 Execution time comparison in seconds for all magnification methods

EVM 85.67 82.52 19.77 100.76

EMMS was approximately faster 1.7× (70%), 4.1× (310%), References

You might also like