Professional Documents
Culture Documents
Z
while promoting smooth depth maps elsewhere. 1
EQ = (A(x) − Q(x))2 + kT (x)∇Q(x)k dx. (15)
Let Q(x) : Ω → R be a functional that represents the Ω 2θ
inverse depth value at pixel x = (x, y)> . The data term As θ → 0, it can be shown [9] that this minimization is
associated with each label at pixel x is computed as: equivalent to minimizing equation 12. Equation 14 is mini-
n mized using a point-wise search over the depth labels which
1X
U (x, Q(x))) = Di (x, bi (Q(x))) , (11) can handle fine structures without resorting to linearizing
n i the data term or a coarse-to-fine approach such as [25].
Similar to [20], we perform a single Newton step around
where n is the number of focal stack frames. The energy
the minimum point to achieve sub-sample accuracy. Equa-
functional we seek to minimize is:
Z tion 15 is similar to the ROF image denoising problem [22]
EQ = λU (x, Q(x)) + kT (x)∇Q(x)k dx, (12) with an anisotropic Huber norm and is minimized using a
Ω primal-dual algorithm [10] through the Legendre-Fenchel
transform.
where T (x) is a symmetric, positive definite dif-
fusion tensor as suggested in [30] and defined as
exp(−α|∇I|β )~n~n> + ~n⊥~n⊥> where ~n = |∇I| ∇I
and ~n⊥ is
7. Handling Bokeh
a unit vector perpedicular to ~n, and kzk is a Huber norm Defocusing highlights and other saturated regions create
defined as. sharp circular expansions known as bokeh. This effect can
( 2
kzk2 cause artifacts if not properly accounted for during image
2 kzk2 ≤ alignment. Optical flow algorithms will explain the bokeh
kzk =
(13)
kzk1 − 2 otherwise expansion as if it was caused by parallax and, after align-
ment, it will contract or expand the bokeh to match the ref-
Following the approach of [25, 20], the data term and erence frame, resulting in a physically incorrect aligned fo-
smoothness term in (12) are decoupled through an auxiliary cal stack, and “bleeding” artifacts in the all-in-focus image,
functional A(x) : Ω → R and split into two minimization e.g., see top row in Figure 3.
problems which are alternately optimized:
To solve this problem, we propose a technique to detect
bokeh regions by looking for bright areas and measure each
Z
1
EA = λU (x, A(x)) + (A(x) − Q(x))2 dx, (14) pixel’s expansion through the focal stack. The measure of
Ω 2θ
expansion is then used to regularize the optical flow in re- 7.3. Bokeh-aware All-in-Focus Image Stitching
gions where bokeh is present.
The previous flow interpolation step allows us to pre-
7.1. Bokeh Detector serve the original shapes of the Bokehs in the focal stack.
However, since bokehs have high-contrast contours in every
Bokeh occurs in bright regions of the image which ex- frame, the sharpness measure used to stitch the all-in-focus
pand through the focal stack. As a discrete approximation image tends to select bokeh contours and therefore produces
of how much each pixel in each frame expands, we propose bleeding artifacts (Figure 3) around bokeh regions. To fix
a voting scheme where every pixel votes for the pixels that this, we incorporate the bokeh detector into a modified data
it flows to in every other frame. Pixels with most votes cor- term E 0 of the previous MRF formulation as follows:
respond to sources of expansion in the stack. Specifically,
we first compute a low-regularized optical flow and use the (
αEi (xi ) + (1 − α)Ci (xi ) if Ki (s, t) > 0
concatenation technique of Section 4 to compute all-pair Fij 0
Ei (xi = j) =
for all i, j ∈ [n]. Ei (xi ) otherwise
Let pi (u, v) be the pixel at (u, v) at frame i. pi (u, v) (20)
will vote for pixels pj (u + Fij (u, v)x , v + Fij (u, v)y ) for all where Ei (xi ) is the original data term, Ci (xi ) = Ii (s, t)
j 6= i. To avoid discretization artifacts, each vote is spread is a color intensity term so that larger bokehs are greater
to pixels in a small neighborhood with weights according penalized, and α is a balancing constant.
to a Gaussian falloff. Let u0 = u + Fji (u, v)x and v 0 =
v + Fji (u, v)y , the total vote for pixel pj (s, t) is computed 8. Experiments
as We now describe implementation details, runtime, re-
1 XX
(u0 − s)2 + (v 0 − t)2
sults, evaluations, and applications.
Vj (s, t) = exp − Implementation details Flows in Section 4 are com-
n−1 u,v
2σ 2
i6=j puted using Ce Liu’s [16] implementation of optical flow
(16) (based on Brox et al.[7] and Bruhn et al. [8]) with pa-
For a given frame j in the focal stack, we then threshold rameters (α, ratio, minWidth, outer-,inner-,SOR-iterations)
Vj > τv and the color intensity Ij (s, t) > τI to detect the = (0.03, 0.85, 20, 4, 1, 40). Flows in Section 7.1 are com-
sources of bokeh expansion. To detect bokeh pixels in all puted using the same implementation with α = 0.01. In
the frames in the stack the maximum votes are propagated Section 5, each color channel is scaled to 0-1 range, the
back to the corresponding pixels in every other frame and a weight λ = 0.04, and µ = 13. In Section 6.1, we blur
bokeh confidence map for each frame is generated as the all-in-focus using a radius starting from 0 up to 6.5 pix-
Ki (s, t) = max(WFji (Vj ))(s, t) (17) els by a 0.25 increment. The exponential constant α = 2,
j6=i and µ = 15. For Levenberg-Marquardt, the nearest and
farthest depths are set to 10 and 32. The focal depth and
7.2. Bokeh-aware Focal Stack Alignment aperture are set to 2 and 3. In the refinement step 6.2, we
Since optical flow estimates are inaccurate at bokeh quantize the inverse depth into 32 bins lying between the
boundaries, special care is needed to infer flow in these re- minimum and maximum estimated depths from the cali-
gions (e.g., by locally increasing flow regularization). In our bration step. The balancing term λ = 0.001. The ten-
implementation, we perform “flow inpainting” as a post- sor constants are α = 20, β = 1, and the Huber constant
processing step, which makes correcting bokeh independent = 0.005. The decoupling constant starts at θ0 = 2 and
of the underlying optical flow algorithms used. For each θn+1 = θn (1 − 0.01n), n ← n + 1 until θ ≤ 0.005. The
Fin , we mask out areas with high Ki , denoted by Ω with Newton step is computed from the first-order and second-
boundary ∂Ω, and interpolate the missing flow field values order central finite difference of the data term plus the
by solving Fourier’s heat equation: quadratic decoupling term. In Section 7.1 equation 16, the
ZZ standard deviation σ = 3, and thresholds τv = 5, τI = 0.5.
min
0 0
k∇Fx0 k2 + k∇Fy0 k2 du dv (18) In Section 7.3, the balancing term α = 0.7.
Fx ,Fy Ω Runtime We evaluate runtime on the “balls” dataset
(Figure 7) with 25 frames at 640x360 pixels on a sin-
s.t. Fx0 |∂Ω = Fx |∂Ω , Fy0 |∂Ω = Fy |∂Ω (19)
gle CPU core of Intel i7-4770@3.40 Ghz. The complete
This can be turned into a linear least squares problem on pipeline takes about 20 minutes which includes comput-
discrete pixels by computing gradients of pixels using finite ing optical flows (8 minutes), detecting bokehs (48 sec-
differences. The boundary condition can also be encoded onds), focal stack alignment (45 seconds), bokeh-aware all-
as a least squares term in the optimization, which can be in-focus stitching (14 seconds), focal stack calibration (8
solved efficiently. minutes), and depth map refinement (3 minutes). The ma-
Figure 5. Multiple datasets captured with a hand-held Samsung Galaxy phone. From left to right (number of frames in parenthesis):
plants(23), bottles(31), fruits(30), metals(33), window(27), telephone(33). Top row shows the all-in-focus stitch. Bottom row shows the
reconstructed depth maps.
jority (75%) of the runtime is spent on computing optical to align the focal stack frames and produce alignment ar-
flow and rendering the focal stack which are part of the cal- tifacts in the all-in-focus photos as shown in the zoom-in
ibration and refinement steps. We believe these costs can boxes whereas our method produces much fewer artifacts.
be brought down substantially with an optimized, real-time The bottom row shows all-in-focus results from a sequence
optical flow implementation, e.g., [32] which reduces the captured with a Samsung Galaxy S3 phone, hand-held cam-
optical flow runtime to 36 seconds. era motion and almost no lateral parallax. The motion in
Experiments We present depth map results and all-in- this sequence is dominated by the magnification change,
focus images in Figure 5 for the following focal stack which is a global transformation, and all three techniques
datasets (number of frames in parenthesis): plants(23), bot- can align the frames equally well. However, our method
tles(31), fruits(30), metals(33), window(27), telephone(33). is able to produce an all-in-focus that does not suffer from
For each dataset, we continuously captured images of size Bokeh bleeding, for example on the ceiling lights while
640x360 pixels using a Samsung Galaxy S3 phone during Photoshop and HeliconFocus show the bleeding problem on
auto-focusing. The results validate the proposed algorithm the ceiling as well as near the concrete column.
and the range of scenes in which it works, e.g., on specu- Evaluations We evaluated our algorithm on 14-frame
lar surfaces in fruits and metals. The window dataset shows focal stack sequences with no, small, and large camera mo-
a particularly interesting result. The depthmap succesfully tions. We captured these sequences using a Nikon D80 with
captures the correct depth for rain drops on the window. a 18-135mm lens at 22mm (our mobile phone does not al-
We demonstrate our aligned focal stack results in the low manual control on focus to generate such ground-truth).
supplementary video and through a comparison of all-in- The scene consists of 3 objects placed on a table: a box of
focus photos generated by our method, Adobe Photoshop playing cards at 12 inches from the camera’s center of pro-
CS5, and HeliconFocus in Figure 3. We use the same se- jection, a bicycling book at 18.5 inches, and a cook book
quences as input to these programs. The kitchen sequence at 28 inches. The background is at 51 inches. The first se-
(Figure 3 top) was captured with a Nikon D80 at focal quence contains no camera motion, and only the magnifica-
length 22mm by taking multiple shots while sweeping the tion change caused by lens breathing during focusing. The
focal depth. The motion of the camera contains large trans- closest focus is at the box of playing cards and the furthest
lation and some rotation. Photoshop and HeliconFocus fail is at the background. The second sequence has a small 0.25-
Figure 7. Given a recovered all-in-focus (top left) and a depth map
(top right), we can synthetically render any frame in the focal stack
and compare to qualitatively verify the calibration process. The
middle right image is a synthetic rendering of scene by bluring the
all-in-focus according to the estimated depth and camera param-
eters to match the real image in the middle left. The bottom left
and right images show a refocusing application which simulates a
larger aperture to decrease the depth-of-field and changes of focus.
Figure 6. Our evaluation focal stack is shown on top. Next rows
Table 2. Results from affine alignment.
show depth maps from our alignment algorithm vs affine align-
ment algorithm in three different scenarios: a static scene, a
Motion: none small large
scene with a small (0.25-inches) and large (1-inch) camera motion. Bike book (18.5) 16.92 17.63 Failed
Depth map estimation using affine alignment produces higher er- Cook book (28) 24.43 50.96 Failed
rors and more artifacts around areas with strong image gradients. RMS Error (inches) 2.76 16.25 Failed
The calibration completely fails in the large motion case.
the defocus variation. Results show that affine alignment
Table 1. Results from our method.
cannot handle even small amounts of scene parallax. It pro-
Motion: none small large
duces many depth artifacts in the small-motion sequence
Bike book (18.5) 16.94 16.57 16.71 and completely fails to estimate reasonable focal depths in
Cook book (28) 24.58 22.82 22.85 the large-motion sequence. Errors are shown in table 2. For
RMS Error (inches) 2.66 3.91 3.86 evaluations of the calibration process, please see our sup-
plementary materials.
inch translational motion generated by moving the camera Application The reconstructed depthmap enables inter-
on a tripod from left to right between each sweep shot. The esting rerendering capabilities such as increasing the aper-
third sequence has a large 1-inch translational motion gen- ture size to amplify the depth-of-field effect as shown in
erated similarly. Since our depth reconstruction is up to an Figure 7, or extend the focus beyond the recorded set, and
affine ambiguity in inverse depth, we cannot quantify an ab- synthesize a small-baseline perspective shift.
solute metric error. Instead, we solve for 2-DoF affine pa-
rameters α and β in 1ŝ = αs +β such that they fit the depth of
9. Conclusion
the box of playing cards sbox and the background sbg to the
ground-truth depth values ŝbox and ŝbg . The depths of the We introduced the first depth from focus (DfF) method
two books averaged over each surface are reported in table capable of handling images from mobile phones and other
1 and the corresponding depth maps are shown in Figure 6. hand-held cameras. We formulated a novel uncalibrated
We also compare depth maps from our algorithm to a DfD problem and proposed a new focal stack aligning algo-
representative of previous work by replacing optical flow rithm to account for scene parallax. Our approach has been
alignment with affine alignment (Figure 6). We apply the demonstrated on a range of challenging cases and produces
same concatenation scheme to affine alignment to handle high quality results.
References [16] Ce Liu. Beyond pixels: exploring new representations and
applications for motion analysis. PhD thesis, Citeseer, 2009.
[1] Sameer Agarwal, Keir Mierle, and Others. Ceres solver.
https://code.google.com/p/ceres-solver/. [17] Yifei Lou, Paolo Favaro, Andrea L Bertozzi, and Stefano
Soatto. Autocalibration and uncalibrated reconstruction of
[2] Simon Baker and Iain Matthews. Equivalence and effi- shape from defocus. In CVPR, 2007.
ciency of image alignment algorithms. In Computer Vision
[18] Aamir Saeed Malik and Tae-Sun Choi. A novel algorithm
and Pattern Recognition, 2001. CVPR 2001. Proceedings of
for estimation of depth map using image focus for 3d shape
the 2001 IEEE Computer Society Conference on, volume 1,
recovery in the presence of noise. Pattern Recognition,
pages I–1090. IEEE, 2001.
41(7):2200–2225, 2008.
[3] Rami Ben-Ari. A unified approach for registration and depth [19] Shree K Nayar and Yasuo Nakagawa. Shape from focus. Pat-
in depth from defocus. 2014. tern analysis and machine intelligence, IEEE Transactions
[4] V Michael Bove Jr et al. Entropy-based depth from focus. on, 16(8):824–831, 1994.
JOSA A, 10(4):561–566, 1993. [20] Richard A Newcombe, Steven J Lovegrove, and Andrew J
[5] Yuri Boykov and Vladimir Kolmogorov. An experimental Davison. Dtam: Dense tracking and mapping in real-time.
comparison of min-cut/max-flow algorithms for energy min- In Computer Vision (ICCV), 2011 IEEE International Con-
imization in vision. Pattern Analysis and Machine Intelli- ference on, pages 2320–2327. IEEE, 2011.
gence, IEEE Transactions on, 26(9):1124–1137, 2004. [21] Alex Paul Pentland. A new sense for depth of field. Pattern
[6] Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast ap- Analysis and Machine Intelligence, IEEE Transactions on,
proximate energy minimization via graph cuts. Pattern (4):523–531, 1987.
Analysis and Machine Intelligence, IEEE Transactions on, [22] Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear
23(11):1222–1239, 2001. total variasion based noise removal algorithms. Physica D:
[7] Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Nonlinear Phenomena, 60(1):259–268, 1992.
Weickert. High accuracy optical flow estimation based on a [23] Rajiv Ranjan Sahay and Ambasamudram N Rajagopalan.
theory for warping. In Computer Vision-ECCV 2004, pages Dealing with parallax in shape-from-focus. Image Process-
25–36. Springer, 2004. ing, IEEE Transactions on, 20(2):558–569, 2011.
[8] Andrés Bruhn, Joachim Weickert, and Christoph Schnörr. [24] Nitesh Shroff, Ashok Veeraraghavan, Yuichi Taguchi, On-
Lucas/kanade meets horn/schunck: Combining local and cel Tuzel, Amit Agrawal, and R Chellappa. Variable focus
global optic flow methods. International Journal of Com- video: Reconstructing depth and video for dynamic scenes.
puter Vision, 61(3):211–231, 2005. In Computational Photography (ICCP), 2012 IEEE Interna-
tional Conference on, pages 1–9. IEEE, 2012.
[9] Antonin Chambolle. An algorithm for total variation mini-
mization and applications. Journal of Mathematical imaging [25] Frank Steinbrucker, Thomas Pock, and Daniel Cremers.
and vision, 20(1-2):89–97, 2004. Large displacement optical flow computation withoutwarp-
ing. In Computer Vision, 2009 IEEE 12th International Con-
[10] Antonin Chambolle and Thomas Pock. A first-order primal- ference on, pages 1609–1614. IEEE, 2009.
dual algorithm for convex problems with applications to
[26] Gopal Surya and Murali Subbarao. Depth from defocus by
imaging. Journal of Mathematical Imaging and Vision,
changing camera aperture: A spatial domain approach. In
40(1):120–145, 2011.
Computer Vision and Pattern Recognition, 1993. Proceed-
[11] Trevor Darrell and Kwangyoen Wohn. Pyramid based depth ings CVPR’93., 1993 IEEE Computer Society Conference
from focus. In Computer Vision and Pattern Recognition, on, pages 61–67. IEEE, 1993.
1988. Proceedings CVPR’88., Computer Society Conference
[27] Jay Martin Tenenbaum. Accommodation in computer vision.
on, pages 504–509. IEEE, 1988.
Technical report, DTIC Document, 1970.
[12] John Ens and Peter Lawrence. A matrix based method for de- [28] Masahiro Watanabe and Shree K Nayar. Telecentric op-
termining depth from focus. In Computer Vision and Pattern tics for computational vision. In Computer VisionECCV’96,
Recognition, 1991. Proceedings CVPR’91., IEEE Computer pages 439–451. Springer, 1996.
Society Conference on, pages 600–606. IEEE, 1991.
[29] Masahiro Watanabe and Shree K Nayar. Telecentric optics
[13] Ray A Jarvis. A perspective on range finding techniques for for focus analysis. Pattern Analysis and Machine Intelli-
computer vision. Pattern Analysis and Machine Intelligence, gence, IEEE Transactions on, 19(12):1360–1365, 1997.
IEEE Transactions on, (2):122–139, 1983.
[30] Manuel Werlberger, Werner Trobin, Thomas Pock, Andreas
[14] Vladimir Kolmogorov and Ramin Zabin. What energy func- Wedel, Daniel Cremers, and Horst Bischof. Anisotropic
tions can be minimized via graph cuts? Pattern Analysis huber-l1 optical flow. In BMVC, volume 1, page 3, 2009.
and Machine Intelligence, IEEE Transactions on, 26(2):147–
[31] Yalin Xiong and Steven A Shafer. Moment and hypergeo-
159, 2004.
metric filters for high precision computation of focus, stereo
[15] Eric Krotkov. Focusing. International Journal of Computer and optical flow. International Journal of Computer Vision,
Vision, 1(3):223–237, 1988. 22(1):25–59, 1997.
[32] Christopher Zach, Thomas Pock, and Horst Bischof. A dual-
ity based approach for realtime tv-l 1 optical flow. In Pat-
tern Recognition, pages 214–223. Springer Berlin Heidel-
berg, 2007.
[33] Quanbing Zhang and Yanyan Gong. A novel technique
of image-based camera calibration in depth-from-defocus.
In Intelligent Networks and Intelligent Systems, 2008. ICI-
NIS’08. First International Conference on, pages 483–486.
IEEE, 2008.
[34] Changyin Zhou, Daniel Miau, and Shree K Nayar. Focal
sweep camera for space-time refocusing. 2012.