You are on page 1of 9

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO.

4, JULY 1987 523

A New Sense for Depth of Field


ALEX PAUL PENTLAND

Abstract-This paper examines a novel source of depth information: This paper reports the finding that the gradient of focus
focal gradients resulting from the limited depth of field inherent in inherent in biological and most other optical systems is a
most optical systems. Previously, autofocus schemes have used depth
of field to measured depth by searching for the lens setting that gives useful source of depth information, prove that these focal
the best focus, repeating this search separately for each image point. gradients may be used to recover a depth map (i.e., dis-
This search is unnecessary, for there is a smooth gradient of focus as a tances between viewer and points in the scene) by means
function of depth. By measuring the amount of defocus, therefore, we of a few, simple transformations of the image, and that
can estimate depth simultaneously at all points, using only one or two with additional computation the reliability of this depth
images. It is proved that this source of information can be used to make
reliable depth maps of useful accuracy with relatively minimal com- information may be internally checked.
putation. Experiments with realistic imagery show that measurement This source of depth information differs markedly from
of these optical gradients can provide depth information roughly com- that used in automatic focusing methods. Autofocus
parable to stereo disparity or motion parallax, while avoiding image- methods all measure depth by searching for the lens set-
to-image matching problems. ting that gives the best focus at a particular point [1]. The
Index Terms-Focus, human vision, image understanding, range limitations of the basic method, therefore, are that it
sensing, shape-from-focus. measures depth at only one point at a time, and it requires
modifying the lens setting in order to search for the setting
I. INTRODUCTION that yields the best focus. Autofocus methods can, of
OUR subjective impression is that we view our sur- course, be improved by either storing the images acquired
roundings in sharp, clear focus. This impression is at each lens setting, and then searching in the stored im-
reinforced by the virtually universal photographic ages for the best focal state at each point, or by employing
tradition1 to make images that are everywhere in focus, a large number of specialized focus-measuring devices
i.e., that have infinite depth of field. Unfortunately, this that conduct a parallel search for the best lens setting.
photographic tradition and our feeling of a sharply fo- The first alternative, however, involves acquiring and
cused world seems to have led vision researchers-in storing, e.g., 30 or more images, while the second re-
both human and machine vision-to largely ignore the fact quires sophisticated parallel hardware.
that in biological systems the images that fall on the retina In contrast, the method described here requires no
are typically quite badly focused everywhere except within search, so that only one or two images are required: rather
the central fovea [2], [3]. There is a gradient of focus, than search for the best focus, we simply measure the er-
ranging from nearly perfect focus at the point of regard to ror in focus (the "focal gradient"), and use that to esti-
almost complete blur at points on distant objects. mate depth. The difference between autofocus techniques
It is puzzling that biological visual systems first employ and this method, therefore, is analogous to the difference
an optical system that produces a degraded image, and between convergence and stereopsis: both autofocus and
then go to great lengths to undo this blurring and present depth-from-convergence change the camera parameters to
us with a subjective impression of sharp focus. This is measure depth at a single point, whereas both this method
especially peculiar because it is just as easy to start out and stereopsis utilize the error signal (blur and disparity,
with everything in perfect focus.2 Why, then, does Nature respectively) to estimate depth.
prefer to employ a lens system in which most of the image Surprisingly, the idea of using focal gradients to infer
is blurred? depth appears to have never before been investigated:3 we
have been unable to discover any investigation of it in
Manuscript received July 30, 1985; revised January 15, 1987. Recom- either the human vision literature or in the somewhat more
mended for acceptance by W. B. Thompson. This work was supported by
the National Science Foundation under Grant DCR-83-12766, the Defense scattered machine vision literature.
Advanced Research Projects Agency under Contract MDA 903-83-C-0027, This paper proves that this novel depth cue can be used
and by a grant from the Systems Development Foundation. to make overconstrained (i.e., reliable) estimates of depth
The author is with the Artificial Intelligence Center, SRI International,
Menlo Park, CA 94025 and the Center for the Study of Language and In- with an accuracy that is roughly comparable to that avail-
formation, Stanford University, Stanford, CA 94305. able by use of stereo or motion cues (assuming human
IEEE Log Number 8613854. parameters), and demonstrates the performance of two
'A practice established in large part by Ansel Adams and others in the
famous " f /64 Club . ' simple, potentially real-time algorithms. Finally, experi-
2With a small aperture, everything would be in perfect focus. Nor would
this substantially affect the sensitivity of the eye. Neural mechanisms stan-
dardly account for 10 000 to 1 brightness differences; the iris provides only 3Several authors have, however, mentioned the theoretical possibility of
a 5 to 1 range of brightness control. such information.

0162-8828/87/0700-0523$01.00 © 1987 IEEE


524 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 4, JtULY 1987

ments showing that people make significant use of this erture settings. Both approaches require only one view of
depth information are presented. the scene.
II. THE FOCAL GRADIENT A. Using Sharp Discontinuties
Most biological lens systems are exactly focused4 at Image data are determined both by scene characteristics
only one distance along each radius from the lens into the and the properties of the lens system, e.g., how fast image
scene. The locus of exactly focused points forms a doubly intensity changes depends upon both how scene radiance
curved, approximately spherical surface in three-dimen- changes and the diameter of the blur circle. If we are to
sional space. Only when objects in the scene intersect this measure blur circle, therefore, we must already know the
surface is their image exactly in focus; objects distant from scenes' contribution to the image. At edges-sharp dis-
this surface of exact focus are blurred, an effect familiar continuities in the image formation process-the rate of
to photographers as depth of field. change we observe in the image is due primarily to the
The amount of defocus or blurring depends solely on point spread function; because it is possible to recognize
the distance to the surface of exact focus and the charac- sharp discontinuities with some degree of confidence [4],
teristics of the lens system; as the distance between the [5] the image data surrounding them can be used to de-
imaged point and the surface of exact focus increases, the termine the focus. These observations lead to the follow-
imaged objects become progressively more defocused. If ing scheme for recovering the viewer-to-scene5 distance
we could measure the amount of blurring at a given point at points of discontinuity.
in the image, therefore, it seems possible that we could 1) Mathematical Details: To calculate the spatial con-
use our knowledge of the parameters of the lens system stant of the point spread function requires a measure of
to compute the distance to the corresponding point in the the rate at which image intensity is changing; the wide-
scene. spread use of zero-crossings of the Laplacian to find edges
The distance D to an imaged point is related to the pa- [6] suggests using slope of the Laplacian across the zero-
rameters of the lens system and the amount of defocus by crossing as a measure of rate of change.
the following equation, which is developed in the Appen- Consider a vertical step edge in the image of magnitude
dix. 6 at position (x0, yo), as defined by
k + 6 if x-xo;
D- Fvo (1) k if x< x0.
vo- F - of
where v0 is the distance between the lens and the image These intensity values are then convolved with the point
plane (e.g., the film location in a camera), f the f-number spread function at that point in the image, which we model
of the lens system, F the focal length of the lens system, as a Gaussian with spatial constant a.
and a the spatial constant of the point spread function (i.e., In the sharp-discontinuity case, then, we define the val-
the radius of the imaged point's "blur circle") which de- ues C(x, y) to be the Laplacian of the convolution of raw
scribes how an image point is blurred by the imaging op- image intensities I(x, y) with a Gaussian point-spread
tics. The point spread function may be usefully approxi- function with spatial constant a (as in [6]). This yields the
mated by a two-dimensional Gaussian G(r, a) with a equations
spatial constant a and radial distance r. The validity of C(x, y) = V2(G(r, a) (9 I(x, y))
using a Gaussian to describe the point spread function is
discussed in the Appendix. = iijV2G( (x-u)2 + (y _ 2
O)
In most situations, the only unknown on the right-hand
side of (1) is a, the point spread function's spatial param- I(ut, v) du dv
eter. Thus, we can use (1) to solve for absolute distance
given only that we can measure a, i.e., the amount of blur = 6(dG(x - xo, a)/dx) (2)
at a particular image point. where G(x - x0, or) is a one-dimensional Gaussian cen-
Measurement of a presents a problem, however, for tered at point x0. For such an edge the slope of the func-
the image data are the result of both the characteristics of tion C(x, y) at the point of the zero-crossing is equal to
the scene and those of the lens system. To disentangle the maximum rate of change in image intensity, and so
these factors, we can either look for places in the image we can use it to estimate a.
with known characteristics (e.g., sharp edges), or we can An estimate of a can be formed as follows:
observe what happens when we change some aspect of the
lens system. In the following discussion both of these two
dx a)
(x,
C(x, y) = 6 - 6x x (3)
general strategies for measurement of a are described: the --,2 ra3ex 2a2)
use of sharp edges, and comparison across different ap- where x, y, and 6 are as before, and for convenience x0 is
4" Exact focus" is taken here to mean "has the minimum variance point
taken to be zero. Taking the absolute value and then the
spread function," the phrase "measurement of focus" is taken to mean 5When the discontinuity is in depth, as at an occluding contour, the
"characterize the point spread function." distance measured is to the nearer side of the discontinuity.
PENTLAND: NEW SENSE FOR DEPTH OF FIELD 525

natural log, we find 13UE.Sl"AYD 341UIOA

In
2; ro3
a~
af2
-_
In =

I| (4)
We can formulate (4) as
AX2 + B = C
where (a)

A = -2I
a-2~ B = In
B=ln
a
r
C =In C(x, Y)
1x (5)
Interpreting (5) as a linear regression in x2, one can
then obtain a maximum-likelihood estimate of the con-
stants A and B, and thus obtain a. The solution of this
linear regression is
, (x? _X2) Ci
A= = B = -C2A (6)
Z (X -
x2)2
where x is the mean of the xi, and C is the mean of the
Ci. From A in (6), one can obtain the following estimate
of the value of the spatial constant a:
a = (-2A)1/ . (b) (c)
Fig. 1. Images identical except for depth of field. (a) Production: the light
Having estimated a, (1) can now be used to find the dis- from a single view is split into two identical images and directed through
tance to the imaged point; note that there are two solu- two lens systems with different aperture size. Alternatively, one can vary
tions, one corresponding to a point in front of the locus the aperture between alternate frames from a standard video or CCD
camera. In either case the two resulting images are identical except for
of exact focus, the other corresponding to a point behind depth of field, as shown in (b) and (c). These images are of a mirrored
it. This ambiguity is generally unimportant because we bottle on a checkered plain, redigitized from [14].
can arrange things so that the surface of exact focus is
nearer to the sensor than any of the objects in the field of one could rig a video or CCD camera so that alternate
view. frames employ a different aperture; as long as no signifi-
cant motion occurs between frames the result will again
B. Comparison Across Differing Apertures be two images identical except for depth of field.
The limiting factor in the previous method is the re- Because differing aperture size causes differing focal
quirement that the scene characteristics must be known gradients, the same point will be focused differently in the
before focus can be measured; this restricts the applica- two images; for our purposes the critical fact is that the
bility of the method to special points such as step discon- magnitude of this difference is a simple function of the
tinuities. If, however, we had two images of exactly the distance between the viewer and the imaged point. To ob-
same scene, but with different depth of field, we could tain an estimate of depth, therefore, one need only com-
factor out the contribution of the scene to the two images pare corresponding points in the two images and measure
(as the contribution is the same), and measure the focus this change in focus. Because the two images are identical
directly. except for aperture size they may be compared directly;
Fig. 1 shows one method of taking a single view of the i.e., there is no matching problem as there is with stereo
scene and producing two images that are identical except or motion algorithms. Thus it is possible to recover the
for aperture size and therefore depth of field. This lens absolute distance D by simple point-by-point comparison
system uses a half-silvered mirror (or comparable contriv- of the two images, as described below.
ance) to split the original image into two identical images, 1) Mathematical Details: We start by taking a patch
which are then directed through lens systems with differ- fi(r, 0) centered at (xo, yo) within the first image Il(x,
ent aperture size. Because change in aperture does not af- Y):
fect the position of image features, the result is two im- f1(r, 0) = I(xo + r cos 0, yo + r sin 0)
ages that are identical except6 for their focal gradient
(amount of depth of field), and so there is no difficulty in and calculate its two-dimensional Fourier transform TI( t,
matching points in one image to points in the other. Fig. 8). The same is done for a patch f2(r, 0) at the corre-
l(b) and (c) shows a pair of such images. Alternatively, sponding point in the second image, yielding T2(t, 0).
Again, note that there is no matching problem, as the im-
'Their overall brightness might also differ. ages are identical except for depth of field.
526 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO. 4, JULY 1987

Now consider the relation of fi to f2. Both cover the i.e., as a linear regression equation in X2.
same region in the image, so that if there were no blurring If a1 = 0 (a pinhole camera) then we have A = 2ir2U2
both would be equal to the same intensity function fo(r, and we may use (1) to solve directly for depth, e.g.,
0). However, because there is blurring (with spatial con-
stants a, and a2 ), the result is
Fvo (10)
v- F - af
fi(r, 0) Mt(r, 0) x G(r, U,)
(7)
f2(r, 0) fo(r, 0) X G(r, u22) wheref is the f-number of the imaging system. This tech-
(One point of caution is that (7) may be substantially in nique has been implemented and is evaluated in the fol-
error in cases with a large amount of defocus, as points lowing section. More generally, if we use three or more
neighboring the patches fl, f2 will be "spread out" into views rather than only two, we have
the patches by differing amounts. This problem can be A9 =27r2(a2-_J2) (11)
avoided by using patches whose edges trail off smoothly,
e.g.,f1(r, 0) = I(xo + r cos 0, yo + r sin 0) G(r, c) for and we may solve this system of equations for each of the
appropriate spatial parameter co.) oi, and thus obtain depth by use of (10).
First we note that 2) Checking the Answer: Overconstraint: When we
f(r, 6) = i3(X, 0) = e
e- rr2 IrX2 have three or more views we obtain an estimate of D for
each ai. Thus our solution for depth is overconstrained;
are a Fourier pair and that if f(r, 0) and i3 ( X, 0) are a
all of the estimates of D from each of the ai must produce
Fourier pair then so are the same estimate of distance-otherwise the estimates
(a1r( must be in error. This can occur, for instance, when there
f( fxgr, 1-61(
6)
is insufficient high-frequency information in the image
patch to enable the change in focus to be calculated. The
Thus (7) may be used to derive the following relationship important point is that this overconstraint allows us to
between 5 1 and 52 (the Fourier transforms of image check our answer: if the equations disagree, then we know
patches fi and f2) and 530 (the transform of the (hypothet- not to trust our answer. If, on the other hand, both equa-
ical) unblurred image patch fo): tions agree then we can know (to within measurement er-
ror) that our answer must be correct.
5:(X 0) = 530( X, 0) G( 2Xa1)
C. Accuracy
1 Possibly the major question concerning the usefulness
52(X, 0) = 5:0( X, 0) G X, --,12-7r Or2
of focal gradient information is whether such information
can be sufficiently accurate. There are two major issues
Thus,7 to be addressed: first, can we estimate the variance a of
the point spread function with sufficient accuracy, and
31( x) G(X, 1/s2ij1) second, does this translate into a reasonable degree of ac-
392( X) G(X, 1/ 2wa2) curacy in the estimation of depth.
Recent research aimed at estimating the point spread
exp (X22-r2( U2 - 2)) (9)
function has shown that it may be accurately recovered
from unfamiliar images despite the presence of normal
where image noise [6], [7]. Further, it appears that humans can
estimate the width of the point spread function to within
i3Z(A) =
5i(X, 0) dO.
a few percent [9], [10]. These findings, together with the
results of estimating a reported in the next section, show
Thus, given 9I and l2 we can find a1 and a2, as follows. that accurate estimation of a is practical given sufficient
Taking the natural log of (9) we obtain image resolution.
The second issue is whether the available accuracy at
X2221r2(a2 -_ r2) = ln 9V1(X) - ln 72(X) estimating a translates into a reasonable accuracy in es-
timating depth. Fig. 2(a) shows the theoretical error curve
We may formulate this as A X2 = B where for the human eye, assuming the accuracy at estimating a
measured in [5] (approximately 2 percent error). It can be
A = 2wr 2( a2 _ F2 ) seen that reasonable accuracy is available out to several
meters. This curve should be compared to the accuracy
B = ln 91(X) - ln 932(X) curve for stereopsis, shown in Fig. 2(b), again assuming
human parameters as measured for complex tasks, e.g.,
7Note that we need only consider the amplitude of the transforms in that disparities of one millimeter can be reliably detected
these calculations. at a distance of one meter. It can be seen that the accur-
PENTLAND: NEW SENSE FOR DEPTH OF FIELD 527

8.4! e Vs dstnce, uw2.0,v0.026,f4.93027165

HERR OBJECT IM FOCUS


FOCUS IMFORMRTIOM
METERS
3

SUBJECTIVE
IMCREASE IM
SCENE DEPTH 2
0.0

8.0 2.9 4.0 6.0 8.8 10.0 METERS


(a) 1
FRR OBJECT IN FOCUS

8.4 *ror Vs distance A


i

STEREOPSIS
PIMHOLE CRMERR COMPRRED TO:
ERROR IM f/22 f/l1 f/5.6
METERS
Fig. 3. Subjective impression of depth versus the magnitude of the focal
gradient for two conditions, near fixation and far fixation.

8.0.0. 2.8 4.8 6.8 8.0 18.0 METERS In the


ward rotating experiment,
secondwireframe subjectscubeweredisplayed
(Nekker) shown ainright-
per-
(b)
Fig. 2. Accuracy at estimating distance, assuming human visual system
spective on a CRT. Such a display may be perceived as
either as a rigid object rotating to the right, or (surpris-
paran neters, using (a) focal gradient information, and (b) stereopsis. ingly) as wobbling, nonrigid object
rotating to the left.
Normally subjects see the rigid interpretation most of the
acies are comparable over a substantial range of dis- time, but when we introduced a focal gradient that fa-
tances vored the nonrigid interpretation, the nonrigid interpre-
tation was seen almost as often as the rigid one!
D. Htiman Perception An experiment demonstrating the importance of depth
of field in human perception can be easily performed by
We have recently reported evidence demonstrating that the reader. First make a pinhole camera by poking a small,
people make use of the depth information contained in clean hole through a piece of stiff paper or metal. Impo-
focal gradients [10]; interestingly, the importance of this sition of a pinhole in the line of sight causes the depth of
opticalI gradient does not appear to have been reported field to be very large, thus effectively removing this depth
previo iusly. The hypothesis that the human visual system cue from the image. Close one eye and view the world
makes significant use of this cue to depth has been inves- through the pinhole, holding it as close as possible to the
tigated I in two experiments. surface of your eye, and note your impression of depth
In tihe first experiment, pictures of naturalistic scenes (for those of you with glasses, things will look sharper if
with v arious magnitudes of focal gradient infornation8 you are doing it correctly). Now quickly remove the pin-
were r andomly presented to subject, who were then asked hole and view the world normally (still using only one
to rep( ort their subjective impression of the three-dimen- eye). The change in the sense of depth is remarkable;
sionaliity of these scenes. Two experimental conditions many observers report that the change is nearly compa-
were i nvestigated: one, where subjects were to fixate the rable to the difference between monocular and binocuiar
neares t object in the scene (a sphere) while more distant viewing, or the change which occurs when a stationary
objects s were out of focus by varying amounts, and two, object begins to move.
where the subjects were to fixate the farthest object in the The effect of the pinhole is not due to change in the
scene 'while the nearest objects were out of focus by vary- field of view, as can be demonstrated by comparing the
ing am iounts. percept obtained through the pinhole to the percept ob-
Fig. 3 shows the results of this experiment; it can be tained through a viewing tube which occludes a similar
seen tiiat the subjective impression of three-dimensional- portion of the scene. The effect is also not because the
ity inc reases strongly with increasing magnitude of the pinhole makes the eye's accommodative state (focal
focal grradient-as long as the farther objects are more de- length) irrelevant,9 as accommodation is a very poor
focuseid. When the nearer objects are more defocused, no source of depth information in humans [11].
increas se in subjective impression of depth results. These Finally, it is interesting to note that the human visual
results indicate that people interpret increasing defocus as system provides the information needed by the aperture-
increas sing distance, perhaps
,~ ~inbecause
~ they more typically comparison
~pno focal foa gradient
grdin technique
tehiu in in at
at least
leas two
tw ways.
fixate on the nearest ~~ ~ field of~ view.
~~~ ways.
object the
9Further, accommodation affects the entire scene at once, thus removing
8i.e., various amounts of depth of field were used. See [14] for examples the accommodation cue can only change one's impression of the average
of these stimuli. distance, not of the depth relations throughout the scene.
528 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL. PAMI-9, NO. 4, JULY 1987

First, the depth of field for the red-green retinal cells is


different from that for the blue retinal cells, because of
one diopterl° of chromatic aberration in the lens. This
provides two simultaneous views of the scene with dis-
similar depth of field, albeit in different spectral bands.
Second, the focal length of the human eye is constantly
varying in a sinusoidal fashion at a frequency of about 2
Hz [9]. The range of variation depends upon the average
accommodation [9], but can be almost one diopter under
some conditions. Thus, two views of the same scene with
substantially different depth of field are obtained within
two hundred and fifty milliseconds, approximately the du-
ration of a typical fixation.

III. EVALUATION
A. Using Sharp Edges
The first method of deriving depth from the focal gra-
dient, by measuring apparent blur near sharp discontinu-
ities, was implemented in a straightforward manner and
evaluated on the image shown in Fig. 4. In this image the
optical system had a smaller depth of field than is cur-
rently typical in vision research; this was done because
the algorithm requires that the digitization adequately re-
solve the point spread function.
Fig. 4 also shows the depth estimates which were ob-
tained when the algorithm was applied to this image. Part
(a) of this figure shows all the sharp discontinuities iden-
tified [3]. It was found that there was considerable vari-
ability in the depth estimates obtained along these con-
tours, perhaps resulting from the substantial noise (3 of 8 Fig. 4. An indoor image of a sand castle, refrigerator, and door, together
with depth estimates for its zero-crossing segments. (a) All the sharp
bits) which was present in the digitized image values. To discontinuities found. (b), (c), and (d) show the zero-crossing segments
minimize this variability the zero-crossing contours were that have large, medium, and small depth values, respectively. It can be
segmented at points of high curvature, and the depth val- seen that the image is properly segmented with respect to depth, with the
exception of one small segment near the top of (c).
ues were averaged with the zero-crossing segments. Fig.
4(b), (c), and (d) shows the zero-crossing segments that
have large, medium, and small depth values, respec- cian filter, providing an estimate of their local high-fre-
tively. It can be seen that the image is properly segmented quency content. The output of the Laplacian filters are
with respect to depth, with the exception of one small seg- then summed over a small area and normalized by divid-
ment near the top of (c). This example demonstrates that ing them by the mean local image brightness, obtained by
this depth estimation technique-which requires little convolving the original images with a Gaussian filter. It
computation beyond the calculation of zero-crossings- appears that a region of 4 x 4 pixels is sufficient to obtain
can be employed to order sharp edges by their depth val- stable estimates of high-frequency content. Fig. 5(a) and
ues. (b) shows the normalized high-frequency content of Fig.
1(b) and (c).
B. Comparison of Different Apertures Finally, the estimated high-frequency content of the
The second technique, comparing two images identical blurry, large-aperture image is divided by that of the
except for aperture, can be implemented in many different is sharp, small-aperture image, i.e., each point of Fig. 5(a)
ways. We will describe a very simple algorithm that is produces divided by the corresponding point in Fig. 5(b). This
amenable to an inexpensive, real-time implementation. a "focal disparity" map, analogous to a stereo
In this algorithm two images are acquired as shown in disparity map, that measures the change in focus between
Fig. 1(a); they are identical except for their depth of field the two images and whose values are monotonically re-
and thus the amount of focal gradient present, as shown produced from Fig. lated to depth by (1). Fig. 5(c) shows the disparity map
in Fig. l(b) and (c). These images (which were redigi- 2(b) and (c); intensity in this figure
is proportional
tized from [14]) are then convolved with a small Lapla- have insufficient high-frequency to depth. Areas of Fig. 5(c) that are black
energy in the sharp-focus
'0Diopters are the reciprocal of focal length (i.e., D I /F). One diop-
= image to make an estimate of depth.
ter is approximately the strength of one's first pair of glasses. It can be seen that this disparity map is fairly accurate.
PENTLAND: NEW SENSE FOR DEPTH OF FIELD 529

depth estimates that are comparable to edge- or feature-


based stereo and motion algorithms. The mathematics of
the aperture-comparison technique shows it to be poten-
tially more reliable than stereo or motion-there is no cor-
respondence problem, and one can obtain an internal
check on the answer-although (as discussed above) it has
somewhat less accuracy. In practice this accuracy may be
further limited by constraints on resolution.
(a) (b) The sharp-edge algorithm appears to have potential for
useful depth-plane segmentation, although it is probably
not accurate enough to produce a depth map. I believe that
this algorithm will be of some interest because most of
the work-finding and measuring the slope of zero-cross-
ings-is already being done for other purposes. Thus this
type of depth-plane segmentation can be done almost as
a side effect of edge finding or other operations. (Note:
after initial airing of these results [11], [112], P. Grossman
(c) of GEC Research, Wembly, Middx., England, has re-
Fig. 5. (a) and (b) show the normalized high-frequency content of Fig. ported achieving an accuracy ± 1.25 centimeters, using a
2(b) and (c), respectively. (c) shows the focal disparity map (analogous together with a standard
to a stereo disparity map) obtained by comparing (a) and (b); brightness simple version of this technique
is proportional to depth. lens system. See [13].)
The aperture-comparison algorithm provides consider-
Note that points reflected in the bottle are estimated as ably stronger information about the scene because it
being further away than points along the edge of the bot- overconstrains scene depth, allowing an internal check on
tle; this is not a mistake, because for these points the dis- the algorithm's answer. Thus it provides depth informa-
tance traveled by the light is further than for those along tion with a reliability comparable to the best that is the-
the edge of the bottle. This algorithm, in common with oretically available from three-or-more image stereo and
stereo and motion algorithms, does not "know" about motion algorithms, although with less depth resolution.
mirrored surfaces. The major limitation in measuring focal gradient depth
information in this manner appears to be ensuring suffi-
C. A Real-Time Implementation? cient high-frequency information to measure the change
It is worth pointing out that it appears that depth esti- between images; this requires having both adequate image
mates could be obtained throughout a 256 x 256 image resolution and high-frequency scene content.
in less than a half-second, using only standard hardware.
A minimum of one convolution per image is required for A. Summary
this technique, together with a left shift and four subtrac- In summary, we have described a new source of depth
tions for the Laplacian, and three divides (or table look- information-the focal gradient-that can provide depth
ups) for the normalization and comparison. If special con- information roughly comparable to stereo disparity or mo-
volution hardware is available, one can use two tion parallax, while avoiding the image-to-image match-
convolutions-one Laplacian and one Gaussian-per im- ing problems that have plagued stereo and motion algo-
age, leaving only three divides for the normalization and rithms. We have proven that the limited depth of field
comparison. Frame buffers that can convolve image data inherent in most optical systems can be used to make re-
in parallel with image acquisition are now available at a liable depth maps of useful accuracy with relatively min-
reasonable price, leaving as few as three operations per imal computation, and have successfully demonstrated
pixel to calculate the disparity map. this technique on realistic imagery.
IV. DISCUSSION APPENDIX
The most striking aspect of these two algorithms is that For a thin lens,
absolute depth can be recovered from a single view with
no image-to-image matching problem, perhaps the major
source of error in stereo and motion algorithms. Further-
I+ I= F
u Fv (12)
more, no special scene characteristics need be assumed, where u is the distance between a point in the scene and
so that the techniques are generally applicable. The sec- the lens, v the distance between the lens and the plane
ond most striking aspect of the algorithms is their sim- one which the image is in perfect focus, and F the focal
plicity: it appears that a real-time implementation could length of the lens. Thus,
be accomplished relatively cheaply.
Measurement of the focal gradients associated with Fv
U v-F (13)
limited depth of field appears to be capable of producing
530 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL. PAMI-9, NO. 4, JULY 1987

LENS
IMAGE PLANE

0
V ~ - U0
O
vo -4
V U
Fig. 6. Geometry of Imaging. vo is the distance between the image pulse
and the lens, uo is the distance between the lens and the locus of perfect
focus, and r is the radius of the lens. When a point at distance u > uo
is projected through the lens, it focuses at a distance v > vo, so that a
blur circle is formed.

For a particular lens, F is a constant. If we then fix the outside of the blur circle. The point spread function is due
distance v between the lens and the image plane to the primarily to diffraction effects, which for any particular
value v = vo, we have also determined a locus of points wavelength produce wave cancellation and reinforcement
at distance u = uo that will be in perfect focus, i.e., resulting in intensity patterns qualitatively similar to the
sinc function, sin r / r, but with different amplitudes and
U0vO FvoF (14) periods for the "rings" around the central peak.
The point spread function describes the image intensity
We may now explore what happens when a point at a dis- I( It, v) caused by a single coherent point-source light in
tance u > uo is imaged. Fig. 6 shows the situation in terms of the parameters of the lens system. It is described
which a lens of radius r is used to project a point at dis- [3] by
tance u onto an image plane at distance vo behind the lens. ( ku
Given this configuration, the point would be focused at I(tt, (U2(p0, v) + U2(20, v))
where v) = 2F2
distance v behind the lens-but in front of the image plane.
Thus, a blur circle is formed on the image plane. Note where / \ m+2s
oo
that a point at distance u < uo also forms a blur circle;
throughout this paper we assume that the lens system is Um(it, v) = E (-i) (-) J. + 2s ( V)
focused on the nearest point so that u is always greater
than uo. This restriction is not necessary in the second v = k(f) r
2wr
algorithm, as overconstraint on the distance solution al- k=k(F) (V-VO) k =

lows determination of whether D = u > uo or D = u <


UO. where X is the wavelength of the light, r is the distance
From the geometry of Fig. 6 we see that from the center of the point spread function, Jn ( v) is the
Bessel function of the first kind and order n, and vo, v,
tan= = (15) U, and F are as before.
v vO -v The "rings" produced by this function vary in ampli-
tude, width, and position with different states of focus and
Combining (13) and (15) and substituting the distance D with different
for the variable u we obtain wavelengths. As wavelength varies these
rings change position by as much as 90 degrees, so that
Frvo the blue light troughs become positioned over the red light
D peaks, etc. Further, change in wavelength results in sub-
rvO F(r + a)
-

stantial changes in the amplitude of the various rings. Al-


or though this point spread function is quite complex, and
the sum over different wavelengths even more so, our
D Fvo analysis shows that for white light the sum of the various
vo- F af -
functions obtained at different wavelengths has the gen-
where f is the f-number of the lens. eral shape of a two-dimensional Gaussian.
The blurring of the image is better described by the Sampling effects caused by digitization are typically
point spread function than by a blur circle, although the next in importance after the diffraction effects. The effect
blurring is bounded by the blur circle radius in the sense of sampling may be accounted for in the point spread
that the point spread function is less than some threshold function by convolving the above diffraction-produced
PENTLAND: NEW SENSE FOR DEPTH OF FIELD 531

point spread function with functions of the form sin r/r. 161 E. Hildreth, "Implementation of1980. a theory of edge detection," M.I.T.
Al Lab. Tech. Rep. 579, Apr.
Other factors such as chromatic abberation, movement, [7] K. T. Knox and B. J. Thomson, "Recovery of images from atmos-
and diffusion of photographic emulsion may also be ac- pherically degraded short-exposure photographs," Astrophys. J., vol.
counted for in the final point spread function by additional 193, pp. L45-L48, 1974.
[8] J. B. Morton and H. C. Andrews, "A posteriori method of image
convolutions. restoration," J. Opt. Soc. Amer., vol. 69, no. 2, pp. 280-290, 1979.
The net effect, in light of the central limit theorem and [9] A. Pentland, "Uniform extrafoveal sensitivity to pattern differ-
our analysis of the sum of single-wavelength focus pat- ences," J. Opt. Soc. Amer., Nov. 1978.
terns, is almost certainly best described by a two-dimen- [10] -, The Focal Gradient: Optics Ecologically Salient (Supplement to
Investigate Opthomology and Visual Science), Apr. 1985.
sional Gaussian G(r, a) with spatial constant a. The spa- [11]-, "Depth of scene from depth of field," in Proc. Image Under-
tial constant a of the point spread function will be standing Workshop, Palo Alto, CA, Sept. 1982.
proportional to the radius of the blur circle; however, the [12] -, "A new sense for depth of field," presented at the Int. Joint
Conf. Artificial Intell., Los Angeles, CA, Aug. 1985.
constant of proportionality will depend on the particulars [13] P. Grossman, "Depth from focus," in Proc. Alvey Committee Meet-
of the optics, sampling, etc. In this paper the radius of ing Machine Vision, Univ. Sussex, Brighton, England, Sept. 1985.
[14] M. Potmesil and I. Chakravarty, "Synthetic image generation with a
the blur circle and the spatial constant of the point spread lens and aperture camera model," ACM Trans. Graphics, vol. 1, no.
function have been treated as identical; in practical appli- 2, pp. 85-208, Apr. 1982.
cation where recovery of absolute distance is desired the
constant of proportionality k must be determined for the
system and included in (1) as follows:
D Fvo Alex Paul Pentland received the Ph.D. degree
vo- F - akf- from the Massachusetts Institute of Technology,
Cambridge, in 1982.
REFERENCES He then joined SRI International's Artificial
Intelligence Center, Menlo Park, CA. He has
[1] R. A. Jarvis, "A perspective on range-finding techniques for com- taught in both the Departments Computer Science
puter vision," IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI- and Psychology at Stanford University, Stanford,
5, pp. 122-139, Mar. 1983. CA, and is currently Program Manager for the Vi-
[2] H. Crane, "A theoretical analysis of the visual accommodation sys- sual Communication project at Stanford's Center
tem in humans," NASA Ames Research Center, Final Rep. NAS 2- for the Study of Language and Information. He
2760, 1966. has done research in artificial intelligence, ma-
[3] M. Born and E. Wolf, Principles of Optics. London: Pergamon, chine vision, human vision, and graphics. He recently finished a book en-
1965. titled From Pixels to Predicates, published by Ablex Publishers, in Nor-
14] A. Pentland, "The visual inference of shape: Computation from local wood, NJ.
features," Ph.D. dissertation, Massachusetts Inst. Technol., 1982. In 1984 Dr. Pentland won the Best Paper prize from the American As-
[5] A. Witkin, "Intensity-based edge classification," in Proc. Amer. Ass. sociation for Artificial Intelligence for his work using Fractal functions to
Artificial Intell., Pittsburgh, PA, Aug. 1982. model complex natural scenes.

You might also like