L4 3D SceneSensing 2022

3D Scene Sensing
COMP.SGN-320
3D and Virtual Reality
Autumn 2022
Introduction
• Previous lectures were about how HVS
perceives visual scenes in 3D and how
such scenes are formally represented
• This one is about how to acquire the
content to be represented
• Computer graphics content is typically
modelled, not sensed (TBD in the next
lecture)
8.11.2022 2
Introduction
Coding
Real Sen
sin
g
Representation Rendering Display
odeling
M
Synthetic
11/8/22 3
Depth sensing: Passive
• Passive Range Sensors
– Do not require any direct control or contact
over the observation to acquire depth data
– Rely only on sources of indirect natural
occurrence such as ambient light
– Various approaches: Depth from focus, Depth
from Shape, Depth from motion, etc.
Depth Capture: Active
• Use properties of some controlled sensing
signal emitted by the sensing device to estimate
the properties of observed objects
• Distance, shape, pose, direction of motion, etc.
• Active Triangulation, Interferometry, Structured
Light, Time-of-Flight
Capture models
Pinhole, lens distortion, thin lens
8.11.2022 6
Pinhole camera model (again)
• Ideal model based on the concept of the
pinhole camera (camera obscura)
• No lens
• No defocus blur, images are perfectly
sharp
• Equivalent to a camera with a small
aperture (large f-number)
8.11.2022 7
Pinhole model equation (again)
y
f P(x,y,z)
I(u,v)
Optical Optical axis
center z Principal point
u
x
Image plane
• Each point 𝑃(𝑥, 𝑦, 𝑧) in the scene is mapped to a

2D point 𝐼(𝑢, 𝑣) on the image plane
! #
• Using similar triangles, 𝑢, 𝑣 = 𝑓 , 𝑓
" "
11/8/22 8
Pinhole model equation (again)
! #
• 𝑢, 𝑣 = 𝑓! " + 𝑐! , 𝑓# " + 𝑐#
• Accounts for
– Different pixel size in (x,y) directions
– Image plane intersection point (𝑐! , 𝑐" )
• Note the coordinate space
• Pinhole model: origin in the middle of the image (plane)
• Image indexing: origin in one of the corners
– Matlab: top left
– OpenGL textures: bottom left
11/8/22 9
Pinhole model as a matrix
𝑢𝑧 𝑓! 0 𝑐! 𝑥
𝑣𝑧 = 0 𝑓# 𝑐# 𝑦
𝑧 0 0 1 𝑧
𝑢𝑧 1 𝑢
𝑣𝑧 = 𝑣
𝑧 𝑧 1
11/8/22 10
Camera intrinsics
• The internal parameters of the camera are referred to as
camera intrinsics
• The matrix is often referred to as K
• K moves points from global 𝑥, 𝑦, 𝑧 -space to the
camera coordinate system
• Division by z (in some sources, k) projects the
points to (u, v) coordinates on the image plane
11/8/22 11
y
Camera pose in space
x
z P(x,y,z)
I(u,v)
center Principal point
u
O(xc,yc,zc) Image plane
• The virtual camera is located at some coordinates

in the global space, facing some direction
11/8/22 12
y
f P(x,y,z)
I(u,v)
center z Principal point
u
x
Image plane
• Perspective projection equation works only if

camera is in (0,0)
11/8/22 13
y
Op
cen tical
ter
x
z
I(u
,v )
u P ri
nc
ipa
lp
oin
Im
t P(x
ag
ep Op ,y,z
lan
e
tic
al
ax
is
)
• Solution: move everything else in the scene to the opposite
direction and rotation
• Functionally equivalent, it is the relative pose that matters
11/8/22 14
y
x
z P(x,y,z)
I(u,v)
center Principal point
u
O(xc,yc,zc) Image plane
• The virtual camera is located at some coordinates

in the global space, facing some direction
11/8/22 15
y
Op
cen tical
ter
x
z
I(u
,v )
u P ri
nc
ipa
lp
oin
Im
t P(x
ag
ep Op ,y,z
lan
e
tic
al
ax
is
)
• Solution: move everything else in the scene to the opposite
direction and rotation
• Functionally equivalent, it is the relative pose that matters
11/8/22 16
Camera projection matrix
• Combines both camera intrinsics and extrinsics
• Describes completely how a point with position (𝑥, 𝑦, 𝑧) in the world
coordinate system is captured by a camera setup
𝑃 = 𝐾 𝑅#×# 𝑇#×%
• The pixel position (𝑢, 𝑣) of world point (𝑥, 𝑦, 𝑧) is found in
𝑥
𝑢′ 𝑦
homogeneous coordinate notation 𝑣′ = P ,
𝑧
𝑘 1
&
𝑢, 𝑣 = 𝑢 ⁄𝑘, 𝑣 ⁄𝑘&
11/8/22 17
Multi-camera configurations
• The camera projection matrices can be used to project
data between multiple cameras
– “What does this the scene shot from camera A look like from
camera position B?”
– E.g. range sensor data to a color camera
• Need to know
– Intrinsic parameters of all cameras
– Extrinsic parameters between all cameras (relative to each
other)
– Distance of the points from the camera(!)
11/8/22 18
Lens distortion models
• Pinhole model does not account for any lens effects
• Even with small aperture, lenses still affect the direction
of incoming light rays
– Not because they are supposed to, but because optics are not
ideal
• Can be considered
– Explicitly effect by effect (physics)
– Implicitly by the resulting net effect (image processing)
8.11.2022 19
Radial distortion
Barrel distortion Pincushion distortion
xdistorted = x(1 + k1*r2 + k2*r4 + k3*r6)

ydistorted= y(1 + k1*r2 + k2*r4 + k3*r6)
8.11.2022 20
Tangential distortion
• Caused by miss-alignment of the lens vs sensor
xdistorted = x + [2 * p1 * x * y + p2 * (r2 + 2 * x2)]

ydistorted = y + [p1 * (r2 + 2 *y2) + 2 * p2 * x * y]
8.11.2022 21
Camera calibration
• Extraction of camera parameters
• Parameters correspond to the pinhole model
• Intrinsic parameters
– How the camera forms the image
– Camera matrix; lens distortions
• Extrinsic parameters (relative to something)
– Position
– Rotation
08/11/2022 3D Media Technology

Calibration
• Self-calibration
– Rely on the captured images only
– Fundamental matrix reconstruction (contains both
information about camera intrinsic parameters and
relative motion between frames
• Manual calibration
– Using images of special calibration patterns to identify
the camera parameters

Input data
https://www.vision.caltech.edu/bouguetj/calib_doc/htmls/example.html
8.11.2022 24
Corner extraction
8.11.2022 25
Corner extraction
8.11.2022 26
Model fit
8.11.2022 27
Intrinsic calibration
8.11.2022 28
Calibration quality
8.11.2022 29
Calibration quality (corrected)
8.11.2022 30
Extrinsic (single camera)
8.11.2022 31
Distortion models
8.11.2022 32
Stereo calibration
8.11.2022 33
Extrinsic stereo parameters
8.11.2022 34
Thin lens model
• Real cameras have lenses
• Unlike a pinhole, a lens system doesn’t image objects

sharp at every distance
– Technically, only one distance is imaged sharp
– Practically, if blur size < pixel size, things appear sharp
https://en.wikipedia.org/wiki/File:Lens3.svg
8.11.2022 35
Thin lens model
• Objects at different distances create blurs of
different sizes
https://en.wikipedia.org/wiki/File:DefocusBlur.png
8.11.2022 36
Sensing methods
Passive - Active
8.11.2022 37
Triangulation
• If an observer is looking at some point of interest from

several locations and measuring relative angle and
displacement of observation, then by simple geometric
triangulation the distance to the point can be calculated
Stereo
• Most common passive depth sensing method
• Mimics the way human observer (partially) sees
depth, i.e. binocular disparity estimation
• Finding correspondences between images taken
from different perspectives
• Main question: what is a correspondence?
8.11.2022 39
Stereo matching (sparse)
• Sometimes it is enough to find depth information
which doesn’t cover the whole scene
– Object detection
– Self-calibration
– Depth range estimation
• Certain things in images are easier to detect and
match
• Focusing on them is fast and robust
=> Feature matching
8.11.2022 40
Feature matching
• Invariant features
– Gradient-based, e.g. corners
– Scale-space based, e.g. SIFT (DoG),
Laplacians, multiscale differentiators
8.11.2022 41
Feature matching
8.11.2022 42
Interest point detection
• Define a measure to measure how interesting a point (+
it’s neighborhood) is
• For example:
– Change of gradient direction along an edge contour (Kitchen-
Rosenfield)
– Change of gradient direction along an edge contour over a
bicubic polynomial surface (Zuniga-Haralick)
– Magnitude of change of gradient (Harris-Stephens)
– Response to elementary corner models (Trajkovic-Hedley)
• Threshold the values to pick interesting points
8.11.2022 43
Harris-Stephens corners
2
• Detect significant changes in the direction

on the image patch I
• First order Taylor approximation
8.11.2022 44
• If C has high enough eigenvalues, any choice of direction would be

a high score for m
8.11.2022 45
8.11.2022 46
Feature computation
• Computing a description of the local neighborhood
• E.g. describing the intensity distribution around the point
or aggregate of filter responses etc.
• Depending on the feature, it may (try to) be invariant to
– Scale
– Rotation
– Perspective
– Exposure
• Similarity between points is determined based on
similarity of the descriptors
8.11.2022 47
Rotational invariance
8.11.2022 48
Rotational invariance
• E.g. rotate image patches to align the gradient direction
with a common axis
• Use descriptors based on local statistics
8.11.2022 49
Scale invariance
8.11.2022 50
Scale invariance
• Searching for features in multiple scales
8.11.2022 51
Affine invariance
8.11.2022 52
Camera self-calibration
• Rely on the captured images only
• Fundamental matrix reconstruction

Fundamental matrix
• The fundamental matrix is a shortcut between
the images of two cameras without knowing
everything about the parameters
• Contains both information about camera
intrinsic parameters and relative motion between
cameras / frames
8.11.2022 54
The epipolar geometry
e e’
C,C’,x,x’ and X are coplanar
The points e and e’ are called epi-poles

8.11.2022 55
All points on p project on l and l’
8.11.2022 56
epipoles e,e’= intersection of baseline

with image plane = projection of projection center in other image
an epipolar plane = plane containing baseline (1-D family)
an epipolar line = intersection of epipolar plane with image (always

come in corresponding pairs)
8.11.2022 57
Fundamental matrix
• The fundamental matrix satisfies the condition
that for any corresponding points 𝒖7 and 𝒖8 in
two images A and B
𝒖!" 𝐹𝒖# = 0
𝑢#
𝑢! 𝑣! 1 𝐹 𝑣# = 0
1
8.11.2022 58
Examples of epipolar lines
8.11.2022 59
Examples of epipolar lines
For parallel camera images there is no optical centre projection to the

other camera. In this case, the epipoles are in infinity
8.11.2022 60
Fundamental matrix
• Algebraic representation of epipolar geometry
• For a given imaging geometry, a point, x, on one image
corresponds to a line, l’, on the other image
• Geometric derivation
• Mapping from 2-D to 1-D family

(rank 2)
8.11.2022 61
Finding F
• If you need to find the fundamental matrix, you
likely don’t know much of the cameras
𝑢#
𝑢! 𝑣! 1 𝐹 𝑣# = 0
1
• Let’s start with point correspondences

8.11.2022 62
Finding F
• Feature matching can give us (tentative)
corresponding points
8.11.2022 63
Finding F
• We need 7 (theoretically)
– 9 parameters, but F is rank 2 and we forget scale
• Those should, however, be correct, but how to guarantee that?
8.11.2022 64
Random Sample Consensus
(RANSAC)
1. Obtain list of tentative matches
2. Select a few
3. Recover the fundamental matrix
4. See if a significant number of tentative matches
agree
5. If not, goto 2.
8.11.2022 65
Essential matrix
• If you know the cameras, but not their pose
• Applying the inverse of the camera matrix K to
the coordinates normalizes them
8.11.2022 66
Essential matrix
• Unlike F, can be decomposed to R and t (kinda)
• Singular Value Decomposition of E leads to 4 possible
solutions (after skipping some matrix magic)
• Only one of them produces all points in front of both
cameras
• Even then, scale is not recovered
– We know in which direction the camera is, but not how far
– Can’t be solved without some extra information
8.11.2022 67
Stereo matching (dense)
• The scene is captured Left view

from two viewpoints
• The distance of objects is
shown as their horizontal
displacement
Disparity
Right view
8.11.2022 68
Stereo camera pose
• To do efficient stereo, certain requirements are in place
1 0 0 𝑏
– Relative camera pose should be 𝑅 = 0 1 0 , 𝒕 = 0
0 0 1 0
– I.e. optical axis are parallel (in 3D space)
– Camera location differs only in horizontal direction
– Horizontal pixel rows are parallel
• No matter how well cameras are physically positioned,

manufacturing tolerances alone prevent them from being
ideally oriented
8.11.2022 69
Stereo rectification
• Usually before doing any correspondence search, stereo
pairs are rectified
– Not an absolute requirement, but makes the task easier to
implement
• Rectification is the process of post processing already
captured data to be as if it had been captured with an
ideal setup
• Necessary input: intrinsic and extrinsic calibration data
• Since we don’t have depth information, only 2D images,
what can be done?
8.11.2022 70
Stereo rectification
ty R?
• Can’t change location (due to

Right parallax)
Camera
• Rotation around the optical center is possible Rf
R?
R? tz
ty
ght Camera
Rf
Left Camera
tx
R?
tz
• Reordering (resampling) of the captured rays
• Rotation does reveal areas which weren’t captured
8.11.2022 71
t
Stereo triangulation
• The relation between the
P = ( x, z )
displacement (disparity)
$∗&
and the depth is 𝑧 =
'
d = xL* - xR*
(0, f ) PL ' = ( xL , f ) (b, f ) PR ' = ( xR , f )
xL* x R*
z
f
(0,0)
b
8.11.2022 72
Stereo matching (dense)
• Practical stereo pipeline includes
– Matching cost computation
– Cost aggregation
– Disparity computation
– Disparity refinement
Rectified
stereo pair Cost volume Cost volume Disparity map Disparity map
Matching cost Cost Disparity Disparity

computation aggregation computation refinement
8.11.2022 73
Global methods
• Based on e.g Hidden Markov models
• Minimize the cost volume function together with
additional smoothness term
• Csmooth - define a smoothness penalty e.g. with
respect to 4-connected neighborhood
y
d
x
Matching cost computation
• In order to determine which points are the same,
a similarity measure (matching cost) must be
defined
• Several variants exist
– L1, L2
– Normalized cross correlation
– Census
– …
8.11.2022 75
Dense stereo - Cost volume
• 3D Cost Volume matrix, specified as a per-pixel
differences of left and right channels
𝐶()*+ (𝑥, 𝑦, 𝑑) = 𝐿 𝑥, 𝑦 − 𝑅(𝑥 − 𝑑, 𝑦)
• Dissimilarity metrics, such
as SSD or SAD
height
• More comprehensive metrics
– mutual Information
dissimilarity
– cross-correlation
– rank-transformed window
difference
width
disparity
Cost aggregation
• Per-pixel matching costs are usually not reliable
– same pixel value can appear multiple times in
different points
– noise in images
– uniform areas
– repetitive patterns
– non-Lambertian reflections in the scene
8.11.2022 77
Cost aggregation
• Assumption: nearby pixels are likely parts of the
same object and thus with similar depths
• Enforced by cost aggregation
• Implemented by filtering the costs
– Averaging, edge-aware, color-guided…
– The more filtering, the more robust result, but less
details are preserved
Global versus local methods
• The bulk of computations for the local methods
goes for the cost volume aggregation
• Global methods are more precise for the price of
higher complexity
• Global methods spend time optimizing the cost
volume based on some criteria
Graph cuts
• Energy minimization via graph theory
t cut
labels
Disparity labels
“cut”
L(p)
y
y
s
p
x
x
Cost aggregation
• Pure Cost Volume usually produces extremely
noisy depth estimates
– noise in images
– uniform areas
– repetitive patterns
– non-Lambertian reflections in the scene
n RGB, sum the absolute differences of the three channels and use that as a cost.
Cost aggregation
regation
e aggregation is per-slice filtering of the Cost Volume to make it more consistent withi
• Aggregated
mension. cost isonathe
Performing this operation filtered costis equivalent to block matchin
cost volume
• estimation).
motion For example, The filterusing
windowlocal neighborhood
size should be a parameter in your implement
𝐶 , , 𝐶 , , 𝑊 , ,
, ∈ ,
is the windowed neighborhood of , and 𝑊 , the weighting function insid

block matching, the weights are constant over the y whole window, while in the Gaussi
d
the normal distribution. In the case of color weighted filtering, the weights are compu
sociated color image. Hint: help imfilter, guidedfilter x
filtering for aggregation (task 6)

ch is also called cross-bilateral filtering. In addition to considering the spatial distance
the contribution of a pixel to the filtering result (e.g. Gaussian filter), the difference in
reduce the effect of outliers on matching quality, the individual cost values in the 3D c
ould be capped, so they do not reach values over e.g. 150 (in case of L1 cost metric). I
Cost aggregation
in RGB, sum the absolute differences of the three channels and use that as a cost.
• Local neighborhood filtering

regation
– Bias-variance trade-off in determining the window size
e aggregation is per-slice filtering of the Cost Volume to make it more consistent with
– Simple windowing (block-based, cross-shaped, star-shaped)
mension. Performing this operation on the cost volume is equivalent to block matchin
– Color-weighted windowing
motion estimation). The filter window size should be a parameter in your implement
𝐶 , , 𝐶 , , 𝑊 , ,
, ∈ ,
is the windowed neighborhood of , and 𝑊 , the weighting function insid

block matching, the weights are constant over the whole window, while in the Gaussi
the normal distribution. In the case of color weighted filtering, the weights are compu
sociated color image. Hint: help imfilter, guidedfilter
filtering for aggregation (task 6)

where Ω , is the windowed neighborhood of , and 𝑊 , the weighting function inside the
window. In block matching, the weights are constant over the whole window, while in the Gaussian ca
Cost aggregation
they follow the normal distribution. In the case of color weighted filtering, the weights are computed
from the associated color image. Hint: help imfilter, guidedfilter
Bilateral filtering for aggregation (task 6)

• This
Local neighborhood filtering
approach is also called cross-bilateral filtering. In addition to considering the spatial distance whe
computing the contribution of a pixel to the filtering result (e.g. Gaussian filter), the difference in
– Bilateral
intensity filtering
is also taken into account.
Spatial proximity (Euclidian distance) between pixels p and q in coordinate space (u,v):
Color similarity between pixels p and q:
Δ𝑐 𝐿 𝐿 𝑎 𝑎 𝑏 𝑏
Weights:
𝑐
W p, q exp exp
A suitable starting point for the values of and is for instance 10 (assuming the image intensities a
Examples of cost aggregation
(g) Color weights

he filtering. Hint: this filter involves complex processing. Make sure your filter
ou start testing the effect of the filter size. Running it with different windows
Disparity Computation / Optimization
take hours to process.
• Convert cost volume to disparity

timated from the
• Local cost volume
method: C using
Winner the itWinner-takes-all
Takes All (WTA) approach
a gmin
ation, it would be possible to fit a quadratic function to the cost vector along the
Cost Value
ed). MINb
y
MINa
d
disparity
x
fferent aggregation
• Confidence approaches
measures
regation methods for different
– Left-to-Right window sizes
correspondence: in terms
marks of the
occluded BAD metric
pixels
– Peak Ratio: shows how ‘sharp’ the minimum is
block matching) PKR = |MINb-MINa|/MINb
w (only considering spatial proximity)
Stereo matching summary
• Passive, low energy consumption
• Uses conventional cameras, cheap
• Depth precision drops inversely to distance
• Small baseline: robust, low depth resolution
• Large baseline: occlusions, higher resolution
• Sensitive to texture of the scene
8.11.2022 87
Estimating depth
• Estimate the 3D point position from point

correspondences
• Triangulation methods: polynomial, linear, etc.
• Dense depth field
– PDF methods
– MRF methods
– Neighborhood based methods
– Dealing with occlusions
08/11/2022
Occlusion filling and refinement
• Occlusion filling is required for areas
where there is no line of sight for both
cameras
Problems
• Not all content is visible from both cameras
– Occluded areas have no correct matching result
– Occlusions must be at least detected, and optionally
filled with some reasonable data
• Matching points relies on texture
– No texture, no matches, no depth
– Texture can be superimposed by lighting (“active
stereo”)
8.11.2022 90
Occlusion filling and refinement
• Methods include
– Median / weighted median
– Color segmentation of the associated image
(smoothness, plane fitting)
– Iterative filling by propagating high-confidence
disparity from well-matched areas
Stereo depth resolution
• Due to the inverse relation (disparity ~ 1/depth),
depth resolution is a function of depth
𝑏∗𝑓
𝑧=
𝑑
8.11.2022 92
Multiview depth
• What if you have multiple views
with arbitrary camera locations?
• Rectification gets problematic
• Plane sweeping1
– Sweep hypothetical depth planes through the volume
– At each reference image pixel, which plane provides the lowest
cost?
– That is the best estimate of depth for that pixel
[1] Collins, R.T., "A space-sweep approach to true multi-image matching," Computer Vision and Pattern Recognition, 1996.
Proceedings CVPR '96, 1996 IEEE Computer Society Conference on , pp.358,363, 18-20 Jun 1996
8.11.2022 93
MULTIVIEW DEPTH ESTIMATION
Plane sweeping
Olli Suominen, Atanas Gotchev
Department of Signal Processing

Tampere University of Technology, Finland
• ABSTRACT
Starting data/parameters 𝑐𝑠
– Multiple camera views 𝒊"
is an increasingly popular algorithm
– Calibration data for each for gen-
timates from multiview
camera images. It avoids im-
n, and can align the matching process with
– Depth range 𝒎
, improving accuracy and robustness. Set How-of
𝒊
– Plane
the search space density
increases significantlyplaneswhen P 𝑓 𝒅 𝒏𝒎
e orientations – are
Plane orientations
considered. We present an 𝑐𝑟
perform plane sweeping without individually
𝑃 = 𝛱! =
ojection and similarity (𝒏𝒎, 𝒅𝒎on
metrics ) |𝑚 = 1, 2,pix-
image …𝑀
Figure 1: Left: The basic concept of plan
as, all orientations and all distances. The pro- tors used to describe it demonstrated with
O. Suominen and A. Gotchev, "Efficient cost volume sampling for plane sweeping based multiview depth
cels when theestimation,"
amount2015 of views is increased
IEEE International Conference onaImage
planeProcessing
approximated by known
(ICIP), 2015, cost volum
pp. 1075-1079,
ently with the number of different plane ori-
es on approximation to generate the costs, but It is based on constructing a pixel-wise
8.11.2022 94
are shown to be small. In practice, it provides sequently sampling the volume to achie
Plane sweeping

For each plane in 𝑃
• ABSTRACT 𝑐𝑠
1. Back-project the reference view 𝒊"
onto
is an increasingly the plane
popular algorithm for gen-
timates from 2. multiview
Project toimages.
the other cameras
It avoids im-
n, and can 3.alignCompute matching
the matching cost between
process with 𝒎
the projected
, improving accuracy views
and robustness. How- 𝒊
4. Apply
the search space cost volume
increases aggregation
significantly when 𝑓 𝒅
e orientations
• Depthare 𝐷(𝑢,
considered. We present
𝑣) is found at the an 𝑐𝑟 𝒏𝒎
perform plane sweeping
lowest without individually
cost plane
ojection and similarity metrics on image pix- Figure 1: Left: The basic concept of plan
cels when the amount of views is increased a plane approximated by known cost volum
8.11.2022 95
Plane sweeping

onto
timates from multiview images. It avoids im-
n, and can align the matching 𝒏𝒎, 𝒅 process with
𝒑 = 𝒊$ 𝒎
, improving accuracy𝒎 and𝒏robustness.
,
𝒎 $𝒊 How- 𝒊
the search space increases significantly when 𝑓
e orientations are considered. We present an 𝑐𝑟 𝒅 𝒏𝒎
8.11.2022 96
Plane sweeping

onto
popular algorithm for gen- 𝒊"
Project toimages.
the other cameras
It avoids im-
n, and can align the matching process with 𝒎
, improving accuracy and 𝑓 robustness. How- 𝒊
𝒊" = 𝑅 % 𝒑𝒎 − 𝒕 .
the search space increases 𝑧 significantly when 𝑓
e orientations are considered. We present an 𝑐𝑟 𝒅 𝒏𝒎
8.11.2022 97
Plane sweeping

onto
Project toimages.
the other cameras
It avoids im-
process with 𝒎
the projected
4. Apply
e orientations are considered. We present an 𝑐𝑟 𝒏𝒎
perform plane 𝐶&sweeping
𝒖𝒓 , 𝛱! without individually
ojection and similarity
* metrics on image pix- Figure 1: Left: The basic concept of plan
as, all orientations
=FF and 𝐼all
$ 𝒖distances.
$ + 𝒖 − 𝐼"The
(𝒖" + 𝒖)
pro- tors used to describe it demonstrated with
cels when the amount"() 𝒖∈- of views is increased a plane approximated by known cost volum
8.11.2022 98
Plane sweeping

onto
Project toimages.
the other cameras
It avoids im-
process with 𝒎
the projected
4. Apply
e orientations
• Depthare 𝐷(𝑢,
considered. We present
𝑣) is found at the an 𝑐𝑟 𝒏𝒎
perform plane sweeping
lowest without individually
cost plane
8.11.2022 99
“SEMI-ACTIVE” SENSORS
8.11.2022 100
Optical markers
• Sometimes the objective can be to capture 3D
information from known objects
• Then it may be enough to have only a few
captured depth points
• However, those points should be robust and
always the same points
– Rules out feature based matching
• Optical markers can be used to make sure
certain points are very reliably detected
8.11.2022 101
Case: Motion capture
• Movie & game industry standard for creating digital
animations
• Optical markers are worn by an actor
• Markers are tracked through cameras
• Known object model is fit on the tracked point cloud
8.11.2022 102
Optical markers
• Most optical tracking systems operate in the (near)
infrared spectrum to avoid distracting human observers
• Infrared cameras are positioned around the capture
space
8.11.2022 103
Optical markers
• Location of cameras is pre-calibrated
• Each camera has their own IR-source
– Optical markers are retro-reflective spheres
• Can also be done with active markers
– Higher cost, more maintenance (batteries)
– Allows larger distances, more robust
8.11.2022 104
Optical markers:
Blob detection
• Each marker is seen as a high intensity spot on each
camera
• For all camera images:
1) Threshold the image to detect bright blobs
2) Remove too small or too large blobs
3) Remove non-circular blobs
4) Extract the centroid of each blob (assumed to be a
marker)
• End result: a number of 2D locations for markers on
each camera plane
8.11.2022 105
Optical markers:
3D reconstruction
• Each camera location is known based on pre-
calibration
• It becomes the familiar triangulation problem
• Each marker position marks a ray of potential
locations from each camera
8.11.2022 106
Optical markers:
3D reconstruction
• However, there are no identifying features in the
markers
• Additionally: noise, calibration error
– Rays from each marker do not intersect correctly
even when markers are labeled correctly
• Which bright spot is which?
• Which configuration of markers in 3D best
explains the detected 2D marker locations
=> Optimization problem

8.11.2022 107
Optical markers:
3D reconstruction
• Optimization criteria: reprojection error
– Project assumed position of marker to each camera
– Measure distance of projection to actual observation on
each image plane
• Multiple markers: average reprojection error over
all markers
8.11.2022 108
Optical markers:
Model fitting
• Optimization result: probable locations of
markers in 3D space
• Prior information: Properties of the object the
markers are attached to
• Simple case: object is rigid and has a rigid
constellation of a few markers attached to it
– Object has 6 degrees of freedom
– Just need to match the observed constellation to the
template
– Iterative Closest Point (ICP) or similar
8.11.2022 109
Optical markers:
Model fitting
• A more challenging case: Object consists of several
parts moving in relation to each other with markers in all
parts
– Such as a human actor
• Still, certain constraints apply
– Joints are in known locations and can rotate in specific ways
– Bones have fixed length
• The outcome: full motion of the actor
– Expressed as the individual movement of joints and bones
– Can be mapped to virtual models with the same skeletal
structure
8.11.2022 110
Coded aperture
• Lens blur is also a depth cue
• Distance of object can be determined from the size of
the blur
• For a single point light source, it’s simple
– Measure the size of the circle
– Focus blur (from symmetric aperture)
is symmetric
8.11.2022 111
Coded aperture
• Real scene has multiple

sources of light
• The lens blurs overlap on the
image sensor
• Difficult if not impossible to tell
them apart
• Inverse problem
– Observe blurred image
– Figure out which scene would
create the image
8.11.2022 112
Coded aperture
• The inversion can be helped by introducing a

more unique blur pattern => aperture code
C. Zhou and S. Nayar, "What are good apertures for defocus deblurring?," Computational Photography (ICCP), 2009 IEEE International
Conference on, San Francisco, CA, 2009, pp. 1-8.
8.11.2022 113
Structured light
• Variant of stereo matching

with an active component
• The goal of stereo is to
identify the projection of the
same point on two different
image planes
z
f
8.11.2022 114
Structured light
• Replace the other camera

from a stereo pair with a
projector (light source)
– Projector can be though of as
a camera in reverse
– 2D LCD on image plane
creates points in 3D space
• Project something unique so
z
it can be identified by the f
camera
8.11.2022 115
Gray code pattern
• Something unique can be a complex pattern
(e.g. pseudorandom)
• There are ways to unambiguously label points in
the scene
– Avoids correspondence search
– Needs time multiplexing
x http://www.cs.middlebury.edu/~schar/papers/structlight/p1.html
8.11.2022 116
Time-of-flight cameras
ACTIVE SENSING
8.11.2022 117
Sampling via scanning
• A 1D range detector is used with a 1D or 2D deflection
unit (rotating mirrors)
rotating mirror
Target
laser
range
detector
8.11.2022 118
Sampling via scanning
• Scanning takes time

– Several minutes per scene for large FOV, high
resolution scanners (Leica C10)
– Real time for narrow FOV, low resolution
(Velodyne VLP-16)
Sensor Velodyne VLP-16 Leica Scanstation C10
Range(m) 100 300
Vertical FOV 30° 270°
Vertical resolution 2° -
Horizontal FOV 360° 360°
5Hz-0.1°
Horizontal resolution 10Hz-0.2° -
20Hz-0.4°
Accuracy ±3 cm 3mm
Price 8000 USD 50 000 USD +
Velodyne sensors are used

Extra
in Google’s self-driving car
8.11.2022 119
Sampling via matrix
• Same distance measurement principles
• Instead of a single sensitive element, matrix of
elements (CCD / CMOS sensor)
• Some methods require very sensitive or very
fast sensing
– Doing it in a matrix is more challenging, more
expensive
– Typically, depth sensors have lower spatial resolution
to compensate
8.11.2022 120
Interferometry
• Constructive and destructive

super-position of beams of
coherent light
• Very short optical wavelengths used
– Suitable for ranges of some millimeters
• Accurate and easy to implement by current CMOS
technology
• Holography, metalloscopy, microscopic imaging
Time-of-Flight Classification
1. Pulsed Modulation
– A very short pulse is emitted by the system, and basic
goal is to detect exactly the arrival time of reflected
light pulse
2. Optical Shutter Sensor Technology
– Indirect measurement of the time of flight using a fast
shutter technique
3. Continuous Modulation
– Phase-difference between sent and received signals
of a continuous modulation wave(CW)
Pulsed Modulation Systems
• A very short pulse is emitted by the system, and the goal
is to detect the exact arrival time of reflected light pulse
• ToF is detected by correlation with the start-stop signal
generator clock
Laser
Pulser
Receiver
TDC Channel
(Time to digital
Converter) Receiver
Channel
Target
Optics
• Short period of the pulse
• Effective transmission of high amount of energy
• Improves quality of scattered signal
• Enables operation of long distance range and use of
cheap low-sensitivity sensors
Laser
Pulser
Receiver
TDC Channel
(Time to digital
Converter)
Receiver
Channel
Target
Optics
• Short periods of pulses allows emitting signals of
low optical power (important for the eye-safety)
• Difficulties in providing good repetition rates of
short high energy pulses
• Currently, the practical implementation requires
expensive laser diodes
– The repetition rate is too low (~10 kHz)
for real-time camera solutions
• Currently used in laser range-finders
Direct time of flight measurement
• Measuring the round trip time, the distance is
calculated by
t 8 m
d= ; c = 3 ×10
2c sec.
• 1 ns corresponds to a distance of 15 cm
• Time measurement is critical and has to be very
accurate
08.11.22 126
Direct time of flight measurement
• Amplitude of reflected signal depends on

surface properties, surface orientation,
distance
• Conventional detection of time of impulse
by comparison with constant threshold is
dependent on amplitude
• Solution: Comparing derivative to zero
08.11.22 127
Indirect time of flight
measurement
• The problem of measuring time can be converted to
measuring amplitude
• E.g use of an optical shutter in front of the receiver,
triggered to close a short time after the laser pulse was
emitted
– The earlier the impulse returns, the more of it will be allowed to
pass through the shutter to the receiver
08.11.22 128
Optical Shutter Systems
• Based on indirect measurement of the time of flight using
a fast shutter technique
• The basic concept utilizes a short Near-Infra-Red (NIR)
light pulse, which represents a light front-wave called
“light wall”
• Light wall has properties of a spherical thick
surface with a finite width
8.11.2022 129
Optical Shutter Systems
• Light hits the objects in the scene as a “light wall”

• Reflected light wall is distorted in a way which
resembles shape of the object
• If the sensor shutter is fast, it can track the shape
of the imprinted light and find depth of the scene
• The shutter is based on a standard CCD sensor
• The depth range and resolution depend on the
operating speed of the CCD
8.11.2022 130
Continuous Wave Methods
• CW techniques are based on the idea that the phase of
the reflected signal is shifted according to the distance to
the target
• It is possible to use the light waves directly for the
measurement (interferometry)
• Another possibility is to modulate the optical carrier
(laser) intensity using RF (radio frequency) signals
– A.k.a. “optical RF interferometry”
•
08.11.22 131
• Phase shift measurement
– Light is modulated with sine wave
– Received signal contains sine wave
– Phase is compared to phase of sender signal with
electrical mixer
Modulated Oscillator
Transmitter
Phase
Mixer / &
Receiver LowPass Amplitude
08.11.22 132
• Problem:
– Ambiguity: (modulo l/2), we need large l
– Accuracy: (proportional to l), we need small l
• Solution:
– use two or several sine waves
+ =
short wave long wave Modulation Modulated

(accuracy) (uniqueness) Signal Light
08.11.22 133
• Chirp Modulation
– Light is modulated with sine wave of linear growing
frequency
– Mixing of sender and receiver signal results in beat
frequency proportional to delay and thus proportional
to distance
chirp signal
frequency of
above signal
08.11.22 134
Photonic Mixer Device Camera
• Photonic Mixer Devices (PMD) are ToF cameras utilizing

CW modulation which can simultaneously capture IR &
range data
• Their active range sensor integrates a demodulation
system on the standard imaging sensor of CMOS
technology
• PMD is specific for manufacturer PMDTecTM
• It is commonly used in scientific publications as a
synonym for all CW based devices
PMD Camera Hardware
Framework
• A typical PMD camera consists of imaging sensor
based on CMOS chip (Photonic Detector) with
integrated correlation unit called Photonic Mixer
• Active illumination provided by integrated
illumination unit based on Near-Infra-Red (NIR)
emitting diodes
• Phase modulator (usually a sine-wave generator)
provides the electric current to the diodes which
emit a modulated NIR wave
PMD Camera Hardware
Framework
Object
• The reflected modulation

light is received through
optical lenses and directed Optical Lens
to the sensor surface Depth

Map
Intensity
• The electrical signal from

CMOS Photonic
Phase Detector
Modulated Correlator
Light
each CMOS pixel is correlated Photo-emitting

Diode Phase
Modulator Photonic
with the signal of the sine-wave Chip Sensor
generator to demodulate the

phase-shift of travelled light
• The value is read-out to the
output interface
phase frames 𝑄𝑘 𝑥,𝑸 𝑦 , in0 order 1
𝜑
II. B ASIC
A. Time-of-flight sensing P RINCIPLES OF T
object principle
IME OF F LIGHT O PERATION
𝐶𝑸 𝑥, 𝑦 , amplitude map 𝐷𝑸 𝐴𝑸𝑘1𝑥, 𝑦
reflected −14
For the purpose
A modulated
typical ToF ofdevice FPN analysis
(c.f. Fig. andlight
1)modeling,
consists inofthis
an section
opto- 𝜑 𝑥, 𝑦 . In practice, 𝜑one
𝑸 tan
is not a
optics (3) 𝑸
values 𝑐𝑸.
Signal Demodulation Principle [8], [9],true where 𝐿 is speed the of observe
light i
we light
give a brief
electronic system overview
that of the ToF
beams a sensing
continuouslyprinciple sensed
modulated Instead,
light [15], Finally, thek1measured
Parameters dist
[10]
harmonic and itslight performancesignal in(typically
LS environment in the[12], [14],
near-infrared mixtures of 𝑸 values and k2 are
and measurem
beamed estimated be phase 𝜑𝑄 andby cancab
[16].
wavelength range) in moment 𝑡, and senses back the delayed The should
amplitude determined
and phase paramete
signal (e.g. compensating for impe
oscillator 𝑡 𝑡 . Athe potentially noisy measurements 𝜑
A. Time-of-flight sensingobjects
light reflection from the principle in the scene at time sensor 𝑑 𝐷𝑸operating
𝑘due
PMD cameras retrieve
cross-correlation operationthe is
modulation
applied for continuous signals ofdataand phase in
captured offset
normal/ delays 1
4
A parameters
typical ToF device (c.f. Fig. 1) consists elements
of an opto-
from
LED 𝑠 𝑡 and reflected 𝑟 𝑡
emitted the photonic mixer.𝑡𝑑 signals. Assuming ideal(image device). Without loss
of the used scene is given in of gen
electronic
emitter system that signal
beams a continuously modulated where
and k 𝑐𝐿 Itis isspeed
=0. obvious of light
that dui
system and conditions, the mixer result of cross correlation is an 2
Parameters k1 and k2 are
harmonic
The photonic light mixer signal
estimates(typically in the
the phase-shift of near-infrared
CCD can be measured only inside
continuous intensity output signal 𝑞 for (or a given
CMOS) correlation
sensor should be determined by ca
wavelength
the signal by range) in moment
implementing 𝑡, and senses back the delayed 𝐷 ∈ 0, 𝐷
anoutput
intensity
(offset)
light
parameter
demodulation
reflection from
𝜏:
approach.
the objects in the scene at time 𝑡 𝑡 . A
𝑀𝐴𝑋
(e.g. compensating , e.g. for 𝑓impe2
for
Fig. 1. A typical Time-of-flight (ToF) device and its components. 𝑑 7.5
and 𝑚 [9].
phase The /data
offset sample
delays due
cross-correlation
Emitted 𝑞 𝑡 s(t)
signal 𝜏 and 𝑟reflected
operation 𝑡is applied
𝑡𝑑 (time-delayed
⨂𝑠 for 𝑡 continuous
𝜏 signals (1) of are expressed in ToF syste
In practice,
emitted 𝑠 𝑡 andthe reflected
cross-correlation
𝑟 𝑡 𝑡 in (1) is implemented
signals. Assuming byideal device). Without loss of gen
withsignal r(t+t
integrating d)output
get cross-correlated 𝑑to q(t+at)given value of 𝜏 Consequently, following (4),
system andtheconditions, of the
thecorrelator
result offor cross correlation is an and k2=0. It is obvious that d
over several 𝑠signal 𝑡, 𝜔 periods 𝐶 depending on the integration time. are also in ToF system units
continuous intensity output 𝐴signal
cos 𝜔𝑡 𝑞 for𝜑 a given correlation can be measured only inside
For integration times that are a multiplier of the modulation (2) phase 𝐷𝑀𝐴𝑋 ,𝑸e.g.
𝐷 ∈ 0,frames byfor using
𝑓 (
(offset) 𝑟 parameter
𝑡 𝑡 , 𝜔 𝜏: 𝐶 𝐴 cos 𝜔 𝑡 𝑡 𝜑
period, that 𝑑is 𝐼𝑇 𝑡1𝑅 𝑡0 𝑅 𝐾 𝑇, 𝐾 ∈ ℕ 𝑑this output 𝑅 is a converts
7.5 𝑚 [9]. the The phase datato sample
the est
for
cosine 𝜔 function:
2𝜋𝑓𝑞 and 𝑡 𝜏𝑓 1/𝑇, 𝑟 𝑡 where 𝑡𝑑 ⨂𝑠𝐴𝑡 denotes 𝜏 amplitude, (1) 𝜑 meters. If not mentioned
are expressed in ToF syste
denotes phase delay, 𝐶 denotes modulation offset (or assumed/used
with Consequently,infollowing the rest of(4) th
modulation
𝑄 𝜏 𝑞 𝑡 𝜏𝜔𝑑𝑡denotes
contrast), 𝐶𝑄 modulation
𝐴𝑄 cos 𝜏 frequency,
𝜑𝑄 and T
(3)
𝑠 𝑡, 𝜔period. 𝐶 𝐴 cos 𝜔𝑡 𝜑 are The
also above
in ToFprocedure
system unit is
is the modulation
(2) phase frames 𝑸 𝑄𝑘 by 𝑥, 𝑦using
, in (
where𝑟 𝜑𝑡𝑄 𝑡𝑑𝜔𝑡 ,𝜔 𝑑 is the𝐶 phase
𝐴 from
cos 𝜔 which
𝑡 𝑡 the range
𝜑 can be
𝑅
object 𝑅
reflected
𝑑 𝑅 𝐶 𝑸 𝑥, 𝑦 , the
converts amplitude
phase to map the es𝐴
determined. By sampling
𝜔 2𝜋𝑓 and 𝑓 1/𝑇, where 𝐴light
formodulated 𝑄 𝜏 in four points
denotes amplitude, 𝜑 𝑸
𝑄 , 𝑄 , 𝑄 , 𝑄 , corresponding to 𝜏 0, 𝜋/2, optics
𝜋, 3𝜋/2, the 𝜑 𝑸 𝑥, 𝑦 . If
meters. In practice,
not mentionedone is
0
denotes light phase delay, 𝐶 denotes modulation offset
1 2 3 sensed(or assumed/used
true values 𝑸. in the restthe
Instead, of th
o
modulation contrast), 𝜔 denotes modulation frequency, light and T
Signal Demodulation Principle
A 4-sampling demodulation procedure
R PAPER IDENTIFICATION
called 4-tap NUMBER
lock pixel and described is (DOUBLE-CLICK HERE TO EDIT) < 2
used in case of sinusoidal signal.
It samples the Q signal in half-period

, respectively,
TH YOUR PAPER
modulation
withIDENTIFICATION
shifts 𝐴𝑸 , the (DOUBLE-CLICK
sub amplitudeNUMBER offset 𝐶𝑸 , and most importantly
HERE TO EDIT)the
< phase2
ed for their
[0°, evaluation.
90°, 180°, 270°] 𝜑𝑸 of the 𝑄 𝜏 can be estimated as [9], [26]
e a frespect
with ackto itheg reference
he signal
ihei depth,
gandf after ack i
respectively, g he
with sub amplitude 𝐴𝑸 , the 𝑄2 𝐶2𝑸 , and
𝑄0 offset 2
𝑄3most𝑄1importantly the phase
used for their evaluation. 𝜑𝑸 of 𝐴
trigonometric operations
frameyields: the𝑸 𝑄 𝜏 can be estimated as [9], [26]
2
e e e a f ack i g he
de i i g f ack i g he
𝐶𝑸 𝑄0 𝑄𝑄1 𝑄 𝑄22 𝑄𝑄3 /4𝑄 2 (4)
FLIGHTTheOPERATION
amplitude (intensity) of the incident light 𝐴𝑸 0 2 3 1
−1
𝑄 𝑄
0 2 2
modeling, in this section 𝜑𝑸 tan .
Contrast (offset) 𝐶𝑸 𝑄0 𝑄𝑄13 𝑄𝑄 21 𝑄 3 /4 (4)
ensing principle [8],
TIME OF FLIGHT OPERATION [9],
Finally, the measured distance 𝑄0D 𝑄 is2 proportional to the
onment [12], [14], [15],
sis and modeling, in this section Phase shift 𝜑𝑸 tan −1
.
estimated phase 𝜑𝑄 and can be evaluated
𝑄3 𝑄1 as
he ToF sensing principle [8], [9],
LS environment [12], Distance
[14], [15], Finally, the measured 𝜑distance
𝑸 𝑐𝐿
D is proportional to the
is obtained as: 𝐷𝑸𝜑𝑄 and
𝑘1 can be evaluated
𝑘2 , (5)
estimated phase as
T 4𝜋𝑓
consists of an opto- 𝜑𝑸 𝑐𝐿
nciple
ontinuously modulated where 𝑐𝐿 is speed of 𝐷
light 𝑘in dry air𝑘(𝑐, 𝐿 ~2.98×108 m/sec (5)
).
Signal Demodulation
Captured Scene
Sine Contrast(offset) The intensity of Captured Depth Data

the incident light
Time-of-Flight
• Sensitive to interference from sunlight due to low
signal to noise ratio of the emitted light
• Highly reflective surfaces can saturate the
sensor
• Too absorbent surfaces don’t return enough light
for measurement
• Too slanted surfaces reflect the light away from
the sensor
• Too detailed surfaces can scatter the light
8.11.2022 141
Camera Depth Resolution
Calibration
• Depth estimation by PMD cameras in most
cases produces noisy results
• Such noise has complex nature and depends on
many different aspects of ToF capturing
– Ambient light, wrap-around error, sensor noise,
wiggling, motion, etc
CWT Camera Depth Resolution
• The active range resolution depends on the chosen modulation
frequency: NIR wave with 20 MHz is about 40cm - 7.5m
• Reflected signal attenuates depending on the distance
• Amplitude attenuation is used as confidence metric
Chapter 2. Preliminaries
CWT Camera Wrap-Around Error
• PMD cameras cannot directly detect whether a
measurement of depth is beyond the operating range
• Demodulation process of phase shift experiences
aliasing which arises from the periodicity of modulation
signal
• Depth aliasing is called wrap-around error
Multi-frequency Modulation Approach for

Depth Unwrapping Application
Other CWT errors
• “Wiggling error” results from

mistaken assumptions in
demodulation
• If the object is changing position
between the phase sampling, the
sensor might receive mixed
modulation signals
• Ambient light
8.11.2022 145
CWT Camera Object Reflectivity
Noise
• Dark painted objects have low reflectivity for
NIR wavelengths, and may be masked by ambient
signals
• Reflectivity from glossy objects can easily over-saturate
the detector
object of dark object of glossy object of

normal reflectivity low reflectivity high reflectivity
More examples of noise sources
2.5 3D sensor fusion and free-view synthesis
(a) Two measurements with
different integration times
(b) Different light reflections
(1) Low reflectivity
(2) Trapped light
(3) Mosquito noise
(4) Low incidence light
(5) Multi-path interference
(c) Regions with corresponding
artifacts
(d) Zoomed versions
Fixed pattern noise in ToF sensors
2.4 Noise influence in a ToF sensing mechanism
(a) Test scene

(b) Raw frame Q0(x,y)
(c) Amplitude
(d) Offset
(e) Noisy phase delay
(f) Ground truth phase delay
Fixed pattern noise is presented

in every digital sensor and has
some specifics for ToF sensors.
Technical issues
Synchronization, interfaces,
bandwidth
8.11.2022 149
Camera synchronization
• If anything in the scene is dynamic, synchronization of
cameras is essential for 3D sensing
• Frames have to be
taken at the same time
– Motion of objects
between frames
– Motion of camera
08/11/2022
Camera synchronization
• Requires implementing some kind of triggering
mechanism (master-slave, etc.)
• Best case: sync two sensors on the same chip
(e.g. mobile phone)
• Worst case: sync several different sensors
attached to different sensors (e.g. acquisition
cluster)
08/11/2022
Camera interfaces (external)
• Gigabit Ethernet
• USB 3.0
• CameraLink
• CoaxPress
• FireWire
8.11.2022 152
Camera sync: external
• Some cameras allow sending an external trigger
signal which takes the shot
• Daisy chaining: one camera triggers another
• Special-purpose hardware
– Device dedicated to propagating external
synchronization signals for triggering the cameras;
– Precise synchronization;
– Expensive, technically complex, not very flexible
08/11/2022
Camera sync: software
• Send the sync signal through the image data
interface/channel
• Depends on the interface
• Affected by latency, quality of the connection,
support by devices
08/11/2022
Data throughput
• Sensors produce lots of data (24 bit FHD@30Hz
~180 MB/s)
• That data should go somewhere from the sensor
• Interface bandwidth
– Gigabit Ethernet (125 MB/s)
– USB 3.0 (625 MB/s)
– CameraLink (255-680 MB/s)
– CoaxPress (780 MB/s)
– FireWire (98 MB/s)
– Internal custom bus (?? MB/s)
08/11/2022
Data throughput
• Storage bandwidth
– HDD (60-150 MB/s)
– SSD SATA3 (200-550 MB/s)
– SSD PCI-express ( 2700 MB/s)
– Gigabit network stream ( 125 MB/s)
• Bandwidth limits the choice of
– frame rate ( 24, 30, 48, 60, 120 Hz...)
– frame resolution (VGA, HD, FullHD, 4K, 8K...)
– dynamic range (8b, 12b, 16b...)
– number of cameras (stereo, 2+2, 40, 300...)
08/11/2022

L4 3D SceneSensing 2022

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L4 3D SceneSensing 2022

Uploaded by

Copyright:

Available Formats

3D Scene Sensing

Pinhole, lens distortion, thin lens

• Each point 𝑃(𝑥, 𝑦, 𝑧) in the scene is mapped to a

• The virtual camera is located at some coordinates

• Perspective projection equation works only if

• The virtual camera is located at some coordinates

xdistorted = x(1 + k1*r2 + k2*r4 + k3*r6)

xdistorted = x + [2 * p1 * x * y + p2 * (r2 + 2 * x2)]

08/11/2022 3D Media Technology

08/11/2022 3D Media Technology

• Unlike a pinhole, a lens system doesn’t image objects

• If an observer is looking at some point of interest from

• Detect significant changes in the direction

• If C has high enough eigenvalues, any choice of direction would be

• Use descriptors based on local statistics

08/11/2022 3D Media Technology

The points e and e’ are called epi-poles

All points on p project on l and l’

epipoles e,e’= intersection of baseline

an epipolar plane = plane containing baseline (1-D family)

an epipolar line = intersection of epipolar plane with image (always

For parallel camera images there is no optical centre projection to the

• Mapping from 2-D to 1-D family

• Let’s start with point correspondences

• Those should, however, be correct, but how to guarantee that?

• The scene is captured Left view

• No matter how well cameras are physically positioned,

• Can’t change location (due to

• Rotation around the optical center is possible Rf

(0, f ) PL ' = ( xL , f ) (b, f ) PR ' = ( xR , f )

Matching cost Cost Disparity Disparity

is the windowed neighborhood of , and 𝑊 , the weighting function insid

filtering for aggregation (task 6)

• Local neighborhood filtering

is the windowed neighborhood of , and 𝑊 , the weighting function insid

filtering for aggregation (task 6)

Bilateral filtering for aggregation (task 6)

Color similarity between pixels p and q:

(g) Color weights

• Convert cost volume to disparity

• Estimate the 3D point position from point

Department of Signal Processing

Department of Signal Processing

Department of Signal Processing

Department of Signal Processing

Department of Signal Processing

Department of Signal Processing

=> Optimization problem

• Real scene has multiple

• The inversion can be helped by introducing a

• Variant of stereo matching

• Replace the other camera

• Scanning takes time

Velodyne sensors are used

• Constructive and destructive

• Amplitude of reflected signal depends on

• Light hits the objects in the scene as a “light wall”

short wave long wave Modulation Modulated

• Photonic Mixer Devices (PMD) are ToF cameras utilizing

• The reflected modulation

to the sensor surface Depth

• The electrical signal from

each CMOS pixel is correlated Photo-emitting

xdistorted = x(1 + k1r2 + k2r4 + k3*r6)