You are on page 1of 80

Human vision and depth perception

COMP.SGN.320 3D and Virtual Reality


Autumn 2021

Yuta Miyanishi, Atanas Gotchev


3D visualization

• The ultimate goal of a 3D display is to replicate the 3D scene,


such that it is visually indistinguishable from reality.
• 3D perception
– 3D scene – light (𝐿) – human observer
• 3D visualization
– 3D display – emitted light (𝐿" ) – human observer

3D 𝐿"
display
𝐿 Human
observer

3D 3D
scene image
Why human vision?
• A brute-force way of imaging: capture & reconstruct everything but it
requires infinite resources
• The end-user of 2D/3D displays is a human
– For efficient display design, we should understand how visual content
affects the perception
– Perceptual artifacts in 3D images can cause general dizziness and
nausea = may impair visual experience and usability.
• Perceptual coding
– Perceptual data loss: not all sensory input is processed
– Cognitive data loss: we do not pay attention to all events registered by
the brain
– Then why to store/transmit/process such data?
– One can store/process only the perceptually “important” part
Vision and light
What is Light?
• Light: electromagnetic wave or photons or rays?
• Wave properties:
– Amplitude/intensity

– Frequency/wavelength

– (Propagation) direction

– Phase

– Polarization (vectoral)
Spectrum of light
• Intensity [W]
• Wavelength [nm]

• Visible light
– 380-750 nm
What is Light?
• Ray parametrization: intensity, wavelength, direction etc.
• Plenoptic function (light field):

𝑃(𝑥, 𝑦, 𝑧, 𝜃, 𝜙, 𝜆, 𝑡)
𝜃!
(𝑥!, 𝑦!, 𝑧!) 𝜙!
The physiology of the visual system
The visual system
• Layered (but bi-directional) neural system
– Receive and process light from the world
– Control visual motor functions
– Extract & reconstruct information about the world
• Anatomically consists of
– Eyes
• Refractive structure
• Retina
• Optic nerve
– LGN
– Visual cortex (= “brain”)
• V1, V2, …, MT, MST, …
Anatomy of the eye
• Cornea: responsible for ~70% of entire refractive power
• Iris: circular opening (= pupil) at the center, surrounded by muscles
so that the pupil size can be controlled around 2–8 mm
• Crystalline lens + zonules + ciliary muscles: focus-tunable lens,
main role in accommodation
• Aqueous humor: water-like liquid
that works like blood
to maintain lens and cornea
Anatomy of the eye
• Vitreous body: jelly-like filling between the lens and retina
• Retina: membrane with layers of neurons, which detects light and
processes the signals
Accommodation
• Accommodation: function to increase the refractive power of the eye
• Unaccommodated (= resting) for a distant object
– Ciliary muscle is relaxed & stretched, zonules flatten the lens
• Accommodative effort for a near object causes
– Contraction of ciliary muscle
– Release of zonular tension
– Lens tries to be “more spherical”
– Increase in the refractive power
Retina: structure
• Retina is layers of photoreceptors and other neurons.
• 4 types of photoreceptors: L-,M-,S-cones and rods = “duplex retinas”
• Various neurons relay signals from photoreceptors

Outside of eyeball
Photoreceptors

Neural signals

Light from pupil

Inside eyeball
Retina: photoreceptors
• Rods
– Extremely sensitive to the light
– Functionally work only under very dim environment (scotopic vision)
• Cones
– Less sensitive to light; function under bright environment (photopic vision)
– Three types of cones (L-,M-,S-) with different spectral response curves
= supports color vision of humans

• The outer segment of rods and cones contains


photopigment which chemically reacts to photon
• Detection of photon “consumes” photopigment
and it must be regenerated in the inner segment
Retina: non-uniformity
Retina is not uniform in neural structure and its function.
Fovea: the central 1.0-mm region; cones are tightly packed
Subtends ~2 deg = width of thumb at arm’s length
Retina: non-uniformity
Retina is not uniform in neural structure and its function.
Periphery: regions > 10 deg from fovea; mostly rods
“Resolution” is low, but very sensitive to light (and motion)
Retina: signal processing
The flow of signals
1. Photoreceptors (~108)
2. Bipolar cells (+ horizontal cells & amacrine cells)
3. Retinal ganglion cells (~106)

Retina is not just a sensor;


it processes with the neural circuits

Not only the distribution of cones/rods,


the circuits after light detection affect
functional variation across retina
Retina: signal processing
Bipolar cells receive input from either cones or rods
• In periphery, a bipolar cell is connected to tens of photoreceptors
– Pooling of signals / convergence of information
– Supports high visual sensitivity to light, but bad “resolution”
• In fovea, bipolar cells are connected to
single cones
– High “resolution”, but low sensitivity to light
• ON bipolar cells and OFF bipolar cells
– React to an increase or a decrease of
the signal from a cone in fovea

Horizontal cells and amacrine cells


(not shown) are “modulators”
Retina: non-uniformity
Light
Retinal region Photoreceptor type “Resolution”
sensitivity
Fovea Mostly cones High Low
Periphery Mostly rods Low High
Foveal vision
The pictures below (original + distorted) are perceived to be nearly identical
because of poor acuity in peripheral vision

Original Distorted only in periphery


Freeman & Simoncelli (2011)
Eye movements
• Saccades
– Voluntary, frequent (3-4 per second) and fast (~1000 deg/s) jumps of
the fixation point from one spot to another
• Smooth pursuit
– Voluntary tracking of moving object
• Micro-saccades
– Involuntary, small and jerk-like eye movements during fixation
– Prevent the visual field to “fade” and improve ability to catch fine details
• Binocular vergence
– Two eyes move in opposite directions
– Rotate eyes inward (convergence) and outward (divergence)
– Important role in binocular depth perception: to fixate objects at various
depths
Eye movement and foveal vision
Retina: signal processing
Each retinal ganglion cell gathers signals from bipolar cells (and
amacrine cells)
• (RGC’s) Receptive field: retinal region in which visual stimuli
influence the neuron’s activity
• Major types of RGCs have an antagonistic
center-surround receptive field
Example: ON-center-OFF-surround receptive field
• RGC increases firing rate when central region gets stimulated
& surrounding region is in dark
• RGC decreases firing rate when central region is in dark
& surrounding region is stimulated Center-Surround Antagonism
- +
- - + +
- + + - + - - +
+ + - -
- - + +
- +

On Center Off Center

• This type of receptive fields works as spatial filtering


Contrast – The Hermann grid
The Hermann grid explained

High contrast area


intensifies “whiteness”

Lower contrast area (less


black) – perceptive level of
white is decreased
The Hermann grid explained

+-

Fovea (gazed)
+-
Small
reseptive field: Periphery
no exic./inhib. +-

Higher inhibition: perceived darker


Is the Hermann grid explained?

+-

+-

+-

https://www.illusionsindex.org/i/hermann-grid
The Visual Cortex
• Primary visual cortex (V1) receives signals
from retina via the LGN
• There is a clear mapping of input from retina
to V1 neurons (retinotopy)
– Foveal region has very large area in V1
(cortical magnification)
• Visual input is further processed in V1
– Selectivity to a line stimulus’s orientation
– The differences between two retinas are
detected
• Visual information is then handed to the next
regions, where retinotopy is no longer kept,
having more complex selectivity and larger
receptive fields, etc.

http://what-when-how.com/neuroscience
Dynamic range of human vision
• Human vision works over a (very) wide range
From faint starlight (< 10-4 cd/m2) to intense sunlight (> 105 cd/m2)
• The mechanisms
– Physical regulation by the pupil
– Two photoreceptor classes, two modes of vision: photopia and scotopia
– Changes in cone sensitivity (biochemical adaptation)
– Retinal processing of signals
– Cortical gain control

• Conventional displays offer a small fraction of the range


– Cinema: max. ~55 cd/m2
– TV: max. 100-500 cd/m2
Spatial vision
Visual angle and spatial frequency

Visual angle
• The angle subtended by an object
(at the retina)
• Directly dictates the apparent size
Spatial frequency
• Number of periodic spatial modulations
wikipedia (en) “visual angle”
per unit angle (or length)
• “cycles/degree” (cpd) is commonly used in studying vision
Visual acuity
Visual acuity
• The smallest visual feature size (in angle)
that a patient can reliably distinguish
– In Snellen letters, the width of the strokes
– Typical good vision: VA of ~1 arcmin (= 1/60 deg)
• Usually tested for foveal vision
= VA is related well to the foveal cone spacing

Snellen chart
Contrast sensitivity
Contrast sensitivity
• “Extension” of visual acuity
• Lowest contrast (Michelson contrast) that an observer can reliably
distinguish at various spatial frequencies of gratings
– Contrast thresholds as a function of spatial frequency
= Contrast sensitivity function
– Highly dependent on location in the visual field and other factors

Contrast: high Contrast: low Contrast: high Contrast: low Empty


Frequency: low Frequency: low Frequency: high Frequency: high
Contrast sensitivity function

Spatial frequency

Contrast Thresholds
Contrast sensitivity function
Color vision
“Colors you see are really in you”
Color is not a physical property
• Spectral radiance & reflectance are physical properties
• Color is what a human observer come up in mind seeing it under
each condition
• Yet color is related to a physical property

“The dress”
One image, two different percepts
Blue-black vs. white-gold
Subjectivity of color
Physically identical “color” (= a spectrum of light) may be perceived as
different colors

Purves & Lotto (2003)


Subjectivity of color
Physically identical apparent color (“physical” color) may be perceived
as different surface colors

Purves & Lotto (2003)


Subjectivity of color
Physically identical apparent color (“physical” color) may be perceived
as different surface colors

Purves & Lotto (2003)


Subjectivity of color

Purves & Lotto (2003)


Subjectivity of color
Physically different “colors” may be perceived as the same color

Purves & Lotto (2003)


Color is not the wavelength

“Colors”

• There are colors which cannot be represented by any


single wavelength (“extraspectral” colors)
• Different combinations of wavelengths (spectrum) can be
perceived as the same color
• The same spectrum can be perceived as different colors
Basis of color vision: three types of cones

• Most mammals are dichromats (L/M cones and S cones)


• Primates are trichromats
• L,M,S cones are sensitive to: Long, Medium, Short bands
• S cones appear much less frequently (~4%), L/M varies among
individuals

https://en.wikipedia.org/wiki/Color_vision
Radiance, Luminance and Brightness

Radiance
Radiant emitted, reflected, tranmitted or received by a surface,
measured by W·sr-1·m-2

https://www.cs.utah.edu/~gk/papers/vis02/talk/slide005.html
Radiance, Luminance and Brightness
Luminance
• Radiance “corrected” by how visible each wavelength is for the
“average human observer”
• cd/m2 = nit

https://www.cs.utah.edu/~gk/papers/vis02/talk/slide005.html
Radiance, Luminance and Brightness
Luminance
• Radiance “corrected” by how visible each wavelength is for the
“average human observer”
• cd/m2 = nit
Radiance, Luminance and Brightness
Brightness
• Perception elicited by the luminance of a visual target
• Brightness may vary even for patches with the same luminance

https://www.cs.utah.edu/~gk/papers/vis02/talk/slide005.html
Radiance, Luminance and Brightness
Brightness
• Perception elicited by the luminance of a visual target
• Brightness may vary even for patches with the same luminance
3 cone types, 3 primaries
For any (radiant) spectrum, we can calculate how much it stimulates
each cone type by computing its inner product with the cone sensitivity:
𝐿 = ∫ 𝑑𝜆 𝐴! 𝜆 𝐼 𝜆 , 𝑀 = ∫ 𝑑𝜆 𝐴" 𝜆 𝐼 𝜆 , 𝑆 = ∫ 𝑑𝜆 𝐴# 𝜆 𝐼 𝜆

AL(l), cone sensitivity of L cone

I(l), spectrum of incoming light


from a particular surface
3 cone types, 3 primaries
For any (radiant) spectrum, we can calculate how much it stimulates
each cone type by computing its inner product with the cone sensitivity:
𝐿 = ∫ 𝑑𝜆 𝐴! 𝜆 𝐼 𝜆 , 𝑀 = ∫ 𝑑𝜆 𝐴" 𝜆 𝐼 𝜆 , 𝑆 = ∫ 𝑑𝜆 𝐴# 𝜆 𝐼 𝜆

The whole spectrum I(l) is then


represented as 3 numbers (L,M,S).

Note that the converse is not true: some


(L,M,S) triples cannot be produced by any
spectrum. E.g., (0,1,0) is not possible as it’s
not possible to activate the M cone without
also activating either/or S and L.
Color metamers
• Color stimuli that have different spectral radiant power distribution
but are perceived as identical by an observer
• Clearly there are many different spectra which yield the same
(L,M,S).
• 2D/3D displays usually have 3 channels: spectral compression.
CIE 1931 Standard

= abs difference L and M


= Luminance
= S-cone

Y=luminance
Z

wavelength in nm

X
Color Gamut
Color “blindness”

• Color blindness was not recognized till 18th century


• Anomalous trichromat: partial malfunction of L (protanopia),
M (deuteranopia) and S (tritanopia) cones
• Dichromacy: full loss of one type of cones
• Monochromacy: full loss of two cone types (extremely rare)

Normal vision Deuteranopia

http://www.colourblindawareness.org/colour-blindness/
Color “blindness”
• Not that rare (especially for males)
• Difference in red-green axis may be difficult to distinguish for
those who don’t have “perfect” cones.
• Color universal design
Depth perception
Perception of 3D world
• Ability to perceive the structure of space is a fundamental goal of
the visual system
• Depth perception: the functions to reconstruct (or rather guess)
the 3D structure of the world from 2D retinal image(s)
• Depth cues: visual information sources about the 3D structure of
the world
– Many are from retinal image(s)
– Extra-retinal cues are also supplementarily used

https://giphy.com/mlb/
Depth cues: classification
• Monocular cues can be obtained even from one retinal image
– Pictorial cues
• Occlusion
• Size and position of objects
• Shading and shadows
• Linear perspective
• Texture gradient
• Aerial perspective
• Defocus blur
– Motion cues
• Head-motion parallax
• Kinetic depth effects / Structure from motion
• Binocular cues are from the difference between two retinal images
– Retinal binocular disparity
– Monocular occlusions (Da Vinci stereopsis)
Pictorial cue: linear perspective

Piero della Francesca, Città Ideale 1470-1490


Depth cues:
Linear perspective

Leonardo da Vinci, The Last Supper, 1490s


Depth cues:
Occlusion

Raffaello Sanzio. La disputa del sacramento, 1508, 1541


Depth cues:
Shading

Antonio Canova, Amore e Psiche, 1787-1793


Depth cues:
Texture Gradient

Gustave Caillebotte. Paris Street, Rainy Day, 1877


Depth cues:
Known size of objects

Gustave Dorè, Divina Commedia, 1861


Pictorial cues as a trick
Pictorial cues as a trick
Motion cues: head-motion parallax
• Laterally moving viewpoints (head) often gives relative metrical
information about 3D scene
• Direction and speed of apparent motions indicate the signed relative
depths of objects
Motion cues: structure from motion

• Without moving viewpoint, motion in a scene may give the depth


information
Focus cues
Retinal defocus blur and accommodation state

Held, Robert T. and Cooper, Emily A. and Banks, Martin S., 2012
Neural link between accommodation and vergence

• Accommodation and binocular vergence are


controlled through the same neural path
• Need for accommodation evokes convergence
and vice versa
• Near reflex
– Lens accommodates
– Eyes convergence
– Pupil constricts
Stereopsis

binocular depth perception


Stereopsis
Stereopsis = binocular depth perception
(from the Greek στερεος- stereos- meaning "solid", and ὄψις opsis, "appearance,sight")
• The function to get precise percepts of depth from the difference between
two retinal images (= retinal disparity)
Retinal Disparity
Binocular vergence
Binocular vergence: “simultaneous movement of both eyes in opposite
directions”
• Especially horizontal vergence is critical in stereopsis
• Images of the fixated point are placed at each eye’s fovea by vergence

Fixated
point Divergence
Eyes rotating outward

Fixated
point

Convergence
Eyes rotating inward

Fixated point is imaged at fovea (the origin in retina) in both eyes, thus zero disparity
Geometry of binocular disparity
• Retinal position of image is measured by angle
• A feature that is imaged at the same position in retina (including
fixated point) has zero disparity
• Horopter: the set of points in the world that have zero disparity (in
each convergence state)
– Theoretically it is the circle that runs through the eyes and the fixated
point (the Vieth-Müller circle) Empirical
– Points on the VM circle should be perceived horopter
as single points (because of the same
retinal position)
– Horopter that is obtained in experiments
= empirical horopter
– Points on the empirical horopter are
actually perceived as single points VM circle

Wikipedia (en) “Horopter”


Geometry of binocular disparity
• When a point has moderate disparity, single vision of the point is
perceived
• When a point has too large disparity, the point is perceived doubled
= double vision (diplopia)
• Panum’s fusional area
– The region in front of & behind
(empirical) horopter
– Single vision is obtained for points
in Panum’s fusional area
Stereoacuity
Stereoacuity = measured by stereothreshold, the smallest binocular
disparity that can be perceived
• Humans’ stereoscopic depth perception is remarkably precise
• A typical stereothreshold of 10 arcsec (0.003 deg) corresponds to
0.8 mm of a bump at the distance of 1 m

• 3-5% of the population lacks


stereoscopic depth perception
(= stereoblindness)

Histogram of stereothreshold (arcsec)


(Coutant & Westheimer 1993)
Dynamic informativeness of depth cues
For example,
• Aerial perspective
– Atmospheric effect (scattering) on light propagation
= Appears (almost) only on objects at a (very) long distance
• Binocular disparity
– Difference between two views from 6.5-cm-apart eyes
= Almost zero for objects at a long distance

Watching a ball in a hand


Watching mountains • Binocular disparity: informative
• Binocular disparity ~0 • Rich pictorial cues
• Rich pictorial cues
Additional reading
• Student resources of Sensation & Perception, 5th ed.
https://learninglink.oup.com/access/sensation-and-perception-5e-student-resources
• Handbook of Visual Optics Vol.1 (fully available online via Andor)
https://andor.tuni.fi/permalink/358FIN_TAMPO/1j3mh4m/alma9910645729505973
• Training course “3D Displays and the Human Visual System”, http://www.full-parallax-
imaging.eu/TS1/?page_id=11
• “Human Vision and Color Perception”, Olympus microscopy center, https://www.olympus-
lifescience.com/en/microscope-resource/primer/lightandcolor/humanvisionintro/
• “Color Vision”, Handprint, http://www.handprint.com/HP/WCL/wcolor.html
• B. Wandell. Foundations of vision. Sinauer Associates, Inc., Sunderland, Massachusetts,
1995
• Schreiber, K. M., Hillis, J. M., Filippini, H. R., Schor, C. M., & Banks, M. S. (2008). The
surface of the empirical horopter. Journal of Vision, 8(3, 7), 1-20.
http://journalofvision.org/8/3/7/article.aspx

Ø See the course page in Moodle for additional reading

You might also like