Professional Documents
Culture Documents
Signals
Acquisition, representation
and Compression
Note on the use of images and external sources of information
• Most of the pictures used throughout this presentation are not created by the
author of the presentation. They were obtained from the sources referenced at
the end and obtained doing Internet searches. For the sake of clarity and
simplicity of the slides, the source has not been referred individually in each
slide.
• The author would like to thank all of the researchers who have published and
made available their contents
• 3D content acquisition
• 3D content representation
• Stereo and multiview video compression
• texture-based coding
• converting from 2D to 3D
• the key is to consider several depth cues such as motion parallax, perspective or
camera motion to obtain a depth map
• the separation can be smaller or larger to accentuate the 3D effect of the displayed
material
• it will need to be varied for different focal length lenses and by the distance from
cameras to the subject
• nonetheless although, depending on age and other factors, other baseline values
may have to be taken into account by content producers
% of viewers
Baseline distance
2019/2020 Processamento e Codificação de Informação Multimédia - MIEEC 6
2D-to-3D conversion
• Why 2D to 3D conversion?
• there is still limited content originally captured in stereoscopic format
• whilst 3D-enabled TV sets are already being introduced at a rather large scale
• Where/When?
• Post-production: cinema and TV
• Broadcasting: live content and legacy material
• TV set: legacy content owned by viewer or 2D linear programming being broadcasted
• How?
• Fully manual
• Semi-automatic
• Automatic off-line
• Automatic in real-time
• From one single-view video (normal flat 2D video) estimate depth and render
new virtual view
• Input: 2D video sequence
• the right video sequence will be the new virtually created sequence
stereoscopic display
• the depth map is a grayscale picture, assigning values of brightness to each pixel
• the value of brightness assigned to a pixel specifies the distance that pixel is from
the camera or from the viewer
• this map must be obtained for each image of the input 2D video sequence
• The resulting stereo/3D video is generated from the corresponding depth maps
and the original 2D video
• by shifting each pixel of the 2D image to the left or to the right depending on
the corresponding depth map value and the type of stereo view (left or right)
• most important and difficult step is filling unknown areas that appear after image
objects have been shifted
• temporal impainting
• Off-line
• Automatic, semi-automatic or manual
• if automatically generated, then converted content may need to be manually
corrected
• On-line
• must be fully automatic
• Assigning depth
• usually a manual operation performed by the editor based on depth cues
• Displace objects
• shifting pixels according to the depth map
• could be manual or semi-automatic operation
• Fill-in holes
• filling unknown areas that appear after image objects have been shifted
according to their attributed depths
• semi-automatic operation with human supervision
• Generate depth
• create a draft overall depth map
• automatic operation
• Select objects
• semi-automatic operation
• Edit depth map
• fine-tune the depth information for each foreground object
• manual or semi-automatic
• Render new view
• automatic
• called depth image based rendering
• Fill-in holes
• automatic operation with human supervision
• a commonly used model is the one that assumes that the closer to the
bottom an object is, the closer to the viewer it is
• directly on 3D TV sets
• mobile devices
• Two main approaches can be employed relying on two different ways of representing
depth
• Depth Image Based Rendering (DIBR)
• it directly uses the constructed depth maps to create the new (shifted) image
• Image Warping
• depth is represented by a mathematical function (as a transformation)
• Input signals
• single original image
• usually taken as to be the left image
• depth data
• map, structure or transformation to apply
• Output
• stereoscopic pair
• the depth data provides disparity values that are used to displace pixels
• Determining correct depth order for all the objects in the scene
• Once the two images have been obtained, how to fit in the left and right
images (or the 3D visual information) into one stream?
• for storage or transmission purposes
• two alternatives
• spatial
• side-by-side
• top-down
• temporal
• doubled frame rates
• Multiview Simulcasting
• Multiview Video
• MVC (extension of H.264)
• MV-HEVC standards
• Texture + Depth
• 2D (Texture) + Depth
• MPEG-C standard
• Multiview+Depth (MVD)
• 3D-HEVC standard
• the stereo signal is a multiplex of the two images into a single frame or sequence
of frames
• signalling data referring to the frame compatible formats has been specified in H.
264 as Supplemental Enhancement Information (SEI) messages
Preparing content
side-by-side top-bottom
recovering content
• MPEG-2 Video, MPEG-4 Visual and the MVC standards offer full stereo coding
solutions with increased compression efficiency adopting this approach
• Multiview Format
• offers the viewer the possibility to move in front of the screen, changing
viewpoint freely with multiple perspectives available
• the basic case of multi-view is stereo video (N = 2), generating the sensation of
depth to the viewer by having each image derived for projection into one eye
images
views
• inter-view prediction
view 0
• view 0 is coded using a conventional H.264
approach
• it can be decoded either by a H.264/AVC or
view 1
H.264 MVC
• constitutes the base or reference level
• the other views are coded in a similar way but
• reference images in those other views are
themselves predicted using the I or P images
from the adjacent view i
base
base view 0 view
view 1
view 1 B B
(B only)
B B
• the depth map is a grayscale image indicating the distance of objects to the
camera
• brighter areas indicate objects closer to the camera
• it can be seen as metadata as it provides information about the image
• only one view (2D) is coded and transmitted alongside the metadata
• all necessary views for 3D display are generated from the received data, such as
using depth image based rendering (DIBR) (recall 2D-to-3D conversion)
• 2D texture + depth
• one single 2D view and metadata with the corresponding depth map
• depth enables to generate at the receiver the missing neighbouring view
• advantages
• 2D video is backward compatible with legacy devices
• limitations
• difficulties in rendering the second image and filling holes due to occlusions (see
2D-to-3D conversion)
• complex decoders
• the 2D views and the depth maps (greyscale images!) are encoded with MVC
• key points
• an MVC + D decoder can reuse H.264/AVC or MVC hardware decoder
implementation modules
• requires typically about twice the bit rate of 2D video coded by H.264/AVC
• Because the depth map is a grayscale image, it can be encoded using usual
video encoders
• and then both channels are joined together for transmission
• the main channel with the 2D textured image and the auxiliary channel with depth
information on a pixel basis
2D texture
channel
MPEG-4
MAC
depth
channel
• MV-HEVC
• Simple stereo/multi view extension of the HEVC standard
• includes encoding of depth maps as additional color plane
• 3D-HEVC
• more efficient video + depth coding (in relation to MV-HECV)
• Scalable stereo/multiview
• http://www.yuvsoft.com/stereo-3d-technologies/2d-to-s3d-conversion-process/
• https://www.ntt-review.jp/archive/ntttechnical.php?contents=ntr201110gls.html
2019/2020 Processamento e Codificação de Informação Multimédia - MIEEC 38