You are on page 1of 4

An Evaluation Framework for 360-degree Video

Compression
Xiaoyu Xiu, Yuwen He, Yan Ye, Bharath Vishwanath
InterDigital Communications, 9710 Scranton Road, #250, San Diego, CA 92121, USA
{xiaoyu.xiu, yuwen.he, yan.ye, bharath.vishwanath}@interdigital.com

Abstract—360-degree video is emerging as a new way of projection formats need to first go through one projection
offering immersive visual experience. 360-degree video can be format conversion step (that is, to convert from the native
viewed on dedicated head mounted devices as well as on projection format to the coding projection format). Because
conventional 2D displays. Due to increased resolution to support projection format conversion introduces loss, the native
wide field of view, efficient compression of 360-degree video
projection format therefore always gets a favorable bias as it
becomes crucial. Whereas it is of significant interest to evaluate
how different projection formats impact the compression does not have to suffer the projection format conversion loss.
efficiency of 360-degree video, a main technical challenge is that Thus, a test methodology used for projection format
the input video is captured in a given native projection format, comparison needs to be carefully designed to overcome this
and that the native format has an obvious edge over other bias. In other words, it should evaluate how well 360-degree
projection formats. In this paper, an evaluation framework is video is represented using a given projection format on the
proposed to reduce the bias towards the native projection format sphere and how “friendly” that projection format is to coding.
when comparing different projection formats and their impact on Depending on the projection format used to represent 360-
360-degree video compression. Additionally, a quality metric degree video, the samples on the projected 2D plane could
called area weighted spherical PSNR (AW-SPSNR) is proposed
correspond to different sampling densities on the sphere. For
for objective 360-degree video quality evaluation. The proposed
evaluation framework was included in the common test example, for ERP projection, the sampling density is infinite
conditions defined for the exploration work of 360-degree video at the top and bottom of the 2D picture, which correspond to
compression technologies under the Joint Video Exploration the north and south poles on the sphere, respectively. For
Team (JVET). projection formats with multiple faces (e.g., CMP and OHP),
the sampling density is higher at the boundaries of the faces
Keywords—360-degree video, virtual reality, video compression,
video quality evaluation than at the centers of the faces. This is different from
conventional 2D video, where the sampling density is uniform
across the 2D plane. Thus, for projected spherical videos, the
I. INTRODUCTION
traditional peak signal-to-noise ratio (PSNR) cannot provide a
Due to the growing interests in deploying virtual reality meaningful quality measurement, as it weighs the distortion at
(VR) and augmented reality (AR) applications, 360-degree each sample location uniformly. In order to measure the
video is emerging as a new way of providing immersive quality of a spherical video, spherical PSNR (SPSNR) was
viewing experience by allowing the users to interactively proposed in [6]. SPSNR uses a set of sampling locations
change their viewpoint and view any part of the captured uniformly sampled on the sphere, finds the corresponding
content. Recently, the Joint Video Exploration Team (JVET) locations on the projected 2D plane given a projection format,
of the ITU-T Video Coding Experts Group (VCEG) and the and then calculates the distortion between the reference signal
ISO/IEC Moving Picture Experts Group (MPEG) began and the test signal at these locations. In [7], uniform weight in
exploration of potential video coding technologies for 360- sphere PSNR (WS-PSNR) was proposed to measure spherical
degree video compression [1]. video quality directly in the projection domain by assigning
Because conventional video codecs are not designed to different weights to the samples on the 2D projection plane.
handle spherical video, 360-degree video needs to be projected Because SPSNR only uses a limited number of sampling
on a 2D plane using different projection formats, such as points, it may not be able to provide a good estimate of the
equirectrangular projection (ERP) [2], cubemap projection spherical video quality. In [6], only 655362 samples on the
(CMP) [3], equal-area projection (EAP) [4], octahedron sphere are used. For a projected 2D video in 4K resolution (e.g.
projection (OHP) [5] and so forth, before coding. 3840x1920), this corresponds to less than 10% of all samples.
In the existing workflow of 360-degree video compression, Such low ratio cannot provide a reliable evaluation for all
the original content obtained from 360-degree video capture content. Another drawback of SPSNR is related to whether to
devices is represented in a given projection format, which is apply interpolation. If a sampling location on the sphere is not
referred to as the native projection format. When we evaluate projected to an integer sampling location on the 2D sampling
the different projection formats in terms of their compression grid, then either sample value from the nearest neighbor is
performance, the native format in which the original content is used, or interpolation needs to be applied to calculate the value
represented can be directly compressed, whereas all other at the fractional projected sampling location. Neither is ideal:

978-1-5386-0462-5/17/$31.00 ©2017 IEEE. VCIP 2017, Dec. 10 – 13, 2017, St Petersburg, U.S.A.
using nearest neighbor may not be sufficiently accurate, original and the reconstructed 8K video. Applying the quality
whereas applying interpolation could introduce additional metrics in this end-to-end manner allows to capture distortion
interpolation error in the metric evaluation process and resulting from forward and backward projection format
introduce the impact due to the choice of specific interpolation conversion, as well as compression.
filters.
Quality evaluation metric (e.g. SPSNR,
To resolve the aforementioned problems, an evaluation WS-PSNR, AW-SPNR, etc)
framework is proposed to compare different projection
formats and their impact on 360-degree video compression High res. native Reconstructed high
without the bias towards the native projection format in which projection format res. native
the original content is represented. The proposed evaluation (e.g. 8K) projection format

framework was agreed to be included in the common test


conditions for 360-degree video compression defined by the Converted to coding Reconstructed in coding
JVET [1]. Then, an area weighted spherical PSNR (AW- projection format (e.g. 4K)
CODEC
projection format
SPSNR) metric is proposed as a quality metric for evaluating
360-degree video. AW-SPSNR overcomes the shortcomings Fig. 1 The proposed evaluation framework for 360-degree video
of SPSNR: it can utilize all available samples in the projected compression
2D plane and does not rely on interpolation.
III. AREA WEIGHTED SPHERICAL PSN
II. THE PROPOSED EVALUATION FRAMEWORK FOR In this section, AW-SPSNR is proposed. AW-SPSNR
COMPRESSING 360-DEGREE VIDEO considers the entire set of samples available in the 2D
As discussed earlier, when evaluating the compression projected plane. Instead of going from the sphere to the
performance of different projection formats, the native format projected plane, AW-SPSNR starts from a sample in the
has a favourable bias compared to other projection formats, projected plane, and weighs the distortion at that sample
because other projection formats need to go through one according to the area that it covers on the sphere. For example,
additional step of format conversion before compression. In for ERP, if the sample is located at the equator (vertical center
order to overcome this bias, it is proposed to use a very high of the 2D plane), the distortion at that location will be given a
resolution signal (e.g., at least 8K) in the native projection larger weight as that sample covers a larger area on the sphere.
format as the ground truth in order to provide a sufficiently Conversely, if the sample is located toward the north/south
good representation of the original spherical signal. It is noted pole (top/bottom of the 2D plane), the distortion at that
that the ground truth signal in the proposed framework can be location will be given a very small weight as that sample
in any of the projection formats (not just ERP), as long as its covers a very small area on the sphere. With this area-based
resolution is sufficient to represent well the samples on the weighting, the proposed AW-SPSNR can calculated as
sphere. Then, the ground truth signal is converted to signal in
the coding projection format at a lower resolution where (1)
compression is applied. The coding projection format can be
in any projection format including ERP. Various projection where is the maximum sample value; and are the
formats spend different number of samples to represent
width and height of the 2D projection picture; and
different parts of the sphere. Going from a high resolution
are the original and reconstructed samples located at
ground truth to a lower resolution coding projection brings out
on the 2D plane; is the normalized weight
the difference among different projection formats in terms of
associated with the sample at , which is computed based
their sampling patterns and how that impacts coding. Thus, the
proposed framework can effectively remove the bias towards on , the non-normalized weight corresponding to
the native projection format in which the ground truth is respective area covered by the sample on the sphere:
represented.
Fig. 1 depicts the proposed framework for evaluating (2)
different projection geometries and compression algorithms
for 360-degree video. For simplicity of explanation, we Given the point P on the sphere projected from the 2D sample
assume the ground truth signal is in 8K resolution (which is location , its spherical area can be derived as
the best resolution currently available for original sphere [8]).
As shown in Fig. 1, the 8K ground truth signal in the native (3)
projection format is firstly converted into the coding
projection format in lower resolution, then sent through where are the latitude and longitude of P, which can be
encoding and decoding, and finally converted back into 8K calculated from the 2D coordinate using the
resolution in the native projection format. Any quality corresponding geometry mapping function from the projection
evaluation metric suitable for spherical content, e.g., SPSNR format to the sphere, i.e.,
in [6], WS-PSNR in [7], and the proposed AW-SPSNR that
will be discussed in Sec. III, can be applied between the (4)

978-1-5386-0462-5/17/$31.00 ©2017 IEEE. VCIP 2017, Dec. 10 – 13, 2017, St Petersburg, U.S.A.
Therefore, the values of and can be derived by average in conversion only test, followed by CMP, ERP and
computing the derivatives of and with regard to and , EAP. This is because compared to the other projection formats,
respectively, as the OHP format is represented by more projective faces;
therefore, the samples on the 2D plane correspond to more
uniform sampling densities on sphere, leading to less quality
ǡ  (5) loss due to the conversion from sphere to the projection
domain. Secondly, it can be observed from Table 2 that the
Thus, the spherical area covered by the 2D best projection format is highly content dependent. The quality
projection sample can be obtained by substituting (5) variation among different projection formats can be very
into (3). Additionally, although AW-SPSNR and WS-PSNR significant. For example, for the sequence “Basketball”, EAP
are built upon similar concepts, i.e., to assign different weights underperforms OHP by more than 6 dB for both the SPSNR
to the samples in the projection picture, the specific methods of and AW-SPSNR metrics. The best projection format for each
calculating the weights were developed independently, and are sequence is emphasized in Table 2 using boldface italic font.
different for different project formats. Readers can refer to [7] Additionally, it can be observed from Table 2 that the
for how to derive the weight values for WS-PSNR. values of AW-SPSNR track very closely that of SPSNR for all
the projection formats and all the sequences. The difference is
IV. SIMULATION RESULTS usually 0.01 to 0.02 dB. Given that AW-SPSNR resolves the
In this section, simulation results are provided to verify the sparse sampling issue and does not need to consider whether
proposed evaluation methodology in Fig. 1. The 360-degree to apply interpolation and what interpolation filters to use, it is
video conversion tool 360Lib [9] is used to generate all the considered to be a more suitable metric for evaluating the
results. In the simulation, a total of eight test sequences in 8K quality of 360-degree video.
resolution and two test sequences in 4K resolution (all in ERP
B. Projection conversion with compression
format) are tested [8][10]. SPSNR and AW-SPSNR are used
as quality evaluation metrics. Two sets of tests are conducted. Table 1 BD-rate in terms of SPSNR and AW-SPSNR for OHP
The first test is a conversion only test that bypasses the compared to the ERP anchor, using different evaluation methods
CODEC and reconstruction blocks in Fig. 1. That is, ground
truth high resolution ERP signal is converted to respective Test1 Test2
coding projection format at lower resolution, which is then Sequence AW- AW-
SPSNR SPSNR
SPSNR SPSNR
converted back and compared to the original ERP signal in
Bicyclist 5.6% 5.5% -4.5% -4.6%
high resolution for quality metric calculation. This conversion
Building 18.4% 18.5% -5.1% -5.0%
only test examines the conversion loss incurred by different
projection formats. The second test is conversion + coding test, To illustrate how the bias toward native projection format
where we examine the impact of different projection formats can affect the test results, two tests are performed using two
on 360-degree video compression. The compression results 4K test sequences. In the first test, the original 4K ERP signal
are collected using the HEVC reference software HM-16.12 is converted to the coding projection format in the same
[11] as the codec. The sequences are coded using four resolution, coded, and converted back to 4K ERP for metric
quantization parameters (QPs), 22, 27, 32 and 37, and using calculation. That is, in this first set, when the coding
random-access (RA) configuration. The Bjøntegaard delta projection format is ERP, project format conversion is not
(BD) rate [12] is used to evaluate compression efficiency. In performed, and the 4K ERP signal is coded directly. In the
this paper, a negative number means BD rate reduction, which second test, coding projection format is in a lower resolution
represents the amount of average bit rate saving at the same following the framework in Fig. 1: the original 4K ERP is
SPSNR/AW-SPSNR. converted to the coding projection format at 75% of 4K
samples, coded, and converted back to 4K ERP for metric
A. Projection Conversion without compression calculation. Table 1 shows the relative BD-rate in terms of
For conversion only test, Table 2 shows the SPSNR and SPSNR and AW-SPSNR of these two tests. Two coding
AW-SPSNR metrics when the coding projection format in Fig. projection formats, OHP and ERP, are used, where OHP is
1 is set to ERP, CMP, EAP and OHP, respectively. Firstly, it compared against ERP anchor. It can be seen that in the first
can be seen that based on the proposed evaluation framework, test, where the bias exists, ERP seems to perform significantly
the ERP format is not only not the best projection format, it is better than OHP. However, in the second test, when the
actually the second worst one with the second lowest proposed evaluation framework is used, the sampling
conversion quality on average. This shows that simply inefficiency of ERP is exposed, and ERP becomes inferior to
changing the resolution (i.e. overall sampling rate on the OHP in coding performance. This shows the importance to
sphere) can expose the sampling inefficiency that the ERP reduce the bias towards the native projection format, and
projection format suffers from. Such sampling inefficiency verifies the effectiveness of the proposed evaluation
includes for example the extreme oversampling of the poles, framework.
which is a well-known problem of this projection format. As Next, the conversion + compression results are provided to
shown in Table 2, OHP provides the highest quality on compare different projection formats using the proposed

978-1-5386-0462-5/17/$31.00 ©2017 IEEE. VCIP 2017, Dec. 10 – 13, 2017, St Petersburg, U.S.A.
methodology. Following the framework in Fig. 1, the ERP projection format is highly content dependent. For example,
coding performance is used as the anchor and other formats CMP is the best format for “Basketball” and “Chairlift”; EAP
are compared with ERP. Table 3 shows the BD-rate in terms is the best format for “Skate in lot”, “Skate Trick” and
of SPSNR and AW-SPSNR for CMP, EAP and OHP against “Driving”; ERP is the best format for “Train” and “Dancing”;
the ERP anchor. The results shown in Table 3 are similar to and ERP and EAP are relatively tied for “Jam Session”.
conversion only results in Table 2 in that the optimal
Table 2 SPSNR and AW-SPSNR metrics (in dB) for ERP, EAP, CMP and OHP, when only projection conversion is applied
ERP CMP EAP OHP
Sequence SPSNR AW- SPSNR AW- SPSNR AW- SPSNR AW-
SPSNR SPSNR SPSNR SPSNR
Basketball 45.58 45.59 46.24 46.22 41.23 41.23 47.22 47.26
Chairlift 47.42 47.40 48.06 48.05 44.51 44.42 48.67 48.71
Jam Session 51.45 51.45 51.74 51.76 52.66 52.64 52.12 52.15
Skate in lot 46.48 46.50 47.05 47.05 47.40 47.39 50.02 50.07
Skate Trick 44.83 44.80 45.09 45.11 46.07 46.05 45.53 45.53
Train 45.64 45.65 46.01 46.06 46.65 46.66 46.11 46.12
Dancing 47.62 47.63 48.09 48.10 47.80 47.81 48.40 48.38
Driving 48.36 48.35 49.03 48.98 49.95 49.89 51.13 51.12
Average 47.17 47.17 47.66 47.67 47.03 47.01 48.65 48.67
Table 3 BD-rate using SPSNR and AW-SPSNR metrics for EAP, CMP and OHP compared to the ERP anchor. The ERP anchor converts 8K
ERP to 4K ERP and compresses 4K ERP
CMP EAP OHP
Sequence SPSNR AW- SPSNR AW- SPSNR AW-
SPSNR SPSNR SPSNR
Basketball -5.6% -5.5% 32.8% 32.8% -2.4% -2.6%
Chairlift -16.8% -17.0% 9.5% 10.1% -1.9% -2.2%
Skate in lot -6.5% -6.2% -14.1% -14.0% -1.2% -1.2%
Jam Session 3.5% 3.1% 0.1% 0.1% 14.8% 14.4%
Train 11.6% 11.5% 1.3% 1.3% 26.4% 26.5%
Skate Trick 10.5% 10.2% -2.8% -2.9% 5.5% 5.4%
Dancing 1.3% 1.3% 6.4% 6.3% 13.0% 12.9%
Driving 14.2% 14.6% -7.9% -7.7% 39.6% 39.7%
Average 1.5% 1.5% 3.2% 3.3% 11.7% 11.6%
[3] Cubemap, https://en.wikipedia.org/wiki/Cube_ mapping/
V. CONCLUSION [4] Lambert cylindrical equal-area projection,
https://en.wikipedia.org/wiki/Lambert_cylindrical_equal-
In this paper, an evaluation framework is presented to area_projection/
compare different projection formats and their impact on [5] Octahedron, https://en.wikipedia.org/wiki/ Octahedron/
360-degree video compression. Simulation following the [6] M. Yu, H. Lakshman, B. Girod, “A framework to evaluate
proposed framework is conducted using the 360Lib software omnidirectional video coding schemes,” IEEE International
tool and HM16.12. Results show that the proposed Symposium on Mixed and Augmented Reality, 2015.
framework can effectively reduce the bias towards the [7] Y. Sun, A. Lu, L. Yu, “WS-PSNR for 360 video quality evaluation”,
native projection format by revealing the sampling MPEG document m38551, June 2016, Geneva, Switzerland.
efficiency of different projection formats and how that [8] A. Abbas, B. Adsumilli, “New GoPro test sequences for virtual
reality video coding,” JVET-D0026, Oct. 2016, Chengdu, China.
impacts compression performance. Additionally, the AW-
[9] 360Lib-1.0 reference software:
SPSNR metric is proposed in this paper for the quality https://jvet.hhi.fraunhofer.de/svn/svn_360Lib/tags/360Lib-1.0/.
evaluation of 360-degree video. Compared to SPSNR, AW- [10] A. Abbas, “GoPro test sequences for vitual reality video coding,”
SPSNR utilizes all the available samples in 2D projection JVET-C0021, May 2016, Geneva, Switzerland.
plane and does not need interpolation process during the [11] HM-16.12 reference software:
calculation. Therefore, it is asserted to be a more suitable https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-
16.12/.
metric for 360 video quality evaluation.
[12] G. Bjontegaard, “Calculation of average PSNR differences between
REFERENCES RD-curves,” VCEG-M33, Mar. 2001

[1] E. Alshina, J. Boyce, A. Abbas, Y. Ye, “JVET common test


conditions and evaluation procedure for 360 video,” JVET-F1030,
Oct. 2016, Chengdu, China.
[2] Equirectangular projection, https://en.wikipedia.org/wiki/
Equirectangular_projection/

978-1-5386-0462-5/17/$31.00 ©2017 IEEE. VCIP 2017, Dec. 10 – 13, 2017, St Petersburg, U.S.A.

You might also like