Professional Documents
Culture Documents
spatial-temporal quality metric for video has been introduced dvi may be any value in the interval of [−1, 1]. |F | depicts the
and subjectively evaluated. This metric relies on the image overall number of frames. For audio we followed the same
data of each video frame and quantifies its spatial-temporal principle as for video with the difference that we used the
distortion. In our case we aim on quantifying the distortion spectral energy of the audio frames for each audio channel by
caused by playback rate variations and, thus, we focus on tem- using the Fourier Transformation. The Fourier Transforma-
poral information only. tion for a single audio frame and for the i-th content section
Increasing or decreasing the media playback rate – de- is denoted by baci (cf. Equation 3), where c depicts the audio
noted as µ – results in a perceptual distortion in audio and/or channel. Equation 4 defines our distortion metric for audio.
video. This distortion depends on the actual multimedia con- Note that we take into account the audio channels.
tent, especially on the temporal features, for which the play- Mi
X jk
back rate is increased or decreased. Therefore, we propose aci (k) =
b e−2πi M fac (j) (3)
the following metrics for measuring the distortion caused by j=Ni
∼
modifying the playback rate of audio and video: C Fei
Sf Fei Sf
X X X X X
• Audio: the spectral energy of an audio frame for the si = (( |b
acu (k)| − |b
acu (k)|)) (4)
c=1 ∼
c-th channel is denoted by fac (x). u=Fsi
k=0 u=Fsi k=0
• Video: the average length of motion vectors between C denotes the number of audio channels available. Sf de-
two consecutive frames is denoted by fv (x). notes the highest frequency. Finally, dai denotes the distor-
tion in audio for the i-th content section.
These metrics allow us to quantify the distortion for audio 1
dai = PC si (5)
and video when increasing or decreasing the playback rate
c=1 sec
for a specific content section. This is done by comparing how sec denotes the overall spectral energy of channel c. Again,
much of each temporal feature has been experienced by the dai may be any value in the interval of [−1, 1]. If there are
user during the content section with and without the playback more content sections for which the playback rate is changed
rate change. We differentiate between increasing and decreas- we use the average of the introduced metrics denoted by dv
ing the playback rate when calculating our distortion metrics and da , respectively.
because our hypotheses is that increasing the playback rate
may have a different impact on the QoE than decreasing the
3. SUBJECTIVE QUALITY ASSESSMENT USING
playback rate.
CROWDSOURCING
In order to determine our metrics we calculate the last and
the first frame number for the content section for which the For validating our metrics and to investigate their correlation
playback rate shall be changed. Determining the first frame with the QoE, we conducted a subjective quality assessment
for the i-th content section for which the playback rate is using crowdsourcing in the following referred as study [10].
changed is done by Fs (tsi , tei ) = btsi · f psµ0 c, where tsi , tei Therefore, we will first describe the key aspects of the study
denotes the start and end of the i-th content section in sec- followed by the screening/filtering of the participants and the
onds. f psµ0 represents the frames per second for the nominal statistical analysis of the results.
playback rate µ0 . For determining the last frame of the i-th
content section for which the playback rate has been increased
∼ 3.1. Participants, Stimuli, Methodology, and Assessment
or decreased we use the following function Fe (tsi , tei ) as de- Platform
picted in Equation 1.
∼ For conducting our user study we selected the crowdsourc-
Fe (tsi , tei ) = bFs (tsi , tei ) + (tei − tsi ) · f ps∆µ c (1)
ing platform Microworkers1 . Microworkers allows hosting
f ps∆µ denotes the frames per seconds for the changed play- so-called campaigns to which subjects (called microworkers)
back rate ∆µ (∆µ = µ0 +δµ, where δµ denotes the change of can subscribe. These campaigns include a detailed descrip-
∼
the playback rate). With δµ equal to zero, Fe (tsi , tei ) yields tion of the task and asks each participant to hand in a proof in
the last frame for the i-th content section without any play- order to verify their participation. The duration of the study
back rate changes which is denoted by Fe (tsi , tei ). In the is approximately 15 minutes. We have found that the typi-
following, we introduce the metrics for the distortion in au- cal amount of money that is payed for a task with a duration
dio (dai ) and the distortion in video (dvi ) for the i-th content of about 15 minutes is approximately $0.20. Therefore, we
section. Equation 2 denotes the distortion metric for video. have set a slightly higher compensation of $0.25 as an ex-
∼
Fe (tsi ,tei ) Fe (tsi ,tei )
tra motivation for each participant [11]. Figure 1 depicts the
1 X X evaluation methodology used to conduct the study. The intro-
dvi = P|F | ( fv (j) − fv (j))
fv (j) duction explains the task and the test procedure. Furthermore,
j=1 j=Fs (tsi ,tei ) j=Fs (tsi ,tei )
(2) 1 http://www.microworkers.com, last access: July 2014.
178
2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX)
179
2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX)
relate with the MOS for the different playback rate configura-
tions. The Pearson correlation coefficient for dv and the QoE
ratings is ρ = 0.43. For da the Pearson correlation coefficient
is ρ = 0.679. If we take the absolute values of |da | and |dv |
for calculating the Pearson correlation coefficient we obtain
following values for the linear correlation between the met-
rics and the QoE scores. The Pearson correlation coefficient
for |da | and the QoE ratings is ρ = −0.5549 and for |dv | and
the QoE ratings ρ = −0.9565. For the metric in the audio do-
main both da and |da | show a low linear correlation with the
obtained MOS, respectively. For the metric in the video do-
main we have obtained contrary results. Taking dv we have a
low linear correlation between the metric and the QoE ratings
but if we take |dv | there exists a high negative linear corre-
lation. Nevertheless, if we want to retain the ability of dis-
tinguishing whether the playback rate has been decreased or
increased we have to use the signed metrics. The low values
Fig. 2. MOS and 95% CI for (dv , da , µ). between the signed metrics and the QoE scores show us that
3.3. Statistical Analysis of the Results assuming a linear relationship between the distortion metrics
and the assessed QoE may not be appropriate. Therefore, we
After screening the participants and their responses, the rat- try to find a model that explains this correlation better than a
ings for each stimulus presentation were subject to statistical linear model which will be discussed in the next sections.
significance tests. According to the Central Limit Theorem Finally, the results indicate that with an increase in |dv |
we assume that the ratings of the participants are normally and |da | the QoE is reduced. Interestingly, the QoE does not
distributed. Nevertheless, we have conducted a Shapiro-Wilk- decrease linearly. For the playback rate changes in the range
test to assess whether the ratings are not normally distributed. of [0.8, 1.8] the QoE remains high compared to the reference.
The null hypothesis (H0), stating that no normal distribution Increasing or decreasing the playback rate further causes a
is present, was rejected for each configuration of playback huge drop in the QoE.
rates of the stimulus presentations. By the use of the metrics
introduced in Section 2 the distortion caused by increasing or
decreasing the playback rates for the selected content sections 4. QOE UTILITY MODEL FOR AMP
are expressed by da for the average distortion in the frequency
domain for the audio channels and dv for the average distor- In [5] several AMP algorithms were assessed on their impact
tion of motion for video. on the QoE regarding QoS parameters such as the initial play-
Figure 2 depicts for each triple (dv , da , µ) the assessed back delay, loss rate, underflow time ratio, and the playback
Mean Opinion Score (MOS). It can be observed that play- rate by introducing cross traffic. The actual impact on the
back rates near the nominal playback rate of µ = 1 cause only playback of the content was not taken into account. Further-
a slight drop in QoE. A Student’s t-test supports this finding more, there is the need for a model which can be easily com-
by stating no significant difference in MOS between the ref- bined with other QoE metrics in order to assess the QoE of a
erence of µ = 1 and the following playback rates: µ = 0.8 system that uses AMP. The presented results of the conducted
(p = 0.93, t = −0.083); µ = 1.2 (p = 0.92, t = 0.096); subjective quality assessment using crowdsourcing gave us a
µ = 1.4 (p = 0.81, t = 0.42); µ = 1.6 (p = 0.22, t = 1.23); first impression on how the QoE degrades with an increase or
µ = 1.8 (p = 0.16, t = 1.41). These results indicate that the decrease in the playback rate when selecting content sections
users could not notice a significant difference for playback with a short time duration. With the knowledge that the Pear-
rates µ ∈ [0.8, 1.8]. son correlation between the QoE scores and the audio/video
For the other playback rates it can be observed that the metric is low, we try to find a function which allows us to
QoE significantly degrades. A Student’s t-test revealed that approximate the QoE more precisely than a linear function
there exists a significant difference in the MOS from the nom- could do. Therefore, we need a function that allows to esti-
inal playback rate of µ = 1 and the media playback rates with mate the QoE from the distortions for audio and video and re-
following values: µ = 0.5 (p = 0.00, t = 4.5217); µ = 0.6 turns us the coefficient of degradation, like ζ : R×R → [0, 1].
(p = 0.002, t = 3.2); µ = 2 (p = 0.03, t = 2.19). These Therefore, let
1 x−θ1 2 1 y−θ3 2
results provide the evidence that users perceived a significant ζ(x, y)θ = e− 2 ( θ2 ) e− 2 ( θ4 )
difference between the reference and the test conditions. be the two-dimensional function of degradation with the pa-
In the following we investigate how well our metrics cor- rameter vector θ that describes the relationship between the
180
2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX)
of 3-tuples with z representing the QoE assessed for (da , dv ). 5. DISCUSSION AND CONCLUSION
Therefore, we try to find the parameter vector θ ∗ that mini-
mizes our f (θ). For determining the conjugated search direc- As mentioned in Section 3 we use only the first 51 seconds of
tions we use the method proposed by Polak-Ribiere [20]. In Big Buck Bunny as multimedia content. This video sequence
order to find a near optimal vector θ we use multiple instances is presented with different configuration of content sections
of the conjugate gradient algorithm with starting points uni- and playback rates for these content sections. Recency ef-
formly distributed in the interval of ]0, 1] for all θi . We se- fects may occur and participants may unwittingly provide un-
lected the θ that provided the lowest costs in terms of our cost reliable ratings. Even shuffling the stimulus presentation in
function f (θ). a random fashion does not avoid recency effects in this case.
Especially, when extreme conditions are presented consecu-
By the use of the conjugate gradient method we fitted
tively (e.g., first with a playback rate of µ = 0.5 and then with
ζ(x, y)θ to the responses received during our study discussed
µ = 1.4). An option is to use different video sequences. But,
in Section 3 and we found the following values for the param-
this may have an influence on the actual task because partic-
eter vector θ ∗ = (0.0011, 0.0482, −0.0004, 0.0184)T . Equa-
ipants may like or dislike them and, therefore, this may have
tion 8 depicts the fully instantiated utility model.
1 x−0.0011 2 1 y+0.0004 2
an impact on the provided QoE rating. Therefore, we decided
QoE(dv , da )θ∗ = QoEwo · e− 2 ( 0.0482 ) e− 2 ( 0.0184 )
to use only a single video sequence for this subjective qual-
(8) ity assessment. Another possibility, that would have increased
Figure 3 depicts the QoE(dv , da )θ∗ by the use of the fitted the duration of the crowdsourced study, is to introduce dummy
ζ. An interesting finding is that a distortion in audio impacts video sequences which allow the participants to forget the last
the QoE more than the same amount of distortion in video. real stimulus presentation. Nevertheless, further subjective
This can be observed by comparing the second and fourth quality assessments have to be conducted in order to support
component of θ or by taking a look at Figure 3. the findings presented in this paper (e.g., [21]).
To test how well our utility model fits the actual data, we The results of the subjective quality assessment using
have conducted an analysis of variance (ANOVA) on how the crowdsourcing lead us to the hypotheses that the correlation
fitted model reflects the variability of the actual data. The between the distortion in the video/audio domain and the
ratio of the sum of squares of the model and sum of squares QoE of playback rate variations can be described by a non
181
2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX)
linear model. The introduced model lead us to the finding that [6] Y.-F. Ou, Y. Zhou, and Y. Wang, “Perceptual quality of
audio plays an important role when increasing or decreasing video with frame rate variation: A subjective study,” in
the playback rate. This is depicted in Figure 3 and denoted IEEE ICASSP, 2010, pp. 2446–2449.
in Equation 8. Comparing our results to the results obtained [7] Q. Huynh-Thu and M. Ghanbari, “Perceived quality of
by other subjective quality assessments which assess the QoE the variation of the video temporal resolution for low bit
of playback variations for video only [6, 7, 22] we can see rate coding,” in Picture Coding Symposium, 2007.
that altering the playback rate for the combination of audio [8] W. Lin and C.-C. J. Kuo, “Perceptual visual quality met-
and video has a very different impact on the QoE. Please note rics: A survey,” Journal of Visual Communication and
that for video only, the human perception is more tolerant for Image Representation, vol. 22, no. 4, 2011.
playback rate variations. This is not the case for audio and [9] Y. Wang, T. Jiang, S. Ma, and W. Gao, “Novel spatio-
the combination of audio and video as shown in this paper. temporal structural information based video quality
metric,” IEEE Transactions on Circuits and Systems for
The contribution of this paper is twofold. First, we have
Video Technology, vol. 22, no. 7, pp. 989–998, 2012.
shown that there is a significant difference between the im- [10] B. Rainer and C. Timmerer, “Self-Organized Inter-
pact of increasing and decreasing the playback rate on the Destination Multimedia Synchronization for Adaptive
QoE. An interesting finding is that increasing the playback Media Streaming,” in 22nd ACM Multimedia, 2014.
rate for specific content sections have a lower impact on the [11] M. Hirth, T. Hossfeld, and P. Tran-Gia, “Anatomy of
QoE than decreasing the playback rate by the reciprocal of a Crowdsourcing Platform - Using the Example of Mi-
the increase of the playback rate. Second, we have introduced croworkers.com,” in 5th IMIS, June 2011, pp. 322–329.
metrics that measure the distortion in audio and video caused [12] M. Waltl, C. Timmerer, B. Rainer, and H. Hellwag-
by increasing or decreasing the playback rate. With the use of ner, “Sensory Effect Dataset and Test Setups,” in 4th
these metrics we derive a utility model. QoMEX. IEEE, 2012, pp. 115–120.
Future work comprises the use of the obtained utility [13] “Rec. ITU-R BT.500-11,” Tech. Rep.
model in order to determine content sections that minimize [14] ITU-T Recommendation P.910, “Subjective video qual-
the impact of playback variations on the QoE. Considering ity assessment methods for multimedia applications,”
the use case of carrying out the synchronization in IDMS, the International Telecommunication Union, Geneva,
buffer contents may be used to determine appropriate content Switzerland, Tech. Rep., Apr. 2008.
sections for overcoming the identified asynchronism. [15] B. Rainer, M. Waltl, and C. Timmerer, “A Web based
Subjective Evaluation Platform,” in 5th QoMEX. IEEE,
Acknowledgments: This work was supported in part by
jul 2013, pp. 24–25.
the EC in the context of the SocialSensor (FP7-ICT-287975) [16] W. Verhelst and M. Roelands, “An overlap-add tech-
and QUALINET (COST IC 1003) projects and partly per- nique based on waveform similarity (wsola) for high
formed in the Lakeside Labs research cluster at AAU. quality time-scale modification of speech,” in IEEE
ICASSP, vol. 2, April 1993, pp. 554–557 vol.2.
[17] T. Hossfeld, C. Keimel, M. Hirth, B. Gardlo, J. Habigt,
6. REFERENCES
K. Diepold, and P. Tran-Gia, “Best Practices for QoE
Crowdtesting: QoE Assessment with Crowdsourcing,”
[1] M. Montagud, F. Boronat, H. Stokking, and R. Branden-
IEEE Transactions on Multimedia, 2013.
burg, “Inter-destination multimedia synchronization: [18] F. Pereira, “A triple user characterization model for
schemes, use cases and standardization,” Multimedia video adaptation and quality of experience evaluatio,”
Systems, vol. 18, pp. 459–482, 2012. in Multimedia Signal Processing, 2005 IEEE 7th Work-
[2] T. Hossfeld, M. Seufert, M. Hirth, T. Zinner, P. Tran- shop on, 2005, pp. 1–4.
Gia, and R. Schatz, “Quantification of youtube qoe via [19] C. T. Kelley, Iterative Methods for Linear and Nonlin-
crowdsourcing,” in ISM, 2011, pp. 494–499. ear Equations, ser. Frontiers in Applied Mathematics.
[3] M. Kalman, E. Steinbach, and B. Girod, “Adaptive me- SIAM, 1995, no. 16.
dia playout for low-delay video streaming over error- [20] E. Polak, Computational methods in optimization; a
prone channels,” IEEE Transactions on Circuits and unified approach [by] E. Polak. Academic Press New
Systems for Video Technology, vol. 14, no. 6, pp. 841– York, 1971.
851, 2004. [21] B. Rainer and C. Timmerer, “A subjective evaluation us-
[4] M. Yuang, S. Liang, and Y. Chen, “Dynamic video play- ing crowdsourcing of adaptive media playout utilizing
out smoothing method for multimedia applications,” audio-visual content features,” in IEEE QCMAN 2014,
Multimedia Tools and Applications, vol. 6, no. 1, pp. H. Lutfiyya and P. Cholda, Eds. IEEE, may 2014.
[22] Z. Lu, W. Lin, B. C. Seng, S. Kato, E. Ong, and
47–60, 1998.
S. Yao, “Perceptual Quality Evaluation on Periodic
[5] M. Li, “Qoe-based performance evaluation for adaptive Frame-Dropping Video,” in IEEE ICIP, vol. 3, 2007, pp.
media playout systems,” Advances in Multimedia, vol. 433– 436.
2013, p. 7, 2013.
182